Theano

Tutorial

Adding two Scalars

import numpy
import theano.tensor import T
from theano import functin
x = T.dscalar('x')
y = T.dscalar('y')
z = x + y
f = function([x,y],z)
f(1,2)
numpy.allclose(z.eval({x:1, y:2}), 3)
numpy.allclose
Returns True if two arrays are element-wise equal within a tolerance.

QUESTION: tensor 张量? numpy.allclose?

Adding two Matrices

x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x,y],z)
f([[1,2], [3,4]], [[10,20], [30,40]])
f(np.array([[1,2], [3,4]]), np.array([[10,20], [30,40]]))

Logistic Function

x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
s2 = (1 + T.tanh(x / 2)) / 2

The reason logistic is performed elementwise is because all of its operations—division, addition, exponentiation, and division—are themselves elementwise operations.

s 等价于 s2

QUESTION: e 是什么? QUESTION: T.dmatrix 一定是二维的?

Setting a Default Value for an Argument

This makes use of the In class which allows you to specify properties of your function’s parameters with greater detail.

from theano import In
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, In(y, value=1)], z)
f(33)

Using Shared Variables

from theano import shared
state = shared(0)
inc = T.scalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

These are hybrid symbolic and non-symbolic variables whose value may be shared between multiple functions. The value can be accessed and modified by the .get_value() and .set_value() methods.

The other new thing in this code is the updates parameter of function. updates must be supplied with a list of pairs of the form (shared-variable, new expression).

The updates mechanism can be a syntactic convenience, but it is mainly there for efficiency. Updates to shared variables can sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix updates). Also, Theano has more control over where and how shared variables are allocated, which is one of the important elements of getting good performance on the GPU.

givens

It may happen that you expressed some formula using a shared variable, but you do not want to use its value. In this case, you can use the givens parameter of function which replaces a particular node in a graph for the purpose of one particular function.

>>> fn_of_state = state * 2 + inc
>>> # The type of foo must match the shared variable we are replacing
>>> # with the ``givens``
>>> foo = T.scalar(dtype=state.dtype)
>>> skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])
>>> skip_shared(1, 3)  # we're using 3 for the state, not state.value
array(7)
>>> print(state.get_value())  # old state still there, but we didn't use it
0

The givens parameter can be used to replace any symbolic variable, not just a shared variable. You can replace constants, and expressions, in general. Be careful though, not to allow the expressions introduced by a givens substitution to be co-dependent, the order of substitution is not defined, so the substitutions have to work in any order.

In practice, a good way of thinking about the givens is as a mechanism that allows you to replace any part of your formula with a different expression that evaluates to a tensor of same shape and dtype.

QUESTION: 不懂为什么需要 givens 这个东西?

Note

Theano shared variable broadcast pattern default to False for each dimensions. Shared variable size can change over time, so we can’t use the shape to find the broadcastable pattern. If you want a different pattern, just pass it as a parameter theano.shared(…, broadcastable=(True, False))

QUESTION: 介个是神马?

Copy functions

We can use copy() to create a similar accumulator but with its own internal state using the swap parameter, which is a dictionary of shared variables to exchange:

import theano
import theano.tensor as T
state = theano.shared(0)
inc = T.iscalar('inc')
accumulator = theano.function([inc], state, updates=[(state, state+inc)], on_unused_input='warn')
new_state = theano.shared(0)
new_accumulator = accumulator.copy(swap={state:new_state})
null_accumulator = accumulator.copy(delete_updates=True)

因为最后一个 delete_updates=True 会导致参数 inc 无用,所以 on_unused_input='warn' 时才可以 copy 成功。

Using Random Numbers

Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream.

Brief Example

from theono.tensor.shared_randomstreams import RandomStreams
from theano import function
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2,2))
rv_n = srng.normal((2,2))
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True)
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

rv_u draws from a uniform distribution, and rv_n drom a normal distribute.

When we add the extra argument no_default_updates=True to function (as in g), then the random number generator state is not affected by calling the returned function. So, for example, calling g multiple times will return the same numbers.

An important remark is that a random variable is drawn at most once during any single function execution. So the nearly_zeros function is guaranteed to return approximately 0 (except for rounding error) even though the rv_u random variable appears three times in the output expression.

Sharing Streams Between Functions

state_after_v0 = rv_u.rng.get_value().get_state()
nearly_zeros() # this affects rv_n's generator
v1 = f()
rng = rv_u.rng.get_value(borrow=True)
rng.set_state(state_after_v0)
rv_n.rng.set_value(rng, borrow=True)
v2 = f()
v3 = f() # v3 == v1

原来 random stream 的 state 是可以保存重放的。

import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400                                   # training sample size
feats = 784                               # number of input variables

# generate a dataset: D = (input_values, target_class)
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.dmatrix("x")
y = T.dvector("y")

# initialize the weight vector w randomly
#
# this and the following bias variable b
# are shared so they keep their values
# between training iterations (updates)
w = theano.shared(rng.randn(feats), name="w")

# initialize the bias term
b = theano.shared(0., name="b")

print("Initial model:")
print(w.get_value())
print(b.get_value())

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                                                                  # w.r.t weight vector w and
                                                                                  # bias term b
                                                                                  # (we shall return to this in a
                                                                                  # following section of this tutorial)

# Compile
train = theano.function(
                  inputs=[x,y],
                  outputs=[prediction, xent],
                  updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
        pred, err = train(D[0], D[1])

print("Final model:")
print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))

QUESTION: 步骤看不懂。难得有一份代码。快看。 QUESTION: 不太懂 bias term