# Linear Regression with Multiple Variables in Tensorflow

In Lecture 4.1 Linear Regression with multiple variables Andrew Ng shows how to generalize linear regression with a single variable to the case of multiple variables. Andrew Ng introduces a bit of notation to derive a more succinct formulation of the problem. Namely, $n$ features $x_1$$x_n$ are extended by adding feature $x_0$ which is always set to 1. This way the hypothesis can be expressed as:

$h_{\theta}(x) = \theta_{0} x_0 + \theta_{1} x_1 + \cdots + \theta_{n} x_n = \theta^T x$

For $m$ examples, the task of linear regression can be expressed as a task of finding vector $\theta$ such that

$\left[ \begin{array}{cccc} \theta_0 & \theta_1 & \cdots & \theta_n \end{array} \right] \times \left[ \begin{array}{ccccc} 1 & 1 & \cdots & 1 \\ x^{(1)}_1 & x^{(2)}_1 & \cdots & x^{(m)}_1 \\ & & \vdots \\ x^{(n)}_m & x^{(n)}_m & \cdots & x^{(n)}_m \\ \end{array} \right]$

is as close as possible to some observed values $y_1, y_2, \cdots, y_m$. The “as close as possible” typically means that the mean sum of square errors between $h_{\theta}(x^{(i)})$ and $y_i$ for $i \in [1, m]$ is minimized. This quantity is often referred to as cost or loss function:

$J(\theta) = \dfrac{1}{2 m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y_i\right)^2$

To express the above concepts in Tensorflow, and more importantly, have Tensorflow find $\theta$ that minimizes the cost function, we need to make a few adjustments. We rename vector $\theta$, as w. We are not using $x_0 = 1$. Instead, we use a tensor of size 0 (also known as scalar), called b to represent $x_0$. As it is easier to stack rows than columns, we form matrix $X$, in such a way that the i-th row is the i-th sample. Our formulation thus has the form

$h_{w,b}(X) = \left[ \begin{array}{ccc} \text{---} & (x^{(1)})^T & \text{---} \\ \text{---} & (x^{(2)})^T & \text{---} \\ & \vdots & \\ \text{---} & (x^{(m)})^T & \text{---} \end{array} \right] \times \left[ \begin{array}{c} w_1 \\ w_2 \\ \vdots \\ w_m \end{array} \right] + b$

This leads to the following Python code:

X_in = tf.placeholder(tf.float32, [None, n_features], "X_in")
w = tf.Variable(tf.random_normal([n_features, 1]), name="w")
b = tf.Variable(tf.constant(0.1, shape=[]), name="b")


We first introduce a tf.placeholder named X_in. This is how we supply data into our model. Line 2 creates a vector w corresponding to $\theta$. Line 3 creates a variable b corresponding to $x_0$. Finally, line 4 expresses function h as a matrix multiplication of X_in and w plus scalar b.

y_in = tf.placeholder(tf.float32, [None, 1], "y_in")
loss_op = tf.reduce_mean(tf.square(tf.subtract(y_in, h)),
name="loss")


To define the loss function, we introduce another placeholder y_in. It holds the ideal (or target) values for the function h. Next we create a loss_op. This corresponds to the loss function. The difference is that, rather than being a function directly, it defines for Tensorflow operations that need to be run to compute a loss function. Finally, the training operation uses a gradient descent optimizer, that uses learning rate of 0.3, and tries to minimize the loss.

Now we have all pieces in place to create a loop that finds w and b that minimize the loss function.

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch in range(1000):
sess.run(train_op, feed_dict={
X_in: X_true,
y_in: y_true
})
w_computed = sess.run(w)
b_computed = sess.run(b)


In line 1 we create a session that is going to run operations we created before. First we initialize all global variables. In lines 3-7 we repeatedly run the training operation. It computes the value of h based on X_in. Next, it computes the current loss, based on h, and y_in. It uses the data flow graph to compute derivatives of the loss function with respect to every variable in the computational graph. It automatically adjusts them, using the specified learning rate of 0.3. Once the desired number of steps has been completed, we record the final values of vector w and scalar b computed by Tensorflow.

To see how well Tensorflow did, we print the final version of computed variables. We compare them with ideal values (which for the purpose of this exercise were initialized to random values):

print "w computed [%s]" % ', '.join(['%.5f' % x for x in w_computed.flatten()])
print "w actual   [%s]" % ', '.join(['%.5f' % x for x in w_true.flatten()])
print "b computed %.3f" % b_computed
print "b actual  %.3f" % b_true[0]

w computed [5.48375, 90.52216, 48.28834, 38.46674]
w actual   [5.48446, 90.52165, 48.28952, 38.46534]
b computed -9.326
b actual  -9.331


### Resources

You can download the Jupyter notebook with the above code from a github linear regression repository.