# Linear Regression Theory

## Overview

In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable, “y”, and one or more explanatory (independent) variables. The case of one independent variable is called simple linear regression. For more than one independent variable, the process is called “multiple linear regression.” The compacted (vectored) formula is:

Y = W * X + b

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models that depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters because the statistical properties of the resulting estimators are easier to determine.

Linear regression models are often fitted using the least squares approach. They may also be fitted in other ways, such as by minimizing the “lack of fit” in some other norm, or by minimizing a penalized version of the least squares loss function, as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms “least squares” and “linear model” are closely linked; they are not synonymous.

## Assumptions

While linear regression seems nice and cool, there are a few assumptions that we take in order to apply a linear regression model.

The major assumptions made by standard linear regression models with standard estimation techniques are:

- Weak exogenity: Predictor variables can be treated as fixed values rather than random variables.
- Linearity: The mean of the response variable is a linear combination of the parameters (regression coefficients) and the predictor variables.
- Homoscedasticity: This means that different values of the response variable have the same variance in their errors, regardless of the values of the predictor variables. In practice, this assumption is invalid.
- Independent of errors.
- Lack of perfect multicollinearity in the predictors. For standard least squares estimation methods, the design matrix “X” must have full column rank; otherwise, we have a condition known as perfect multicollinearity in the predictor variables.

# Linear Regression in Tensorflow

In our example, we will use a data set which contains the number of fires in an area and the number of thefts in that area in Chicago. Our goal is to predict the number of thefts based on the number of fires.

Let’s check out the data set first to have an idea.

`head(data)`

```
## X Y
## 1 6.2 29
## 2 9.5 44
## 3 10.5 36
## 4 7.7 37
## 5 8.6 53
## 6 34.1 68
```

`plot(data)`

Let’s standardize our data so that we can achieve better accuracy.

```
data = scale(data)
data = as.data.frame(data)
```

Let’s see how the scaled data looks like.

`plot(data)`

The goal of this tutorial is to try out different regression models (both simple linear and polynomial regressions). The models will be trained using the mean squared error.

The first approach to fit a model to the data will be a simple linear regression. Therefore, we assume that the relationship between the number of fires and thefts is linear.

The first thing we should do is to create the placeholders (input ad label):

```
X = tf$placeholder(tf$float32, name = "X")
Y = tf$placeholder(tf$float32, name = "Y")
```

Having created the placeholders we should create, the variables will be updated during the training.

```
W = tf$Variable(0.0, name = "weights")
b = tf$Variable(0.0, name = "bias")
```

Moreover, let’s create the initialization operation for the variables.

`init_op = tf$global_variables_initializer()`

After the definition of placeholders and the variables, we will construct the model to predict “Y.” Remember that the formula of this model should be:

Y = X * W + b

`pred = tf$add(tf$multiply(X, W),b)`

In order to train the model, we need to define a loss function that gives an indication of how our model performs based on which parameters (weights and bias) are updated.

`loss = tf$square(Y - pred, name = "loss")`

There are many optimization algorithms that help us minimize the loss. The most basic is the gradient descent algorithm. Today, we have many algorithms that outperform it. In this tutorial, we will use the simplest, since the goal is not a state-of-art performance, but your comprehension of Linear Regression in Tensorflow.

`optimizer = tf$train$GradientDescentOptimizer(learning_rate = 0.001)$minimize(loss)`

Now that we have defined all the data types and operations, we will go ahead and train the model.

```
with(tf$Session() %as% sess, {
sess$run(init_op) # Let's intialize the variables
# Having initialized the parameters let's start the training
for( i in 1:100){ # run for 100 epochs
for (j in 1:nrow(data)){ # go over all the training examples
sess$run(optimizer, feed_dict = dict(X=data$X[j], Y=data$Y[j]))
}
}
w_value = sess$run(W)
b_value = sess$run(b)
print(c(w_value, b_value))
})
```

`## [1] 0.552182734 -0.002773606`

Let’s plot it to see how it fits.

```
x = seq(-5, 5, 0.01)
y = w_value*x + b_value
plot(data)
lines(x, y, type="l")
```

We can see that it is doesn’t fit quite well. Let’s try a polynomial regression. Let’s try to use a 2nd degree polynomial regression and see how it performs. The only difference is that we add a parameter that will be multiplied by the squared value of independent variables.

```
U = tf$Variable(0.0, name = "quad_weights")
init_op = tf$global_variables_initializer()
```

`pred = tf$add(tf$multiply(X,tf$multiply(X,U)),tf$add(tf$multiply(X, W),b))`

`loss = tf$square(Y - pred, name = "loss")`

`optimizer = tf$train$GradientDescentOptimizer(learning_rate = 0.001)$minimize(loss)`

```
with(tf$Session() %as% sess, {
sess$run(init_op) # Let's intialize the variables
# Having initialized the parameters let's start the training
for( i in 1:100){ # run for 100 epochs
for (j in 1:nrow(data)){ # go over all the training examples
sess$run(optimizer, feed_dict = dict(X=data$X[j], Y=data$Y[j]))
}
}
w_value = sess$run(W)
b_value = sess$run(b)
u_value = sess$run(U)
print(c(w_value, u_value, b_value))
})
```

`## [1] 0.2931574 0.2319304 -0.2213118`

Let’s plot it to see how it fits.

Y = X^2*U + X*W + b

```
x = seq(-5,5,0.01)
y = u_value *x^2 + w_value*x + b_value
plot(data)
lines(x, y, type="l")
```

It is getting close; let’s check out a 3rd degree polynomial.

The formula should be:

Y = X^3 * V + X^2*U + X*W + b

```
V = tf$Variable(0.0, name = "quad_3_weights")
init_op = tf$global_variables_initializer()
```

`pred = tf$add(tf$multiply(tf$pow(X,3),V),tf$add(tf$multiply(tf$pow(X,2),U),tf$add(tf$multiply(X,W),b)))`

`loss = tf$square(Y - pred, name = "loss")`

`optimizer = tf$train$GradientDescentOptimizer(learning_rate = 0.001)$minimize(loss)`

```
with(tf$Session() %as% sess, {
sess$run(init_op) # Let's intialize the variables
# Having initialized the parameters let's start the training
for( i in 1:100){ # run for 100 epochs
for (j in 1:nrow(data)){ # go over all the training examples
sess$run(optimizer, feed_dict = dict(X=data$X[j], Y=data$Y[j]))
}
}
w_value = sess$run(W)
b_value = sess$run(b)
u_value = sess$run(U)
v_value = sess$run(V)
print(c(v_value, w_value, u_value, b_value))
})
```

`## [1] 0.5549336 -0.4178921 -0.7707534 0.1196232`

```
x = seq(-5,5,0.01)
y = v_value*x^3 + u_value *x^2 + w_value*x + b_value
plot(data)
lines(x, y, type="l")
```

It seems like it fits better. However, there is the danger of over-fitting. On the next post, we will go through some practical exercises of regression modeling.

**What's next:**

- Explore all our (>1000) R exercises
- Find an R course using our R Course Finder directory
- Subscribe to receive weekly updates and bonus sets by email
- Share with your friends and colleagues using the buttons below

## Leave a Reply