Cost Function in Single-Variable Linear Regression

CSPeach
3 min readJan 21, 2024

--

In machine learning, a cost function, also known as a loss function, is a fundamental concept used to measure the performance of a machine learning model. The cost function quantifies the error between predicted values and actual values and presents it in the form of a single real number.

For example, if we have a dataset for (house size, house price)

import numpy as np

# house size in 1000s of sqft
x_house_size = np.array([1.0, 2.0, 3.0, 5.0])

# house price in 1000s of dollars
y_house_price = np.array([300.0, 500.0, 700.0, 1000.0])

We want to apply linear regression to create a model to predict the house price given any house size. Given the formula for linear regression model is

w is the weight/slope, b is the bias/intercept

We need to figure out what are the best w and b choices. By best choice, we mean the straight line that can fit the data the best.

FigA: comparison between 2 difference choices of w&b

We can use the cost function to compute which choice is better mathematically. The equation for computing cost with one variable is

where yi is the actual outcome and yi_hat is the prediction

The implementation of the cost function in python will be

# x: the prediction input, in our example, it will be the x_house_size
# y: the actual outcomes, in our example, it will be the y_house_price
# w: slope/weight we choose
# b: bias/intercept we choose
def compute_cost(x, y, w, b):
# number of training examples
m = x.shape[0]

cost_sum = 0
for i in range(m):
# yi_hat will be the predicted value given x
yi_hat = w * x[i] + b
cost = (y[i] - yi_hat) ** 2
cost_sum = cost_sum + cost
total_cost = (1/(2*m)) * cost_sum

return total_cost

We can use the compute_cost function to compare the 2 choices between {w=200, b=100} vs {w=400, b=200}

# house size in 1000s of sqft
x_house_size = np.array([1.0, 2.0, 3.0, 5.0])

# house price in 1000s of dollars
y_house_price = np.array([300.0, 500.0, 700.0, 1000.0])

cost_1 = compute_cost(x_house_size, y_house_price, w=200, b=100)
print(f"cost_1 with w=200 and b=100: {cost_1}")

cost_2 = compute_cost(x_house_size, y_house_price, w=400, b=200)
print(f"cost_2 with w=400 and b=200: {cost_2}")
output of running compute_cost with two w & b variance

Since cost_1 has smaller cost compared to cost_2 , it proves {w=200,b=100} is a better choice, which align with what we see in the figure FigA above.

To learn more about the intuition behind the cost function, check out the Mean Squared Error (MSE) blog post below. The two functions are very similar, except that cost function will divide by 2m instead of m , which is for mathematic convenience, specifically related to the gradient descent optimization process.

--

--

Responses (1)