In machine learning, a cost function, also known as a loss function, is a fundamental concept used to measure the performance of a machine learning model. The cost function quantifies the error between predicted values and actual values and presents it in the form of a single real number.
For example, if we have a dataset for (house size, house price)
import numpy as np
# house size in 1000s of sqft
x_house_size = np.array([1.0, 2.0, 3.0, 5.0])
# house price in 1000s of dollars
y_house_price = np.array([300.0, 500.0, 700.0, 1000.0])
We want to apply linear regression to create a model to predict the house price given any house size. Given the formula for linear regression model is
We need to figure out what are the best w
and b
choices. By best choice, we mean the straight line that can fit the data the best.
We can use the cost function to compute which choice is better mathematically. The equation for computing cost with one variable is
The implementation of the cost function in python will be
# x: the prediction input, in our example, it will be the x_house_size
# y: the actual outcomes, in our example, it will be the y_house_price
# w: slope/weight we choose
# b: bias/intercept we choose
def compute_cost(x, y, w, b):
# number of training examples
m = x.shape[0]
cost_sum = 0
for i in range(m):
# yi_hat will be the predicted value given x
yi_hat = w * x[i] + b
cost = (y[i] - yi_hat) ** 2
cost_sum = cost_sum + cost
total_cost = (1/(2*m)) * cost_sum
return total_cost
We can use the compute_cost
function to compare the 2 choices between {w=200, b=100} vs {w=400, b=200}
# house size in 1000s of sqft
x_house_size = np.array([1.0, 2.0, 3.0, 5.0])
# house price in 1000s of dollars
y_house_price = np.array([300.0, 500.0, 700.0, 1000.0])
cost_1 = compute_cost(x_house_size, y_house_price, w=200, b=100)
print(f"cost_1 with w=200 and b=100: {cost_1}")
cost_2 = compute_cost(x_house_size, y_house_price, w=400, b=200)
print(f"cost_2 with w=400 and b=200: {cost_2}")
Since cost_1
has smaller cost compared to cost_2
, it proves {w=200,b=100} is a better choice, which align with what we see in the figure FigA
above.
To learn more about the intuition behind the cost function, check out the Mean Squared Error (MSE) blog post below. The two functions are very similar, except that cost function will divide by 2m
instead of m
, which is for mathematic convenience, specifically related to the gradient descent optimization process.