A mathematician and a physicist are in an old cabin in the woods. With them is their psychologist friend who is collecting data on how mathematicians and physicists think. The psychologist poses the following problem.
"Here's a kettle, a tap with running water, a box of matches, and a functioning stove. Your task is to boil a kettle full of water. How would you do it?"
The physicist pipes up first. "I'd fill the kettle with the water from the tap and light the stove with the match. We then wait, let's see...", and he proceeds to calculate the time it would take for the water to boil given the barometric pressure and humidity at their location.
The mathematician says, "Yes, I'd do the same thing."
The psychologist is a bit disappointed. This is a null result and null results don't get published #@!. But then he thinks of an ingenious second experiement. He says, "Here's an already fully filled kettle on the stove. Your task is to boil a kettle full of water. How would you do it?"
Once again, the physicist is first to answer. "Are you joking? I'd light the stove and wait 3 minutes and 24 seconds (plus or minus 8 seconds) for the water to boil."
The mathematician is looking out the window, completely bored by this line of questioning. He sighs, "It's trivial...I'd just pour out the water, reducing the problem to the one that's already been solved."
The model is just a fancy way of saying the process or the way by which the input is turned into or transformed into the output. In this case, the series of steps that transform the input into the output are:
The model and its parameters are closely tied. If a model gives us the structure of the transformation, the parameters specify the amount or the extent to which the input has been transformed into the output. Think of parameters as the knobs you turn to increase or decrease the amount of transformation of the input. Parameters have values; as these values change the output produced by the model changes.
In this case, the paramters and their allowed values are:
In [1]:
from tabulate import tabulate as tabl
params = [["Kettle Water Timing", "Full", "Not Full"],
["Match", "Lit", "Not Lit"],
["Stove", "Lit", "Not Lit"],
["Kettle Position", "On Stove", "Not On Stove"]
]
headers = ["Parameter", "Value 1", "Value 2"]
print tabl(params, headers)
Typically, parameter values take on a range of values -- an infinite range is quite common. Even in this case, we've simplified the possible values of the Kettle Position parameter to On Stove and Not On Stove. But you could have this parameter take on the value of the three coordinates of spatial position. For the moment, we'll keep it simple.
A machine goes about solving this problem by trying out a whole bunch of parameter values. To distinguish which parameter values are on the right track and which ones aren't, we need to define the cost of getting it wrong. In this case if
Suppose we define the cost as:
Cost(param 1, param 2, param 3, param 4) = 100 - temperature of water in kettle
Then the cost for our parameter values above is 0. For any other set of parameter values the cost is 100 because the water cannot boil in those conditions.
The aim of thinking like a machine (or, machine learning) is to find the set of parameter values that minimize the cost of getting it wrong.
The brute force way of minimizing the cost of getting it wrong is to try out every possible combination of parameter values. In this case we can get away with the brute force method of trying out parameter values. But even in relative simple problems the brute force method is not feasible. The method that's most commonly used is called Gradient Descent. We'll see the details in the next chapter.
The iterative method starts with a particular set of parameter values. The cost for that set of values is calculated. Based on the properties of the cost function at this value, the next set of parameter values is chosen. How this choice is made is a property of the particular iterative method that is chosen. The cost for this next set of parameter values is now calculated and the process unfolds as it did before to choosing the next set of parameter values. The process ends when the minimum value is arrived at.
The key point is that the method is iterative -- as more steps are taken, the parameter values come closer and closer to those values that will minimize the cost of getting it wrong.
Following steps 1 through 6 will lead the machine to the set of parameter values that minimize the cost function. This particular set of parameter values is the solution to the problem. By using this process, the machine is said to have "learned" the solution to the problem.
To think like a machine is to think of every problem as a (giant) optimization problem. The optimization problem is broken down into the following 7 steps.
The final step is what machines make so easy for us to do. That's where we harness the incredible computational power we have at our fingertips.
Being able to set problems up as optimization problems for a computing machine is the crux of machine learning. It also shows you that human intelligence -- how to set things up to make it possible for the machine to find the optimal answer -- is a critical part of machine learning.