Think Like a Machine - Chapter 1

Introduction

How Mathematicians Think

A mathematician and a physicist are in an old cabin in the woods. With them is their psychologist friend who is collecting data on how mathematicians and physicists think. The psychologist poses the following problem.

"Here's a kettle, a tap with running water, a box of matches, and a functioning stove. Your task is to boil a kettle full of water. How would you do it?"

The physicist pipes up first. "I'd fill the kettle with the water from the tap and light the stove with the match. We then wait, let's see...", and he proceeds to calculate the time it would take for the water to boil given the barometric pressure and humidity at their location.

The mathematician says, "Yes, I'd do the same thing."

The psychologist is a bit disappointed. This is a null result and null results don't get published #@!. But then he thinks of an ingenious second experiement. He says, "Here's an already fully filled kettle on the stove. Your task is to boil a kettle full of water. How would you do it?"

Once again, the physicist is first to answer. "Are you joking? I'd light the stove and wait 3 minutes and 24 seconds (plus or minus 8 seconds) for the water to boil."

The mathematician is looking out the window, completely bored by this line of questioning. He sighs, "It's trivial...I'd just pour out the water, reducing the problem to the one that's already been solved."

How Machines Think

For his next experiment, the psychologist has brought along a computer, aka a calculating machine. It's the same problem as before -- get the kettle full of water boiling. How does the machine think through this problem?

Step 1: Define the Input(s)

The machine defines the input as temperature of the tap water. This might strike you as strange, but roll with it for the moment. The input value can take one of a range of possible values, say from 5 to 100 degrees centigrade.

Step 2: Define the Output

There's usually only one output. In this case its the temperature of water in a full kettle. If the kettle is not full, the output is undefined. The world is a messy place.

Step 3: Define the Model

The model is just a fancy way of saying the process or the way by which the input is turned into or transformed into the output. In this case, the series of steps that transform the input into the output are:

  • Fill the kettle
  • Light the match
  • Light the stove
  • Move the kettle onto the stove

Step 4: Define the Parameters of the Model

The model and its parameters are closely tied. If a model gives us the structure of the transformation, the parameters specify the amount or the extent to which the input has been transformed into the output. Think of parameters as the knobs you turn to increase or decrease the amount of transformation of the input. Parameters have values; as these values change the output produced by the model changes.

In this case, the paramters and their allowed values are:


In [1]:
from tabulate import tabulate as tabl
params = [["Kettle Water Timing", "Full", "Not Full"],
          ["Match", "Lit", "Not Lit"],
          ["Stove", "Lit", "Not Lit"],
          ["Kettle Position", "On Stove", "Not On Stove"] 
         ]
headers = ["Parameter", "Value 1", "Value 2"]
print tabl(params, headers)


Parameter            Value 1    Value 2
-------------------  ---------  ------------
Kettle Water Timing  Full       Not Full
Match                Lit        Not Lit
Stove                Lit        Not Lit
Kettle Position      On Stove   Not On Stove

Typically, parameter values take on a range of values -- an infinite range is quite common. Even in this case, we've simplified the possible values of the Kettle Position parameter to On Stove and Not On Stove. But you could have this parameter take on the value of the three coordinates of spatial position. For the moment, we'll keep it simple.

Step 5: Define the Cost of Getting it Wrong

A machine goes about solving this problem by trying out a whole bunch of parameter values. To distinguish which parameter values are on the right track and which ones aren't, we need to define the cost of getting it wrong. In this case if

  • Kettle Water Level = Full
  • Match = Lit
  • Stove = Lit
  • Kettle Position = On Stove then the output is water boiling. We've got the parameters right. The cost of getting it wrong should now be 0.

Suppose we define the cost as:

Cost(param 1, param 2, param 3, param 4) = 100 - temperature of water in kettle

Then the cost for our parameter values above is 0. For any other set of parameter values the cost is 100 because the water cannot boil in those conditions.

The aim of thinking like a machine (or, machine learning) is to find the set of parameter values that minimize the cost of getting it wrong.

Step 6: Pick an Iterative Method for Minimizing the Cost of Getting it Wrong

The brute force way of minimizing the cost of getting it wrong is to try out every possible combination of parameter values. In this case we can get away with the brute force method of trying out parameter values. But even in relative simple problems the brute force method is not feasible. The method that's most commonly used is called Gradient Descent. We'll see the details in the next chapter.

The iterative method starts with a particular set of parameter values. The cost for that set of values is calculated. Based on the properties of the cost function at this value, the next set of parameter values is chosen. How this choice is made is a property of the particular iterative method that is chosen. The cost for this next set of parameter values is now calculated and the process unfolds as it did before to choosing the next set of parameter values. The process ends when the minimum value is arrived at.

The key point is that the method is iterative -- as more steps are taken, the parameter values come closer and closer to those values that will minimize the cost of getting it wrong.

Step 7: Implement the Iterative Method

Following steps 1 through 6 will lead the machine to the set of parameter values that minimize the cost function. This particular set of parameter values is the solution to the problem. By using this process, the machine is said to have "learned" the solution to the problem.

Exercise 1-1

You run a retail website. When a visitor buys an item, you recommend a list of other items that they may also be interested in. How would a machine think through this problem? What is the cost of getting it wrong in this case?

Summary

To think like a machine is to think of every problem as a (giant) optimization problem. The optimization problem is broken down into the following 7 steps.

  • Step 1: Define the input(s)
  • Step 2: Define the output
  • Step 3: Define the model
  • Step 4: Define the parameters of the model
  • Step 5: Define the cost of getting it wrong
  • Step 6: Pick an iterative method for minimizing the cost of getting it wrong
  • Step 7: Implement the iterative method

The final step is what machines make so easy for us to do. That's where we harness the incredible computational power we have at our fingertips.

Being able to set problems up as optimization problems for a computing machine is the crux of machine learning. It also shows you that human intelligence -- how to set things up to make it possible for the machine to find the optimal answer -- is a critical part of machine learning.