Machine Learning Begins

Definition


Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed.

Arthur Samuel (1959)


A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Tom Mitchell (1998)


Source: [Coursera's "What is Machine Learning?" lecture](https://pt.coursera.org/learn/machine-learning/lecture/Ujm7v/what-is-machine-learning)

Hype

Skill set

Data Science Metromap

Typical kind of resources we find in the web (and our reactions)

How neural network layers internally work


Regularized linear regression formula

Typical concerns

But I won't manage to master everything on Data Science Metromap!


But I don't understand those mathematical formulas! I'm not a statistician!


But I can't understand exactly how the internals work!


But I don't have Big Data!

Machine Learning Pyramid

(from the Computer Science point of view)

Baby steps

We can start just understanding how different algorithms work and when and how to use them to solve problems!

Model

  • Simple workflow (generic example)

  • Supervised learning workflow (example with an image as input)

Steps to build a great model


Some extra concerns

  • Bias
  • Underfitting
  • Overfitting

Types of Algorithms


Machine Learning Types

  • Supervised Learning (predictive modeling, labeled data)
    • Classification (Diagnose alzheimer positive or negative in image)
    • Regression (Predict population growth)
  • Unsupervised Learning (descriptive modeling, unlabeled data)
    • Clustering (Customer segmentation)
      • Association (Product recommendation)

Exercise!

  • Given a Stack Overflow post, find other similar Stack Overflow posts
  • Categorize a new candidate's interview data into Jr/Mid/Sr
  • Discover groups of users with similar interest in GitHub
  • Rank the best NYSE options to invest
  • Crawl a real estate website and then estimate the pricing of your own house

Some technologies

Programming languages

Code share and visualization

Data structure packages

  • NumPy (N-dimensional array package)
  • Pandas (Data frame and data analysis package)

Data visualization

ML algorithms & ecosystem

Distributed data processing

AIaaS/MLaaS

But what algorithm should I use?