Types of ML systems

  • trained w/ or w/o human supervision (supervised, unsupervised, semi-supervised, reinforcement learning)
  • can or cannot learn learn incrementally on the fly (online v batch learning)
  • comparing new data points to known, or detect patterns in training data and build predictive model (instance-based v model-based learning)

Supervised/Unsupervised learning

Supervised

  • labels: solutions fed to algo along with training data
  • classification: algo is trained w/ many examples with their class
  • regression: predict a target numeric value given a set of features called predictors
  • attribute: a data type (eg, "mileage")
  • feature: an attribute plus its value (eg, "mileage = 15,000")
  • logistic regression can be used for classification, as it outputs a value that corresponds to the probability of belonging to a given class

  • most important supervised learning algos

    • k-nearest neighbors
    • linear regression
    • logistic regression
    • support vector machines (SVMs)
    • decision trees and random forests
    • neural networks

Unsupervised learning

  • clustering
    • k-means
    • hierarchical cluster analysis (HCA)
    • expectation maximization
  • visualization and dimensionality reduction
    • principal component analysis (PCA)
    • kernel PCE
    • locally-linear embedding (LLE)
    • t-distributed stochastic neighbor embedding (t-SNE)
  • association rule learning
    • apriori
    • eclat
  • dimensionalty reduction: simplify data w/o loosing to much info
    • example: a car's mileage highly correlated w/ its age, so merge them into one feature
  • feature extraction: building informative and non-redundant features from an initial set of measured data
  • anomaly detection: identifying outliers
  • association rule learning: discover interesting relations between attributes

Semi-supervised learning

  • algos taking paritally labeled data, usu. lots of unlabeled data and a little labeled
    • ex: step 1 unsupervised clustering of faces from an unlabeled set of photos; step 2 human assigns a name label per face
    • most are combinations of supervised and unsupervised algos
  • deep belief networks (DBNs) are based on unsupervised components called restricted Bolzmann machines (RBMs) stack on top of one another

Reinforcement learning

  • very different
  • learning system (called an agent) can observe an environment, select and perform actions; gets rewards or penalties for its choices
  • creates a policy to get the most reward over time; a policy defines what actions an agent takes in a given situation
  • ex: robots learning to walk and AlphaGo (just applying the policy it had learned)

Batch and online learning

  • whether or not it can learn incrementally

Batch learning

  • takes all data, trains algo, takes a lot of time
  • also called offline learning
  • then launched, no more learning

Online learning

  • system trained incrementally, either individually or mini-batches
  • good for data w/ continuous data flow and need to react rapidly (eg, stock prices); also for limited computing resources
  • can also be used for out-of-core learning
  • learning rate: how fast to adapt to changing data (and thus forget old data)
    • high learning rate means adapt quickly but forget old data
    • low means more inertia, less sensitive to noise in new data or non-representative data points
  • danger of bad data input degrading system; may want to monitor input and react to abnormal data

Instance based v Model based learning

Distinction on how ML systems generalize (how it learns to perform on instances never seen)

Instance based learning

  • a system starts by looking for exact matches, then generalizes to similar examples
  • measure of similarity: ex: count of words in common between two docs (spam and unknown)

Model based learning

  • generalize from a set of examples, then make predictions
  • How to know which model performs best
    • utility function (or fitness function) that measures how good your model is
    • cost function measures how bad it is
    • for linear regression, usu. use a cost function to measure distance between model's preductions and training examples

Main challenges of Machine Learning

Insufficient data

On effectiveness of data:

  • Banko and Brill (2001): very diff algos perform similarly w/ enough data
  • Peter Norvig et al (2009). "The Unreasonable Effectiveness of Data". Data is more important than algo for complex problems; thus invest in corpus.

Non-representative training data

  • training data too small, you get sample noise
  • for large sets: if sampling method is flawed, we get sampling bias

Poor data quality

  • rm or fix outliers
  • if some instances are missing a few features, must decide whether to drop instance, fill in w/ average, ignore the attribute for all instances, or train 2 models, w/ and w/o the missing value

Irrelevant features

  • feature engineering: deciding upon good features
    • feature selection: selecting more useful features
    • feature extraction: combining features to produce a more useful one (dimensionality reduction helps)
    • creating new features

Overfitting the training data

  • overfitting: model performs well on training data but does not generalize
    • happens when model is too complex relative to amount and/or noisiness of training data
  • solutions:
    • simplify the model by selecting simpler algo, algo w/ fewer parameters, reducing number of attributes in training data, constraining model
    • get more training data
    • reduce noise in training data (fix errors, rm outliers)
  • regularization: constraining model to make is simpler, thus reducing overfitting
  • degrees of freedom: number of variables is an algo. Ex: z = Ax + y has two degrees of freedome; however if x is limited to some range, then it'll have between 1 and 2 degrees of freedom
  • hyperparameter: controlls the amount of regularization; a parameter of the algo, not the model

Underfitting the training data

  • underfitting: when the model is too simple to learn from the training data
  • to fix underfitting:
    • use more powerful model with more parameters
    • feature engineering -- better features into model
    • reduce constraints on the model (eg reducing regularization hyperparameter)

Testing and validation

  • training set: what you make model from
  • test set: what you evaulate model from
  • generalization error (or out-of-sample error): estimation of accuracy of model on unseen data
  • validation set: a second holdout set in addition to test set
  • cross-validation: divide all data into complementary test/train sets, making multple passes with different divisions; average errors; train final model on all data
  • no free lunch theorem: Wolpert (1996): there is no model that is guaranteed to work a prior better than another, must try them all