notebook.community

Ch1 Giving Computers the Ability to Learn from Data

머신 러닝의 시대

풍부한 (공개)데이타: MNIST, ImageNet, Kaggle
풍부한 알고리즘 : Deep Learning
풍부한 ML 오픈소스 라이브러리: SciPy, Theno, tensorflow, OpenAI
풍부한 학습 자료

머신 러닝이란?

AI 의 하부 범주로 분류되어 왔음
자가학습(Self-Learning)을 통해 Data to Knowledge
예측 모델 구축
데이터 기반 판단(Data-driven decision)

머신 러닝의 세가지 분류

지도학습(Supervised Learning)

Label이 주어진 것
학습을 통해 unseen label을 추측(predict)해 내는 것
classification, regression
ex)
- binary classification : spam/non-spam
- multiclass classification : Handwritten letter
- linear/logistic regression

강화학습(Reinforcement Learning)

Agent가 환경과의 interaction을 통해서 더 나은 퍼포먼스를 내도록 하는 것
주요 요소
- Agent
- Action
- Environment (or State)
- Reward (or feedback)
Learn a serials of actions that maximize long-term reward via exploratory trial-and-error approach

자율 학습 (Unsupervised Learning)

label도 reward도 없다.
데이터만 가지고 그 데이터가 가진 내재화된 구조를 밝혀내는 것
ex)
- 데이터 간의 유사성을 기반으로 한 동종의 그룹 찾기
- 데이터의 공통성을 근간으로 하는 데이터 압축(차원 축소)
  - 시간, 공간적 효율성, 노이즈 감소

기본적 용어 및 개념 소개

Iris-Data : 꽃 데이터, 3종류

머신 러닝 시스템 작성 절차

Preprocessing

Cleansing
- imputation, normalization(zero-centered, same scale)
feature engineering
- invention, selection, transformation(by kernel)

Model selection

어떠한 모델이 이 데이터에 적합한 것인가?

Regression?, Decision Tree?, Deep Neural Network? or Ensenble?

어떠한 평가 지표를 가지고 측정할 것인가?

인식률(classification rate), precision/recall, F-score, AIC, BIC 등

어떠한 데이터셋 구성을 가지고 측정할 것인가?

Training Set,
Validation Set
Test Set

어떻게 Hyperparameter를 최적화 할 것인가?

hyperparameter : 학습 알고리즘으로는 자동적으로 알아낼 수 없는 변수값

Python으로 머신러닝 하기

수학 라이브러리

Numpy, Scipy

머신러닝 라이브러리

Scikit-learn, panda

설치

virtualenv, pip, anaconda



In [ ]: