NumPy, Data Science, and IMQAV

  • Ingest
  • Model
  • Query
  • Analyze
  • Visualize

Application of IMQAV

  • Organization
  • Architecture
  • Set of Tasks

Ingest

Ingestion is a set of software engineering techniques to adapt high volumes of data that arrive rapidly (often via streaming).

  • Kafka
  • RabbitMQ
  • Fluentd
  • Sqoop
  • Kinesis (AWS)

Model

Modeling is a set of data architecture techniques to create data storage that is appropriate for a particular domain.

  • Relational
    • MySQL
    • Postgres
    • RDS (AWS)
  • Key Value
    • Redis
    • Riak
    • DynamoDB (AWS)
  • Columnar
    • Casandra
    • HBase
    • RedShift (AWS)
  • Document
    • MongoDB
    • ElasticSearch
    • CouchBase
  • Graph
    • Neo4J
    • OrientDB
    • ArangoDB

Query

Query refers to extracting data (from storage) and modifying that data to accommodate anomalies such as missing data.

  • Batch
    • MapReduce
    • Spark
    • Elastic MapReduce (AWS)
  • Batch SQL
    • Hive
    • Presto
    • Drill
  • Streaming
    • Storm
    • Spark Streaming
    • Samza

Analyze

Analyze is a broad category that includes techniques from computer science, mathematical modeling, artificial intelligence, statistics, and other disciplines.

NumPy is included within 'Analyze'

  • Statistics
    • SPSS
    • SAS
    • R
    • Statsmodels
    • SciPy
    • Pandas
  • Optimization and Mathematical Modeling (SciPy and other libraries)
    • Linear, Integer, Dynamic, Programming
    • Gradient and Lagrange methods
  • Machine Learning
    • Batch
      • H2O
      • Mahout
      • SparkML
    • Interactive
      • scikit-learn

Visualize

Visualize refers to transforming data into visually attractive and informative formats.

  • matplotlib
  • seaborn
  • bokeh
  • pandas
  • D3
  • Tableau
  • Leaflet
  • Highcharts
  • Kibana

In [ ]: