The term of data science itself is still contested, but a concise definition can be brought here:
To do data science, you have to be able to find and process large datasets. You’ll often need to understand and use programming, math, and technical communication skills. You’ll need to be a unicorn that can put together a lot of different skillsets.
- Roger Huang, Springboard blog - source
A longer definition might be the one offered by the now famous HBR article, Data Scientist: The Sexiest Job of the 21st Century (Oct 2012):
[...] what data scientists do is make discoveries while swimming in data [...] They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set. [...]
[...] Often they are creative in displaying information visually and making the patterns they find clear and compelling. They advise executives and product managers on the implications of the data for products, processes, and decisions.
Given the nascent state of their trade, it often falls to data scientists to fashion their own tools and even conduct academic-style research. Yahoo, one of the firms that employed a group of data scientists early on, was instrumental in developing Hadoop. [...]
What kind of person does all this? What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser. The combination is extremely powerful—and rare.
Source: hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
This document describes a plan to enlarge the major areas of technical work of the field of statistics. Because the plan is ambitious and implies substantial change, the altered field will be called “data science.” [...]
The six areas and their percentages are the following:
- (25%) Multidisciplinary Investigations: data analysis collaborations in a collection of subject matter areas.
- (20%) Models and Methods for Data: statistical models; methods of model building; methods of estimation and distribution based on probabilistic inference.
- (15%) Computing with Data: hardware systems; software systems; computational algorithms.
statistics: statistics
learning and generalizing: ML / ANN
bayesian generalization - one-shot learning, pymc ...
optimization field: MCDA / MODA
Machine learning types:
In [1]:
%%svg
<svg width="720" height="80"><g>
<g><rect x="0" y="0" width="150" height="70" fill="#FFF" stroke="#000"></rect>
<text x="10" y="30" font-family="Verdana" font-size="20" fill="#444">Analysis</text></g>
<g transform="translate(170,0)">
<polyline fill="none" stroke="#AAA" stroke-width="1" stroke-linecap="round" stroke-linejoin="round" points="
0.375,0.375 45.63,38.087 0.375,75.8 "/>
</g>
<g transform="translate(230,0)">
<rect x="0" y="0" width="400" height="70" fill="#FFF" stroke="#000"></rect>
<text x="10" y="30" font-family="Verdana" font-size="20" fill="#444">Modelling</text></g>
</g></svg>