把系统与算法结合,设计大规模分布式的机器学习算法与系统,使得机器学习算法可以在多处理器和多机器的集群环境下作业,处理更大量级的数据。 这方面较为知名的系统包括:
In [7]:
from IPython.display import display_html, HTML
HTML('<iframe src=http://ccc.nju.edu.cn/newsmap/ width=1000 height=500></iframe>')
# the webpage we would like to crawl
Out[7]:
imMens: Real-time Visual Querying of Big Data from Stanford Visualization Group on Vimeo.
Bin-Summarize-Smooth: A Framework for Visualizing Large Data (Hadley Wickham)
"Why Exploring Big Data is Hard and What We Can Do About It", Danyel Fisher's talk at OpenVisConf 2015
d. boyd and K. Crawford, "Critical Questions for Big Data"
Information, Communication & Society Volume 15, Issue 5, 2012 http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878
Google Flu Trends: The Limits of Big Data (NYT)
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (14 March): 1203-1205.
Halevy, Norvig, Pereira