Welcome to Harvard Data Ventures Workshop 0

Modified from an IPython Notebook created by Rick Muller here

Goals for today:

  1. Learn Python (if necessary)
  2. Start getting familiar with Python data munging and manipulation tools
  3. Tackle an introductory data science task

Why Python?

Python is the programming language of choice for many scientists to a large degree because it offers a great deal of power to analyze and model scientific data with relatively little overhead in terms of learning, installation or development time. It is a language you can pick up in a weekend, and use for the rest of one's life.

In particular, the Python ecosystem (consisting of libraries and tools for data analysis and the people who write that code) is extremely rich, especially in the realm of data science.

Jupyter/IPython?

What you're seeing now is an IPython or Jupyter notebook. The names are roughly interchangeable, though technically speaking they are different components. This notebook is an interactive method for running code.

Usually, we write Python code in the form of Python scripts (i.e. files with extensions .py). IPython notebooks let us write code and evaluate them right away, while saving the code being written so that we can share it.

Instructions

Just write code in the cells and press Ctrl + Enter to evaluate the cell contents. The output will show up below. For more detailed instructions, you can always reference the docs.

Plan for Today

Feel free to skip any material if you know it already.

  1. Introduction to Python
  2. More on Python Tools
  3. Pandas Intro
  4. Pandas
  5. Addendum