Hour of Code 2015

For Mr. Clifford's Class (5C)

Perry Grossman

December 2015

Introduction

From the Hour of Code to the Power of Code

How to use programming skills for data analysis, or "data science," the new, hot term


In [1]:
# you can also access this directly:
from PIL import Image
im = Image.open("DataScienceProcess.jpg")
im
#path=\'DataScienceProcess.jpg'
#image=Image.open(path)


Out[1]:

Some Basic Things

Leveraging a tutorial by David Beazley, Ian Stokes-Rees and Continuum Analytics:
http://localhost:8888/notebooks/Dropbox/Python/Harvard%20SEAS%20Tutorial/python-mastery-isr19-master/1-PythonReview.ipynb

and other resources


In [2]:
# Comments
# ls list of the files in this folder. See below.
This line will make an error because this line is not python code and this is a code cell.
# Leveraging
#http://localhost:8888/notebooks/Dropbox/Python/Harvard%20SEAS%20Tutorial/python-mastery-isr19-master/1-PythonReview.ipynb


  File "<ipython-input-2-110c11bd529c>", line 3
    This line will make an error because this line is not python code and this is a code cell.
            ^
SyntaxError: invalid syntax

In [3]:
ls # NOT PYTHON! command line


DataScienceProcess.jpg  HourofCode2015-Copy1.ipynb  Hour of Code 2015.pdf
Data_Science_VD.png     HourofCode2015.ipynb

In [4]:
pwd # ALSO NOT PYTHON! Shows what folder you are in.


Out[4]:
u'/home/perry/Dropbox/Python/Hour of Code 2015'

In [5]:
# math
1+2


Out[5]:
3

In [6]:
4000*3


Out[6]:
12000

In [18]:
import math
math.sqrt(2)


Out[18]:
1.4142135623730951

In [10]:
2 ** (0.5)


Out[10]:
1.4142135623730951

In [11]:
637*532.6


Out[11]:
339266.2

In [14]:
from __future__ import division
1/2


Out[14]:
0.5

In [17]:
(8+5)*4


Out[17]:
52

In [ ]:


In [19]:
# Create a variable
name = 'Perry Grossman'

In [20]:
# Print the variable
name


Out[20]:
'Perry Grossman'

In [23]:
name[6]


Out[23]:
'G'

Floor numbering is the numbering scheme used for a building's floors. There are two major schemes in use across the world. In one system, used in the majority of Europe, the ground floor is the floor on the ground and often has no number or is assigned the number zero. Therefore, the next floor up is assigned the number 1 and is the first floor. The other system, used primarily in the United States and Canada, counts the bottom floor as number 1 or first floor.

https://en.wikipedia.org/wiki/Storey


In [25]:
from functools import partial
# https://docs.python.org/2/library/functools.html
from random import choice, randint

In [38]:
choice('yes no maybe'.split()) # split is a method


Out[38]:
'maybe'

In [42]:
for i in range(10):
    print("Call me " + choice('yes no maybe'.split()))


Call me yes
Call me maybe
Call me yes
Call me maybe
Call me maybe
Call me yes
Call me maybe
Call me maybe
Call me no
Call me maybe

In [46]:
randint(1, 6)


Out[46]:
3

In [49]:
# If you need dice, try this:

roll = partial(randint, 1, 20)

In [55]:
roll()
# how would you make 20 sided dice?


Out[55]:
6

In [56]:
# Create a list of numbers
vals = [3, -8, 2, 7, 6, 2, 5, 12, 4, 9]

In [57]:
#Find the even numbers
evens = []
for v in vals:
    if v%2 == 0:
        evens.append(v)
#How is this working?

In [58]:
evens


Out[58]:
[-8, 2, 6, 2, 12, 4]

In [60]:
squares = []
for v in vals:
    squares.append(v*v)

In [61]:
squares


Out[61]:
[9, 64, 4, 49, 36, 4, 25, 144, 16, 81]

In [62]:
bigsquares = []
for v in vals:
    s = v*v
    if s > 10:
        bigsquares.append(s)

In [63]:
bigsquares


Out[63]:
[64, 49, 36, 25, 144, 16, 81]

See this Tutorial by Ian Stokes-Rees for more data examples:

https://github.com/ijstokes/python-mastery-isr19/blob/master/2-Data.ipynb

Now Let's use Python to Analyze US Energy Data--- sorry this is very rough-- will try to update soon

https://github.com/PerryGrossman/DI5/blob/master/EnergyConsumptionEstimates_1949_2012-Residential_Sector.ipynb

Now Let's use Python to Analyze Advertising Spend on Sales

http://localhost:8888/notebooks/Dropbox/Python/Regression/08_linear_regression-Copy1.ipynb

Now Let's use Python to Look at Citizen Complaints in NYC

http://localhost:8888/notebooks/Dropbox/Python/Test/PyDataNYC_2013_tutorial_pg.ipynb


Am I a little obsessed with Python? Perhaps

Now Let's use Python to Analyze US Presidential Election Data

(very rough)

http://localhost:8888/notebooks/Dropbox/Python/PandasTutorialFiles/FEC.v2-Copy_pg.ipynb

Now Let's use Python to Classify Olive Oils from their Fatty Acid Composition

The percentage composition of fatty acids found in the lipid fraction of Italian olive oils', with oils from 3 regions of Italy: the North, the South, and Sardinia.

Forina, M., Armanino, C., Lanteri, S. & Tiscornia, E. (1983), Classification of Olive Oils from their Fatty Acid Composition, in Martens, H. and Russwurm Jr., H., eds, Food Research and Data Analysis, Applied Science Publishers, London, pp. 189–214.

http://localhost:8888/notebooks/Dropbox/laptop-repos/OliveOilMachineLearning/Olives_at_BDF-Copy0.ipynb

Want to make some money?

Start coding algorithms in python to predict the stock market

https://www.quantopian.com/

https://www.quantopian.com/lectures

Resources

Boston Python meetup, with pizza!: http://www.meetup.com/bostonpython/

Anaconda integrates Python packages

http://anaconda.org/

Ian Stokes-Rees, parent of your former classmate, Maggie Stokes-Rees, works at Continuum Analytics, the company that makes anaconda and wakari, a web-based python solution.

https://www.wakari.io

Can be used from the web, from chromebooks