How to use programming skills for data analysis, or "data science," the new, hot term
In [1]:
# you can also access this directly:
from PIL import Image
im = Image.open("DataScienceProcess.jpg")
im
#path=\'DataScienceProcess.jpg'
#image=Image.open(path)
Out[1]:
Leveraging a tutorial by David Beazley, Ian Stokes-Rees and Continuum Analytics http://localhost:8888/notebooks/Dropbox/Python/Harvard%20SEAS%20Tutorial/python-mastery-isr19-master/1-PythonReview.ipynb
and other resources
In [2]:
# Comments
# ls list of the files in this folder. See below.
This line will make an error because this line is not python code and this is a code cell.
# Leveraging
#http://localhost:8888/notebooks/Dropbox/Python/Harvard%20SEAS%20Tutorial/python-mastery-isr19-master/1-PythonReview.ipynb
In [3]:
ls # NOT PYTHON! command line
In [4]:
pwd # ALSO NOT PYTHON! Shows what folder you are in.
Out[4]:
In [5]:
# math
1+2
Out[5]:
In [6]:
4000*3
Out[6]:
In [18]:
import math
math.sqrt(2)
Out[18]:
In [10]:
2 ** (0.5)
Out[10]:
In [11]:
637*532.6
Out[11]:
In [14]:
from __future__ import division
1/2
Out[14]:
In [17]:
(8+5)*4
Out[17]:
In [ ]:
In [19]:
# Create a variable
name = 'Perry Grossman'
In [20]:
# Print the variable
name
Out[20]:
In [23]:
name[6]
Out[23]:
Floor numbering is the numbering scheme used for a building's floors. There are two major schemes in use across the world. In one system, used in the majority of Europe, the ground floor is the floor on the ground and often has no number or is assigned the number zero. Therefore, the next floor up is assigned the number 1 and is the first floor. The other system, used primarily in the United States and Canada, counts the bottom floor as number 1 or first floor.
In [25]:
from functools import partial
# https://docs.python.org/2/library/functools.html
from random import choice, randint
In [38]:
choice('yes no maybe'.split()) # split is a method
Out[38]:
In [42]:
for i in range(10):
print("Call me " + choice('yes no maybe'.split()))
In [46]:
randint(1, 6)
Out[46]:
In [49]:
# If you need dice, try this:
roll = partial(randint, 1, 20)
In [55]:
roll()
# how would you make 20 sided dice?
Out[55]:
In [56]:
# Create a list of numbers
vals = [3, -8, 2, 7, 6, 2, 5, 12, 4, 9]
In [57]:
#Find the even numbers
evens = []
for v in vals:
if v%2 == 0:
evens.append(v)
#How is this working?
In [58]:
evens
Out[58]:
In [60]:
squares = []
for v in vals:
squares.append(v*v)
In [61]:
squares
Out[61]:
In [62]:
bigsquares = []
for v in vals:
s = v*v
if s > 10:
bigsquares.append(s)
In [63]:
bigsquares
Out[63]:
See this for more data examples:
http://localhost:8888/notebooks/Dropbox/Python/Dataminingtutorial/Python_For_Data_Mining-Copy1.ipynb
http://localhost:8888/notebooks/Dropbox/Python/Regression/08_linear_regression-Copy1.ipynb
http://localhost:8888/notebooks/Dropbox/Python/Test/PyDataNYC_2013_tutorial_pg.ipynb
Am I a little obsessed with Python? Perhaps
(rough)
http://localhost:8888/notebooks/Dropbox/Python/PandasTutorialFiles/NYCPython_FoodDB.ipynb
(very rough)
http://localhost:8888/notebooks/Dropbox/Python/PandasTutorialFiles/FEC.v2-Copy_pg.ipynb
The percentage composition of fatty acids found in the lipid fraction of Italian olive oils', with oils from 3 regions of Italy: the North, the South, and Sardinia.
Forina, M., Armanino, C., Lanteri, S. & Tiscornia, E. (1983), Classification of Olive Oils from their Fatty Acid Composition, in Martens, H. and Russwurm Jr., H., eds, Food Research and Data Analysis, Applied Science Publishers, London, pp. 189–214.
Boston Python meetup, with pizza!: http://www.meetup.com/bostonpython/
Anaconda integrates Python packages
Ian Stokes-Rees, parent of your former classmate, Maggie Stokes-Rees, works at Continuum Analytics, the company that makes anaconda and wakari, a web-based python solution.
Can be used from the web, from chromebooks
Allen Downey