Text Using Markdown

If you double click on this cell, you will see the text change so that all of the formatting is removed. This allows you to edit this block of text. This block of text is written using Markdown, which is a way to format text using headers, links, italics, and many other options. Hit shift + enter or shift + return to show the formatted text again. This is called "running" the cell, and you can also do it using the run button in the toolbar.

Code cells

One great advantage of IPython notebooks is that you can show your Python code alongside the results, add comments to the code, or even add blocks of text using Markdown. These notebooks allow you to collaborate with others and share your work. The following cell is a code cell.


In [2]:
# Hit shift + enter or use the run button to run this cell and see the results

print 'hello world11_0_11'
print 'hello world'


hello world11_0_11
hello world

In [6]:
# The last line of every code cell will be displayed by default, 
# even if you don't print it. Run this cell to see how this works.

print 2 + 2 # The result of this line will not be displayed
print 3 + 3 # The result of this line will be displayed, because it is the last line of the cell


4
6

Nicely formatted results

IPython notebooks allow you to display nicely formatted results, such as plots and tables, directly in the notebook. You'll learn how to use the following libraries later on in this course, but for now here's a preview of what IPython notebook can do.


In [ ]:
# If you run this cell, you should see the values displayed as a table.

# Pandas is a software library for data manipulation and analysis. You'll learn to use it later in this course.
import pandas as pd

df = pd.DataFrame({'a': [2, 4, 6, 8], 'b': [1, 3, 5, 7]})
df

In [9]:
# If you run this cell, you should see a scatter plot of the function y = x^2

%pylab inline
import matplotlib.pyplot as plt

xs = range(-30, 31)
ys = [x ** 2 for x in xs]

plt.scatter(xs, ys)


Populating the interactive namespace from numpy and matplotlib
Out[9]:
<matplotlib.collections.PathCollection at 0xb58fda0>

Creating cells

To create a new code cell, click "Insert > Insert Cell [Above or Below]". A code cell will automatically be created.

To create a new markdown cell, first follow the process above to create a code cell, then change the type from "Code" to "Markdown" using the dropdown next to the run, stop, and restart buttons.

Some Markdown data

Re-running cells

If you find a bug in your code, you can always update the cell and re-run it. However, any cells that come afterward won't be automatically updated. Try it out below. First run each of the three cells. The first two don't have any output, but you will be able to tell they've run because a number will appear next to them, for example, "In [5]". The third cell should output the message "Intro to Data Analysis is awesome!"


In [16]:
class_name = "BRUCE Woodley Intro to Data Analysis"

In [19]:
message = class_name + " is awesome!"

In [20]:
message


Out[20]:
'BRUCE Woodley Intro to Data Analysis is awesome!'

Once you've run all three cells, try modifying the first one to set class_name to your name, rather than "Intro to Data Analysis", so you can print that you are awesome. Then rerun the first and third cells without rerunning the second.

You should have seen that the third cell still printed "Intro to Data Analysis is awesome!" That's because you didn't rerun the second cell, so even though the class_name variable was updated, the message variable was not. Now try rerunning the second cell, and then the third.

You should have seen the output change to "your name is awesome!" Often, after changing a cell, you'll want to rerun all the cells below it. You can do that quickly by clicking "Cell > Run All Below".


In [2]:
import unicodecsv

with open("enrollments.csv","rb") as filein :
     line = unicodecsv.DictReader(filein)
     print("type(line) \t",type(line))  
     enrollments = list(line)
print enrollments[0]


('type(line) \t', <type 'instance'>)
{u'status': u'canceled', u'is_udacity': u'True', u'is_canceled': u'True', u'join_date': u'2014-11-10', u'account_key': u'448', u'cancel_date': u'2015-01-14', u'days_to_cancel': u'65'}

In [3]:
import unicodecsv

with open("daily_engagement.csv","rb") as filein :
     line = unicodecsv.DictReader(filein)
     #print("type(line) \t",type(line))  
     daily_engagement = list(line)
print daily_engagement[0]


{u'lessons_completed': u'0.0', u'num_courses_visited': u'1.0', u'total_minutes_visited': u'11.6793745', u'projects_completed': u'0.0', u'acct': u'0', u'utc_date': u'2015-01-09'}

In [4]:
import unicodecsv

with open("project_submissions.csv","rb") as filein :
     line = unicodecsv.DictReader(filein)
     project_submissions_fieldnames = line.fieldnames 
     #print("type(line) \t",type(line))
     print("project_submissions_fieldnames = ",str(project_submissions_fieldnames))
     project_submissions = list(line)
print project_submissions[0]


('project_submissions_fieldnames = ', "[u'creation_date', u'completion_date', u'assigned_rating', u'account_key', u'lesson_key', u'processing_state']")
{u'lesson_key': u'3176718735', u'processing_state': u'EVALUATED', u'account_key': u'256', u'assigned_rating': u'UNGRADED', u'completion_date': u'2015-01-16', u'creation_date': u'2015-01-14'}

Fixing Data Types.


In [5]:
# Fixing Data Types.
# Hit shift + enter or use the run button to run this cell and see the results
from datetime import datetime as dt

# Takes a date as a string, and returns a Python datetime object. 
# If there is no date given, returns None
def parse_date(date):
    if date == '':
        return None
    else:
        return dt.strptime(date, '%Y-%m-%d')
    
# Takes a string which is either an empty string or represents an integer,
# and returns an int or None.
def parse_maybe_int(i):
    if i == '':
        return None
    else:
        return int(i)
    
print(" type(enrollment) " , type(enrollment))
# Clean up the data types in the enrollments table
for enrollment in enrollments:
    enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])
    enrollment['days_to_cancel'] = parse_maybe_int(enrollment['days_to_cancel'])
    enrollment['is_canceled'] = enrollment['is_canceled'] == 'True'
    enrollment['is_udacity'] = enrollment['is_udacity'] == 'True'
    enrollment['join_date'] = parse_date(enrollment['join_date'])
    
enrollments[0]


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-08421e9dad61> in <module>()
     19         return int(i)
     20 
---> 21 print(" type(enrollment) " , type(enrollment))
     22 # Clean up the data types in the enrollments table
     23 for enrollment in enrollments:

NameError: name 'enrollment' is not defined

In [6]:
# enrollments
# daily_engagement
# project_submission
# these are all a "List of Dictionaries"
import sys 
import os 
import string 
import time 


#print(type(enrollments),len(enrollments) )
enrollments_set = set()
for line in enrollments :
  enrollments_set.add(line['account_key'] )  
print("enrollments",type(enrollments), " row total: ",len(enrollments),  " total students: ", len(enrollments_set) )

#print(type(daily_engagement), len(daily_engagement) ) 
daily_engagement_set = set()
for line in daily_engagement :
  daily_engagement_set.add(line['acct'] )  
print("daily_engagement", type(daily_engagement)," row total: ",len(daily_engagement),  " total students: ", len(daily_engagement_set) )

#print(type(project_submissions), len(project_submissions) )
project_submissions_set = set()
for line in project_submissions :
  project_submissions_set.add(line['account_key'] )  
print("project_submissions", type(project_submissions)," row total: ",len(project_submissions),  " total students: ", len(project_submissions_set) )

print(" ")
print('REM: these are all a "List of Dictionaries"...!')


('enrollments', <type 'list'>, ' row total: ', 1640, ' total students: ', 1302)
('daily_engagement', <type 'list'>, ' row total: ', 136240, ' total students: ', 1237)
('project_submissions', <type 'list'>, ' row total: ', 3642, ' total students: ', 743)
 
REM: these are all a "List of Dictionaries"...!

In [ ]:


In [ ]: