Python 3.6 Jupyter Notebook

Getting started with Jupyter Notebooks

Note: This notebook is not graded.
It serves as an introduction to the notebooks and submission process that you will use throughout the course.

Introduction

This notebook will introduce the concept of interactive notebooks to those who are unfamiliar with it. As part of the resources provisioned to all students, a dedicated Amazon Web Services (AWS) server is created for each student. Your virtual analysis environment is created using the Base Jupyter Notebook Stack for Anaconda with Python 3, as well as a number of additional installed packages that are relevant to the course, but not present in the default set of packages from Anaconda. All the necessary data, as well as the template notebooks required in this course, will be populated on this server.

1. What is a notebook?

Jupyter Notebooks are essentially documents that are designed to be easily read by both humans and computers. Jupyter Notebooks provide a web-based application that is suited to capture the whole computation process: developing, documenting, and executing code, as well as communicating the results. The elements of a typical notebook include descriptions and analyses, as well as executable code. You can share your code with others and communicate your process and results with a wide array of audiences, from technical to non-technical. This type of interactive computing allows for reporting as well as user interaction and is therefore ideally suited for educational purposes.

Note

Try Jupyter is a platform from which you can try Jupyter without installing it onto your system. Please note that this is a service provided by Jupyter and that you may experience delays in cases where many people try to access the service at the same time.

For more information, refer to the Jupyter Documentation.

During this course, you will only use Python 3 based Jupyter Notebooks and the supplied kernel is all you need to complete the exercises.

Note:

A ‘kernel’ is a program that runs and introspects the user’s code. The Anaconda distribution used in this course includes a kernel for Python 3 code. Kernels for Python 2.7, as well as for several other programming languages such as R, Julia, Octave are available, but would need to be installed separately if you plan to set up your own server in the future. Managing multiple environments is outside the scope of this course.

Notebooks differ from Integrated Development Environments (IDEs) in the sense that they are typically used to interactively analyze and visualize datasets. IDEs are used to create standalone scripts. Should your focus be on pure scripts which run as standalone items, you can refer to the links below for Python IDEs. You will not make use of IDE's during this course.

1.1 Notebook tour

The IPython shell and subsequent notebooks were created using Python. Jupyter then added support for additional programming languages. For those of you who are not familiar with IPython or Jupyter, it is worthwhile to review this quick tour of IPython notebooks.

Note:

Autocompletion is achieved by pressing "Shift+Tab" instead of "Tab" as indicated in the resource above.

You can also download the notebook for later reference here.

1.2 Running Jupyter

For the purpose of this course, your environment has been preconfigured for you. To access your virtual analysis environment, you need to navigate to your Amazon Web Services instance and enter your password (Instance ID) as shown in the image below.

You will find your login details in the “0.9 Activity: Sign in to your virtual analysis environment” component on the Online Campus in the Orientation Module.

Note:

While running the Jupyter Notebook server locally or setting up your own cloud server is possible, no technical support will be provided for local or alternative cloud installations.

Those interested in setting up local or alternative cloud installations for use outside of this course can refer to the following links:

1.3 Executing your first Python code in a Jupyter Notebook

In the toolbar at the top of the notebook you have the option to change the type of cells. The two types that you will use frequently are "code" and "markdown" cells. Code cells are used to execute code and markdown cells are used to display formatted text. We will get back to these in Module 1.

When starting with a new programming language, the first example usually begins with outputting the text "hello world".


Try this
Select the code cell (shaded in grey) below by inserting your cursor and press the ▶ (run cell) button in the toolbar above to run it.
Alternatively, you can use the "shift + enter" shortcut command to run the cell.

In [ ]:
print("hello world")

1.4 Basics

This section will demonstrate basic notebook commands. You will be provided with additional references that you can explore in your own time, should you be interested in doing so.

Note:

In Module 1 you will be provided with a notebook containing Python basics to prepare you for the remainder of the course.

1.4.1 Performing calculations

While it is possible to perform direct calculations, such as those shown in the cell below, you would typically assign the output of a calculation to variables for reuse in subsequent calculations or functions.

Execute the code cell below.
You will be provided with template notebooks where you need to execute the cells in sequence. Although this note will not be repeated in all cells, you can execute or run the code cells as demonstrated in the "hello world" example above.

Notebooks are set up to load libraries and data, create variables and perform calculations that are required by later sections and it is therefore very important that you execute all code cells in sequence from the top to bottom throughout the course.

In [ ]:
2 + 2

1.4.2 Assigning the output of a calculation to a variable


In [ ]:
# Assign the value 2 to variable a and the value 3 to variable b.
a = 2
b = 3

In [ ]:
# Assign the value of the product of a and b to the variable c.
c = a * b

In [ ]:
# Print the value of variable c.
print(c)

1.4.3 Printing text

Change the value of the variable “yourname”, which is currently set to "student", to your name and run the two cells below.


In [ ]:
yourname = "student"

In [ ]:
print('Hello', yourname)

1.4.4 Writing code

You can use built-in Python functions and write your own functions to make tasks repeatable. The two cells below define a function and then execute it. Run them to see the results.

Note:

In Module 1, you will be provided with an overview and examples of the Python programming concepts and constructs required to complete this course. For now, this notebook simply includes a sample function without additional explanation.


In [ ]:
# In Python 3, the default behaviour for print changed and the default is newline.
# To change this behaviour, you can add optional arguments.
# View the help function for the print method.
help(print)

In [ ]:
# Create a function that prints ten numbers.
def print_10_nums():
    for i in range(10):
        print(i, end=' ')

In [ ]:
# Execute your function.
print_10_nums()
Note:
It is important to note that Python uses 0-based indexing. More information on the range function will be provided in the following notebook.

1.4.5 Adding comments to your code

It is good practice to add comments to your code. This makes it easier for others to interpret your code. It can also help you to understand your own intentions and processes when revisiting notebooks at a later stage. Comments can be added by starting the line with the # symbol.


In [ ]:
# Set a new variable, d, equal to 10 and print the variable.
d = 10
print(d)

1.4.6 Help and autocomplete

You will be introduced to packages and libraries in subsequent modules. These contain reusable code where you can “call” repeatable functions instead of manually writing code or creating your own functions. To load a package or library, you first need to ensure that it has been installed. Then you may import and finally use the package.

You can execute the cell below to check whether a package has been installed on your system.


In [ ]:
# Check whether Pandas is installed.
# Import system configuration.
import sys

# Check whether Pandas package is available in system.
# Note that you may have multiple versions of Python installed in virtual environments. The statements in this
#   cell was created for your virtual analysis environment and may have different results when executed elsewhere.
'pandas' in sys.modules

If you need to display a package’s help function, you can execute the following command:


In [ ]:
help('pandas')

If you know that a package has been installed, or you have checked that it exists, you can import it to make it available for you to use in your notebook. You should be able to execute the cell below without errors if the package is available to your notebook. The selected package is part of the standard packages installed with the Anaconda Python distribution and will complete without errors.

Note:

You have to import the packages that you use for every notebook that you work on. It is good practice to load the packages once, close to the top of the notebook, prior to executing code. We will deviate from this best practice in the first module of the course, and load the packages when introducing their functionality.


In [ ]:
# Import Pandas.
import pandas as pd

You can also display the help function by using the ? symbol, as shown below. Close the output by clicking the x at the top right corner (of the output) once you have reviewed it.


In [ ]:
# Display the help for the library imported.
pd?

Autocomplete can be used to display information about the package as well as the libraries contained in the package. Place your cursor after the full stop in the cell below, and press "Shift+Tab" once to display information about the package. Pressing "Shift+Tab" a second time will display the libraries available in the package.

Note:
The cell below will produce an error if executed. Here, the intention is to demonstrate autocomplete. Please follow the instructions above.

In [ ]:
pd.

Once you have selected a library, you can also use the autocomplete to show the help function and the input parameters that can be utilized in a specific library. Place your cursor after the opening parenthesis in the cell below and press "Shift+Tab" to review the input parameters available to the specific library.

Note:
The cell below will produce an error if executed. Here, the intention is to demonstrate autocomplete. Please follow the instructions above.

In [ ]:
# Use autocomplete to see the availble input parameters.
pd.read_csv(

2. Working with notebooks

This section will introduce some of the basics you will require when working with your notebooks throughout the course.

2.1 Navigating the Jupyter interface

When you are in a notebook such as this one, you can click on the "Jupyter" logo at the top left-hand corner of the screen to return to the directory view. The autosave function will save content periodically, but make sure to select "File" and "Save and Checkpoint" from the menu at the top of the screen, if there are any changes that you wish to save.

2.2 Creating notebooks

Once in the directory view, you can navigate to the required folder and select "New" and "Python notebook" at the top right-hand corner of the screen to create a new notebook. You can rename notebooks from the directory view or while in the actual notebook (by selecting the notebook name at the top of the screen).

2.3 Shutting down notebooks


Note:
You will often have multiple notebooks running at the same time, but it is good practice to shut down notebooks that are not in use. This is because you typically have limited resources on the machine hosting your notebooks or pay for the amount of cloud resources that you consume. Shutting down notebooks that are not in use frees up resources for computational purposes, which allows for an improved user experience, faster execution of notebooks, and reduced costs when hosting on your own cloud environments.

You can shut down notebooks in a number of ways. The first way is to select "File" and then "Close and Halt" (in the drop-down menu) from within the running notebook. The second (recommended way) is to navigate to your directory view by selecting the Jupyter logo or switching to a window in which the directory is already open. Then, select the "Running" tab at the top of the screen and shut down all unnecessary notebooks by selecting the "Shutdown" option on the right of the screen (above the notebooks).

When starting a session, it is recommended that you review all of the notebooks that are currently running, and shut down those not in use. Before leaving or logging out of your environment, revisit the “Running” tab on your directory view, and shut down all of the notebooks.

3. Additional resources

As part of their explanation of Open data science, Continuum Analytics states that:

Open data science is not a single technology, but a revolution within the data science community. Open Data Science is an inclusive movement that makes open source tools for data science - data, analytics and computation - easily work together as a connected ecosystem.

(Continuum Analytics 2016)

This course is not positioned as a Python training course, but it will introduce the relevant Python constructs for users who are not familiar with them. This section contains additional (optional) resources. You are welcome to review these resources during the Orientation Module, should you want to explore some of the related topics and improve your Python skills.

Python:

Python for beginners

Code Academy: Learn to program Python

Open Data Science is a team sport

4. Submit your notebook


Note:
There are no graded activities in this notebook, but you are required to submit this notebook in a valid ".ipynb" format in order to complete Orientation Module.

Use the following four-step process when submitting your notebooks for assessment:

  1. Rename the file by clicking on the notebook name at the top ("M0_NB1_GettingStarted") and change this to “M0_NB1_YourName”.
  2. In the menu at the top left, select “File” and then “Save and Checkpoint”.
  3. Save a copy of the changed file to your own (offline) laptop or desktop. This needs to be submitted, and also serves as a backup of your files should any of the content on the virtual machine be refreshed. Select “File”, “Download as” and then select “IPython Notebook (.ipynb)” from the menu that appears.
  4. Upload this file to the Online Campus.

5. References

Continuum Analytics. 2016. “Open Data Science | Continuum.” Accessed September 9. https://www.slideshare.net/continuumio/why-open-data-science-matters-gartner-bi-analytics-summit-16

Note: Ensure that you keep a safe copy of all the notebooks that you change.