This notebook will introduce the concept of interactive notebooks to those who are unfamiliar with it. As part of the resources provisioned to all students, a dedicated Amazon Web Services (AWS) server is created for each student. Your virtual analysis environment is created using the Base Jupyter Notebook Stack for Anaconda with Python 3, as well as a number of additional installed packages that are relevant to the course, but not present in the default set of packages from Anaconda. All the necessary data, as well as the template notebooks required in this course, will be populated on this server.
Jupyter Notebooks are essentially documents that are designed to be easily read by both humans and computers. Jupyter Notebooks provide a web-based application that is suited to capture the whole computation process: developing, documenting, and executing code, as well as communicating the results. The elements of a typical notebook include descriptions and analyses, as well as executable code. You can share your code with others and communicate your process and results with a wide array of audiences, from technical to non-technical. This type of interactive computing allows for reporting as well as user interaction and is therefore ideally suited for educational purposes.
Note
Try Jupyter is a platform from which you can try Jupyter without installing it onto your system. Please note that this is a service provided by Jupyter and that you may experience delays in cases where many people try to access the service at the same time.
For more information, refer to the Jupyter Documentation.
During this course, you will only use Python 3 based Jupyter Notebooks and the supplied kernel is all you need to complete the exercises.
Note:
A ‘kernel’ is a program that runs and introspects the user’s code. The Anaconda distribution used in this course includes a kernel for Python 3 code. Kernels for Python 2.7, as well as for several other programming languages such as R, Julia, Octave are available, but would need to be installed separately if you plan to set up your own server in the future. Managing multiple environments is outside the scope of this course.
Notebooks differ from Integrated Development Environments (IDEs) in the sense that they are typically used to interactively analyze and visualize datasets. IDEs are used to create standalone scripts. Should your focus be on pure scripts which run as standalone items, you can refer to the links below for Python IDEs. You will not make use of IDE's during this course.
The IPython shell and subsequent notebooks were created using Python. Jupyter then added support for additional programming languages. For those of you who are not familiar with IPython or Jupyter, it is worthwhile to review this quick tour of IPython notebooks.
Note:
Autocompletion is achieved by pressing "Shift+Tab" instead of "Tab" as indicated in the resource above.
You can also download the notebook for later reference here.
For the purpose of this course, your environment has been preconfigured for you. To access your virtual analysis environment, you need to navigate to your Amazon Web Services instance and enter your password (Instance ID) as shown in the image below.
You will find your login details in the “0.9 Activity: Sign in to your virtual analysis environment” component on the Online Campus in the Orientation Module.
Note:
While running the Jupyter Notebook server locally or setting up your own cloud server is possible, no technical support will be provided for local or alternative cloud installations.
Those interested in setting up local or alternative cloud installations for use outside of this course can refer to the following links:
Local installation: Instructions on Continuum Analytics's Anaconda download page.
Cloud setup: Instructions on Continuum Analytics's AWS instructions page.
In the toolbar at the top of the notebook you have the option to change the type of cells. The two types that you will use frequently are "code" and "markdown" cells. Code cells are used to execute code and markdown cells are used to display formatted text. We will get back to these in Module 1.
When starting with a new programming language, the first example usually begins with outputting the text "hello world".
In [ ]:
print("hello world")
This section will demonstrate basic notebook commands. You will be provided with additional references that you can explore in your own time, should you be interested in doing so.
Note:
In Module 1 you will be provided with a notebook containing Python basics to prepare you for the remainder of the course.
While it is possible to perform direct calculations, such as those shown in the cell below, you would typically assign the output of a calculation to variables for reuse in subsequent calculations or functions.
In [ ]:
2 + 2
In [ ]:
# Assign the value 2 to variable a and the value 3 to variable b.
a = 2
b = 3
In [ ]:
# Assign the value of the product of a and b to the variable c.
c = a * b
In [ ]:
# Print the value of variable c.
print(c)
In [ ]:
yourname = "student"
In [ ]:
print('Hello', yourname)
Note:
In Module 1, you will be provided with an overview and examples of the Python programming concepts and constructs required to complete this course. For now, this notebook simply includes a sample function without additional explanation.
In [ ]:
# In Python 3, the default behaviour for print changed and the default is newline.
# To change this behaviour, you can add optional arguments.
# View the help function for the print method.
help(print)
In [ ]:
# Create a function that prints ten numbers.
def print_10_nums():
for i in range(10):
print(i, end=' ')
In [ ]:
# Execute your function.
print_10_nums()
It is good practice to add comments to your code. This makes it easier for others to interpret your code. It can also help you to understand your own intentions and processes when revisiting notebooks at a later stage. Comments can be added by starting the line with the # symbol.
In [ ]:
# Set a new variable, d, equal to 10 and print the variable.
d = 10
print(d)
You will be introduced to packages and libraries in subsequent modules. These contain reusable code where you can “call” repeatable functions instead of manually writing code or creating your own functions. To load a package or library, you first need to ensure that it has been installed. Then you may import and finally use the package.
You can execute the cell below to check whether a package has been installed on your system.
In [ ]:
# Check whether Pandas is installed.
# Import system configuration.
import sys
# Check whether Pandas package is available in system.
# Note that you may have multiple versions of Python installed in virtual environments. The statements in this
# cell was created for your virtual analysis environment and may have different results when executed elsewhere.
'pandas' in sys.modules
If you need to display a package’s help function, you can execute the following command:
In [ ]:
help('pandas')
If you know that a package has been installed, or you have checked that it exists, you can import it to make it available for you to use in your notebook. You should be able to execute the cell below without errors if the package is available to your notebook. The selected package is part of the standard packages installed with the Anaconda Python distribution and will complete without errors.
Note:
You have to import the packages that you use for every notebook that you work on. It is good practice to load the packages once, close to the top of the notebook, prior to executing code. We will deviate from this best practice in the first module of the course, and load the packages when introducing their functionality.
In [ ]:
# Import Pandas.
import pandas as pd
You can also display the help function by using the ? symbol, as shown below. Close the output by clicking the x at the top right corner (of the output) once you have reviewed it.
In [ ]:
# Display the help for the library imported.
pd?
Autocomplete can be used to display information about the package as well as the libraries contained in the package. Place your cursor after the full stop in the cell below, and press "Shift+Tab" once to display information about the package. Pressing "Shift+Tab" a second time will display the libraries available in the package.
In [ ]:
pd.
Once you have selected a library, you can also use the autocomplete to show the help function and the input parameters that can be utilized in a specific library. Place your cursor after the opening parenthesis in the cell below and press "Shift+Tab" to review the input parameters available to the specific library.
In [ ]:
# Use autocomplete to see the availble input parameters.
pd.read_csv(
This section will introduce some of the basics you will require when working with your notebooks throughout the course.
When you are in a notebook such as this one, you can click on the "Jupyter" logo at the top left-hand corner of the screen to return to the directory view. The autosave function will save content periodically, but make sure to select "File" and "Save and Checkpoint" from the menu at the top of the screen, if there are any changes that you wish to save.
Once in the directory view, you can navigate to the required folder and select "New" and "Python notebook" at the top right-hand corner of the screen to create a new notebook. You can rename notebooks from the directory view or while in the actual notebook (by selecting the notebook name at the top of the screen).
You can shut down notebooks in a number of ways. The first way is to select "File" and then "Close and Halt" (in the drop-down menu) from within the running notebook. The second (recommended way) is to navigate to your directory view by selecting the Jupyter logo or switching to a window in which the directory is already open. Then, select the "Running" tab at the top of the screen and shut down all unnecessary notebooks by selecting the "Shutdown" option on the right of the screen (above the notebooks).
When starting a session, it is recommended that you review all of the notebooks that are currently running, and shut down those not in use. Before leaving or logging out of your environment, revisit the “Running” tab on your directory view, and shut down all of the notebooks.
As part of their explanation of Open data science, Continuum Analytics states that:
Open data science is not a single technology, but a revolution within the data science community. Open Data Science is an inclusive movement that makes open source tools for data science - data, analytics and computation - easily work together as a connected ecosystem.
(Continuum Analytics 2016)
This course is not positioned as a Python training course, but it will introduce the relevant Python constructs for users who are not familiar with them. This section contains additional (optional) resources. You are welcome to review these resources during the Orientation Module, should you want to explore some of the related topics and improve your Python skills.
Python:
Use the following four-step process when submitting your notebooks for assessment:
Continuum Analytics. 2016. “Open Data Science | Continuum.” Accessed September 9. https://www.slideshare.net/continuumio/why-open-data-science-matters-gartner-bi-analytics-summit-16
Note: Ensure that you keep a safe copy of all the notebooks that you change.