This notebook is primarily focused on introducing the specifics of using Python in an interactive environment such as Datalab. It is not intended to provide a complete tutorial to Python as a language. If you're completely new to Python, no problem! Python is quite straightforward, and there are lots of resources. The interactive step-by-step material at Codecademy might be of interest.
To get started, below is a code cell that contains a Python statement. You can run it by pressing Shift+Enter or clicking the the Run toolbar button with the cell selected.
In [1]:
print("Hello World")
You can edit the cell above and re-execute it to iterate over it. You can also add additional code cells to enter new blocks of code.
In [2]:
import sys
number = 10
def square(n):
return n * n
The cell above created a variable named number and a function named square, and placed them into the global namespace. It also imported the sys module into the same namespace. This global namespace is shared across all the cells in the notebook.
As a result, the following cell should be able to access (as well as modify) them.
In [3]:
print('The number is currently %d' % number)
number = 11
sys.stderr.write('And now it is %d' % number)
square(number)
Out[3]:
By now, you've probably noticed a few interesting things about code cells:
Upon execution, their results are shown inline in the notebook, after the code that produced the results. These results are included into the saved notebook. Results include outputs of print statements (text that might have been written out to stdout as well as stderr) and the final result of the cell.
Some code cells do not have any visible output.
Code cells have a distinguishing border on the left. This border is a washed out gray color when the notebook is first loaded, indicating that a cell has not been run yet; the border changes to a filled blue border after the cell runs.
Python APIs are usually accompanied by documentation. You can use ? to invoke help on a class or a method. For example, execute the cells below:
In [4]:
str?
In [5]:
g = globals()
g.get?
When run, these cells produce docstring content that is displayed in the help pane within the sidebar.
The code cells also provide auto-suggest. For example, press Tab after the '.' to see a list of members callable on the g variable that was just declared.
In [ ]:
# Intentionally incomplete for purposes of auto-suggest demo, rather than running unmodified.
g.
Function signature help is also available. For example, press Tab in the empty parentheses below.
In [7]:
str()
Out[7]:
Note that help in Python relies on the interpreter being able to resolve the type of the expression that you are invoking help on.
If you have not yet executed code, you may be able to invoke help directly on the class or method you're interested in, rather than the variable itself. Try this.
In [8]:
import datetime
datetime.datetime?
Datalab includes the standard Python library and a set of libraries that you can easily import. Most of the libraries were installed using pip, the Python package manager, or pip3 for Python 3.
In [1]:
%%bash
pip list --format=columns
If you have suggestions for additional packages to include, please submit feedback proposing the inclusion of the packages in a future version.
You can use pip to install your own Python 2 libraries, or pip3 to install Python 3 libraries.
Keep in mind that this will install the library within the virtual machine instance being used for Datalab, and the library will become available to all notebooks and all users sharing the same instance.
The library installation is temporary. If the virtual machine instance is recreated, you will need to reinstall the library.
The example, below, installs scrapy, a library that helps in scraping web content.
In [2]:
%%bash
apt-get update -y
apt-get install -y -q python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
pip install -q scrapy
Inspecting the Python evironment by running pip list, we should now see that Scrapy is installed and ready to use.
In [5]:
%%bash
pip list --format=columns