Assignment 0

CSCI 1360E: Foundations for Informatics and Analytics

Important Dates

  • Released: 2016-06-09 at 12pm EDT
  • Deadline: [n/a]

Grading Breakdown

  • Q1: 0pts
  • Q2: 0pts
  • Q3: 0pts

Total: 0pts

Overview

This assignment will introduce you to the basics of the online infrastructure we'll be using this semester and show you how to use them. In particular, we'll walk through how to

  1. Log into JupyterHub and interact with any released assignments.
  2. Open up a basic Python shell through JupyterHub.
  3. Run the lecture notebooks on your own.
  4. Install Python on your local machine (OPTIONAL).

Note the fourth point above: due to the online nature of this course and the heavy use of scientific notebooks, installing Python on your local machine will be optional. However, it is nonetheless highly recommended. If you don't have an internet connection and want/need to do some testing in Python, you'll be out of luck if you don't have it already installed.


Q1: JupyterHub

Jupyter notebooks are interactive web-based notebooks that can be viewed in multiple formats--HTML, PDF, even as lecture slides (as you'll see next week!)--and can also execute embedded code. In essence, they're an incredible instructional tool.

JupyterHub is a multi-tenant platform on which users can interact with Jupyter notebooks. Normally, one has to run a Jupyter notebook on a server in order for its interactivity to function properly. While Jupyter itself comes with a local server, this doesn't really help in the case of wanting to provide interactive versions to many users at once--say, a class of students. That's where JupyterHub comes in: I can release a single notebook, and every time someone requests to view it, a new server instance is spun up so everyone gets their own copy to edit. No Google-docs-ing each other's edits.

We have such an instance running!

1: Log into JupyterHub.

Remember, you'll need access to the UGA campus network. Either you'll need to do this from campus, or be logged into a VPN, otherwise you'll get a frustraing error.

Go to http://jupyterhub.cs.uga.edu. At the login prompt, enter your MyID as your username, and your MyID as your password. Yes, totally insecure, but this is only accessible from the UGA network, and we'll get to changing your password shortly.

If your login was successful, you should see a URL that looks something along the lines of jupyterhub.cs.uga.edu/user/<MyID>/tree, and a screen that looks like this:

2: Change your JupyterHub password.

JupyterHub performs user authentication based on the accounts of the users that exist quite literally on the system running the JupyterHub server. So in order to change your password, we'll have to delve ever-so-slightly into the world of command prompts. I promise it'll be easy!

On the far right-side of the main JupyterHub landing screen, click the dropdown menu that says "New", and select "Terminal". You should be directed to a screen that looks like this:

This is a shell that runs on top of the core operating system. To change your password, we have to enter one simple command: passwd

Press Enter. You'll be asked to type in your existing password (again, your MyID), then you'll be prompted to enter an entirely new password and confirm it.

That's it! You should have a new password now. You can test it's working by clicking the "Control Panel" button in the top right corner, then "Logout". Try logging back in again, this time with your MyID and new password.

This has no bearing on your UGA MyID account. It's a completely separate system. It just happens to use your MyID as your username, so you don't have to remember something different. If you want to set your password to be the same as your UGA MyID password, be my guest.

3: Assignments

Here's the bread and butter of the course: you'll be doing your assignments in Jupyter notebooks.

To demonstrate, I've uploaded a tiny "Assignment 0" (part of this, you could say) to JupyterHub. Go ahead and log in, and we'll go through step-by-step how to use it.

After you've logged in, notice the "Assignments" tab. Click it. You should see a screen that looks like this:

You should see "A0" listed among the "Released" assignments; the other two rows will be empty.

First thing you do when attempting to complete an assignment--fetch it! This does what it sounds like: it "fetches" the assignment from the instructor and provides you with your own copy of it. Go ahead and click it!

Once you've fetched the assignment, it will show up in the "Files" tab: a folder named "A0". Inside, there should be a notebook named "TestA0.ipynb." Go ahead and click on it!

The assignment itself is very short; just a way to introduce how assignments may go. In fact, given how the autograder is set up--you can see its exact tests in the bottom cell of the notebook!--unless you hack the ever-loving daylights out of the problem, it's pretty much impossible to fail!

See this line?

Copy it, and paste it on top of the line that reads raise NotImplementedError. As in, delete that line entirely and replace it with the line above.

Now, put something in between the two quotation marks. Anything!

Got it? Sweet!

Now, go to "File" -> "Save and Checkpoint". That will save your code. Lastly, go to "File" -> "Close and halt". That will shut down your notebook.

With that done, all that's left is to submit your code. Go back to the "Assignments" tab. Notice that, in the "Downloaded assignments" tab, there's a little arrow next to "A0". Click it, and a drop-down will appear with a handy "Validate" button on the far right. This button is awesome--it runs all the autograder tests before you even submit! So you automatically know what [autograded] questions work, and which don't. Of course, there can (and will!) still be manually-graded questions, but it should give you a good idea how you're doing.

Go ahead and click validate. Provided you followed the directions, you should see this popup:

All tests passed, well done! All that's left is to send the assignment back to me so I can officially record your grade. Hit the blue "Submit" button on the right. Once it goes through, notice the timestamp that shows up:

This is the server's officially-recorded time of completion for your assignment. You can submit as many times as you want! Only the last submission will ever be graded.

This step is crucial. If you don't submit, I will have no way of knowing that you completed your assignment.

4: Python Shell

We interacted briefly with the operating system shell when we changed your JupyterHub password. We can also go into a Python shell. This is a really nice way of testing out code and immediately seeing results; a great tool for one-liners that you'd like some immediate feedback on.

No coding just yet! This is just to show you how to access a Python environment, should you choose not to install Python on your local machine.

Fire up an operating system Terminal again: from the main screen, go to the right-hand side, click the "New" dropdown, and select "Terminal".

In the terminal, type "python". The Python shell should start, and you should see something like this:

This is a Python environment! You can type and run Python code here, and it will be immediately executed and evaluated. We won't do any of that now; for now, just type exit() to drop out of the Python shell and back to the operating system.

5: Creating Jupyter notebooks

You can create entirely new Jupyter notebooks while in JupyterHub. This is one possible way you can test out Python code without actually installing Python on your local machine. Of coures, it requires an internet connection.

It's easy enough: go to the "New" dropdown at the far right, and select "Python 3".

That's it! A new notebook should spin up using a Python kernel. Don't worry yet about doing any coding, but so you know how it works: all the code goes in cells. You can add new cells using the "+" button on left side of the menubar.

You can shuffle existing cells around using the up and down arrows in the same menu. After you've made changes to a cell, you can execute and run it and see its output. To do this, make sure it's active, and click the button that looks like a "Play" button in the top menu bar (near the middle).


Q2: Executing Jupyter notebooks

For yesterdays Lecture 1 slides, you viewed a static rendering of the Jupyter notebook. The raw notebook itself is a JSON file; not something we really want to have to mess with ourselves. However, the whole point of these fancy Jupyter notebooks is that they're supposed to allow interactivity. What's the point if, in the end, they're just glorified HTML?

There are a couple ways we can interactively view the notebook files. In doing so, you have full access to the source of the notebooks. You can change them, edit them, add to them, subtract from them; they're designed this way so, when the lectures start containing Python code, you can run these notebooks interactively, change the code yourself, and re-run it to see what the changes do.

For these notebooks without any real code...well, I suppose you could change the image paths to point to lolcats instead?

There are two ways of running the Jupyter notebooks interactively. I'll walk through them both.

1: mybinder

mybinder is an awesome service set up by some of the researchers at HHMI Janelia Farms for the expressed purpose of running Jupyter notebooks interactively. Their servers are public, but they're completely free; they're run as a service to the scientific and educational communities.

I've already set up the notebooks to be viewed through mybinder, so all you have to do is navigate to the course's GitHub repository:

https://github.com/eds-uga/csci1360e-su16

and click the little mybinder button about 2/3 of the way down the page:

After you click the link, be patient. The Janelia Farms servers will be spinning up virtual machines to host the Jupyter notebook for you; this can take some time, considering their resources aren't unlimited and this service is used by a lot of people.

Eventually, you should see an interface pretty much identical to the one you saw in JupyterHub earlier, except with some stuff in it! Click the "lectures" folder, and then click on "L1 - What is data science?.ipynb" notebook.

In just a second, you should see a somewhat familiar sight:

I saw "somewhat", because it looks a tiny bit different. That's the interactivity talking! If you double-click on any part of the notebook, the cell will activate, and you'll be given access to the raw content. For Lecture 1 it's pretty boring; all the content is Markdown, a type of mark-up language very similar to HTML. You're welcome to change it, tweak it, and add/remove things! After you've made changes, simply make sure the cell you want to re-execute is active, and click the "Play" button.

2: Locally-hosted using Jupyter

mybinder is a really nice service, but it requires an internet connection. That's not always a given. If you want to interact with the notebook content, test out some code, and generally push things around but know you won't have an internet connection, it's possible to host the notebooks locally.

This requires installing Python. Before proceeding any further, go to the Bonus question at the end for instructions on how to install Python on your local machine.

If you are only interested in STATIC downloads, such as getting a PDF version of the lecture slides, skip to part 3 directly below.

If you're still reading, I'll assume you've already installed Python on your machine and tested to make sure it works.

First, we'll download the GitHub repository where the notebooks are stored. Go to the GitHub repository: https://github.com/eds-uga/csci1360e-su16.

On the main page, click the green button on the right that says "Clone or download", and choose "Download ZIP". This will give you a ZIP archive of the entire repository. Extract the archive.

Once it's downloaded and extracted, open up a command prompt / Terminal window. In the command prompt, navigate to the folder you just downloaded and extracted. This is usually done with the command cd <foldername>.

When you're in the directory with the *.ipynb lecture files (you can test by typing "ls" if you're using Linux or OS X, or "dir" if you're using Windows; in either case, running that command should output a list of files in the current directory), run the following command to start up Jupyter:

jupyter notebook

This will spin up a Jupyter server and open up a browser tab in the folder where you issued the command. You should see a window exactly like the one you saw in JupyterHub, this time with the lecture notebooks visible. Simply click on one of the notebooks, and you're up and running!

3: Static downloads

When you're viewing the notebooks interactively, such as through mybinder or locally using Jupyter, you have the option to export them in a variety of different formats. If you want offline access to the lecture slides but don't care about interactivity, then you can follow these steps:

  1. Start up a notebook using mybinder (as in part 1) or locally-hosted Jupyter (as in part 2).
  2. Go to "File" -> "Download as".
  3. Select your favorite format! I would recommend PDF, or failing that, HTML.

This will provide static renderings of the notebooks for you to view at your leisure. They're completely inert, but if you just want offline access to the content, this is the easiest way.

Q3: Installing Python locally [OPTIONAL!!!]

Between mybinder and JupyterHub, you have plenty of online resources at your disposal for running very nice Python environments and testing out small snippets of code. But you don't want "nice." You want the real deal. It's certainly more than possible to ace this course without ever installing Python on your local machine, but just in case you really want to give it a shot...

It's pretty easy, actually. Well, if you wanted to install base Python, that can be a royal headache; different operating systems (Windows vs Unix), different versions (Python 2 vs Python 3), and how to get that pesky $PATH variable to cooperate? Fortunately, my recommendation is not to use base Python.

Instead, we'll use a Python distribution: Anaconda, to be precise. Anaconda is basically a fully-functional, prepackaged Python distribution that is ready-built for any of the three major operating systems: Windows, OS X, and Linux. It comes with a ton of very useful third-party packages, so (at least in this class) we won't even need to worry about installing anything beyond Anaconda itself. But it even comes with a couple package managers, so even if you wanted or needed extra packages, the tools are already provided. Pretty sweet deal!

MAKE SURE YOU DOWNLOAD ANACONDA FOR PYTHON 3.

It should be pretty much as easy as clicking the download link and downloading / installing the version that corresponds to your operating system. Once the install is finished, let's test that it worked.

Fire up a command prompt / Terminal window. Type python.

If all is well, you should see exactly what we saw in the JupyterHub terminal.

Notes

No need to submit this homework through JupyterHub. This doesn't have any autograded (or graded in any way, for that matter) sections.

Questions? Concerns? Complaints? Criticisms?

  • If they're topical, related to what the homework discusses, post about it in #csci1360e-discussions.
  • If they're technical--that blasted $PATH isn't cooperating--post about it in #techprobs.
  • If you discovered some cool nuance like "accidentally" inventing SkyNet, post about it in #random.
  • If you have something to tell me but want to tell me 1-on-1, you can click my name in the Slack group and send me a direct message.