"You should have a strong command of at least one toolset that (a) allows for filtering, joining, pivoting, and aggregating tabular data, and (b) enables reproducible workflows." -- Buzzfeed job posting, 2017
"As a general rule, all assertions in a story based on data analysis should be reproducible. The methodology description in the story or accompanying materials should provide a road map to replicate the analysis." -- The Associated Press Stylebook
Trust in media is low and declining. The reasons for this are myriad, and we can control only what we can control. One thing we can do -- be more transparent about what we do. And this notion goes beyond just journalism -- why should anyone trust what you have to say if you can't show your work?
Jupyter Notebooks are good at this, being able to mix code and text. But your notebooks are currently only visible to you. So we're going to work on improving your notebooks with Markdown and Github.
Markdown is what you are writing in when you aren't writing code in Jupyter Notebooks. It's very simple, and there's only a finite number of things you can do with it, but you can drasically improve your notebooks with some simple typographic tricks. Here's a partial listing of what you can do in Markdown that might be useful in notebooks.
# h1
## h2
### h3
#### h4
##### h5
Which looks like:
To get a block quote, add >
at the beginning of the line.
It looks like:
This is a block quote
You can use a horizontal rule to separate content -- a thematic break.
In Jupyter notebooks, you create a horizontal rule with three dashes: ---
Which looks like:
You can bold text, italicize text, even strikethrough text with **bold**
, _italicize_
and ~~strikethrough~~
.
You can create bulleted or numbered lists like this:
* Bullet 1
* Bullet 2
* Bullet 3
1. Numbered list 1
2. Numbered list 2
3. Numbered list 3
Which looks like:
And:
You can add a link like this: [text to be linked](http://website.com)
It looks like this: text to be linked
Tables are good at showing tabular data. Sounds basic, but people seem to forget tables when there are so many good data visualization options out there. Tables look like this:
| FieldName1 | FieldName2 |
| ---------- | ---------- |
| foo | bar |
| baz | bing |
| boo | buzz |
And that looks like:
FieldName1 | FieldName2 |
---|---|
foo | bar |
baz | bing |
boo | buzz |
The way to handle images in your post is to put the images in the same folder as your Jupyter Notebook and path to them.
To embed an image, it looks like this: ![Dog](dog.jpg)
GitHub is a social code sharing website used by millions of developers around the world. It's a place for people to put their code so others can see it, be inspired by it, even participate in it. Other developers can make a copy of your software, improve it and give that back to you.
It's also an ideal place to store your notebooks to foster transparency. With some simple tools, you can publish your notebooks next to your stories so readers who want to know more can see how you did what you did.
You get transparency and replicability in one swoop.
First things first: Create an account.
On GitHub you create repositories of code. You will have a local copy on your computer, and you'll have a copy on GitHub. You will keep them in sync using commits where you will push code to GitHub or pull it down from Github, depending on which way you need to move code.
Let's make your first repository, just to test it out. First click on the green New Repository button. Now we need to give our repostitory a name, a description, and initialize it with a README file.
For most people, the easiest way to work with GitHub is through their desktop application. You can download it here.
Log into your account via the desktop app. What we first need to do is clone our repository to our local machine.
Once logged in, click the plus button in the top right corner and then click Clone.
Click on your repository from the list and then click Clone your project. Tell GitHub where to clone it -- this is up to you, but make it somewhere you can find it again and do not move it.
Now that we have a clone of it, let's edit the README file. Let's add this sentence: "I am learning about GitHub."
Save the file and go back to GitHub Desktop. You should see you have 1 uncommitted change.
Click that. You are now going to create a commit message, which is like a note to yourself as to what this change is. In this case, we edited README, so add that as the summary and click Commit to Master, which is what you are doing. You have a master branch of your code. If, later, you wanted to try something new but didn't want to mess with your existing code, you could create a branch off of master, work there, and if it worked you could roll it back into master. But that's a topic for another day.
Once you have committed to master, you haven't actually sent it to GitHub until you hit the Sync button in the top right. This is the push and pull parts of GitHub. The desktop app does them all at once. On the command line, these are separate commands.
With a repository set up like this, you can add your Jupyter Notebooks and other files into the folder and commit them. GitHub will render a notebook as HTML in the browser, which is what makes this an ideal way to do this.
This is how you are going to publish your first story. You are going to combine your code, graphics and text into a single notebook to tell your story. Your notebook should ONLY be the code needed to tell the story -- your scratch work or errors should be in a separate file. You will use Markdown to give it a headline, byline and add your story text between your graphics. You will embed your finished graphics -- if you do them in ggplot or fix them up in Illustrator is up to you -- in the notebook. When it's done, it should be ready to publish in a particularly nerdy publication that likes R code mixed in with stories.
You do not need to turn in anything for this assignment, but for your first major assignment, you will turn in the GitHub URL for your project. It will look something like this: https://github.com/mattwaite/JOUR491-Data-Visualization/blob/master/Assignments/11_FinishingTouches/FinishingTouches.ipynb
In [ ]: