Publication & Sharing

Exporting your Notebook

Jupyter is able to export and convert your notebook to multiple formats. There are two ways you can achieve this. The first is from the command line interface and the second from the web page of the notebook.

Exercise -- Exporting (5 min)

Here we'll export one of the notebooks that you used today. So decide which notebook you'll use and then:

  • Download the notebook in html format from your browser
  • Open a new terminal window to your server (from the Jupyter Notebook) and export the notebook using the command line
  • You should be able to see the file in your directory

In [ ]:
!ls ~/reproducible_research

GitHub Pages

Recently GitHub enabled a functionality of rendering repositories hosted at it, to static websites. A short guide can be found here. Detailed documentation, can be found here.

There are two types of pages:

  • User Pages
    • Only one page can exist per user
    • Their url is: https:<username>.github.io
  • Repository Pages
    • You can can have one page per repository
    • They are accessible through: https:<username>.github.io/<repository_name>

Exercise -- Enabling GitHub Pages for our repository (15 min)

  • Follow the above guides and enable github-pages for your project repository
  • Choose a theme of your chosing and make sure that the website is accessible
  • The "homepage" of your page, should be your README file.

At the README file of your repository, create an entry with a link to the html report of your document. Make sure that it is shown in your website.

Add a Licence to your repository (10 min)

As already mentioned, your data and analysis should contain a licence. In this exercise you'll get familiar with some basic licences and apply one to your project.

  • To get yourself familiar with the available licences visit ChooseaLicense and find a licence that works for you
  • Add a licence to your repository. Some instructions can be found here

Exercise -- Making your code citable (15 min)

You should have already heard about Digital Object Identifiers (doi). In this exercise, we will see how we can add a doi, provided by Zenodo to our repository.

  • You need to follow the instructions from here

  • After this task, your should have a badge shown, both at your website and at the central page of your repo

More Documentation

Exporting your Notebooks

Introduction

The native Jupyter Notebook format is not (yet?) among those accepted by scholarly publishers. Nor do web browsers know how to render it. Hence, when it comes time to publish your Notebooks, whether as part of a scholarly publication or simply to the web, it needs to be exported to a suitable output format. This is what the jupyter nbocnvert command does.

In this lesson, we will look at several formats relevant to scholarly publishing and publishing to the web, and we will learn how to export a notebook to such formats.

Output formats

Scholarly publishers typically accept one or several Word-processing oriented formats, which are often binary or at least not meant for human consumption. For sharing or publishing data and analysis documentation, a text format that is easy to read and doesn't require special-purpose software is usually best. How do you determine the best format(s) to export your Notebook to?

Below we briefly describe a few formats that are widely used in scholarly communication.

PDF

The PDF format is primarily used for paper-based output. It contains information about the paper size and the margins. Most web browsers know how to display it, but posting it to the web is somewhat like posting an raster-graphics image - it is not meant to be built upon or modified.

You would use the PDF format if you were interested in printing a copy of your notebook (for filing a paper copy, or hand-writing comments), or for sending to co-authors for reading and commenting (which will likely require tools such as Adobe Acrobat).

HTML

HTML is the native format for the web. It does not usually contain information about printing on paper. Exporting to this format is a popular means of posting content to the web so that browsers can render it in the best way suitable for the device they are running on. (HTML is also a format rich in metadata and context information for search engines.)

If you have a website that you manage, publishing to HTML can make it easy to add your notebook as a page to a website.

Markdown

Markdown is a plain-text format that was designed to both be human-readable without any special-purpose rendering tools, and to be easily exported to HTML.

LaTeX

LaTeX is a plain-text format designed for authoring documents that will subsequently be typeset. It is widely used in publishing, and often accepted as a manuscript submission format, especially in fields that routinely need extensive mathematical typesetting capabilities. For publishing, sharing, and reading LaTeX is often compiled into a PDF format. Depending on your field, your co-authors may be more comfortable editing LaTeX files than Notebook files.

Exporting a Notebook

Notebooks can be exported through web-based user interface, or from the command line. The web-based interface in essence runs the same command as one would on the command line, and hence has the same installartion dependencies.

Dependencies

Exporting to LaTeX format will require Pandoc to be installed. Exporting to PDF works through generating LaTeX first (and has those dependencies as well), and then needs a TeX installation to generate PDF.

From the command line

The command is jupyter nbconvert, followed by notebook to convert, destination format (option --to <format>) and output filename (option --output <filename>):

$ jupyter nbconvert my_notebook.ipynb --to markdown --output output.md

Or to HTML format:

$ jupyter nbconvert my_notebook.ipynb --to html --output output.html

The default HTML template (full) includes headers and everything needed to form a complete HTML document. If you wanted to embed the resulting HTML as a fragment into, say, a blog post, use the basic template:

$ jupyter nbconvert my_notebook.ipynb --to html --template basic --output output.html

Full documentation of the NBConvert tool is available online.

From the web-based user interface

In the File->Download As menu, click the desired format. The conversion result will download to your computer.

Using nbconvert to execute or extract code

The jupyter nbconvert command-line tool can be used to execute a notebook in whole and capture the result, by "converting" to the notebook format:

$ jupyter nbconvert --to notebook --execute my_notebook.ipynb

This generates my_notebook.nbconvert.ipynb, a new notebook that is the same as the source notebook but with all the output from code cells captured.

You can also extract the code from a notebook into an executable script, i.e., for an iPython notebook extract the Python code cells into a Python script:

$ jupyter nbconvert --to script my_notebook.ipynb

Identifiers and Licensing for Research Products

Stable, globally unique, and resolvable identifiers for research products

Why archives for research products, and why use them

A not so uncommon story: You’re a graduate student reading a paper on which you want to base your analysis approach, and for you therefore need to verify and reproduce the analysis The paper gives the lab's website as the link for obtaining the code. However, it turns out the researcher has since left that university, and their new lab's website no longer has a link to that code. After several weeks of silence the author responds to your email saying they will try and find the code, but they're working on a different project now. That was a month ago.

Lab websites aren't archives. Doing archiving well is non-trivial, and likely isn't your line of research. Use an archive that specializes in doing well what you need from an archive.

There are many archives, for all imaginable purposes and domains. In fact, there are so many that there is re3data, a registry of repositories that allows browsing them by various attributes.

Why globally unique resolvable identifiers for non-paper research products?

One of the key benefits of using an archive is that nearly all of them will assign a globally unique resolvable identifier to deposits. This is because deposit identifiers benefit their users - both depositors, and those reusing deposits:

  • Ability to identify and cite the deposit in a manuscript and a CV
  • Ability to track views, downloads, or more generally impact
  • Ability to identify exactly what record, and which version of it was (re)used

Why DOIs

DOIs (digital object identifiers) are only one type of unique identifier, but is the most frequently used type in scholarly communication, and for identifying research products. Some of its benefits include:

  • Allows separating content from who hosts the content.
  • Cannot be minted ad-hoc, and instead require interacting with a registration agency, which typically need to be paid a fee. This fosters metadata quality, and assigns clear responsibility for maintaining the DOI's continued resolution.
  • Publishers, and the publishing industry, knows how to deal with them.
  • Practically every scholar knows how to deal with them.

While DOIs on the surface all look the same, some expectations for their associated metadata (and programmable APIs differ based on the issuing DOI registrar (often referred to as "type of DOI"). In scholarly publishing and communication, the most frequently encountered DOI registrars are CrossRef (issues almost all scientific paper DOIs, works with publishers) and DataCite. The latter is used for all kinds of "other" research products, including data, software, source code, and preprints.

Licensing and Terms of Reuse

Upon publishing, different research products have different needs, and different eligibility for licensing. Determining an appropriate license should be an informed decision, and can be further complicated if multiple institutions with different intellectual property policies contributed to the products in a manner that can't be easily disentangled. Also, intellectual property and copyright laws differ across countries.

Why license in the first place?

In many jurisdictions (including the US) intellectual property rights vest in the author of a creative work whether they assert it or not. Also, in most jurisdictions (including the US) the rights one has for work copyrighted by someone else is limited to fair use (and what one believes is fair use versus what a court will say is not necessarily the same thing).

Hence, if you make public work eligible for copyright protection yet don't say anything about terms of reuse, nobody has any rights to it beyond fair use. If you reuse work published in this way yourself, you risk that at any point the author will claim their right and asks to be compensated unless you cease to use the work immediately. Do you really want to base your research success on such a risk? If not, why do you expect anyone else to?

By publishing a research product, as a scholar one usually intends to benefit from that by allowing the product to have a wider impact. Not stating any licene or terms of reuse is effectively in contradiction to that.

ORCiDs

"I Am Not a Scientist, I Am a Number" Bourne PE, Fink JL (2008) I Am Not a Scientist, I Am a Number. PLoS Comput Biol 4(12): e1000247. doi:10.1371/journal.pcbi.1000247

Aggregating research products and their uptake in the scientific community as well as the public is very difficult without identifiers. Most authors of and contributors to research do not have distinctive names. If it's too difficult to aggregate someone's reseach outputs and their impact, then most research output will not be taken into account for assessment.

Enter ORCID, the Open Researcher and Contributor ID. ORCID allows you to create and maintain a fairly comprehensive biographic, grant support, and publication profile. Funders, institutions, and publishers are increasingly adopting it. (At least in the sense of allowing you to record your ORCID; using it for features that convey tangible benefits to you are still in its infancy.)