Jupyter is able to export and convert your notebook to multiple formats. There are two ways you can achieve this. The first is from the command line interface and the second from the web page of the notebook.
Here we'll export one of the notebooks that you used today. So decide which notebook you'll use and then:
In [ ]:
!ls ~/reproducible_research
Recently GitHub enabled a functionality of rendering repositories hosted at it, to static websites. A short guide can be found here. Detailed documentation, can be found here.
There are two types of pages:
https:<username>.github.io
https:<username>.github.io/<repository_name>
As already mentioned, your data and analysis should contain a licence. In this exercise you'll get familiar with some basic licences and apply one to your project.
You should have already heard about Digital Object Identifiers (doi). In this exercise, we will see how we can add a doi, provided by Zenodo to our repository.
You need to follow the instructions from here
After this task, your should have a badge shown, both at your website and at the central page of your repo
The native Jupyter Notebook format is not (yet?) among those accepted by scholarly publishers. Nor do web browsers know how to render it. Hence, when it comes time to publish your Notebooks, whether as part of a scholarly publication or simply to the web, it needs to be exported to a suitable output format. This is what the jupyter nbocnvert
command does.
In this lesson, we will look at several formats relevant to scholarly publishing and publishing to the web, and we will learn how to export a notebook to such formats.
Scholarly publishers typically accept one or several Word-processing oriented formats, which are often binary or at least not meant for human consumption. For sharing or publishing data and analysis documentation, a text format that is easy to read and doesn't require special-purpose software is usually best. How do you determine the best format(s) to export your Notebook to?
Below we briefly describe a few formats that are widely used in scholarly communication.
The PDF format is primarily used for paper-based output. It contains information about the paper size and the margins. Most web browsers know how to display it, but posting it to the web is somewhat like posting an raster-graphics image - it is not meant to be built upon or modified.
You would use the PDF format if you were interested in printing a copy of your notebook (for filing a paper copy, or hand-writing comments), or for sending to co-authors for reading and commenting (which will likely require tools such as Adobe Acrobat).
HTML is the native format for the web. It does not usually contain information about printing on paper. Exporting to this format is a popular means of posting content to the web so that browsers can render it in the best way suitable for the device they are running on. (HTML is also a format rich in metadata and context information for search engines.)
If you have a website that you manage, publishing to HTML can make it easy to add your notebook as a page to a website.
Markdown is a plain-text format that was designed to both be human-readable without any special-purpose rendering tools, and to be easily exported to HTML.
LaTeX is a plain-text format designed for authoring documents that will subsequently be typeset. It is widely used in publishing, and often accepted as a manuscript submission format, especially in fields that routinely need extensive mathematical typesetting capabilities. For publishing, sharing, and reading LaTeX is often compiled into a PDF format. Depending on your field, your co-authors may be more comfortable editing LaTeX files than Notebook files.
Notebooks can be exported through web-based user interface, or from the command line. The web-based interface in essence runs the same command as one would on the command line, and hence has the same installartion dependencies.
Exporting to LaTeX format will require Pandoc to be installed. Exporting to PDF works through generating LaTeX first (and has those dependencies as well), and then needs a TeX installation to generate PDF.
The command is jupyter nbconvert
, followed by notebook to convert, destination format (option --to <format>
) and output filename (option --output <filename>
):
$ jupyter nbconvert my_notebook.ipynb --to markdown --output output.md
Or to HTML format:
$ jupyter nbconvert my_notebook.ipynb --to html --output output.html
The default HTML template (full
) includes headers and everything needed to form a complete HTML document. If you wanted to embed the resulting HTML as a fragment into, say, a blog post, use the basic
template:
$ jupyter nbconvert my_notebook.ipynb --to html --template basic --output output.html
Full documentation of the NBConvert tool is available online.
In the File->Download As menu, click the desired format. The conversion result will download to your computer.
nbconvert
to execute or extract codeThe jupyter nbconvert
command-line tool can be used to execute a notebook in whole and capture the result, by "converting" to the notebook
format:
$ jupyter nbconvert --to notebook --execute my_notebook.ipynb
This generates my_notebook.nbconvert.ipynb
, a new notebook that is the same as the source notebook but with all the output from code cells captured.
You can also extract the code from a notebook into an executable script, i.e., for an iPython notebook extract the Python code cells into a Python script:
$ jupyter nbconvert --to script my_notebook.ipynb
A not so uncommon story: You’re a graduate student reading a paper on which you want to base your analysis approach, and for you therefore need to verify and reproduce the analysis The paper gives the lab's website as the link for obtaining the code. However, it turns out the researcher has since left that university, and their new lab's website no longer has a link to that code. After several weeks of silence the author responds to your email saying they will try and find the code, but they're working on a different project now. That was a month ago.
Lab websites aren't archives. Doing archiving well is non-trivial, and likely isn't your line of research. Use an archive that specializes in doing well what you need from an archive.
There are many archives, for all imaginable purposes and domains. In fact, there are so many that there is re3data, a registry of repositories that allows browsing them by various attributes.
One of the key benefits of using an archive is that nearly all of them will assign a globally unique resolvable identifier to deposits. This is because deposit identifiers benefit their users - both depositors, and those reusing deposits:
DOIs (digital object identifiers) are only one type of unique identifier, but is the most frequently used type in scholarly communication, and for identifying research products. Some of its benefits include:
While DOIs on the surface all look the same, some expectations for their associated metadata (and programmable APIs differ based on the issuing DOI registrar (often referred to as "type of DOI"). In scholarly publishing and communication, the most frequently encountered DOI registrars are CrossRef (issues almost all scientific paper DOIs, works with publishers) and DataCite. The latter is used for all kinds of "other" research products, including data, software, source code, and preprints.
Upon publishing, different research products have different needs, and different eligibility for licensing. Determining an appropriate license should be an informed decision, and can be further complicated if multiple institutions with different intellectual property policies contributed to the products in a manner that can't be easily disentangled. Also, intellectual property and copyright laws differ across countries.
In many jurisdictions (including the US) intellectual property rights vest in the author of a creative work whether they assert it or not. Also, in most jurisdictions (including the US) the rights one has for work copyrighted by someone else is limited to fair use (and what one believes is fair use versus what a court will say is not necessarily the same thing).
Hence, if you make public work eligible for copyright protection yet don't say anything about terms of reuse, nobody has any rights to it beyond fair use. If you reuse work published in this way yourself, you risk that at any point the author will claim their right and asks to be compensated unless you cease to use the work immediately. Do you really want to base your research success on such a risk? If not, why do you expect anyone else to?
By publishing a research product, as a scholar one usually intends to benefit from that by allowing the product to have a wider impact. Not stating any licene or terms of reuse is effectively in contradiction to that.
"I Am Not a Scientist, I Am a Number" Bourne PE, Fink JL (2008) I Am Not a Scientist, I Am a Number. PLoS Comput Biol 4(12): e1000247. doi:10.1371/journal.pcbi.1000247
Aggregating research products and their uptake in the scientific community as well as the public is very difficult without identifiers. Most authors of and contributors to research do not have distinctive names. If it's too difficult to aggregate someone's reseach outputs and their impact, then most research output will not be taken into account for assessment.
Enter ORCID, the Open Researcher and Contributor ID. ORCID allows you to create and maintain a fairly comprehensive biographic, grant support, and publication profile. Funders, institutions, and publishers are increasingly adopting it. (At least in the sense of allowing you to record your ORCID; using it for features that convey tangible benefits to you are still in its infancy.)