BibSON (Bibliographic Serial Object Notation) to HTML

BibSON uses RSON (See https://code.google.com/p/rson/wiki/Manual and https://pypi.python.org/pypi/rsonlite/0.1.0) with special keywords for bibliographic data.

More on RSON:

RSON syntax is relaxed compared to JSON:

Any valid JSON file is also a valid RSON file as long as it is encoded in UTF-8 or ASCII. (External conversion functions could detect other files and pass UTF-8 or Unicode to the RSON decoder.) Comments are allowed with a leading # (in any column) if the comment is the only non-whitespace on the line. (But inside a triple-quoted or equal-delimited string, the # may not always start a comment—it may be part of the string.) String quoting is not required unless the string contains any RSON special characters. RSON special characters are: { } [ ] : , " = (The same as the JSON special characters, plus =.) Python-style triple-quoted (""") strings are supported for practically arbitrary embedded data. Integer formats include hex, binary, and octal, and embedded underscores are allowed. In addition to the relaxed JSON syntax, RSON supports Python-style indentation to create structures. When inside any or {} pair, the syntax is JSON syntax, with the enhancements described above. Outside any or {} pairs, RSON indented syntax is used to describe the structure.

RSON indentation controls the nesting levels of dicts or lists. As with Python, spaces or tabs may be used (in fact, any valid JSON whitespace except \n or \r may be used for any whitespace, including indentation), but RSON does not make any equivalence between tabs and spaces, so indenting a line the same as or more than a previous line requires prefixing the line with exactly the same whitespace characters as were used on the previous line. Mixing tab and space (or other) indentation whitespace will most likely result in an error.

More on BibSON:

BibSON makes it easy to ...

@_@ I just found a very similar (much more advanced) project! See http://scholar.berkeley.edu/pitman/software/bibjson and http://okfnlabs.org/bibjson/ ...

Although I have to say, BibSON is still much easier to read and type :D... I should consider writing parsers from RIS, BibTeX, etc. to BibSON, and to formalize my specification.

And some more: http://blog.martinfenner.org/2013/07/30/citeproc-yaml-for-bibliographies/ http://blog.martinfenner.org/2013/08/04/automatically-list-all-your-publications-in-your-blog/ https://github.com/inukshuk/jekyll-scholar

Additional features

Page numbers automatically convert xx--xx into xx–xx, and –,—, and & are turned into the appropriate html syntax.

Examples

Title = Latent variable modeling of hippocampal replay
#Type = [poster, cpaper, jpaper, talk, chapter, review, thesis, preprint, other]
Type = cpaper
AuthorList
    First = Etienne 
    Last = Ackermann
    ORCID = orcid.org/0000-0001-7139-9360
    Bold = TRUE
    First = [First, Middle]
    Last = Last
# Date = YYYYMMDD; use YYYY0000 for just the year;
Date = YYYYMMDD
ConfDates = October 17--21, 2015
Pages = 
DOI = 
URL = 
ExternLink = docs/posters/utaustin2015.pdf
PosterImg = images/poster-thumbs/SfN15.png
Abstract =

TODO:

write specification

expand citeString for conference, preprint, etc., as well as additional, custom labels in a list...

write syntax highlighting for Sublime Text: http://sublimetext.info/docs/en/extensibility/syntaxdefs.html



In [46]:

    
import rsonlite as rsl
import io
import contextlib

mybibfile = 'kemerelab.bibson'
#mybibfile = 'pubs.bibson'
with open(mybibfile, 'r') as f:
    bibson = f.read()

mybib = rsl.simpleparse(bibson)


print('{0} entries read from "{1}"'.format(len(mybib),mybibfile))









    



3 entries read from "kemerelab.bibson"



In [48]:

    
pubTypes = {'jpaper': 'journal paper', 'poster': 'poster', 'other': 'other', 'cpaper': 'conference paper', 'talk': 'talk', 'preprint': 'preprint', 'thesis': 'thesis', 'chapter': 'book chapter', 'review': 'review' }
pubLabels = {'jpaper': 'journal', 'poster': 'poster', 'other': 'other', 'cpaper': 'conference', 'talk': 'talk', 'preprint': 'preprint', 'thesis': 'thesis', 'chapter': 'chapter', 'review': 'review' }
months = {'MM': '', '00': '', '01': 'January', '02': 'February', '03': 'March', '04': 'April', '05': 'May', '06': 'June', '07': 'July', '08': 'August', '09': 'September', '10': 'October', '11': 'November', '12': 'December'}

htmlwriter = io.StringIO()
    
for ii, pubitem in enumerate(mybib):
    warninglist = []
    pubTitle = pubitem.get('Title','')
    if pubTitle == '':
        warninglist.append('Title is empty in ' + 'pub: ' + str(ii+1))
    AuthorList = pubitem.get('AuthorList',[])
    pubType = pubitem.get('Type','other')
    pubDate = pubitem.get('Date','YYYYMMDD')
    pubYear = pubDate[0:4]
    pubMonth = pubDate[4:6]
    if pubDate == 'YYYYMMDD' or len(pubDate)!=8:
        warninglist.append('Date not specified, or incorect format! Use YYYYMMDD in ' + 'pub: ' + str(ii+1))
    num_authors = len(AuthorList)
    pubExternLink = pubitem.get('ExternLink','#')
    pubURL = pubitem.get('URL','#')
    pubAbstract = pubitem.get('Abstract','')
    pubAbstract = pubAbstract.replace("&ndash;", "--")
    pubAbstract = pubAbstract.replace("&mdash;", "---")
    pubAbstract = pubAbstract.replace("&", "&amp;")
    pubAbstract = pubAbstract.replace("---", "&mdash;")
    pubAbstract = pubAbstract.replace("--", "&ndash;")
    pubPosterImg = pubitem.get('PosterImg')
    if num_authors == 0:
        warninglist.append('AuthorList is empty in ' + 'pub: ' + str(ii+1))
    #print('number of authors: {0}'.format(num_authors))
    authorliststring = ''
    
    if not isinstance(AuthorList, list): # only one author
        #print('Only one author!')
        authorstring = ''
        if isinstance(AuthorList['First'], list): # list of names given
            #print('with mutliple names!')
            for name in AuthorList['First']:
                authorstring = authorstring + name.capitalize()[0] + '. '
        else: # only one name given
            authorstring = authorstring + author['First'].capitalize()[0] + '. '
        authorliststring = authorstring + AuthorList['Last']
        #print(authorliststring)
    else:
        for ii, author in enumerate(AuthorList):
            authorstring = ''

            if isinstance(author['First'], list): # list of names given
                for name in author['First']:
                    authorstring = authorstring + name.capitalize()[0] + '. '
            else: # only one name given
                authorstring = authorstring + author['First'].capitalize()[0] + '. '
            authorstring = authorstring + author['Last']
            if ii == num_authors-1:
                authorliststring = authorliststring + ' and ' + authorstring
            elif ii > 0:
                authorliststring = authorliststring + ', ' + authorstring
            else:
                authorliststring = authorstring
    #print(authorliststring)
    
    pubConfName = pubitem.get('ConfName')
    pubConfDates = pubitem.get('ConfDates')
    pubGeneric = pubitem.get('Generic')
    pubJournal = pubitem.get('Journal')
    pubVolume = pubitem.get('Volume')
    pubIssue = pubitem.get('Issue')
    pubNumber = pubitem.get('Number')
    pubPages = pubitem.get('Pages')
    pubDOI = pubitem.get('DOI')
    
    citeString = ''
    if pubGeneric:
        citeString += pubGeneric
    if pubConfName:
        citeString += '<i>' + pubConfName + '</i>'    
    if pubJournal:
        citeString += '<i>' + pubJournal + '</i>'
    if pubVolume:
        citeString += ', vol. ' + pubVolume
    if pubNumber:
        citeString += ', no. ' + pubNumber
    if pubIssue:
        citeString += ', issue ' + pubIssue
    if pubPages:
        pubPages = pubPages.replace("--", "&ndash;")
        citeString += ', pp. ' + pubPages
    if pubJournal:
        citeString += ', ' + months[pubMonth] + ' ' + pubYear + '.'
    elif pubConfDates:
        citeString += ', ' + pubConfDates
    
        
    # write html snippet:
    with contextlib.redirect_stdout(htmlwriter):
        htmlsnippet = ('<div class="item mix ' + pubType + '" data-year="' + pubDate + '"><div class="pubmain"><div class="pubassets"> <a href="' + pubExternLink + '" class="tooltips" title="Download" target="_blank"></a><i class="icon-cloud-download"></i><a href="' + pubURL + '" class="tooltips" title="Link" target="_blank"></a></div><h4 class="pubtitle">' + pubTitle + '</h4><div class="pubauthor">''' + authorliststring + '</div><div class="pubcite"><span class="label label-' + pubLabels[pubType] + '">' + pubTypes[pubType] + '</span><span class="label label-year">' + pubYear + '</span>' + citeString + '</div></div><div class="pubdetails">')
        print(htmlsnippet, end='')
        
        if pubPosterImg:
            htmlsnippet = '<a href="' + pubExternLink + '" target="_blank"><img alt="image" src="' + pubPosterImg + '" align="left"  style="padding:0 15px 15px 0; width: 180px;"></a>'
            print(htmlsnippet, end='')
        
        htmlsnippet = '<h4>Abstract</h4><p>' + pubAbstract + '</p></div></div>'
        print(htmlsnippet, end='')
        
    print('Finished parsing ' + pubTypes[pubType] + ' "' + pubTitle[0:40] + '..." ' + '(' + pubYear + ')' )
    
    if warninglist:
        with contextlib.redirect_stdout(sys.stderr):
            for warning in warninglist:
                print(warning)
        
with open("pubs-pre.html", 'r') as f:
    htmlpre = f.read()
    
with open("pubs-post.html", 'r') as f:
    htmlpost = f.read()
    
with open("pubs.html", 'w') as f:
    f.write(htmlpre)
    f.write(htmlwriter.getvalue())
    f.write(htmlpost)









    



Finished parsing journal paper "Investigating irregularly patterned deep..." (2015)
Finished parsing journal paper "Characterizing Motor and Cognitive Effec..." (2014)
Finished parsing conference paper "Current Amplitude-Dependent Modulation o..." (2013)