YAML support is provided by PyYAML at http://pyyaml.org/. This notebook depends on it.



In [1]:

    
import yaml

The following cell provides an initial example of a note in our system.

A note is nothing more than a YAML document. The idea of notetaking is to keep it simple, so a note should make no assumptions about formatting whatsoever.

In our current thinking, we have the following sections:

title: an optional title (text)
tags: one or more keywords (text, sequence of text, no nesting)
mentions: one or more mentions (text, sequence of text, no nesting)
outline: one or more items (text, sequence of text, nesting is permitted)
dates (numeric text, sequence, must follow established historical ways of representing dates)
text (text from the source as multiline string)
bibtex, ris, or inline (text for the bibliographic item; will be syntax checked)
bibkey (text, a hopefully unique identifier for referring to this source in other Zettels)
cite: Used to cite a bibkey from the same or other notes. In addition, the citation may be represented as a list, where the first item is the bibkey and subsequent items are pages or ranges of page numbers. See below for a good example of how this will work.
note (any additional details that you wish to hide from indexing)

In most situations, freeform text is permitted. If you need to do crazy things, you must put quotes around the text so YAML can process it. However, words separated by whitespace and punctuation seems to work fine in most situations.

These all are intended to be string data, so there are no restrictions on what can be in any field; however, we will likely limit tags, mentions, dates in some way as we go forward. Fields such as bibtex, ris, or inline are also subject to validity checking.

Print the document to the console (nothing special here).



In [2]:

    
myFirstZettel="""
title: First BIB Note for Castells
tags:
  - Castells
  - Network Society
  - Charles Babbage is Awesome
  - Charles Didn't do Everything
mentions:
  - gkt
  - dbdennis
dates: 2016
cite:
  - Castells Rise 2016
  - ii-iv
  - 23-36
outline:
  - Introduction
  - - Computers
    - People
  - Conclusions
  - - Great Ideas of Computing

text: |
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam eleifend est sed diam maximus rutrum. Quisque sit amet imperdiet odio, id tristique libero. Aliquam viverra convallis mauris vel tristique. Cras ac dolor non risus porttitor molestie vel at nisi. Donec vitae finibus quam. Phasellus vehicula urna sed nibh condimentum, ultrices interdum velit eleifend. Nam suscipit dolor eu rutrum fringilla. Sed pulvinar purus purus, sit amet venenatis enim convallis a. Duis fringilla nisl sit amet erat lobortis dictum. Nunc fringilla arcu nec ex blandit, a gravida purus commodo. Vivamus lacinia tellus dui, vel maximus lacus ornare id.

  Vivamus euismod justo sit amet luctus bibendum. Integer non mi ullamcorper enim fringilla vulputate sit amet in urna. Nullam eu sodales ipsum. Curabitur id convallis ex. Duis a condimentum lorem. Nulla et urna massa. Duis in nibh eu elit lobortis vehicula. Mauris congue mauris mollis metus lacinia, ut suscipit mi egestas. Donec luctus ante ante, eget viverra est mollis vitae.

  Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis ligula ac euismod. Mauris elementum metus vel scelerisque feugiat. Vivamus bibendum massa eu pellentesque sodales. Nulla nec lacus dolor. Donec scelerisque, nibh sed placerat gravida, nunc turpis tristique nibh, ac feugiat enim massa ut eros. Nulla finibus, augue egestas hendrerit accumsan, tellus augue tempor eros, in sagittis dolor turpis nec mi. Nunc fringilla mi non malesuada aliquet.

bibkey:
  Castells Rise 1996
bibtex: |
  @book{castells_rise_1996,
    address = {Cambridge, Mass.},
    series = {Castells, {Manuel}, 1942- {Information} age . v},
    title = {The rise of the network society},
    isbn = {978-1-55786-616-5},
    language = {eng},
    publisher = {Blackwell Publishers},
    author = {Castells, Manuel},
    year = {1996},
    keywords = {Information networks., Information society., Information technology Economic aspects., Information technology Social aspects., Technology and civilization.}
  }

note:
  George likes this new format.
"""



In [3]:

    
print(myFirstZettel)









    



title: First BIB Note for Castells
tags:
  - Castells
  - Network Society
  - Charles Babbage is Awesome
  - Charles Didn't do Everything
mentions:
  - gkt
  - dbdennis
dates: 2016
cite:
  - Castells Rise 2016
  - ii-iv
  - 23-36
outline:
  - Introduction
  - - Computers
    - People
  - Conclusions
  - - Great Ideas of Computing

text: |
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam eleifend est sed diam maximus rutrum. Quisque sit amet imperdiet odio, id tristique libero. Aliquam viverra convallis mauris vel tristique. Cras ac dolor non risus porttitor molestie vel at nisi. Donec vitae finibus quam. Phasellus vehicula urna sed nibh condimentum, ultrices interdum velit eleifend. Nam suscipit dolor eu rutrum fringilla. Sed pulvinar purus purus, sit amet venenatis enim convallis a. Duis fringilla nisl sit amet erat lobortis dictum. Nunc fringilla arcu nec ex blandit, a gravida purus commodo. Vivamus lacinia tellus dui, vel maximus lacus ornare id.

  Vivamus euismod justo sit amet luctus bibendum. Integer non mi ullamcorper enim fringilla vulputate sit amet in urna. Nullam eu sodales ipsum. Curabitur id convallis ex. Duis a condimentum lorem. Nulla et urna massa. Duis in nibh eu elit lobortis vehicula. Mauris congue mauris mollis metus lacinia, ut suscipit mi egestas. Donec luctus ante ante, eget viverra est mollis vitae.

  Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis ligula ac euismod. Mauris elementum metus vel scelerisque feugiat. Vivamus bibendum massa eu pellentesque sodales. Nulla nec lacus dolor. Donec scelerisque, nibh sed placerat gravida, nunc turpis tristique nibh, ac feugiat enim massa ut eros. Nulla finibus, augue egestas hendrerit accumsan, tellus augue tempor eros, in sagittis dolor turpis nec mi. Nunc fringilla mi non malesuada aliquet.

bibkey:
  Castells Rise 1996
bibtex: |
  @book{castells_rise_1996,
    address = {Cambridge, Mass.},
    series = {Castells, {Manuel}, 1942- {Information} age . v},
    title = {The rise of the network society},
    isbn = {978-1-55786-616-5},
    language = {eng},
    publisher = {Blackwell Publishers},
    author = {Castells, Manuel},
    year = {1996},
    keywords = {Information networks., Information society., Information technology Economic aspects., Information technology Social aspects., Technology and civilization.}
  }

note:
  George likes this new format.

This shows how to load just the YAML portion of the document, resulting in a Python dictionary data structure. Observe that the Python dictionary has { key : value, ... }. So we can extract the YAML fields from the Python dictionary data structure.

Notice that when you write a YAML list of mentions, there is a nested Python list ['gkt', 'dbdennis'].



In [4]:

    
doc = yaml.load(myFirstZettel)

Closing the loop, the following shows how to iterate the keys of the data structure.



In [5]:

    
for key in doc.keys():
    print(key, "=", doc[key])









    



mentions = ['gkt', 'dbdennis']
dates = 2016
outline = ['Introduction', ['Computers', 'People'], 'Conclusions', ['Great Ideas of Computing']]
bibtex = @book{castells_rise_1996,
  address = {Cambridge, Mass.},
  series = {Castells, {Manuel}, 1942- {Information} age . v},
  title = {The rise of the network society},
  isbn = {978-1-55786-616-5},
  language = {eng},
  publisher = {Blackwell Publishers},
  author = {Castells, Manuel},
  year = {1996},
  keywords = {Information networks., Information society., Information technology Economic aspects., Information technology Social aspects., Technology and civilization.}
}

title = First BIB Note for Castells
bibkey = Castells Rise 1996
tags = ['Castells', 'Network Society', 'Charles Babbage is Awesome', "Charles Didn't do Everything"]
cite = ['Castells Rise 2016', 'ii-iv', '23-36']
note = George likes this new format.
text = Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam eleifend est sed diam maximus rutrum. Quisque sit amet imperdiet odio, id tristique libero. Aliquam viverra convallis mauris vel tristique. Cras ac dolor non risus porttitor molestie vel at nisi. Donec vitae finibus quam. Phasellus vehicula urna sed nibh condimentum, ultrices interdum velit eleifend. Nam suscipit dolor eu rutrum fringilla. Sed pulvinar purus purus, sit amet venenatis enim convallis a. Duis fringilla nisl sit amet erat lobortis dictum. Nunc fringilla arcu nec ex blandit, a gravida purus commodo. Vivamus lacinia tellus dui, vel maximus lacus ornare id.

Vivamus euismod justo sit amet luctus bibendum. Integer non mi ullamcorper enim fringilla vulputate sit amet in urna. Nullam eu sodales ipsum. Curabitur id convallis ex. Duis a condimentum lorem. Nulla et urna massa. Duis in nibh eu elit lobortis vehicula. Mauris congue mauris mollis metus lacinia, ut suscipit mi egestas. Donec luctus ante ante, eget viverra est mollis vitae.

Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis ligula ac euismod. Mauris elementum metus vel scelerisque feugiat. Vivamus bibendum massa eu pellentesque sodales. Nulla nec lacus dolor. Donec scelerisque, nibh sed placerat gravida, nunc turpis tristique nibh, ac feugiat enim massa ut eros. Nulla finibus, augue egestas hendrerit accumsan, tellus augue tempor eros, in sagittis dolor turpis nec mi. Nunc fringilla mi non malesuada aliquet.

And this shows how to get any particular item of interest. In this case, we're extracting the bibtex key so we can do something with the embedded BibTeX (e.g. print it).



In [6]:

    
print(doc['bibkey'])
print(doc['bibtex'])









    



Castells Rise 1996
@book{castells_rise_1996,
  address = {Cambridge, Mass.},
  series = {Castells, {Manuel}, 1942- {Information} age . v},
  title = {The rise of the network society},
  isbn = {978-1-55786-616-5},
  language = {eng},
  publisher = {Blackwell Publishers},
  author = {Castells, Manuel},
  year = {1996},
  keywords = {Information networks., Information society., Information technology Economic aspects., Information technology Social aspects., Technology and civilization.}
}

Adapted from http://stackoverflow.com/questions/12472338/flattening-a-list-recursively. There really must be a nicer way to do stuff like this. I will rewrite this using a walker so we can have custom processing of the list items.



In [7]:

    
def flatten(item):
    if type(item) != type([]):
        return [str(item)]
    if item == []:
        return item
    if isinstance(item[0], list):
        return flatten(item[0]) + flatten(item[1:])
    return item[:1] + flatten(item[1:])



In [8]:

    
flatten("George was here")









    Out[8]:





['George was here']



In [9]:

    
flatten(['A', ['B', 'C'], ['D', ['E']]])









    Out[9]:





['A', 'B', 'C', 'D', 'E']

Now we are onto some sqlite3 explorations.

Ordinarily, I would use some sort of mapping framework to handle database operations. However, it's not clear the FTS support is part of any ORM (yet). I will continue to research but since there is likely only one table, it might not be worth the trouble.

Next we will actually add the Zettel to the database and do a test query. Almost there.



In [10]:

    
import sqlite3

# This is for showing data structures only.

import pprint
printer = pprint.PrettyPrinter(indent=2)

class SQLiteFTS(object):  
  def __init__(self, db_name, table_name, field_names):
    self.db_name = db_name
    self.conn = sqlite3.connect(db_name)
    self.cursor = self.conn.cursor()
    
    self.table_name = table_name
    self.fts_field_names = field_names
    self.fts_field_refs = ['?'] * len(self.fts_field_names)  # for sqlite insert template generation
    self.fts_field_init = [''] * len(self.fts_field_names)
    self.fts_fields = dict(zip(self.fts_field_names, self.fts_field_refs))
    self.fts_default_record = dict(zip(self.fts_field_names, self.fts_field_init))

  def bind(self, doc):
    self.record = self.fts_default_record.copy()
    for k in doc.keys():
        if k in self.record.keys():
           self.record[k] = doc[k]
        else:
           print("Unknown fts field %s" % k)
    self.record.update(doc)
    
  def drop_table(self):
    self.conn.execute("DROP TABLE IF EXISTS %s" % self.table_name)

  def create_table(self):
    sql_fields = ",".join(self.fts_default_record.keys())
    print("CREATE VIRTUAL TABLE zettels USING fts4(%s)" % sql_fields)
    self.conn.execute("CREATE VIRTUAL TABLE zettels USING fts4(%s)" % sql_fields) 
    
  def insert_into_table(self):
    sql_params = ",".join(self.fts_fields.values())
    #printer.pprint(self.record)
    #printer.pprint(self.record.values())
    sql_insert_values = [ ",".join(flatten(value)) for value in list(self.record.values())]
    print("INSERT INTO zettels VALUES (%s)" % sql_params)
    print(self.record.keys())
    printer.pprint(sql_insert_values)
    self.conn.execute("INSERT INTO zettels VALUES (%s)" % sql_params, sql_insert_values)

  def done(self):
    self.conn.commit()
    self.conn.close()
    
sql = SQLiteFTS('zettels.db', 'zettels', ['title', 'tags', 'mentions', 'outline', 'cite', 'dates', 'summary', 'text', 'bibkey', 'bibtex', 'ris', 'inline', 'note' ])

#doc_keys = list(doc.keys())
#doc_keys.sort()
#rec_keys = list(sql.record.keys())
#rec_keys.sort()
#print("doc keys %s" % doc_keys)
#print("record keys %s" % rec_keys)

sql.drop_table()
sql.create_table()
printer.pprint(doc)
sql.bind(doc)
sql.insert_into_table()
sql.done()

#sql_insert_values = [ str(field) for field in sql.record.values()]
#print(sql_insert_values)

#print(record)









    



CREATE VIRTUAL TABLE zettels USING fts4(mentions,dates,outline,bibtex,inline,title,bibkey,ris,tags,cite,note,text,summary)
{ 'bibkey': 'Castells Rise 1996',
  'bibtex': '@book{castells_rise_1996,\n'
            '  address = {Cambridge, Mass.},\n'
            '  series = {Castells, {Manuel}, 1942- {Information} age . v},\n'
            '  title = {The rise of the network society},\n'
            '  isbn = {978-1-55786-616-5},\n'
            '  language = {eng},\n'
            '  publisher = {Blackwell Publishers},\n'
            '  author = {Castells, Manuel},\n'
            '  year = {1996},\n'
            '  keywords = {Information networks., Information society., '
            'Information technology Economic aspects., Information technology '
            'Social aspects., Technology and civilization.}\n'
            '}\n',
  'cite': ['Castells Rise 2016', 'ii-iv', '23-36'],
  'dates': 2016,
  'mentions': ['gkt', 'dbdennis'],
  'note': 'George likes this new format.',
  'outline': [ 'Introduction',
               ['Computers', 'People'],
               'Conclusions',
               ['Great Ideas of Computing']],
  'tags': [ 'Castells',
            'Network Society',
            'Charles Babbage is Awesome',
            "Charles Didn't do Everything"],
  'text': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam '
          'eleifend est sed diam maximus rutrum. Quisque sit amet imperdiet '
          'odio, id tristique libero. Aliquam viverra convallis mauris vel '
          'tristique. Cras ac dolor non risus porttitor molestie vel at nisi. '
          'Donec vitae finibus quam. Phasellus vehicula urna sed nibh '
          'condimentum, ultrices interdum velit eleifend. Nam suscipit dolor '
          'eu rutrum fringilla. Sed pulvinar purus purus, sit amet venenatis '
          'enim convallis a. Duis fringilla nisl sit amet erat lobortis '
          'dictum. Nunc fringilla arcu nec ex blandit, a gravida purus '
          'commodo. Vivamus lacinia tellus dui, vel maximus lacus ornare id.\n'
          '\n'
          'Vivamus euismod justo sit amet luctus bibendum. Integer non mi '
          'ullamcorper enim fringilla vulputate sit amet in urna. Nullam eu '
          'sodales ipsum. Curabitur id convallis ex. Duis a condimentum lorem. '
          'Nulla et urna massa. Duis in nibh eu elit lobortis vehicula. Mauris '
          'congue mauris mollis metus lacinia, ut suscipit mi egestas. Donec '
          'luctus ante ante, eget viverra est mollis vitae.\n'
          '\n'
          'Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis '
          'ligula ac euismod. Mauris elementum metus vel scelerisque feugiat. '
          'Vivamus bibendum massa eu pellentesque sodales. Nulla nec lacus '
          'dolor. Donec scelerisque, nibh sed placerat gravida, nunc turpis '
          'tristique nibh, ac feugiat enim massa ut eros. Nulla finibus, augue '
          'egestas hendrerit accumsan, tellus augue tempor eros, in sagittis '
          'dolor turpis nec mi. Nunc fringilla mi non malesuada aliquet.\n',
  'title': 'First BIB Note for Castells'}
INSERT INTO zettels VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)
dict_keys(['mentions', 'dates', 'outline', 'bibtex', 'inline', 'bibkey', 'ris', 'title', 'cite', 'tags', 'note', 'text', 'summary'])
[ 'gkt,dbdennis',
  '2016',
  'Introduction,Computers,People,Conclusions,Great Ideas of Computing',
  '@book{castells_rise_1996,\n'
  '  address = {Cambridge, Mass.},\n'
  '  series = {Castells, {Manuel}, 1942- {Information} age . v},\n'
  '  title = {The rise of the network society},\n'
  '  isbn = {978-1-55786-616-5},\n'
  '  language = {eng},\n'
  '  publisher = {Blackwell Publishers},\n'
  '  author = {Castells, Manuel},\n'
  '  year = {1996},\n'
  '  keywords = {Information networks., Information society., Information '
  'technology Economic aspects., Information technology Social aspects., '
  'Technology and civilization.}\n'
  '}\n',
  '',
  'Castells Rise 1996',
  '',
  'First BIB Note for Castells',
  'Castells Rise 2016,ii-iv,23-36',
  "Castells,Network Society,Charles Babbage is Awesome,Charles Didn't do "
  'Everything',
  'George likes this new format.',
  'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam eleifend est '
  'sed diam maximus rutrum. Quisque sit amet imperdiet odio, id tristique '
  'libero. Aliquam viverra convallis mauris vel tristique. Cras ac dolor non '
  'risus porttitor molestie vel at nisi. Donec vitae finibus quam. Phasellus '
  'vehicula urna sed nibh condimentum, ultrices interdum velit eleifend. Nam '
  'suscipit dolor eu rutrum fringilla. Sed pulvinar purus purus, sit amet '
  'venenatis enim convallis a. Duis fringilla nisl sit amet erat lobortis '
  'dictum. Nunc fringilla arcu nec ex blandit, a gravida purus commodo. '
  'Vivamus lacinia tellus dui, vel maximus lacus ornare id.\n'
  '\n'
  'Vivamus euismod justo sit amet luctus bibendum. Integer non mi ullamcorper '
  'enim fringilla vulputate sit amet in urna. Nullam eu sodales ipsum. '
  'Curabitur id convallis ex. Duis a condimentum lorem. Nulla et urna massa. '
  'Duis in nibh eu elit lobortis vehicula. Mauris congue mauris mollis metus '
  'lacinia, ut suscipit mi egestas. Donec luctus ante ante, eget viverra est '
  'mollis vitae.\n'
  '\n'
  'Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis ligula ac '
  'euismod. Mauris elementum metus vel scelerisque feugiat. Vivamus bibendum '
  'massa eu pellentesque sodales. Nulla nec lacus dolor. Donec scelerisque, '
  'nibh sed placerat gravida, nunc turpis tristique nibh, ac feugiat enim '
  'massa ut eros. Nulla finibus, augue egestas hendrerit accumsan, tellus '
  'augue tempor eros, in sagittis dolor turpis nec mi. Nunc fringilla mi non '
  'malesuada aliquet.\n',
  '']



In [14]:

    
with open("xyz.txt") as datafile:
  text = datafile.read()



In [15]:

    
print(text)









    



@misc{blahblahblah,
 title = {In Depth - In Depth: Ray Kurzweil - Book {TV}},
 url = {http://www.booktv.org/Program/7515/In+Depth+Ray+Kurzweil.aspx},
 urldate = {2011-02-11},
 keywords = {*{AddedToZettels}},
 file = {In Depth - In Depth\: Ray Kurzweil - Book TV:/Users/dbdennis/Library/Application Support/Zotero/Profiles/duztnovb.default/zotero/storage/TWWBX3QV/In+Depth+Ray+Kurzweil.html:text/html}
}



In [19]:

    
bibkey = 'blahblahblah'
bibtex = text



In [20]:

    
import yaml
from collections import OrderedDict

class quoted(str): pass

def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str): pass

def literal_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
    return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(bibkey=bibkey, bibtex=literal(bibtex))
print(yaml.dump(d))









    



bibkey: blahblahblah
bibtex: |
  @misc{blahblahblah,
   title = {In Depth - In Depth: Ray Kurzweil - Book {TV}},
   url = {http://www.booktv.org/Program/7515/In+Depth+Ray+Kurzweil.aspx},
   urldate = {2011-02-11},
   keywords = {*{AddedToZettels}},
   file = {In Depth - In Depth\: Ray Kurzweil - Book TV:/Users/dbdennis/Library/Application Support/Zotero/Profiles/duztnovb.default/zotero/storage/TWWBX3QV/In+Depth+Ray+Kurzweil.html:text/html}
  }



In [ ]: