YAML support is provided by PyYAML at http://pyyaml.org/. This notebook depends on it.
In [1]:
import yaml
The following cell provides an initial example of a note in our system.
A note is nothing more than a YAML document. The idea of notetaking is to keep it simple, so a note should make no assumptions about formatting whatsoever.
In our current thinking, we have the following sections:
In most situations, freeform text is permitted. If you need to do crazy things, you must put quotes around the text so YAML can process it. However, words separated by whitespace and punctuation seems to work fine in most situations.
These all are intended to be string data, so there are no restrictions on what can be in any field; however, we will likely limit tags, mentions, dates in some way as we go forward. Fields such as bibtex, ris, or inline are also subject to validity checking.
Print the document to the console (nothing special here).
In [2]:
myFirstZettel="""
title: First BIB Note for Castells
tags:
- Castells
- Network Society
- Charles Babbage is Awesome
- Charles Didn't do Everything
mentions:
- gkt
- dbdennis
dates: 2016
cite:
- Castells Rise 2016
- ii-iv
- 23-36
outline:
- Introduction
- - Computers
- People
- Conclusions
- - Great Ideas of Computing
text: |
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam eleifend est sed diam maximus rutrum. Quisque sit amet imperdiet odio, id tristique libero. Aliquam viverra convallis mauris vel tristique. Cras ac dolor non risus porttitor molestie vel at nisi. Donec vitae finibus quam. Phasellus vehicula urna sed nibh condimentum, ultrices interdum velit eleifend. Nam suscipit dolor eu rutrum fringilla. Sed pulvinar purus purus, sit amet venenatis enim convallis a. Duis fringilla nisl sit amet erat lobortis dictum. Nunc fringilla arcu nec ex blandit, a gravida purus commodo. Vivamus lacinia tellus dui, vel maximus lacus ornare id.
Vivamus euismod justo sit amet luctus bibendum. Integer non mi ullamcorper enim fringilla vulputate sit amet in urna. Nullam eu sodales ipsum. Curabitur id convallis ex. Duis a condimentum lorem. Nulla et urna massa. Duis in nibh eu elit lobortis vehicula. Mauris congue mauris mollis metus lacinia, ut suscipit mi egestas. Donec luctus ante ante, eget viverra est mollis vitae.
Vivamus in purus in erat dictum scelerisque. Aliquam dictum quis ligula ac euismod. Mauris elementum metus vel scelerisque feugiat. Vivamus bibendum massa eu pellentesque sodales. Nulla nec lacus dolor. Donec scelerisque, nibh sed placerat gravida, nunc turpis tristique nibh, ac feugiat enim massa ut eros. Nulla finibus, augue egestas hendrerit accumsan, tellus augue tempor eros, in sagittis dolor turpis nec mi. Nunc fringilla mi non malesuada aliquet.
bibkey:
Castells Rise 1996
bibtex: |
@book{castells_rise_1996,
address = {Cambridge, Mass.},
series = {Castells, {Manuel}, 1942- {Information} age . v},
title = {The rise of the network society},
isbn = {978-1-55786-616-5},
language = {eng},
publisher = {Blackwell Publishers},
author = {Castells, Manuel},
year = {1996},
keywords = {Information networks., Information society., Information technology Economic aspects., Information technology Social aspects., Technology and civilization.}
}
note:
George likes this new format.
"""
In [3]:
print(myFirstZettel)
This shows how to load just the YAML portion of the document, resulting in a Python dictionary data structure. Observe that the Python dictionary has { key : value, ... }. So we can extract the YAML fields from the Python dictionary data structure.
Notice that when you write a YAML list of mentions, there is a nested Python list ['gkt', 'dbdennis'].
In [4]:
doc = yaml.load(myFirstZettel)
Closing the loop, the following shows how to iterate the keys of the data structure.
In [5]:
for key in doc.keys():
print(key, "=", doc[key])
And this shows how to get any particular item of interest. In this case, we're extracting the bibtex key so we can do something with the embedded BibTeX (e.g. print it).
In [6]:
print(doc['bibkey'])
print(doc['bibtex'])
Adapted from http://stackoverflow.com/questions/12472338/flattening-a-list-recursively. There really must be a nicer way to do stuff like this. I will rewrite this using a walker so we can have custom processing of the list items.
In [7]:
def flatten(item):
if type(item) != type([]):
return [str(item)]
if item == []:
return item
if isinstance(item[0], list):
return flatten(item[0]) + flatten(item[1:])
return item[:1] + flatten(item[1:])
In [8]:
flatten("George was here")
Out[8]:
In [9]:
flatten(['A', ['B', 'C'], ['D', ['E']]])
Out[9]:
Now we are onto some sqlite3 explorations.
Ordinarily, I would use some sort of mapping framework to handle database operations. However, it's not clear the FTS support is part of any ORM (yet). I will continue to research but since there is likely only one table, it might not be worth the trouble.
Next we will actually add the Zettel to the database and do a test query. Almost there.
In [10]:
import sqlite3
# This is for showing data structures only.
import pprint
printer = pprint.PrettyPrinter(indent=2)
class SQLiteFTS(object):
def __init__(self, db_name, table_name, field_names):
self.db_name = db_name
self.conn = sqlite3.connect(db_name)
self.cursor = self.conn.cursor()
self.table_name = table_name
self.fts_field_names = field_names
self.fts_field_refs = ['?'] * len(self.fts_field_names) # for sqlite insert template generation
self.fts_field_init = [''] * len(self.fts_field_names)
self.fts_fields = dict(zip(self.fts_field_names, self.fts_field_refs))
self.fts_default_record = dict(zip(self.fts_field_names, self.fts_field_init))
def bind(self, doc):
self.record = self.fts_default_record.copy()
for k in doc.keys():
if k in self.record.keys():
self.record[k] = doc[k]
else:
print("Unknown fts field %s" % k)
self.record.update(doc)
def drop_table(self):
self.conn.execute("DROP TABLE IF EXISTS %s" % self.table_name)
def create_table(self):
sql_fields = ",".join(self.fts_default_record.keys())
print("CREATE VIRTUAL TABLE zettels USING fts4(%s)" % sql_fields)
self.conn.execute("CREATE VIRTUAL TABLE zettels USING fts4(%s)" % sql_fields)
def insert_into_table(self):
sql_params = ",".join(self.fts_fields.values())
#printer.pprint(self.record)
#printer.pprint(self.record.values())
sql_insert_values = [ ",".join(flatten(value)) for value in list(self.record.values())]
print("INSERT INTO zettels VALUES (%s)" % sql_params)
print(self.record.keys())
printer.pprint(sql_insert_values)
self.conn.execute("INSERT INTO zettels VALUES (%s)" % sql_params, sql_insert_values)
def done(self):
self.conn.commit()
self.conn.close()
sql = SQLiteFTS('zettels.db', 'zettels', ['title', 'tags', 'mentions', 'outline', 'cite', 'dates', 'summary', 'text', 'bibkey', 'bibtex', 'ris', 'inline', 'note' ])
#doc_keys = list(doc.keys())
#doc_keys.sort()
#rec_keys = list(sql.record.keys())
#rec_keys.sort()
#print("doc keys %s" % doc_keys)
#print("record keys %s" % rec_keys)
sql.drop_table()
sql.create_table()
printer.pprint(doc)
sql.bind(doc)
sql.insert_into_table()
sql.done()
#sql_insert_values = [ str(field) for field in sql.record.values()]
#print(sql_insert_values)
#print(record)
In [14]:
with open("xyz.txt") as datafile:
text = datafile.read()
In [15]:
print(text)
In [19]:
bibkey = 'blahblahblah'
bibtex = text
In [20]:
import yaml
from collections import OrderedDict
class quoted(str): pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str): pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(bibkey=bibkey, bibtex=literal(bibtex))
print(yaml.dump(d))
In [ ]: