The :mod:diogenes.read module provides tools for reading data from external sources into Diogenes' preferred Numpy
structured array format.
The module can read from either:
In [34]:
import diogenes
sample_csv_text = 'id,name,age\n0,Anne,57\n1,Bill,76\n2,Cecil,26\n'
with open('sample.csv', 'w') as csv_in:
csv_in.write(sample_csv_text)
sample_table = diogenes.read.open_csv('sample.csv')
sample_table is a structured array (more specifically, a record array).
In [35]:
print sample_table.dtype
In [36]:
print sample_table
In [37]:
remote_csv = diogenes.read.open_csv_url('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv',
delimiter=';')
print remote_csv.dtype
In [38]:
print remote_csv[:10]
We read from, and write to, databases using :func:diogenes.read.read.connect_sql. When we pass an
SQLAlchemy connection string to connect_sql, we
get an instance of :class:diogenes.read.read.SQLConnection, which can run SQL queries with :meth:diogenes.read.read.SQLConnection.execute in a way that resembles (but does not strictly adhere to) DBAPI 2.0.
In [39]:
conn = diogenes.read.read.connect_sql('sqlite://')
conn.execute('CREATE TABLE sample_table (id INT, name TEXT, age INT)')
for row in sample_table:
conn.execute('INSERT INTO sample_table (id, name, age) VALUES (?, ?, ?)', row)
An important difference between most Python SQL libraries and Diogenes is that Diogenes returns queries in structured arrays.
In [40]:
sql_result = conn.execute('SELECT * FROM sample_table')
print sql_result.dtype
In [41]:
print sql_result
In [41]: