At this point, you're probably tired of my expounding on how to use databases and would probably like some examples on what they can do. I've put together what is hopefully a nice set of examples for you using the tools I've already presented in addition to a few more I'll throw into the mix.
I'm defining a full database here for your convenience. It will include all the tools that we've used so far and some hopefully useful examples of their use together as a functioning whole. For your first time through this, don't worry too much about the minutiae of each class. Just execute all the way through to the next section, so you can play around a bit with a full database. Later, you should definitely come back and check out all the new pieces of code and look them up in SQLAlchemy's documentation.
In [1]:
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.ext.declarative import declarative_base
import math
engine = create_engine ('sqlite:///:memory:')
Base = declarative_base ()
# We're going to be using some trig functions in the database to do some great circle math, so we need to create these functions
raw_con = engine.raw_connection()
raw_con.create_function("cos", 1, math.cos)
raw_con.create_function("acos", 1, math.acos)
We'll now define a full Star object. We'll give it a convenient init method, so you can use RA and Dec tuples instead of floats. We'll also implement some great circle math, so that you can query based on proximity of stars to sky coordinates and other stars.
In [2]:
from sqlalchemy import func
from sqlalchemy.ext.hybrid import hybrid_property, hybrid_method
import numpy as np
class Star (Base):
__tablename__ = 'stars'
id = Column (Integer, primary_key = True)
name = Column (String, unique = True)
ra = Column (Float)
dec = Column (Float)
cos_dec = Column (Float)
sin_dec = Column (Float)
def __init__ (self, *args, **kwargs):
try:
kwargs ["ra"] = float (kwargs ["ra"])
except TypeError:
kwargs ["ra"] = self.raToFloat (*(kwargs ["ra"]))
try:
kwargs ["dec"] = float (kwargs ["dec"])
except TypeError:
kwargs ["dec"] = self.decToFloat (*(kwargs ["dec"]))
kwargs ["cos_dec"] = math.cos (kwargs ["dec"])
kwargs ["sin_dec"] = math.sin (kwargs ["dec"])
super (Star, self).__init__ (*args, **kwargs)
def __repr__ (self):
return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)
@classmethod
def load (cls, session, data, tags = None, convert = False, **kwargs):
filters = ('u', 'g', 'r', 'i', 'z')
if tags is None:
tags= []
for datum in data:
if convert:
star = Star (name = str (datum ['objid']), ra = np.radians (datum ['ra']), dec = np.radians (datum ['dec']))
else:
star = Star (name = str (datum ['objid']), ra = datum ['ra'], dec = datum ['dec'])
session.add (star)
for tag in tags:
star.tags.append (Tag.get (session, tag))
for filter in filters:
phot = Photometry (filter = filter, mag = datum [filter])
phot.star = star
session.commit ()
@staticmethod
def raToFloat (hours, minutes = 0.0, seconds = 0.0):
return np.radians ((hours + minutes / 60.0 + seconds / 3600.0) * 360 / 24)
@staticmethod
def decToFloat (degrees, arcmins = 0.0, arcsecs = 0.0):
return (np.radians (degrees + arcmins / 60.0 + arcsecs / 3600.0))
@hybrid_method
def greatCircleDistance (self, other):
return math.acos ((self.cos_dec * other.cos_dec * math.cos (self.ra - other.ra) + self.sin_dec * other.sin_dec) * (1.0 - 2.e-16))
@greatCircleDistance.expression
def greatCircleDistance (cls, other):
return func.acos ((cls.cos_dec * other.cos_dec * func.cos (cls.ra - other.ra) + cls.sin_dec * other.sin_dec) * (1.0 - 2.e-16))
We also implement a SkyLocation object. This is designed to hold an ra and dec and we can use it to calculate distances between fixed sky locations and individual stars. This can be used, for example, to calculate the angular separation between a star and the zenith to filter by airmass.
In [14]:
from sidereal import LatLon, SiderealTime, AltAz
from datetime import datetime
import numpy as np
class SkyLocation (object):
def __init__ (self, ra, dec):
try:
self.ra = float (ra)
except TypeError:
self.ra = Star.raToFloat (*ra)
try:
self.dec = float (dec)
except TypeError:
self.dec = Star.decToFloat (*dec)
self.cos_dec = math.cos (self.dec)
self.sin_dec = math.sin (self.dec)
super (SkyLocation, self).__init__ ()
def greatCircleDistance (self, other):
return math.acos (self.cos_dec * other.cos_dec * math.cos(self.ra - other.ra) + self.sin_dec * other.sin_dec)
@classmethod
def zenith (cls, latitude, longitude, time = None):
"""
time should be a datetime object or None to use the current time
"""
if time is None:
time = datetime.now ()
altaz = AltAz (math.pi / 2.0, 0.0)
radec = altaz.raDec (SiderealTime.fromDatetime (time), LatLon (np.radians (latitude), np.radians (longitude)))
return cls (radec.ra, radec.dec)
We also implement the Photometry class from the One to Many lesson to show some applications of the photometry class.
In [4]:
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship, backref
class Photometry (Base):
__tablename__ = 'photometry'
id = Column (Integer, primary_key = True)
filter = Column (String)
mag = Column (Float)
date = Column (Float)
comments = Column (String)
star_id = Column (Integer, ForeignKey("stars.id"))
star = relationship ("Star", backref = backref ("photometry_set",
cascade="all, delete-orphan", lazy = "dynamic"))
We can also implement the Tag class from the Many to Many lesson to show some of its applications.
In [5]:
from sqlalchemy import Table
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.orm.exc import NoResultFound
star_tag = Table('star_tag', Base.metadata,
Column('star_id', Integer, ForeignKey('stars.id')),
Column('tag_id', Integer, ForeignKey('tags.id'))
)
class Tag (Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key = True)
tag = Column(String(50), nullable=False, unique=True)
stars = relationship ('Star', secondary = star_tag, backref = 'tags')
@classmethod
def get (cls, session, tag):
try:
return session.query (cls).filter_by (tag = tag).one ()
except NoResultFound:
return cls (tag = tag)
def __repr__ (self):
return "<Tag %s>" % self.tag
In [6]:
from sqlalchemy.orm import sessionmaker
Base.metadata.create_all (engine)
Session = sessionmaker (bind = engine)
We've now set up a database, and I'd like to show you a few neat things you can do with it, but first, we need to populate it. I have three datasets for you: some M3 photometry, some M13 photometry, and a list of the 10 brightest stars in the night sky. We're going to put all three into the database and look at some applications.
In [7]:
import numpy as np
session = Session ()
# Load the M3 dataset from a csv file using numpy, tag all the data as "M3"
Star.load (session, np.genfromtxt ("sdss-dr7-m3.csv", names = True, delimiter = ',', dtype = None), tags = ["M3"], convert = True)
In [8]:
import numpy as np
# Load the M13 dataset from a csv file using numpy, tag all the data as "M13"
Star.load (session, np.genfromtxt ("sdss-dr7-m13.csv", names = True, delimiter = ',', dtype = None), tags = ["M13"], convert = True)
In [9]:
# The 10 brightest stars in the night sky, tagged as "Bright"
sirius = Star (name = "Sirius", ra = (6, 45, 08.9), dec = (-16, 42, 58))
sirius.tags.append (Tag.get (session, "Bright"))
session.add (sirius)
canopus = Star (name = "Canopus", ra = (6, 23, 57.1), dec = (-52, 41, 45))
canopus.tags.append (Tag.get (session, "Bright"))
session.add (canopus)
rigil = Star (name = "Rigil Kentaurus", ra = (14, 39, 35.9), dec = (-60, 50, 7))
rigil.tags.append (Tag.get (session, "Bright"))
session.add (rigil)
arcturus = Star (name = "Arcturus", ra = (14, 15, 39.7), dec = (19, 10, 57))
arcturus.tags.append (Tag.get (session, "Bright"))
session.add (arcturus)
vega = Star (name = "Vega", ra = (18, 36, 56.3), dec = (38, 47, 1))
vega.tags.append (Tag.get (session, "Bright"))
session.add (vega)
capella = Star (name = "Capella", ra = (5, 16, 41.4), dec = (45, 59, 53))
capella.tags.append (Tag.get (session, "Bright"))
session.add (capella)
rigel = Star (name = "Rigel", ra = (5, 14, 32.3), dec = (-8, 12, 6))
rigel.tags.append (Tag.get (session, "Bright"))
session.add (rigel)
procyon = Star (name = "Procyon", ra = (7, 39, 18.1), dec = (5, 13, 30))
procyon.tags.append (Tag.get (session, "Bright"))
session.add (procyon)
achernar = Star (name = "Achernar", ra = (1, 37, 42.9), dec = (-57, 14, 12))
achernar.tags.append (Tag.get (session, "Bright"))
session.add (achernar)
betelgeuse = Star (name = "Betelgeuse", ra = (5, 55, 10.3), dec = (7, 24, 25))
betelgeuse.tags.append (Tag.get (session, "Bright"))
session.add (betelgeuse)
session.commit ()
In [10]:
import matplotlib.pyplot as plt
%matplotlib inline
In [11]:
import numpy as np
query = session.query (Star, Tag).join (Tag, Star.tags)
results = query.all ()
tagToColor = {"M13": "blue", "M3": "green", "Bright": "white"}
ras = np.array ([star.ra for star, tag in results])
decs = np.array ([star.dec for star, tag in results])
colors = np.array ([tagToColor [tag.tag] for star, tag in results])
fig = plt.figure ()
ax = plt.subplot (111, projection = "mollweide")
ax.scatter (ras - math.pi, decs, c = colors, s = 50)
ax.grid (True)
We can use the tags to filter and color by dataset, and we can even add filters that are functions of both color and magnitude.
There are a few interesting features we're using here. The big one is aliasing. An alias is a reference to a table in the database, but occasionally, we may want to filter by more than one entry in a database, e.g. when filtering by color. To filter by color, we need two Photometry type objects, and the way we specify this is with aliases.
In [12]:
import numpy as np
from sqlalchemy.orm import aliased
# Set up the aliases to the Photometry object
photog = aliased (Photometry)
photor = aliased (Photometry)
# We'll need to set up a query that returns two photometry objects and a Tag; we join the Tag table to the Star table
query = session.query (photog, photor, Tag).join (Tag, Star.tags)
# We join the photog Photometry table to the Star table and then filter by forcing photog to be a "g" filter
query = query.join (photog, Star.photometry_set).filter (photog.filter == "g")
# We repeart for photor
query = query.join (photor, Star.photometry_set).filter (photor.filter == "r")
# We then construct a filter to filter based on color and magnitude
# Try commenting out this next line
query = query.filter (5 * (photog.mag - photor.mag) + 20 > photog.mag)
# We separate the query into one for each cluster
m3 = query.filter (Star.tags.contains (Tag.get (session, "M3"))).all ()
m13 = query.filter (Star.tags.contains (Tag.get (session, "M13"))).all ()
m3gmags = np.array ([photog.mag for photog, photor, tag in m3])
m3rmags = np.array ([photor.mag for photog, photor, tag in m3])
m13gmags = np.array ([photog.mag for photog, photor, tag in m13])
m13rmags = np.array ([photor.mag for photog, photor, tag in m13])
fig = plt.figure ()
ax = plt.subplot (111)
ax.scatter (m3gmags - m3rmags, m3gmags, s = 5, lw = 0, c = "green", label = "M3")
ax.scatter (m13gmags - m13rmags, m13gmags, s = 5, lw = 0, c = "blue", label = "M13")
ax.set_xlabel ("g - r")
ax.set_ylabel ("g")
ax.set_ylim ((25, 15))
ax.set_xlim ((0, 2))
ax.legend ()
Out[12]:
We can use our database to query for any stars in our database that are currently observable, and we can prioritize them by airmass (or by whatever metric you desire). This could be useful in the construction of automated queues that choose the best targets on any given night.
In [15]:
import math
# We generate a SkyLocation object at the current zenith and specify our desired maximum airmass
zenith = SkyLocation.zenith (36.974117, -122.030796)
maxAirmass = 2.0
# We want the query to return a Star object and its distance from the zenith
query = session.query (Star, Star.greatCircleDistance (zenith)).filter (Star.tags.contains (Tag.get (session, "Bright")))
# We filter by our maximum airmass, making sure the angle from zenith doesn't exceed the desired airmass
query = query.filter (Star.greatCircleDistance (zenith) < math.acos (1.0 / maxAirmass))
# We order by the distance from the zenith so that the stars appear in order of the best potential observations
for star, dist in query.order_by (Star.greatCircleDistance (zenith)):
print ("Found visible star, %s, with airmass %f" % (star.name, 1.0 / math.cos (dist)))
In [16]:
from sqlalchemy.orm import aliased
from sqlalchemy import func
# We need two star aliases because we're comparing distances
first = aliased (Star)
second = aliased (Star)
# We want the query to return the star and the distance to the nearest neighbor (not including itself)
query = session.query (first, func.min (first.greatCircleDistance (second))).filter (first.id != second.id)
# Both stars should be members of M3
query = query.filter (first.tags.contains (Tag.get (session, "M3"))).filter (second.tags.contains (Tag.get (session, "M3")))
# Since we're returning the min of the collection, we need to group by the first star.
# This would return a group of results, except that the min function now takes the minimum of each group
query = query.order_by (first.greatCircleDistance (second)).group_by (first)
results = query.all ()
distances = np.array ([distance for star, distance in results])
Now we have a numpy array that contains, for every star, the distance to the nearest star in order of increasing separation. We can do any number of things with this information. For example, we can look at the probability density function of this information.
In [17]:
import numpy as np
values, base = np.histogram (distances, bins = np.logspace(np.log10 (min (distances)), np.log10 (max (distances)), 50))
fig = plt.figure ()
ax = plt.subplot (111)
ax.plot (base [:-1], values)
ax.set_xscale ("log")
# ax.set_yscale ("log")
ax.set_xlabel ("Angular Separation to Nearest Neighbor (Radians)")
Out[17]:
In [47]:
Hopefully, I've provided you with a series of useful tools that can be applied to your own unique data set, whatever that may be. Databases are extremely powerful ways to query and organize data, but like any tool, their features can be abused.
First, there are many examples (particularly with homogenous data sets) where a database is the extremely powerful tool you don't need. A small, homogenous data table is almost always easier to deal with in the form of numpy arrays. Relational databases are much more useful when dealing with large, heterogeneous data sets. In addition, excessively building relationships can slow down your database, particularly if your database contains huge numbers of entries.
Tha
A final note: all the work here has been done with sqlite databases, which are nice for learning purposes but do not scale well to production-level code. Now that you're familiar with the basics, I definitely recommend looking through some of the SQLAlchemy documentation here.
Good luck!
In [ ]: