Getting data into Python

Outline:

  • ASCII Files: numpy.loadtxt, astropy.io.ascii, read_csv (pandas)
  • Reading/Writing FITS files: astropy.io.fits, fitsio
  • IDL .sav files: scipy.readsav
  • Pandas

CSV data


In [ ]:
import os
import numpy as np
import requests

In [ ]:
# get some CSV data from the SDSS SQL server
URL = "http://skyserver.sdss.org/dr12/en/tools/search/x_sql.aspx"

cmd = """
SELECT TOP 1000
    p.u, p.g, p.r, p.i, p.z, s.class, s.z, s.zerr
FROM
    PhotoObj AS p
JOIN
    SpecObj AS s ON s.bestobjid = p.objid
WHERE
    p.u BETWEEN 0 AND 19.6 AND
    p.g BETWEEN 0 AND 20 AND
    s.class = 'GALAXY'
"""
if not os.path.exists('galaxy_colors.csv'):
    cmd = ' '.join(map(lambda x: x.strip(), cmd.split('\n')))
    response = requests.get(URL, params={'cmd': cmd, 'format':'csv'})
    with open('galaxy_colors.csv', 'w') as f:
        f.write(response.text)

In [ ]:
!ls -lh galaxy_colors.csv

In [ ]:
!more galaxy_colors.csv

Using numpy.loadtxt


In [ ]:
dtype=[('u', 'f8'),
       ('g', 'f8'),
       ('r', 'f8'),
       ('i', 'f8'),
       ('z', 'f8'),
       ('class', 'S10'),
       ('redshift', 'f8'),
       ('redshift_err', 'f8')]
data = np.loadtxt('galaxy_colors.csv', skiprows=2, delimiter=',', dtype=dtype)

In [ ]:
data[:10]

Using astropy.io.ascii


In [ ]:
from astropy.io import ascii

In [ ]:
data = ascii.read('galaxy_colors.csv', format='csv', comment='#')

In [ ]:
type(data)

In [ ]:
data[:10]

Using pandas


In [ ]:
import pandas

In [ ]:
data = pandas.read_csv('galaxy_colors.csv', comment='#')

In [ ]:
type(data)

In [ ]:
data.head()

In [ ]:
data.describe()

In [ ]:
# Pandas reads from *lots* of different data sources
pandas.read_

Specialized text formats


In [ ]:
# get some data from CDS
prefix = "http://cdsarc.u-strasbg.fr/vizier/ftp/cats/J/ApJ/686/749/"
for fname in ["ReadMe", "table10.dat"]:
    if not os.path.exists(fname):
        response = requests.get(prefix + fname)
        with open(fname, 'w') as f:
            f.write(response.text)

In [ ]:
!cat table10.dat

In [ ]:
!cat ReadMe

In [ ]:
# must specify the "readme" here.
data = ascii.read("table10.dat", format='cds', readme="ReadMe")

In [ ]:
data

Reading FITS files

Two options: astropy.io.fits (formerly pyfits) and fitsio.


In [ ]:
# get an SDSS image (can search for images from http://dr12.sdss3.org/fields/)
if not os.path.exists("frame-g-006728-4-0121.fits.bz2"):
    !wget http://dr12.sdss3.org/sas/dr12/boss/photoObj/frames/301/6728/4/frame-g-006728-4-0121.fits.bz2
if not os.path.exists("frame-g-006728-4-0121.fits"):
    !bunzip2 frame-g-006728-4-0121.fits.bz2

astropy.io.fits


In [ ]:
from astropy.io import fits

hdulist = fits.open("frame-g-006728-4-0121.fits")

In [ ]:
hdulist

In [ ]:
hdulist.info()

In [ ]:
hdulist[0].data

In [ ]:
hdulist[0].header

fitsio

(pip install --no-deps fitsio)

  • Faster (mainly for tables)
  • Does a better job with ASCII table extensions

In [ ]:
import fitsio

In [ ]:
f = fitsio.FITS("frame-g-006728-4-0121.fits")

In [ ]:
# summary of file HDUs
f

In [ ]:
# summary of first HDU
f[0]

In [ ]:
# Summary of 3rd HDU
f[2]

In [ ]:
# Actually read the data.
data = f[0].read()
data

Salvaging data from IDL

scipy.io.readsav: Formerly a separate idlsave module by Tom Robitaille.


In [ ]:
from scipy.io import readsav

In [ ]:
# Note: won't work unless you have this sav file!
data = readsav("150623434_det8_8100keV.sav")

In [ ]:
data

In [ ]:
len(data.events)

Clean up downloaded files


In [ ]:
!rm galaxy_colors.csv
!rm ReadMe
!rm table10.dat
!rm frame-g-006728-4-0121.fits.bz2
!rm frame-g-006728-4-0121.fits