Now that we have our skeleton API built and we can pull the metadata out of MP3 files, we can build an actual data driven API instead of processing everything by hand.
Why no TinyTag? I let TinyTag go after figuring out that the encoding error is centered there. Originally, I had not used Mutagen because it had given me issues. Turns out Mutagen supports Python2. There's a fork -- MutagenX -- that supports Python3. Problem solved.
Setting up the data driven API is pretty straight forward. However, I'm going to touch on a few more advanced things in here because simply doing an incremental development blog post would be terribly boring to me. The first slightly advanced thing is going to be modularizing the Flask app. Here's what my directory layout looks like right now:
FlaskAmp/
├── app
│ ├── config.py
│ ├── dev-sqlite.db
│ ├── factory.py
│ ├── __init__.py
│ ├── models.py
│ ├── serializers.py
│ └── utils.py
├── LICENSE
├── manager.py
├── README.md
└── requirements
It's a pretty standard layout. But it bears repeating. Under the app
directory is where our action takes place. We'll go over each of these files.
In [ ]:
#app/models.py
from flask.ext.sqlalchemy import SQLAlchemy
db = SQLAlchemy()
class Artist(db.Model):
__tablename__ = "artists"
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.Unicode, unique=True)
albums = db.relationship('Album', backref=db.backref('artist', uselist=False), order_by='Album.name')
tracks = db.relationship('Track', backref=db.backref('artist', uselist=False), lazy='noload')
class Album(db.Model):
__tablename__ = 'albums'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.Unicode, unique=True)
artist_id = db.Column('artist_id', db.Integer, db.ForeignKey('artists.id'))
tracks = db.relationship(
'Track', backref=db.backref('album', uselist=False),
lazy='immediate', order_by='Track.position')
class Track(db.Model):
__tablename__ = 'tracks'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.Unicode, primary_key=True)
length = db.Column('length', db.Integer)
position = db.Column('position', db.Integer)
location = db.Column('location', db.Unicode)
stream = db.Column('stream', db.String)
artist_id = db.Column('artist_id', db.Integer, db.ForeignKey('artists.id'))
album_id = db.Column('album_id', db.Integer, db.ForeignKey('albums.id'))
This is a pretty standard setup for Flask-SQLAlchemy models. The only thing most people might not be familar with is the lazy='noload'
on the relationship from Artist to Track. This tell SQLAlchemy to never load this data. Why? For now, we're not interested in loading it. Given that I'm already working on a much more indepth model structure, this'll get changed in the future.
In [ ]:
#app/serializers.py
#this is pretty similar to before
from flask.ext.marshmallow import Marshmallow
ma = Marshmallow()
class ArtistSerializer(ma.Serializer):
albums = ma.Nested('AlbumSerializer', exclude=('artist',), many=True)
_links = ma.Hyperlinks({
'self':ma.URL('artist', id='<id>'),
'collection':ma.URL('artist')
})
class Meta:
fields = ('id', 'name', 'albums', 'links')
class AlbumSerializer(ma.Serializer):
artist = ma.Nested('ArtistSerializer', exclude=('albums',))
tracks = ma.Nested('TrackSerializer', exclude=('albums', 'artist'), many=True)
_links = ma.Hyperlinks({
'self':ma.URL('album', id='<id>'),
'collection':ma.URL('album')
})
class Meta:
fields = ('id', 'name', 'artist', 'tracks', 'links')
class TrackSerializer(ma.Serializer):
albums = ma.Nested('AlbumSerializer', exclude=('artist', 'tracks'), many=True)
artist = ma.Nested('ArtistSerializer', exclude=('album'))
length = ma.Method('convert_time')
_links = ma.Hyperlinks({
'self':ma.URL('track', id='<id>'),
'collection':ma.URL('track')
})
def convert_time(self, track):
seconds = track.length
mins = seconds//60
seconds -= (mins*60)
return "{!s:2>0}:{!s:2>0}".format(mins, seconds)
class Meta:
fields = ('id', 'artist', 'album', 'name', 'position', 'length', 'links', 'stream')
The only thing I'll note here is the use of Meta.fields
rather than additional and that we're only exposing the streaming id for tracks to the serializer (and thus through the API) instead of both it and the file location. Giving away information like a file structure just makes it easier for people with malicious intent to gain access to the system.
In [ ]:
#app/config.py
import os
basedir = os.path.abspath(os.path.dirname(__file__))
class BaseConfig:
SQLALCHEMY_COMMIT_ON_TEARDOWN = True
SQLALCHEMY_RECORD_QUERIES = False
@staticmethod
def init_app(app):
pass
class DevConfig(BaseConfig):
DEBUG = True
SQLALCHEMY_DATABASE_URI = os.environ.get('DEV-DATABASE-URI') or 'sqlite:///{}'.format(os.path.join(basedir, 'dev-sqlite.db'))
configs = {
'dev' : DevConfig,
'default' : DevConfig
}
In [ ]:
#app/factory.py
from flask import Flask
from .config import configs
def create_app(env, exts=()):
app = Flask(__name__)
config = configs.get(env, 'default')
app.config.from_object(config)
config.init_app(app)
for ext in exts:
ext.init_app(app)
return app
You'll see a lot of people turning to a factory for creating their application. However, they'll explicitly initialize their extensions one at a time. In this case, we pass in an environment and a list of extensions to initialize instead. I find that this is a little more flexible and doesn't rely on abusing scoping in Python.
In [ ]:
#app/__init__.py
from flask import request, abort, send_file
from flask.ext.restful import Api, Resource
from .models import Artist, Album, Track
from .serializers import ArtistSerializer, AlbumSerializer, TrackSerializer
from .factory import create_app
app = create_app('dev', exts=[ma,db])
api = Api(app) # I've had strange things happen initializing the Restful API afterwards...
class SingleArtist(Resource):
def get(self, id):
artist = Artist.query.get_or_404(id)
return {'artist' : ArtistSerializer(artist)}
class SingleAlbum(Resource):
def get(self, id):
album = Album.query.get_or_404(id)
return {'album' : AlbumSerializer(artist)}
class SingleTrack(Resource):
def get(self, id):
track = Track.query.get_or_404(id)
return {'track' : TrackSerializer(artist)}
class ListArtist(Resource):
def get(self):
page = request.args.get('page', default=1, type=int)
limit = request.args.get('limit', default=10, type=int)
q = Artist.query.order_by(Artist.name)
q = q.paginate(page, limit, error_out=False)
return {'artists' : ArtistSerializer(q.items, many=True)}
class ListAlbum(Resource):
def get(self):
page = request.args.get('page', default=1, type=int)
limit = request.args.get('limit', default=10, type=int)
q = Album.query.join(Artist).filter(Artist.id==Album.artist_id)
q = q.order_by(Artist.name, Album.name)
q = q.paginate(page, limit, error_out=False)
return {'artists' : AlbumSerializer(q.items, many=True)}
class ListTrack(Resource):
def get(self):
page = request.args.get('page', default=1, type=int)
limit = request.args.get('limit', default=10, type=int)
q = Track.query.join(Artist, Album)
q = q.filter(Artist.id==Album.artist_id, Album.id==Track.album_id)
q = q.order_by(Artist.name, Album.name, Track.position)
q = q.paginate(page, limit, error_out=False)
return {'artists' : TrackSerializer(q.items, many=True)}
@app.route('/stream/<stream_id>/', methods=['GET'])
def stream(stream_id):
track = Track.query.filter(Track.stream==stream_id).first()
if not track:
abort(404)
return send_file(track.location)
api.add_resource(SingleArtist, '/artist/<int:id>/', endpoint='artist')
api.add_resource(ListArtist, '/artist/', endpoint='artists')
api.add_resource(SingleAlbum, '/album/<int:id>/', endpoint='album')
api.add_resource(ListAlbum, '/album/', endpoint='albums')
api.add_resource(SingleTrack, '/track/<int:id>/', endpoint='track')
api.add_resource(ListTrack, '/track/', endpoint='tracks')
There's a few differences here. I'll address a few:
paginate
will return several helpful bits such as has_next
, has_prev
, next_num
and prev_num
among others.abort
allows you to redirect to an error page. Flask-SQLAlchemy implements a get_or_404
helper which does something similar.send_file
tells Flask to send a file response instead of something more typical.This is all you need for your API to interact and stream. Except data. That's what manager.py
is for.
In [ ]:
#manager.py
from app import app, db, ma, models, serializers, utils
from flask.ext.script import Manager
manager = Manager(app)
@manager.option('-d', '--dir', dest='dir')
def add(dir):
utils.store_directory(dir)
@manager.shell
def _shell_context():
return dict(app=app, db=db, ma=ma, models=models, serials=serializers, utils=utils)
if __name__ == "__main__":
manager.run()
Flask-Script is an incredibly useful extension that allows you to interact with your application on the command line. Consider this an extremely brief explaination.
The utils file is the only one I haven't covered yet. These are functions and things that will assist in the running of the application but are otherwise unconnected to anything else.
In [ ]:
#app/utils.py
import os
from uuid import uuid4
from mutagenx import File
from .models import db, Album, Artist, Track
valid_types=('m4a', 'flac', 'mp3', 'ogg', 'oga')
def find(basedir, valid_types=valid_types):
'''Utilize os.walk to only select out the files we'd like to potentially
parse and yield them one at a time.'''
basedir = os.path.abspath(basedir)
for current, dirs, files in os.walk(basedir):
if not files:
continue
for file in files:
if file.endswith(valid_types):
yield os.path.join(os.path.abspath(current), file)
def adaptor(track):
return dict(
artist=track['artist'][0],
album=track['album'][0],
position=int(track['tracknumber'][0].split('/')[0]),
length=int(track.info.length),
location=track.filename,
name=track['title'][0],
stream=str(uuid4())
)
def adapt_track(track, adaptor=adaptor):
info = adaptor(track)
artist = Artist.query.filter(Artist.name == info['artist']).first()
if not artist:
artist = Artist(name=info['artist'])
db.session.add(artist)
info['artist'] = artist
album = Album.query.filter(Album.name==info['album']).first()
if not album:
album = Album(name=info['album'], artist=artist)
db.session.add(album)
info['album'] = album
track = Track.query.filter(Track.name==info['name'])
track = track.filter(Track.album_id==album.id)
track = track.first()
if not track:
track = Track(**info)
db.session.add(track)
return artist, album, track
def store_directory(basedir, valid_types=valid_types, adaptor=adaptor):
for file in find(basedir, valid_types):
file = File(file, easy=True)
try:
artist, album, track = adapt_track(file)
print(
" * Storing: {0.name} - {1.name} - {2.position:2>0} - {2.name}"
"".format(artist, album, track))
db.session.commit()
except KeyError:
pass
Now, to add information to the database, you'll simply do:
python manager.py add -d /home/justanr/Music/Acid Bath
And you'll get back something like this:
* Storing: Acid Bath - When the Kite String Pops - 5 - Jezebel
* Storing: Acid Bath - When the Kite String Pops - 11 - The Morticians Flame
* Storing: Acid Bath - When the Kite String Pops - 14 - Cassie Eats Cockroaches
* Storing: Acid Bath - When the Kite String Pops - 4 - Finger Paintings of the Insane
* Storing: Acid Bath - When the Kite String Pops - 9 - Toubabo Koomi
* Storing: Acid Bath - When the Kite String Pops - 13 - The Bones of Baby Dolls
* Storing: Acid Bath - When the Kite String Pops - 6 - Scream of the Butterfly
* Storing: Acid Bath - When the Kite String Pops - 10 - God Machine
* Storing: Acid Bath - When the Kite String Pops - 2 - Tranquilized
* Storing: Acid Bath - When the Kite String Pops - 8 - Dope Fiend
* Storing: Acid Bath - When the Kite String Pops - 3 - Cheap Vodka
* Storing: Acid Bath - When the Kite String Pops - 7 - Dr. Seuss Is Dead
* Storing: Acid Bath - When the Kite String Pops - 12 - What Color Is Death
* Storing: Acid Bath - When the Kite String Pops - 1 - The Blue
* Storing: Acid Bath - Paegan Terrorism Tactics - 6 - Old Skin
* Storing: Acid Bath - Paegan Terrorism Tactics - 10 - New Corpse
* Storing: Acid Bath - Paegan Terrorism Tactics - 5 - Locust Spawning
* Storing: Acid Bath - Paegan Terrorism Tactics - 12 - Ode of the Peagan
* Storing: Acid Bath - Paegan Terrorism Tactics - 7 - New Death Sensation
* Storing: Acid Bath - Paegan Terrorism Tactics - 11 - Dead Girl
* Storing: Acid Bath - Paegan Terrorism Tactics - 4 - Diäb Soulé
* Storing: Acid Bath - Paegan Terrorism Tactics - 3 - Graveflower
* Storing: Acid Bath - Paegan Terrorism Tactics - 9 - 13 Fingers
* Storing: Acid Bath - Paegan Terrorism Tactics - 2 - Bleed Me an Ocean
* Storing: Acid Bath - Paegan Terrorism Tactics - 8 - Venus Blue
* Storing: Acid Bath - Paegan Terrorism Tactics - 1 - Paegan Love Song
* Storing: Acid Bath - Demos: 1993 - 1996 - 7 - Finger Paintings of the Insane
* Storing: Acid Bath - Demos: 1993 - 1996 - 6 - Dope Fiend
* Storing: Acid Bath - Demos: 1993 - 1996 - 8 - Jezebel
* Storing: Acid Bath - Demos: 1993 - 1996 - 10 - Venus Blue
* Storing: Acid Bath - Demos: 1993 - 1996 - 11 - Bleed Me an Ocean
* Storing: Acid Bath - Demos: 1993 - 1996 - 3 - Scream of the Butterfly
* Storing: Acid Bath - Demos: 1993 - 1996 - 9 - The Bones of Baby Dolls
* Storing: Acid Bath - Demos: 1993 - 1996 - 2 - What Color Is Death
* Storing: Acid Bath - Demos: 1993 - 1996 - 1 - Dr. Seuss Is Dead
* Storing: Acid Bath - Demos: 1993 - 1996 - 5 - The Mortician's Flame
* Storing: Acid Bath - Demos: 1993 - 1996 - 12 - Graveflower
* Storing: Acid Bath - Demos: 1993 - 1996 - 4 - God Machine
Running the server is as simple as python manager.py runserver -p 5000 -h 0.0.0.0
. From there navigating to 127.0.0.1:5000/artist/
will output something like this:
{
"artists": [
{
"albums": [
{
"id": 3,
"links": {
"collection": "/album/",
"self": "/album/3/"
},
"name": "Demos: 1993 - 1996",
"tracks": [
{
"id": 35,
"length": "05:39",
"links": {
"collection": "/track/",
"self": "/track/35/"
},
"name": "Dr. Seuss Is Dead",
"position": 1,
"stream": "5ef608d4-aca1-496d-a876-0778c2e6ea6b"
}
]
}
],
"id": 1,
"links": {
"collection": "/artist/",
"self": "/artist/1/"
},
"name": "Acid Bath"
}
]
}
From there, navigating to 127.0.0.1:5000/stream/5ef608d4-aca1-496d-a876-0778c2e6ea6b/
will play the song.
There's many things wrong with the application. Flask is server static content, which is a job better served by a proper web server, such as Nginx. And we're just glossing over the issue of the application blowing up when it hits an object with only partial data. We're also storing objects one at a time instead of committing a batch of objects (say ten or fifteen at a time, or even a whole album or artist discography).
But this also lays the foundations of a much broader application. There's several features I intend on adding and building on to here until there's a full featured API.
But for now, feel free to tinker and change. If you develop something, submit me a pull request.