This example demonstrates how to build a simple content-based audio retrieval model and evaluate the retrieval accuracy on a small song dataset, CAL500. This dataset consists of 502 western pop songs, performed by 499 unique artists. Each song is tagged by at least three people using a standard survey and a fixed tag vocabulary of 174 musical concepts.
This package includes a loading utility for getting and processing this dataset, which makes loading quite easy.
In [1]:
from cbar.datasets import fetch_cal500
X, Y = fetch_cal500()
Calling fetch_cal500() initally downloads the CAL500 dataset to a subfolder of your home directory. You can specify a different location using the data_home parameter (fetch_cal500(data_home='path')). Subsequents calls simply load the dataset.
The raw dataset consists of about 10,000 39-dimensional features vectors per minute of audio content which were created by
Each song is, then, represented by exactly 10,000 randomly subsampled, real-valued feature vectors as a bag-of-frames. The bag-of-frames features are further processed into one k-dimensional feature vector by encoding the feature vectors using a codebook and pooling them into one compact vector.
Specifically, k-means is used to cluster all frame vectors into k clusters. The resulting cluster centers correspond to the codewords in the codebook. Each frame vector is assigned to its closest cluster center and a song represented as the counts of frames assigned to each of the k cluster centers.
By default, fetch_cal500() uses a codebook size of 512 but this size is easily modified with the codebook_size parameter (fetch_cal500(codebook_size=1024)).
In [2]:
X.shape, Y.shape
Out[2]:
Let's split the data into training data and test data, fit the model on the training data, and evaluate it on the test data. Import and instantiate the model first.
In [3]:
from cbar.loreta import LoretaWARP
model = LoretaWARP(n0=0.1, valid_interval=1000)
Then split the data and fit the model using the training data.
In [4]:
from cbar.cross_validation import train_test_split_plus
(X_train, X_test,
Y_train, Y_test,
Q_vec, weights) = train_test_split_plus(X, Y)
%time model.fit(X_train, Y_train, Q_vec, X_test, Y_test)
Out[4]:
Now, predict the scores for each query with all songs. Ordering the songs from highest to lowest score corresponds to the ranking.
In [5]:
Y_score = model.predict(Q_vec, X_test)
Evaluate the predictions.
In [6]:
from cbar.evaluation import Evaluator
from cbar.utils import make_relevance_matrix
n_relevant = make_relevance_matrix(Q_vec, Y_train).sum(axis=1)
evaluator = Evaluator()
evaluator.eval(Q_vec, weights, Y_score, Y_test, n_relevant)
evaluator.prec_at
Out[6]:
In [7]:
from cbar.cross_validation import cv
In [8]:
cv('cal500', 512, n_folds=3, method='loreta', n0=0.1, valid_interval=1000)
The cross-validation results including retrieval method parameters are written to a JSON file. For each dataset three separate result files for mean average precision (MAP), precision-at-k, and precision-at-10 as a function of relevant training examples are written to disk. Here are the mean average precision values of the last cross-validation run.
In [9]:
import json
import os
from cbar.settings import RESULTS_DIR
results = json.load(open(os.path.join(RESULTS_DIR, 'cal500_ap.json')))
results[results.keys()[-1]]['precision']
Out[9]:
This package comes with a simple CLI which makes it easy to start cross-validation experiments from the command line. The CLI enables you to specify a dataset and a retrieval method as well as additional options in one line.
To start an experiment on the CAL500 dataset with the LORETA retrieval method, use the following command.
$ cbar crossval --dataset cal500 loreta
This simple command uses all the default parameters for LORETA but you can specify all parameters as arguments to the loreta command. To see the available options for the loreta command, ask for help like this.
$ cbar crossval loreta --help
Usage: cbar crossval loreta [OPTIONS]
Options:
-n, --max-iter INTEGER Maximum number of iterations
-i, --valid-interval INTEGER Rank of parameter matrix W
-k INTEGER Rank of parameter matrix W
--n0 FLOAT Step size parameter 1
--n1 FLOAT Step size parameter 2
-t, --rank-thresh FLOAT Threshold for early stopping
-l, --lambda FLOAT Regularization constant
--loss [warp|auc] Loss function
-d, --max-dips INTEGER Maximum number of dips
-v, --verbose Verbosity
--help Show this message and exit.