University of Texas at San Antonio



**Open Cloud Institute**


Machine Learning/BigData EE-6973-001-Fall-2016


**Paul Rad, Ph.D.**

**Ali Miraftab, Research Fellow **



**Music classification Using CNN + RNN**


Sharaj Panwar,
*University of Texas at San Antonio, San Antonio, Texas, USA*
sharaj18@gmail.com



**Project Definition:** A Convolution Recurrent Neural Network is proposed for music tagging. CRNN combines convolution neural network and recurrent neural networks. It uses CNN for local feature extraction and RNN for temporal summarization of extracted features. CRNNs work well for music tagging. RNNs are more flexible in summarizing the local features than CNNs. CNNs are static by using weighted average (convolution) and sub-sampling. This flexibility can be helpful as some of the tags like moods tags may be affected by the global structure while other tags such as instruments can be affected by local and short-segment information. CRNN uses a 2-layer RNN with gated recurrent units (GRU) to summarize temporal patterns on the top of two dimensional 4-layer CNN. In CNN sub-structure, the sizes of convolution layers and max-pooling layers are 3x3 and (2x2)-(3x3)-(4x4)-(4x4). This sub-sampling results in a feature map size of nx1x15. They are then fed into a 2-layer RNN, of which the last hidden state is connected to the output of the network.

**Outcome:** Applying Convolution Recurrent Neural Network for Music Classification.

**Dataset:** The music data can be found in [http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset]. This directory provides 300 GB dataset, however provides a subset consisting of 10,000 songs (1%, 1.8 gb) selected at random.