Exploring and Visualizing the Rolling Stone's 500 Greatest Albums of All Time

Data Source: https://www.kaggle.com/notgibs/500-greatest-albums-of-all-time-rolling-stone

Final Article: https://medium.com/@itzzthad/visualising-rolling-stones-500-greatest-songs-using-bokeh-78ebc0eaff3f#.hehircf02

Importing the necessary modules

I'll be using pandas to clean and structure the data and bokeh for visualization.

pandas: http://pandas.pydata.org/pandas-docs/stable/

bokeh: http://bokeh.pydata.org/en/latest/



In [1]:

    
import pandas as pd
from bokeh.charts import Donut, HeatMap, Histogram, Line, Scatter, show, output_notebook, output_file
from bokeh.plotting import figure



In [2]:

    
output_notebook()









    





    
        
        Loading BokehJS ...

Getting the data and structuring it

Get the album list from the xls file into a pandas dataframe



In [3]:

    
album_list = pd.read_excel('albumlist.xls')



In [4]:

    
album_list.dtypes









    Out[4]:





Number       int64
Year         int64
Album       object
Artist      object
Genre       object
Subgenre    object
dtype: object



In [5]:

    
album_list.head()









    Out[5]:






  
    
      
      Number
      Year
      Album
      Artist
      Genre
      Subgenre
    
  
  
    
      0
      1
      1967
      Sgt. Pepper's Lonely Hearts Club Band
      The Beatles
      Rock
      Rock & Roll, Psychedelic Rock
    
    
      1
      2
      1966
      Pet Sounds
      The Beach Boys
      Rock
      Pop Rock, Psychedelic Rock
    
    
      2
      3
      1966
      Revolver
      The Beatles
      Rock
      Psychedelic Rock, Pop Rock
    
    
      3
      4
      1965
      Highway 61 Revisited
      Bob Dylan
      Rock
      Folk Rock, Blues Rock
    
    
      4
      5
      1965
      Rubber Soul
      The Beatles
      Rock, Pop
      Pop Rock

The Genre and Subgenre categories have multiple comma separated values. I'm going to keep just the first value and drop the others for the category.



In [13]:

    
#efficent method to do the same thing commented below
for lab, row in album_list.iterrows():
    album_list.loc[lab, "Genre_Refined"] = row["Genre"].split(',')[0]
    album_list.loc[lab, "Subgenre_Refined"] = row["Subgenre"].split(',')[0]



In [14]:

    
#add Genre_Refined column by selecting just the first value
#album_list['Genre_Refined'] = album_list['Genre']
#for i in range(len(album_list)):
    #album_list['Genre_Refined'][i] = album_list['Genre'][i].split(',')[0]



In [15]:

    
#add Subgenre_Refined column by selecting just the first value
#album_list['Subgenre_Refined'] = album_list['Subgenre']
#for i in range(len(album_list)):
    #album_list['Subgenre_Refined'][i] = album_list['Subgenre'][i].split(',')[0]



In [16]:

    
album_list.head()









    Out[16]:






  
    
      
      Number
      Year
      Album
      Artist
      Genre
      Subgenre
      Genre_Refined
      Subgenre_Refined
    
  
  
    
      0
      1
      1967
      Sgt. Pepper's Lonely Hearts Club Band
      The Beatles
      Rock
      Rock & Roll, Psychedelic Rock
      Rock
      Rock & Roll
    
    
      1
      2
      1966
      Pet Sounds
      The Beach Boys
      Rock
      Pop Rock, Psychedelic Rock
      Rock
      Pop Rock
    
    
      2
      3
      1966
      Revolver
      The Beatles
      Rock
      Psychedelic Rock, Pop Rock
      Rock
      Psychedelic Rock
    
    
      3
      4
      1965
      Highway 61 Revisited
      Bob Dylan
      Rock
      Folk Rock, Blues Rock
      Rock
      Folk Rock
    
    
      4
      5
      1965
      Rubber Soul
      The Beatles
      Rock, Pop
      Pop Rock
      Rock
      Pop Rock



In [17]:

    
album_list.dtypes









    Out[17]:





Number               int64
Year                 int64
Album               object
Artist              object
Genre               object
Subgenre            object
Genre_Refined       object
Subgenre_Refined    object
dtype: object

Top 10 artists having the most number of albums in the list



In [18]:

    
#get a count of how many times each artist made it into the list
artists_count = album_list.groupby(['Artist'], as_index=False).count()



In [19]:

    
#find the top 10 artists
top_artists = artists_count.sort_values(by='Number', ascending=False).head(10)
top_artists = top_artists.reset_index().drop(['index', 'Album','Year','Genre','Subgenre', 'Genre_Refined', 'Subgenre_Refined'], axis=1)
top_artists.head(10)









    Out[19]:






  
    
      
      Artist
      Number
    
  
  
    
      0
      The Beatles
      10
    
    
      1
      Bob Dylan
      10
    
    
      2
      The Rolling Stones
      10
    
    
      3
      Bruce Springsteen
      8
    
    
      4
      The Who
      7
    
    
      5
      Radiohead
      5
    
    
      6
      U2
      5
    
    
      7
      Led Zeppelin
      5
    
    
      8
      Elton John
      5
    
    
      9
      David Bowie
      5



In [20]:

    
#get the artists and corresponding counts into two lists for plotting
top_artists_list = top_artists.Artist.values.tolist()
top_artists_count = top_artists.Number.astype(float).values.tolist()



In [21]:

    
#visualize the data using bokeh
#output_file("top_artists.html", title="top artists")
p = figure(x_range=top_artists_list, plot_height = 500, plot_width = 500)

#set x-axis properties
p.xgrid.visible = False
p.xaxis.major_label_orientation = 3.14/4
p.xaxis.axis_label = 'Artist Name'

#set y-axis properties
p.ygrid.visible = False
p.yaxis.axis_label = 'Album Count'

#draw circles
p.circle(y=top_artists_count, x=top_artists_list, size=15, fill_color="black")
show(p)

Year wise count of number of albums



In [22]:

    
#get count of albums in each year
yearwise_albums = album_list.groupby(['Year'], as_index=False).count()
yearwise_albums = yearwise_albums.sort_values(by='Year').reset_index().drop(['index', 'Album', 'Artist','Genre','Subgenre','Subgenre_Refined','Genre_Refined'], axis=1)
yearwise_albums.head()



In [23]:

    
#visulaizing the data using bokeh line graphs
#output_file("yearwise_albums.html", title="yearwise_albums")
line = Line(data=yearwise_albums, x='Year', y='Number')
line.yaxis.axis_label = 'Number of Albums'
show(line)

Top genres and subgenres



In [24]:

    
#pivot the data and get a subset of the pivoted data where each subgenre has a count of more than 5
pivoted = pd.pivot_table(album_list, index=['Genre_Refined', 'Subgenre_Refined'], values=['Number'], aggfunc='count')
pivoted_subset = pivoted[pivoted['Number'] > 5]
pivoted_subset = pivoted_subset.reset_index()
pivoted_subset









    Out[24]:






  
    
      
      Genre_Refined
      Subgenre_Refined
      Number
    
  
  
    
      0
      Electronic
      New Wave
      7
    
    
      1
      Electronic
      Synth-pop
      6
    
    
      2
      Folk
      Country
      9
    
    
      3
      Funk / Soul
      Rhythm & Blues
      11
    
    
      4
      Funk / Soul
      Soul
      19
    
    
      5
      Hip Hop
      None
      16
    
    
      6
      Rock
      Alternative Rock
      33
    
    
      7
      Rock
      Art Rock
      6
    
    
      8
      Rock
      Blues Rock
      37
    
    
      9
      Rock
      Classic Rock
      18
    
    
      10
      Rock
      Folk Rock
      37
    
    
      11
      Rock
      Garage Rock
      8
    
    
      12
      Rock
      Hard Rock
      17
    
    
      13
      Rock
      Indie Rock
      13
    
    
      14
      Rock
      New Wave
      12
    
    
      15
      Rock
      None
      8
    
    
      16
      Rock
      Pop Rock
      37
    
    
      17
      Rock
      Psychedelic Rock
      14
    
    
      18
      Rock
      Punk
      6
    
    
      19
      Rock
      Rock & Roll
      14
    
    
      20
      Rock
      Soft Rock
      6



In [25]:

    
#visualizing the data using the bokeh donut chart
#output_file("donut.html", title="donut")
from bokeh.palettes import Purples9 as palette1
palette1 = palette1[::-1]
d = Donut(pivoted_subset, label=['Genre_Refined', 'Subgenre_Refined'], values='Number', 
          text_font_size='15pt', plot_height=1000, plot_width=1000, palette=palette1)
show(d)

Songs in each genre by year



In [26]:

    
#getting yearwise data for each genre
yearwise_data = album_list.groupby(['Year', 'Genre_Refined'], as_index=False).count()
yearwise_data = yearwise_data.sort_values(by='Year').reset_index().drop(['index', 'Album', 'Artist','Genre','Subgenre','Subgenre_Refined'], axis=1)
yearwise_data.head(25)









    Out[26]:






  
    
      
      Year
      Genre_Refined
      Number
    
  
  
    
      0
      1955
      Jazz
      1
    
    
      1
      1956
      Jazz
      1
    
    
      2
      1956
      Rock
      1
    
    
      3
      1957
      Rock
      2
    
    
      4
      1958
      Blues
      1
    
    
      5
      1959
      Jazz
      4
    
    
      6
      1960
      Funk / Soul
      1
    
    
      7
      1960
      Jazz
      1
    
    
      8
      1960
      Rock
      1
    
    
      9
      1961
      Funk / Soul
      1
    
    
      10
      1962
      Blues
      1
    
    
      11
      1962
      Funk / Soul
      1
    
    
      12
      1963
      Folk
      1
    
    
      13
      1963
      Funk / Soul
      1
    
    
      14
      1963
      Rock
      3
    
    
      15
      1964
      Blues
      1
    
    
      16
      1964
      Jazz
      1
    
    
      17
      1964
      Rock
      2
    
    
      18
      1965
      Rock
      11
    
    
      19
      1965
      Jazz
      1
    
    
      20
      1965
      Blues
      1
    
    
      21
      1965
      Funk / Soul
      1
    
    
      22
      1966
      Electronic
      1
    
    
      23
      1966
      Funk / Soul
      2
    
    
      24
      1966
      Rock
      10



In [27]:

    
#visualizing the data using a bokeh heatmap
#output_file("yearwise_genre.html", title="yearwise_subgenre")
from bokeh.palettes import Reds9 as palette2
palette2 = palette2[::-1]
hm_year = HeatMap(yearwise_data, x='Year', y='Genre_Refined', values='Number', stat=None,
               width=750, plot_height=500, palette=palette2)
#y-axis properties
hm_year.yaxis.axis_label = 'Genre'
hm_year.yaxis.major_label_orientation = 'horizontal'
show(hm_year)

Rock subgenres over the years



In [28]:

    
#count subgenres yearwise and subset it for rock music
yearwise_subgenres = album_list.groupby(['Year', 'Genre_Refined', 'Subgenre_Refined'], as_index=False).count()
rock_subgenres_yearwise = yearwise_subgenres[yearwise_subgenres['Genre_Refined'] == 'Rock'].reset_index().drop(['index', 'Album', 'Artist','Genre','Subgenre'], axis=1)
rock_subgenres_yearwise.head()









    Out[28]:






  
    
      
      Year
      Genre_Refined
      Subgenre_Refined
      Number
    
  
  
    
      0
      1956
      Rock
      Rock & Roll
      1
    
    
      1
      1957
      Rock
      Rock & Roll
      1
    
    
      2
      1957
      Rock
      Rockabilly
      1
    
    
      3
      1960
      Rock
      Blues Rock
      1
    
    
      4
      1963
      Rock
      Beat
      1



In [29]:

    
#visualizing the data using bokeh scatterplot
#output_file("rock_subgenres_yearwise.html", title="rock_subgenres_yearwise")
hm_rock_subgenres = Scatter(rock_subgenres_yearwise, x='Year', y='Subgenre_Refined', width=800, plot_height=800)
#x-axis properties
hm_rock_subgenres.xgrid.visible = False
#y-axis properties
hm_rock_subgenres.yaxis.major_label_orientation = 'horizontal'
hm_rock_subgenres.yaxis.axis_label = 'Subgenres of Rock'
hm_rock_subgenres.ygrid.visible = False
show(hm_rock_subgenres)

Getting a summary of the top 10 Albums



In [30]:

    
#top 10 albums
top_albums = album_list.head(10)
#Get artists and albums into a new data frame
top_albums_a = top_albums['Artist']
top_albums_b = top_albums['Album']
top_albums_final = pd.concat([top_albums_a, top_albums_b], axis=1)
#groupby and summarize
top_albums_chart = top_albums_final.groupby(['Artist', 'Album']).count()
top_albums_chart









    Out[30]:






  
    
      
      
    
    
      Artist
      Album
    
  
  
    
      Bob Dylan
      Blonde on Blonde
    
    
      Highway 61 Revisited
    
    
      Marvin Gaye
      What's Going On
    
    
      The Beach Boys
      Pet Sounds
    
    
      The Beatles
      Revolver
    
    
      Rubber Soul
    
    
      Sgt. Pepper's Lonely Hearts Club Band
    
    
      The Beatles ("The White Album")
    
    
      The Clash
      London Calling
    
    
      The Rolling Stones
      Exile on Main St.



In [ ]:

	Number	Year	Album	Artist	Genre	Subgenre
0	1	1967	Sgt. Pepper's Lonely Hearts Club Band	The Beatles	Rock	Rock & Roll, Psychedelic Rock
1	2	1966	Pet Sounds	The Beach Boys	Rock	Pop Rock, Psychedelic Rock
2	3	1966	Revolver	The Beatles	Rock	Psychedelic Rock, Pop Rock
3	4	1965	Highway 61 Revisited	Bob Dylan	Rock	Folk Rock, Blues Rock
4	5	1965	Rubber Soul	The Beatles	Rock, Pop	Pop Rock

	Artist	Number
0	The Beatles	10
1	Bob Dylan	10
2	The Rolling Stones	10
3	Bruce Springsteen	8
4	The Who	7
5	Radiohead	5
6	U2	5
7	Led Zeppelin	5
8	Elton John	5
9	David Bowie	5

	Genre_Refined	Subgenre_Refined	Number
0	Electronic	New Wave	7
1	Electronic	Synth-pop	6
2	Folk	Country	9
3	Funk / Soul	Rhythm & Blues	11
4	Funk / Soul	Soul	19
5	Hip Hop	None	16
6	Rock	Alternative Rock	33
7	Rock	Art Rock	6
8	Rock	Blues Rock	37
9	Rock	Classic Rock	18
10	Rock	Folk Rock	37
11	Rock	Garage Rock	8
12	Rock	Hard Rock	17
13	Rock	Indie Rock	13
14	Rock	New Wave	12
15	Rock	None	8
16	Rock	Pop Rock	37
17	Rock	Psychedelic Rock	14
18	Rock	Punk	6
19	Rock	Rock & Roll	14
20	Rock	Soft Rock	6

	Year	Genre_Refined	Number
0	1955	Jazz	1
1	1956	Jazz	1
2	1956	Rock	1
3	1957	Rock	2
4	1958	Blues	1
5	1959	Jazz	4
6	1960	Funk / Soul	1
7	1960	Jazz	1
8	1960	Rock	1
9	1961	Funk / Soul	1
10	1962	Blues	1
11	1962	Funk / Soul	1
12	1963	Folk	1
13	1963	Funk / Soul	1
14	1963	Rock	3
15	1964	Blues	1
16	1964	Jazz	1
17	1964	Rock	2
18	1965	Rock	11
19	1965	Jazz	1
20	1965	Blues	1
21	1965	Funk / Soul	1
22	1966	Electronic	1
23	1966	Funk / Soul	2
24	1966	Rock	10


Artist	Album
Bob Dylan	Blonde on Blonde
Bob Dylan	Highway 61 Revisited
Marvin Gaye	What's Going On
The Beach Boys	Pet Sounds
The Beatles	Revolver
	Rubber Soul
	Sgt. Pepper's Lonely Hearts Club Band
	The Beatles ("The White Album")
The Clash	London Calling
The Rolling Stones	Exile on Main St.