Data Analysis using Pandas

Pandas has become the defacto package for data analysis. In this workshop, we are going to use the basics of pandas to analyze the interests of today's group. We are going to use meetup.com's api and fetch the list of interests that are listed in each of our meetup.com profile. We will compute which interests are common, which are uncommon, and find out how we can use topics of common interest to form teams for project night.

Lets get started by importing the essentials. You would need meetup.com's python api and pandas installed.



In [1]:

    
!pip install meetup-api
import meetup.api
import pandas as pd
from IPython.display import Image, display, HTML
from itertools import combinations
import sys

Next one of you need get a meetup.com API key. You will find it https://secure.meetup.com/meetup_api/key/ Also you'll need tonight's event id. Tonight's event id is the nine digit number in the meetup url.



In [73]:

    
API_KEY = '3f6d3275d3b6314e73453c4aa27'
event_id='239174132'

The following function uses the api and loads the data into a pandas data frame.



In [74]:

    
def get_members(event_id):
    client = meetup.api.Client(API_KEY)
    rsvps=client.GetRsvps(event_id=event_id, urlname='_ChiPy_')
    member_id = ','.join([str(i['member']['member_id']) for i in rsvps.results])
    return client.GetMembers(member_id=member_id)

def get_topics(members):
    topics = set()
    for member in members.results:
        try:
            for t in member['topics']:
                topics.add(t['name'])
        except:
            print("Unexpected error:", sys.exc_info()[0])
            raise

    return list(topics)

def df_topics(event_id):
    members = get_members(event_id=event_id)
    topics = get_topics(members)
    columns=['name','id','thumb_link'] + topics
    
    data = [] 
    for member in members.results:
        topic_vector = [0]*len(topics)
        for topic in member['topics']:
            index = topics.index(topic['name'])       
            topic_vector[index] = 1
        try:
            data.append([member['name'], member['id'], member['photo']['thumb_link']] + topic_vector)
        except KeyError:
            data.append([member['name'], member['id'], 'NA'] + topic_vector)
        except:
            print("Unexpected error:", sys.exc_info()[0])
            raise
    return pd.DataFrame(data=data, columns=columns)

Q1: Load data from meetup.com into a dataframe by calling df_topics.

You'll need to call the df_topics function with the event_id and assign it to a variable to use it for the following questions.



In [ ]:

Q2: What are the column names of the dataframe?



In [ ]:

Q3: How do you check the index of the dataframe? Can you set the index of the data frame to be the names column?



In [ ]:

Q4: How would you get the transpose of the dataframe?



In [ ]:

Q5: What does the first and last 10 rows of the dataset look like?



In [ ]:

Q6: Write the data out to a csv file. Only include names and topics. Do not include member id and thumblink.



In [ ]:

Q7: How many unique topics of interest do we have?



In [ ]:

Q8: Write a function that takes a name and gives back all of his/her interest.



In [ ]:

Q9: Write a function that takes a topic of interest and gives back names who are interested.



In [ ]:

Q10: Who has the highest number of topics? How many topics is he/she interested in?



In [ ]:

Q11: Which is the most common topic of intertest? Which is the least popular topic of interest?



In [ ]:

Q12: Which names are associated with the topics of interest found in the previous question?



In [ ]:

Q13: Draw a plot that shows the frequency of each topic.



In [ ]:

Q14: Are there topic(s) common to all the members of your team?



In [ ]:

Q15: Write a function that will take the names of your team members and rank every pair by the number of topics common among them. So if the team has A, B, C and D, an example could be

A, B - 6
A, C - 5
A, D - 4
B, C - 3
B, D - 2
C, D - 1

Note the pairs are sorted in the number of topics common among them.



In [ ]: