Welcome to our team project for Data Science. In this exercise we will build up on the foundations created in our last two team projects where we devised an algorithm to group attendees of Python Project Night and built a web based roster app that can be use the algorithm for grouping.
This will give you a gentle introduction to handling data with pandas, using a third party machine learning SaaS api to do image recognition.
"Diversity is the engine of invention." Justin Trudeau, 2016
Diversity in tech communities has been a widely addressed topic. As one of the most active tech community in the world, in this exercise we would try to measure some aspects of diversity in tech community. We will use image recognition on the meetup.com profile pictures of the members of ChiPy user group and determine determine how diverse our attendees are. Then we will compare the same with other tech groups in the city and around the world.
The first proof of concept implementation did not take long, and it concerned me. If it is so easy build tools that can be potentially abused or misinterpreted, we need to think through the implications of the tools we build. So if you are concerned, we are on the same page. There is a question at around the middle of the project to address this.
Note the approach used in this is a crude first step and is not without flaws. Like all software, what you will build is incomplete and needs a lot of refinement (that's why this is open source!) before we can get comprehensive results. So take the initial results of your analysis with copious amount of salt.
For this project, we are going to look at just one facet of diversity - gender diversity of the members.
In [ ]:
!pip3 install meetup-api pandas pytest matplotlib clarifai
This part of the exercise is straight from the previous team project. We use the meetup.com api to load get the ChiPy members who RSVP-ed for one event.
In [1]:
import meetup.api
import pandas as pd
API_KEY = ''
event_id=''
def get_members(event_id):
client = meetup.api.Client(API_KEY)
rsvps=client.GetRsvps(event_id=event_id, urlname='_ChiPy_')
member_id = ','.join([str(i['member']['member_id']) for i in rsvps.results])
return client.GetMembers(member_id=member_id)
Now lets load the data into pandas dataframe.
In [118]:
def load_members_to_data_frame(event_id):
members = get_members(event_id=event_id)
columns=['name','id','thumb_link']
data = []
for member in members.results:
try:
data.append([member['name'], member['id'], member['photo']['thumb_link']])
except:
print('Discard incomplete profile')
return pd.DataFrame(data=data, columns=columns)
df=load_members_to_data_frame(event_id=event_id)
In [ ]:
Next we introduce Clarifai. It is a powerful image recognition as service.
Signing up is very easy.
The API is built around a simple idea. You send inputs (images) to the service and it returns predictions.
The type of prediction is based on what model you run the input through. For example, if you run your input through the 'food' model, the predictions it returns will contain concepts that the 'food' model knows about. If you run your input through the 'color' model, it will return predictions about the dominant colors in your image.
Input Output:
Here is rest of the docs if you need them.
In [6]:
client_id, client_secret = '', '' #your keys here
from clarifai.rest import ClarifaiApp
def analyze_image(url):
app = ClarifaiApp(client_id, client_secret)
model = app.models.get("general-v1.3")
return model.predict_by_url(url=url)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [2]:
def determine_gender(url):
return 'M'
assert determine_gender('iron_man') == 'M'
Test out your implementation of determine_gender with the profile pictures of your team members. Refine your algorithm to make changes based on your results. Some people like to have cats or pandas as their profile pictures. Think of a strategy for handling situations like that.
Before we bring the pieces togther, we need to do a little bit of refining so that we can evaluate our results visually.
We will use IPython's HTML display features by converting thumblink urls to be inserted inside html img tags. Note the function calls above mutate the dataframe itself, so if you execute the cell more than once it will malformat the img tag and the images would not be rendered correctly.
In [ ]:
from IPython.display import Image, display, HTML
pd.set_option('display.max_colwidth', -1)
df['pic']=df.thumb_link.map(lambda x:'<img src="{0}" height=80 width=80 />'.format(x))
HTML(df[['name','pic']].to_html(escape=False))
Now that we have a visual way of evaulating the results, lets apply your determine_gender function to the list of attendees.
In [ ]:
Lets take a look at the results.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Here are some. Feel free to include others you are aware of
Feel free to collaborate with other teams on #team-projects slack channel so that we may cover all the different user groups. Share the meetup.com urls that you have found.
In [ ]:
In [ ]: