Predicting Proposal Selection @ PyCon India

by @jaidevd


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
df = pd.read_table("tagged.tsv")
print df.shape
plt.figure(figsize=(10, 8))
df['selected'].value_counts().plot(kind="pie")


(289, 14)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x11539b550>

In [13]:
#plt.figure(figsize=(10, 8))
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
df[df.year == 2015]['selected'].value_counts().plot(kind="pie", ax=ax[0])
ax[0].set_title('PyCon 2015')
df[df.year == 2016]['selected'].value_counts().plot(kind="pie", ax=ax[1])
ax[1].set_title('PyCon 2016')


Out[13]:
<matplotlib.text.Text at 0x117a0d490>

Features used for Learning

  • Difference between deadline and last updated date

  • Workshop or Talk?

  • Section: Web, Scientific, Data Analysis, Infrastructure, Embedded, etc.

  • Number of upvotes

  • Number of _public_ comments

  • Content link present?

  • Target audience: beginner, intermediate, advanced

  • Speaker link present?

ROC

The Decision Tree

Observations

Do not be late.

Be active around the deadline

Infrastructure talks are dangerous

Early birds are likely to be rejected

If popular talks are selected, they better be about data visualization and analytics.

If no content is uploaded and if the talk belongs to the "others" category - likely to be rejected.

If it's a scientific computing talk - almost always selected! (even if it's not popular)