Predicting Proposal Selection @ PyCon India

by @jaidevd



In [1]:

    
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline



In [6]:

    
df = pd.read_table("tagged.tsv")
print df.shape
plt.figure(figsize=(10, 8))
df['selected'].value_counts().plot(kind="pie")









    



(289, 14)






    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x11539b550>



In [13]:

    
#plt.figure(figsize=(10, 8))
fig, ax = plt.subplots(1, 2, figsize=(16, 8))
df[df.year == 2015]['selected'].value_counts().plot(kind="pie", ax=ax[0])
ax[0].set_title('PyCon 2015')
df[df.year == 2016]['selected'].value_counts().plot(kind="pie", ax=ax[1])
ax[1].set_title('PyCon 2016')









    Out[13]:





<matplotlib.text.Text at 0x117a0d490>

Features used for Learning

Difference between deadline and last updated date
Workshop or Talk?
Section: Web, Scientific, Data Analysis, Infrastructure, Embedded, etc.
Number of upvotes
Number of _public_ comments
Content link present?
Target audience: beginner, intermediate, advanced
Speaker link present?

ROC

The Decision Tree

Observations

Do not be late.

Be active around the deadline

Infrastructure talks are dangerous

Even popular ones are not likely to be selected

Early birds are likely to be rejected

Even if the proposal is popular and all the content is ready

If popular talks are selected, they better be about data visualization and analytics.

If no content is uploaded and if the talk belongs to the "others" category - likely to be rejected.

Predicting Proposal Selection @ PyCon India

by @jaidevd

Features used for Learning

Difference between deadline and last updated date

Workshop or Talk?

Section: Web, Scientific, Data Analysis, Infrastructure, Embedded, etc.

Number of upvotes

Number of _public_ comments

Content link present?

Target audience: beginner, intermediate, advanced

Speaker link present?

ROC

The Decision Tree

Observations

Do not be late.

Be active around the deadline

Infrastructure talks are dangerous

Even popular ones are not likely to be selected

Early birds are likely to be rejected

Even if the proposal is popular and all the content is ready

If popular talks are selected, they better be about data visualization and analytics.

If no content is uploaded and if the talk belongs to the "others" category - likely to be rejected.

If it's a scientific computing talk - almost always selected! (even if it's not popular)