Chair Selection for a Conference

Some very simple code to help automate the chair selection for the AAS meeting.

This assumes that the information about all potential chairs is compiled in a single spread sheet which has columns describing the name, but more importantly the area of expertise and the session date and time where they give a talk as in the example in the data folder.

Requirements

  • python 2.7 or 3.5
  • numpy
  • pandas

In [1]:
import numpy as np
import pandas as pd

Let's make some example data. That also requires the package names, so if you don't have it, or don't want to install it, then ignore the following section and go to Selecting a Chair for a Session below.

We're going to generate 100 names and then define some areas of expertise as well as some sessions for each of them.


In [2]:
import names

In [3]:
last_names = np.array([names.get_last_name() for i in xrange(100)])
first_names = np.array([names.get_first_name() for i in xrange(100)])

In [4]:
## Set up areas of expertise for all of them
expertises = ["Stars", "Galaxies", "Black Holes", "Neutron Stars", 
             "Planets", "Cosmology", "Gas and Dust", "Aliens", "Education"]

area_of_expertise_1 = np.random.choice(expertises, size=last_names.shape[0])
area_of_expertise_2 = np.random.choice(expertises, size=last_names.shape[0])
area_of_expertise_3 = np.random.choice(expertises, size=last_names.shape[0])

In [5]:
## set up session dates and times
dates = ["10/04/2015", "10/05/2015", "10/06/2015"]
session_times = ["9:00:00 AM,10:30:00 PM",
                 "11:00:00 AM,12:30:00 PM",
                 "2:00:00 PM,3:30:00 PM"]

session_date = np.random.choice(dates, size=last_names.shape[0])
session_times = np.random.choice(session_times, size=last_names.shape[0])

In [6]:
session_start = [s.split(",")[0] for s in session_times]
session_end =  [s.split(",")[1] for s in session_times]

Now we can set up a DataFrame:


In [7]:
df_dict = {"last_name":last_names,
          "first_name":first_names,
          "area_of_expertise_1":area_of_expertise_1,
          "area_of_expertise_2":area_of_expertise_2,
          "area_of_expertise_3":area_of_expertise_3,
          "session_date":session_date,
          "session_start":session_start,
          "session_end": session_end
          }

df = pd.DataFrame(df_dict)

In [8]:
df.head()


Out[8]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
0 Black Holes Stars Aliens Ramona Walshe 10/06/2015 12:30:00 PM 11:00:00 AM
1 Planets Aliens Cosmology Toshiko Ungar 10/05/2015 12:30:00 PM 11:00:00 AM
2 Stars Aliens Aliens James Soria 10/05/2015 10:30:00 PM 9:00:00 AM
3 Stars Cosmology Planets Tomas Wesson 10/06/2015 10:30:00 PM 9:00:00 AM
4 Neutron Stars Gas and Dust Cosmology Jeremy Laday 10/06/2015 12:30:00 PM 11:00:00 AM

Let's dump this into a .csv file for future use:


In [11]:
df.to_csv("../data/example_chairs.csv",index=False,index_label="Index")

Selecting a Chair for a Session

Imagine you have a session on black holes and neutron stars, which will take place on 10/05/2015 from 2:00:00 PM to 3:30:00 PM.

You'd like to pick a person from your sample of possible chairs who has the right expertise, but also does not give a talk in that session.

Let's first load some sample data. As shown above, that data was randomly generated, so all names should be fictitious. All similarities to living or dead persons is entirely coincidental.


In [12]:
df = pd.read_csv("../data/example_chairs.csv")

What does this table look like?


In [13]:
df.head()


Out[13]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
0 Black Holes Stars Aliens Ramona Walshe 10/06/2015 12:30:00 PM 11:00:00 AM
1 Planets Aliens Cosmology Toshiko Ungar 10/05/2015 12:30:00 PM 11:00:00 AM
2 Stars Aliens Aliens James Soria 10/05/2015 10:30:00 PM 9:00:00 AM
3 Stars Cosmology Planets Tomas Wesson 10/06/2015 10:30:00 PM 9:00:00 AM
4 Neutron Stars Gas and Dust Cosmology Jeremy Laday 10/06/2015 12:30:00 PM 11:00:00 AM

Your table might have more columns, but it needs to have at least the ones above. You can easily export it from, for example, Excel into a .csv file, which Pandas can read.

Two important things to note:

  • Your table might have NaN values (e.g. if a person does not give a talk at all). That's fine.
  • Apple's Numbers and Microsoft's Excel count from 1, and also count the header row. Pandas DataFrames don't, which means that the number in the index column of the DataFrame may be two lower than the column in whatever form you might be using for any given column. We will fix this by adding 2 to the index column, but if your software does something different, you might want to change that!

In [14]:
df.index += 2

In [15]:
df.head()


Out[15]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
2 Black Holes Stars Aliens Ramona Walshe 10/06/2015 12:30:00 PM 11:00:00 AM
3 Planets Aliens Cosmology Toshiko Ungar 10/05/2015 12:30:00 PM 11:00:00 AM
4 Stars Aliens Aliens James Soria 10/05/2015 10:30:00 PM 9:00:00 AM
5 Stars Cosmology Planets Tomas Wesson 10/06/2015 10:30:00 PM 9:00:00 AM
6 Neutron Stars Gas and Dust Cosmology Jeremy Laday 10/06/2015 12:30:00 PM 11:00:00 AM

Okay, so now we have the data!

Let's get out some column to make our queries look easier:


In [16]:
area1 = df["area_of_expertise_1"]
area2 = df["area_of_expertise_2"]
area3 = df["area_of_expertise_3"]

session_date = df["session_date"]
session_start = df["session_start"]
session_end = df["session_end"]

First task: let's pick a person whose first expertise is black holes


In [17]:
## Extrasolar Planets, Theory 1
black_holes = df[((area1 == "Black Holes") & (
                 ((session_date == "10/05/2015") & 
                            (session_start != "2:00:00 PM") & 
                            (session_end != "3:30:00 PM")) |
                      (session_date != "10/05/2015")))]

The query above basically says the following:

"Pick all participants whose value for "area1" is "Black Holes", and who either do not give a talk on the date "10/05/2015" at all or who give a talk on "10/05/2015", but not between "2:00:00 PM" and "3:30:00 PM".

Note: Make sure that the strings you use in your query match exactly the entries in your table!

Here's the resulting list:


In [18]:
black_holes


Out[18]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
2 Black Holes Stars Aliens Ramona Walshe 10/06/2015 12:30:00 PM 11:00:00 AM
11 Black Holes Galaxies Planets David Bertholf 10/06/2015 12:30:00 PM 11:00:00 AM
15 Black Holes Gas and Dust Black Holes Charles Boles 10/04/2015 3:30:00 PM 2:00:00 PM
18 Black Holes Aliens Cosmology Ethel Villalobos 10/04/2015 12:30:00 PM 11:00:00 AM
20 Black Holes Gas and Dust Aliens Adela Guevara 10/06/2015 12:30:00 PM 11:00:00 AM
28 Black Holes Gas and Dust Education Paul Robles 10/04/2015 10:30:00 PM 9:00:00 AM
29 Black Holes Neutron Stars Gas and Dust Kevin Violette 10/06/2015 10:30:00 PM 9:00:00 AM
35 Black Holes Aliens Stars Jessica Bucknell 10/04/2015 12:30:00 PM 11:00:00 AM
40 Black Holes Cosmology Planets Jeanne Gomez 10/04/2015 10:30:00 PM 9:00:00 AM
41 Black Holes Cosmology Black Holes Valerie Reeves 10/06/2015 3:30:00 PM 2:00:00 PM
45 Black Holes Galaxies Stars Tomas Williams 10/05/2015 12:30:00 PM 11:00:00 AM
62 Black Holes Stars Education Lori Poole 10/04/2015 12:30:00 PM 11:00:00 AM
63 Black Holes Planets Gas and Dust Corey Gaines 10/06/2015 12:30:00 PM 11:00:00 AM
94 Black Holes Aliens Cosmology Elizabeth Kin 10/06/2015 12:30:00 PM 11:00:00 AM

If you want a random sample from that list, here is your solution:


In [19]:
black_holes.loc[np.random.choice(np.array(black_holes.index))]


Out[19]:
area_of_expertise_1    Black Holes
area_of_expertise_2      Cosmology
area_of_expertise_3    Black Holes
first_name                 Valerie
last_name                   Reeves
session_date            10/06/2015
session_end             3:30:00 PM
session_start           2:00:00 PM
Name: 41, dtype: object

We can make our query more complex. For example, because our session is about both neutron stars and black holes, we might be happy with anyone whose expertise is either in neutron stars or black holes:


In [20]:
## Extrasolar Planets, Theory 1
black_holes = df[(((area1 == "Black Holes") | (area1 == "Neutron Stars")) & (
                 ((session_date == "10/05/2015") & 
                            (session_start != "2:00:00 PM") & 
                            (session_end != "3:30:00 PM")) |
                      (session_date != "10/05/2015")))]

In [21]:
black_holes


Out[21]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
2 Black Holes Stars Aliens Ramona Walshe 10/06/2015 12:30:00 PM 11:00:00 AM
6 Neutron Stars Gas and Dust Cosmology Jeremy Laday 10/06/2015 12:30:00 PM 11:00:00 AM
7 Neutron Stars Education Planets Mark Skimehorn 10/05/2015 10:30:00 PM 9:00:00 AM
11 Black Holes Galaxies Planets David Bertholf 10/06/2015 12:30:00 PM 11:00:00 AM
12 Neutron Stars Planets Planets Ann Boyd 10/06/2015 3:30:00 PM 2:00:00 PM
15 Black Holes Gas and Dust Black Holes Charles Boles 10/04/2015 3:30:00 PM 2:00:00 PM
17 Neutron Stars Cosmology Aliens David Johnston 10/05/2015 12:30:00 PM 11:00:00 AM
18 Black Holes Aliens Cosmology Ethel Villalobos 10/04/2015 12:30:00 PM 11:00:00 AM
20 Black Holes Gas and Dust Aliens Adela Guevara 10/06/2015 12:30:00 PM 11:00:00 AM
25 Neutron Stars Cosmology Cosmology Cassandra Hubertus 10/04/2015 12:30:00 PM 11:00:00 AM
28 Black Holes Gas and Dust Education Paul Robles 10/04/2015 10:30:00 PM 9:00:00 AM
29 Black Holes Neutron Stars Gas and Dust Kevin Violette 10/06/2015 10:30:00 PM 9:00:00 AM
35 Black Holes Aliens Stars Jessica Bucknell 10/04/2015 12:30:00 PM 11:00:00 AM
38 Neutron Stars Education Cosmology Carman Tucker 10/05/2015 10:30:00 PM 9:00:00 AM
40 Black Holes Cosmology Planets Jeanne Gomez 10/04/2015 10:30:00 PM 9:00:00 AM
41 Black Holes Cosmology Black Holes Valerie Reeves 10/06/2015 3:30:00 PM 2:00:00 PM
45 Black Holes Galaxies Stars Tomas Williams 10/05/2015 12:30:00 PM 11:00:00 AM
46 Neutron Stars Planets Galaxies Daniel Kennison 10/06/2015 10:30:00 PM 9:00:00 AM
62 Black Holes Stars Education Lori Poole 10/04/2015 12:30:00 PM 11:00:00 AM
63 Black Holes Planets Gas and Dust Corey Gaines 10/06/2015 12:30:00 PM 11:00:00 AM
67 Neutron Stars Neutron Stars Gas and Dust Joseph Cate 10/06/2015 12:30:00 PM 11:00:00 AM
70 Neutron Stars Neutron Stars Black Holes Patrick Scurry 10/06/2015 3:30:00 PM 2:00:00 PM
90 Neutron Stars Galaxies Cosmology George Rodriquez 10/04/2015 10:30:00 PM 9:00:00 AM
94 Black Holes Aliens Cosmology Elizabeth Kin 10/06/2015 12:30:00 PM 11:00:00 AM

That's a much larger list! We might also want to consider people who have expertise in both! that means that their first area of expertise need to be either black holes or neutron stars and their second area of expertise also needs to be either of the two:


In [22]:
## Extrasolar Planets, Theory 1
black_holes = df[(((area1 == "Black Holes") | (area1 == "Neutron Stars")) &
                  ((area2 == "Black Holes") | (area2 == "Neutron Stars")) &
                  (
                 ((session_date == "10/05/2015") & 
                            (session_start != "2:00:00 PM") & 
                            (session_end != "3:30:00 PM")) |
                      (session_date != "10/05/2015")))]

In [23]:
black_holes


Out[23]:
area_of_expertise_1 area_of_expertise_2 area_of_expertise_3 first_name last_name session_date session_end session_start
29 Black Holes Neutron Stars Gas and Dust Kevin Violette 10/06/2015 10:30:00 PM 9:00:00 AM
67 Neutron Stars Neutron Stars Gas and Dust Joseph Cate 10/06/2015 12:30:00 PM 11:00:00 AM
70 Neutron Stars Neutron Stars Black Holes Patrick Scurry 10/06/2015 3:30:00 PM 2:00:00 PM

And that's about as difficult as our queries get!

I've put this entire workflow into a little script (chair_selection.py) that should be easy to use. Make sure your csv file with the candidates are in the same folder as the script.

Some sample queries:

See the help message for the script: $> python chair_selection.py --help

All candidates whose primary expertise is "Black Holes" and who don't give a talk on Oct 5 between 2 and 3 pm:

$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all"

One random candidate whose primary expertise is "Black Holes" and who don't give a talk on Oct 5 between 2 and 3 pm:

$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "random"

All candidates whose primary expertise is either "Black Holes" or "Neutron Stars" and who don't give a talk on Oct 5 2015 between 2 and 3 pm:

$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" "Neutron Stars" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all"

All candidates whose first expertise is either "Black Holes" or "Neutron Stars" and whose second expertise is also "Black Holes" and "Neutron Stars":

$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" "Neutron Stars" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all" --area2 "Black Holes" "Neutron Stars"

Good luck with selecting your chairs! :)