Some very simple code to help automate the chair selection for the AAS meeting.
This assumes that the information about all potential chairs is compiled in a single spread sheet which has columns describing the name, but more importantly the area of expertise and the session date and time where they give a talk as in the example in the data
folder.
In [1]:
import numpy as np
import pandas as pd
Let's make some example data. That also requires the package names
, so if you don't have it, or don't want to install it, then ignore the following section and go to Selecting a Chair for a Session
below.
We're going to generate 100 names and then define some areas of expertise as well as some sessions for each of them.
In [2]:
import names
In [3]:
last_names = np.array([names.get_last_name() for i in xrange(100)])
first_names = np.array([names.get_first_name() for i in xrange(100)])
In [4]:
## Set up areas of expertise for all of them
expertises = ["Stars", "Galaxies", "Black Holes", "Neutron Stars",
"Planets", "Cosmology", "Gas and Dust", "Aliens", "Education"]
area_of_expertise_1 = np.random.choice(expertises, size=last_names.shape[0])
area_of_expertise_2 = np.random.choice(expertises, size=last_names.shape[0])
area_of_expertise_3 = np.random.choice(expertises, size=last_names.shape[0])
In [5]:
## set up session dates and times
dates = ["10/04/2015", "10/05/2015", "10/06/2015"]
session_times = ["9:00:00 AM,10:30:00 PM",
"11:00:00 AM,12:30:00 PM",
"2:00:00 PM,3:30:00 PM"]
session_date = np.random.choice(dates, size=last_names.shape[0])
session_times = np.random.choice(session_times, size=last_names.shape[0])
In [6]:
session_start = [s.split(",")[0] for s in session_times]
session_end = [s.split(",")[1] for s in session_times]
Now we can set up a DataFrame:
In [7]:
df_dict = {"last_name":last_names,
"first_name":first_names,
"area_of_expertise_1":area_of_expertise_1,
"area_of_expertise_2":area_of_expertise_2,
"area_of_expertise_3":area_of_expertise_3,
"session_date":session_date,
"session_start":session_start,
"session_end": session_end
}
df = pd.DataFrame(df_dict)
In [8]:
df.head()
Out[8]:
Let's dump this into a .csv
file for future use:
In [11]:
df.to_csv("../data/example_chairs.csv",index=False,index_label="Index")
Imagine you have a session on black holes and neutron stars, which will take place on 10/05/2015 from 2:00:00 PM to 3:30:00 PM.
You'd like to pick a person from your sample of possible chairs who has the right expertise, but also does not give a talk in that session.
Let's first load some sample data. As shown above, that data was randomly generated, so all names should be fictitious. All similarities to living or dead persons is entirely coincidental.
In [12]:
df = pd.read_csv("../data/example_chairs.csv")
What does this table look like?
In [13]:
df.head()
Out[13]:
Your table might have more columns, but it needs to have at least the ones above. You can easily export it from, for example, Excel into a .csv file, which Pandas can read.
Two important things to note:
In [14]:
df.index += 2
In [15]:
df.head()
Out[15]:
Okay, so now we have the data!
Let's get out some column to make our queries look easier:
In [16]:
area1 = df["area_of_expertise_1"]
area2 = df["area_of_expertise_2"]
area3 = df["area_of_expertise_3"]
session_date = df["session_date"]
session_start = df["session_start"]
session_end = df["session_end"]
First task: let's pick a person whose first expertise is black holes
In [17]:
## Extrasolar Planets, Theory 1
black_holes = df[((area1 == "Black Holes") & (
((session_date == "10/05/2015") &
(session_start != "2:00:00 PM") &
(session_end != "3:30:00 PM")) |
(session_date != "10/05/2015")))]
The query above basically says the following:
"Pick all participants whose value for "area1" is "Black Holes", and who either do not give a talk on the date "10/05/2015" at all or who give a talk on "10/05/2015", but not between "2:00:00 PM" and "3:30:00 PM".
Note: Make sure that the strings you use in your query match exactly the entries in your table!
Here's the resulting list:
In [18]:
black_holes
Out[18]:
If you want a random sample from that list, here is your solution:
In [19]:
black_holes.loc[np.random.choice(np.array(black_holes.index))]
Out[19]:
We can make our query more complex. For example, because our session is about both neutron stars and black holes, we might be happy with anyone whose expertise is either in neutron stars or black holes:
In [20]:
## Extrasolar Planets, Theory 1
black_holes = df[(((area1 == "Black Holes") | (area1 == "Neutron Stars")) & (
((session_date == "10/05/2015") &
(session_start != "2:00:00 PM") &
(session_end != "3:30:00 PM")) |
(session_date != "10/05/2015")))]
In [21]:
black_holes
Out[21]:
That's a much larger list! We might also want to consider people who have expertise in both! that means that their first area of expertise need to be either black holes or neutron stars and their second area of expertise also needs to be either of the two:
In [22]:
## Extrasolar Planets, Theory 1
black_holes = df[(((area1 == "Black Holes") | (area1 == "Neutron Stars")) &
((area2 == "Black Holes") | (area2 == "Neutron Stars")) &
(
((session_date == "10/05/2015") &
(session_start != "2:00:00 PM") &
(session_end != "3:30:00 PM")) |
(session_date != "10/05/2015")))]
In [23]:
black_holes
Out[23]:
And that's about as difficult as our queries get!
I've put this entire workflow into a little script (chair_selection.py
) that should be easy to use. Make sure your csv file with the candidates are in the same folder as the script.
Some sample queries:
See the help message for the script: $> python chair_selection.py --help
All candidates whose primary expertise is "Black Holes" and who don't give a talk on Oct 5 between 2 and 3 pm:
$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all"
One random candidate whose primary expertise is "Black Holes" and who don't give a talk on Oct 5 between 2 and 3 pm:
$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "random"
All candidates whose primary expertise is either "Black Holes" or "Neutron Stars" and who don't give a talk on Oct 5 2015 between 2 and 3 pm:
$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" "Neutron Stars" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all"
All candidates whose first expertise is either "Black Holes" or "Neutron Stars" and whose second expertise is also "Black Holes" and "Neutron Stars":
$> python chair_selection.py -f "../data/example_chairs.csv" -a "Black Holes" "Neutron Stars" -d "10/05/2015" -s "2:00:00 PM" -e "3:30:00 PM" -m "all" --area2 "Black Holes" "Neutron Stars"
Good luck with selecting your chairs! :)