This notebook was used to create the different graphs based on the requirements provided in the course descriptions.

Import the libraries :


In [1]:
# Useful starting lines
%matplotlib inline
import os
import numpy as np
import scipy
import scipy.sparse as sp
import matplotlib.pyplot as plt
import pandas as pd
import re
import networkx as nx
import itertools
import pygsp
from pygsp import graphs, filters, plotting
import pickle 


plt.rcParams['figure.figsize'] = (17, 5)
plotting.BACKEND = 'matplotlib'

%load_ext autoreload
%autoreload 2

In [2]:
courses=pd.read_pickle("../data/cleaned_courses_STI.pickle")
courses.head()


Out[2]:
AcademicYear CourseTitleFR CourseTitleEN ProfessorSCIPERs AssistantSCIPERs KeyWords_FR KeyWords_EN StudyPlans Requirements Summary_Concepts_Contents_EN Summary_Concepts_Contents_FR
CourseCode
BIOENG-404 2016-2017 Analysis and modelling of locomotion NaN 220184;115955;104385 266358;223203;170511;255602;181161;267756;210551 [neurophysiology, motor system, l ocomotion, k... [neurophysiology, motor system, locomotion, ki... EDNE -; SV [] [neuroprosthetics, and link to biped robot, ga... [la biomécanique, la modélisation numérique, l...
BIOENG-430 2016-2017 Introduction to cellular and molecular biotech... NaN 162095 NaN [biotechnologie, expression génique, transgénè... [biotechnology, gene expression, trasngenesis,... CGC; Mineur : Biotechnologie - [] [, the course present comparatively several to... [soulignant la réduction de l'impact environne...
BIOENG-433 2016-2017 Biotechnology lab (for CGC) NaN 110882 252599;253981;273636;273637;243036;273638;2427... [] [] CGC; Mineur : Biotechnologie - [BIOENG-437] [they purify the recombinant protein and chara... [ce cours donnera l'opportunité de se familiar...
BIOENG-437 2016-2017 Pharmaceutical biotechnology NaN 110882 NaN [] [] CGC; Mineur : Biotechnologie - [] [the course will try to trace back the think a... [the course will try to trace back the think a...
BIOENG-442 2016-2017 Biomaterials NaN 106911;254787 243036;260304;253981;266152 [] [cell, extracellular matrix, tissue, regenerat... CGC; Mineur : Neuroprosthétiques -; Mineur : T... [] [this course cover the fundamental concept beh... [ce cours couvre le concept fondamentaux sur l...

In [3]:
# Index for the courses
courses_index_dict=dict(zip(courses.index, np.arange(len(courses.index))))

1. Functions: weight matrix and graph construction

The following functions are used later on make our analysis on the requirements. It includes the following processes:

- Compute the weight matrix thanks to a dictionnary where the keys are course codes and the values are the topics (i.e. Professors or Studyplans).
- Compute a weight matrix thanks to a dictionnary where the keys are the topics and the values are the course codes.
- Compute a graph based on its weight matrix.
- Reshape a weight matrix by adding courses that did not have requirements.

In [4]:
# Argument 1: Dictionary course -> topics
# Return: weight matrix
def compute_weight_matrix_ct(courses_dict):
    # Initialize the weight matrix and the list of the values.
    weight_mat=np.zeros((len(courses_dict),len(courses_dict)))
    values_for_each_course = list(courses_dict.values())
    
    # Create a list containing all the unique values.
    unique_values = []
    for i in range(0,len(values_for_each_course)):
        unique_values.extend(values_for_each_course[i])
    unique_values = list(set(unique_values))
    
    # Loop on the values: Find every courses that have the same value v. Then add a weight between them.
    for i in range(0,len(unique_values)):
        # Variable to store the index of the courses that have the value i in common
        courses_index_with_value_i = []
        for j,lst in enumerate(values_for_each_course):
            for k,value in enumerate(lst):
                if value == unique_values[i]:
                    courses_index_with_value_i.append(j)
        # Add weight between the courses.
        for j in range(0,len(courses_index_with_value_i)):
            for k in range(j+1,len(courses_index_with_value_i)):
                weight_mat[courses_index_with_value_i[j],courses_index_with_value_i[k]] += 1
                weight_mat[courses_index_with_value_i[k],courses_index_with_value_i[j]] += 1        
    return weight_mat

In [5]:
# Argument 1: dictionary topic -> course (topic = StudyPlans or Professors..)
# Argument 2 (optional): dictionary course -> index
# Argument 3 (optional): weight added for one link between two edges. Default = 1
# Return tuple: (weight_matrix) if argument 2 given
#               (weight_matrix, dictionary course -> index) if argument 2 not given

def compute_weight_matrix_tc(*args):
    # Initialize the dictionaries depending on the inputs
    topic_dict = args[0]
    if(len(args) > 1):
        courses_index_dict = args[1]
    else:
        # The index have to be determined
        unique_courses = []
        courses_index_dict = {}
        # Find each courses
        for i in range(0,len(list(topic_dict.values()))):
            unique_courses.extend(list(topic_dict.values())[i])
        unique_courses = list(set(unique_courses))
        # Create dictionary courses -> index
        courses_index_dict=dict(zip(unique_courses, np.arange(len(unique_courses))))
    # Initialize the weight matrix
    weight_mat=np.zeros((len(courses_index_dict),len(courses_index_dict)))
    # Define the weights for each edges
    if(len(args) > 2):
        w = args[2]
    else:
        w = 1      
    # For all topics, find links between courses
    for topic in topic_dict.keys():
        for course1, course2 in itertools.combinations(topic_dict[topic], 2):
            weight_mat[courses_index_dict[course1],courses_index_dict[course2]]+=w
            weight_mat[courses_index_dict[course2],courses_index_dict[course1]]+=w
    # Return the weight matrix and the index dictionary if it was not given as an input.
    if(len(args) > 1):
        return weight_mat
    else:
        return weight_mat, courses_index_dict

In [6]:
def compute_graph(weight_mat):
    # Create the graph: nx -> weight matrix
    Graph = nx.from_numpy_matrix(weight_mat)
    # Initialize the plot and the position of the nodes.
    plt.figure(1,figsize=(16,16)) 
    pos = nx.spring_layout(Graph)
    # Draw and plot the nodes, labels and edges.
    nx.draw_networkx_nodes(Graph, pos, cmap=plt.get_cmap('jet'))
    nx.draw_networkx_labels(Graph, pos)
    nx.draw_networkx_edges(Graph, pos)
    plt.show()
    return Graph

In [7]:
# Reshape the weight matrix:
# - index_weight: index of the courses of the actual weight matrix
# - index_courses: index of all the courses wanted in the weight matrix
def reshape_weight_matrix(weight_matrix, index_weight, index_courses):
    result_weight_matrix = np.zeros((len(index_courses),len(index_courses)))
    for key in index_courses.keys():
        # If the course is taken into account in the actual matrix, add it to the new one
        if(key in index_weight.keys()):
            for key2 in index_weight.keys():
                # And if the course exists in the final list, add it to the new matrix
                if(key2 in index_courses.keys()):
                    result_weight_matrix[index_courses[key],index_courses[key2]] = weight_matrix[index_weight[key],index_weight[key2]]
                    result_weight_matrix[index_courses[key2],index_courses[key]] = weight_matrix[index_weight[key2],index_weight[key]]
    return result_weight_matrix

2. Requirements of the same course linked together

Creation of the graph obtained by linking the courses required for the same lecture.


In [8]:
# Creation of the dictionary course -> requirements
dict_requirements_same_course = courses['Requirements'].to_dict()
# Delete the entries that have less than 2 requirements
new_dict = {}
for key in dict_requirements_same_course.keys():
    if (len(list(dict_requirements_same_course[key])) > 1):
        new_dict[key] = dict_requirements_same_course[key]
dict_requirements_same_course = new_dict

In [9]:
weight_requirements_same_course, index_requirements_same_course = compute_weight_matrix_tc(dict_requirements_same_course)
#weight_requirements_same_course = reshape_weight_matrix(weight_requirements_same_course,index_requirements_same_course,courses_index_dict)

In [10]:
# Number of courses that have been linked
len(index_requirements_same_course)


Out[10]:
111

In [11]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_requirements_same_course)


Out[11]:
<matplotlib.image.AxesImage at 0x11ea6ab38>

In [12]:
graph_requirements_same_course = compute_graph(weight_requirements_same_course)


3. Courses linked to their requirements

Creation of the graph obtained by linking a course to its requirements.


In [13]:
# Creation of the dictionary course+requirement -> course, requirement
dict_course_to_requirement = courses['Requirements'].to_dict()
new_dict = {}
for key in dict_requirements_same_course.keys():
    if (len(list(dict_requirements_same_course[key])) > 0):
        for value in dict_requirements_same_course[key]:
            new_dict[key+"+"+value] = [key]
            new_dict[key+"+"+value] += [value]
dict_course_to_requirement = new_dict

In [14]:
weight_course_to_requirement, index_course_to_requirement = compute_weight_matrix_tc(dict_course_to_requirement)
#weight_course_to_requirement = reshape_weight_matrix(weight_course_to_requirement,index_course_to_requirement,courses_index_dict)

In [15]:
# Number of courses that have been linked
len(index_course_to_requirement)


Out[15]:
155

In [16]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_course_to_requirement)


Out[16]:
<matplotlib.image.AxesImage at 0x12312e940>

In [17]:
graph_course_to_requirement = compute_graph(weight_course_to_requirement)


4. Courses that have the same requirements linked together

Creation of the graph obtained by linking the courses that have the same requirements together.


In [18]:
# Creation of the dictionary requirement -> courses
dict_course_same_requirements = courses['Requirements'].to_dict()

new_dict = {}
for key in dict_course_same_requirements.keys():
    for value in dict_course_same_requirements[key]:
        if(value in new_dict):
            new_dict[value] += [key]
        else:
            new_dict[value] = [key]
dict_course_same_requirements = new_dict

In [19]:
weight_course_same_requirements, index_course_same_requirements = compute_weight_matrix_tc(dict_course_same_requirements)
#weight_course_same_requirements = reshape_weight_matrix(weight_course_same_requirements,index_course_same_requirements,courses_index_dict)

In [20]:
# Number of courses that have been linked
len(index_course_same_requirements)


Out[20]:
109

In [21]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_course_same_requirements)


Out[21]:
<matplotlib.image.AxesImage at 0x123a94630>

In [22]:
graph_course_same_requirements = compute_graph(weight_course_same_requirements)


5. Reshape the weight matrices

In this part we reshape the weight matrices of the 3 parts presented before so every courses are taking into account. It means that even if a course did not have requirement, it will appear in the weight matrix.


In [23]:
weight_requirements_same_course = reshape_weight_matrix(weight_requirements_same_course,index_requirements_same_course,courses_index_dict)

In [24]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_requirements_same_course)


Out[24]:
<matplotlib.image.AxesImage at 0x1232dcac8>

In [25]:
weight_course_to_requirement = reshape_weight_matrix(weight_course_to_requirement,index_course_to_requirement,courses_index_dict)

In [26]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_course_to_requirement)


Out[26]:
<matplotlib.image.AxesImage at 0x1227b50f0>

In [27]:
weight_course_same_requirements = reshape_weight_matrix(weight_course_same_requirements,index_course_same_requirements,courses_index_dict)

In [28]:
plt.figure(1,figsize=(10,10))
plt.spy(weight_course_same_requirements)


Out[28]:
<matplotlib.image.AxesImage at 0x123271e10>

Saving the weight matrices into files.


In [29]:
pkl_file = open(os.path.join(os.getcwd(), "Graphs","req_course_same_req_graph_STI.pkl"), "wb")
pickle.dump(weight_requirements_same_course, pkl_file)
pkl_file.close()

In [30]:
pkl_file = open(os.path.join(os.getcwd(), "Graphs","req_course_to_req_graph_STI.pkl"), "wb")
pickle.dump(weight_course_to_requirement, pkl_file)
pkl_file.close()

In [31]:
pkl_file = open(os.path.join(os.getcwd(), "Graphs","req_same_course_graph_STI.pkl"), "wb")
pickle.dump(weight_course_same_requirements, pkl_file)
pkl_file.close()