SDHacks 2015 Talk

Anatomy of Web Scrapers: Building Data Apps

$ whoami
{
       "name": "Sang Han",
    "website": "http://sanghan.me",
     "github": "https://github.com/jjangsangy",
  "education": [
        {
          "school": "University of California: San Diego",
          "degree": "B.Sc",
           "years": "2007-2011",
            "conc": ["Physical Chemistry", "Behavioral Psychology"]
        },
        {
          "school": "Stanford University",
          "degree": "M.Eng",
            "year": "2016+",
            "conc": ["Artificial Intelligence", "Systems Security"]
        }
    ],
    "work": {
         "company": "Qadium Inc",
        "position": "Data Scientist",
            "desc": "DARPA Research in Information Innovation and Machine Learning"
    }
}

GraphUCSD:

Class Study App using UCSD CAPE Data

Create a interactive visualization composed by the CAPE surveys filled at the end of each quarter Code: Github


Installation

Easy way (Only OS X and Linux)

$ make

Hard Way (Windows)

Install Python using Anacondas

Anacondas Python Distribution

Install Python Packages

pip install -r requirements.txt

Important Libraries

Basically just scraped cape website using Python (Both Python 2 and Python 3 Work), and I used PostgreSQL as the backend. Took about a day to write, and then another day just messing around to get everything to fit the schema, so it was a fun weekend project. The packages that are required to run the scraper are

The Anacondas Python Distribution is like the easiest way to get all the packages needed if you wish to try out the code yourself.

I also use a ThreadPool for making connections asynchronously, so that this doesn't take a million years lol.

Visualization

The visualizations I used here are Tableau

Database

So most of the code is actually data munging and cleaning up the data in order to fit the schema for PostgreSQL.

Ultimately, the schema for Postgres looks like this.

This image is a little bit old, the new schema is a little different, but you get the idea.

A Note on Scraping

I know that usually it's not polite to scrape from a service if they already provide an API, like reddit for instance. However, when I went to go look for one, I couldn't find any, so that gave me the green light to go ahead and write a scraper. And honestly, ever since I was a student (like 3 years ago), I was always unsatisfied with CAPE, so this is kind of my way of liberating the data so that students can access it better.

Etc..

Currently it only queries about 30-40 or so different departments and grabs the tables generated for those queries.

However, every single class also has it's own page, but since I didn't want to make 20,000 HTTP requests, I went and only grabbed the front matter.

This kind of opens it up for anyone else, or even myself to build a service that takes into account the rest of the data. In the scraper itself, I've created a column called link that actually points to the individual CAPE's for classes, so I've made it really easy for people to do this.

The Code


In [1]:
from __future__ import print_function

import requests
import sys
import itertools
import logging
import string
import os

import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
from operator import itemgetter
from multiprocessing.dummy import Pool as ThreadPool

try:
    from urllib.parse import urljoin
except ImportError:
    from urlparse import urljoin

Making a HTTP Connection


In [2]:
import requests

req = requests.get('http://google.com')

print(req.text)


<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:'B8TOVtqvGsuFmQGI2LDYBA',kEXPI:'3700330,3700388,4029815,4031109,4032677,4033307,4036509,4036527,4038012,4039268,4042490,4042785,4042793,4043492,4045841,4046304,4049501,4049551,4049573,4050912,4051034,4051241,4051558,4051596,4051714,4052304,4054117,4054284,4054551,4055202,4055744,4056038,4056163,4057169,4057324,4057586,4057836,4058117,4058228,4058316,4058330,4058337,4058384,4058544,4059318,4059438,4059446,4059635,4059767,4059860,4060683,4060845,4061089,8300273,8300310,8502095,8502315,8502451,8502691,8502986,8503012,8503132,8503150,8503157,8503304,8503307,8503404,8503585,8503719,8503744,10200083',authuser:0,kscs:'c9c918f0_24'};google.kHL='en';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.wl=function(a,b){try{google.ml(Error(a),!1,b)}catch(d){}};google.time=function(){return(new Date).getTime()};google.log=function(a,b,d,e,g){a=google.logUrl(a,b,d,e,g);if(""!=a){b=new Image;var c=google.lc,f=google.li;c[f]=b;b.onerror=b.onload=b.onabort=function(){delete c[f]};window.google&&window.google.vel&&window.google.vel.lu&&window.google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,d,e,g){var c="",f=google.ls||"";if(!d&&-1==b.search("&ei=")){var h=google.getEI(e),c="&ei="+h;-1==b.search("&lei=")&&((e=google.getLEI(e))?c+="&lei="+e:h!=google.kEI&&(c+="&lei="+google.kEI))}a=d||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+b+c+f+"&zx="+google.time();/^http:/i.test(a)&&google.https()&&(google.ml(Error("a"),!1,{src:a,glmm:1}),a="");return a};google.y={};google.x=function(a,b){google.y[a.id]=[a,b];return!1};google.load=function(a,b,d){google.x({id:a+k++},function(){google.load(a,b,d)})};var k=0;})();var _gjwl=location;function _gjuc(){var a=_gjwl.href.indexOf("#");if(0<=a&&(a=_gjwl.href.substring(a),0<a.indexOf("&q=")||0<=a.indexOf("#q="))&&(a=a.substring(1),-1==a.indexOf("#"))){for(var d=0;d<a.length;){var b=d;"&"==a.charAt(b)&&++b;var c=a.indexOf("&",b);-1==c&&(c=a.length);b=a.substring(b,c);if(0==b.indexOf("fp="))a=a.substring(0,d)+a.substring(c,a.length),c=d;else if("cad=h"==b)return 0;d=c}_gjwl.href="/search?"+a+"&cad=h";return 1}return 0}
function _gjh(){!_gjuc()&&window.google&&google.x&&google.x({id:"GJH"},function(){google.nav&&google.nav.gjh&&google.nav.gjh()})};window._gjh&&_gjh();</script><style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important}
</style><style>body,td,a,p,.h{font-family:arial,sans-serif}body{margin:0;overflow-y:scroll}#gog{padding:3px 8px 0}td{line-height:.8em}.gac_m td{line-height:17px}form{margin-bottom:20px}.h{color:#36c}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:collapse}em{font-weight:bold;font-style:normal}.lst{height:25px;width:496px}.gsfi,.lst{font:18px arial,sans-serif}.gsfs{font:17px arial,sans-serif}.ds{display:inline-box;display:inline-block;margin:3px 0 4px;margin-left:4px}input{font-family:inherit}a.gb1,a.gb2,a.gb3,a.gb4{color:#11c !important}body{background:#fff;color:black}a{color:#11c;text-decoration:none}a:hover,a:active{text-decoration:underline}.fl a{color:#36c}a:visited{color:#551a8b}a.gb1,a.gb4{text-decoration:underline}a.gb3:hover{text-decoration:none}#ghead a.gb2:hover{color:#fff !important}.sblc{padding-top:5px}.sblc a{display:block;margin:2px 0;margin-left:13px;font-size:11px}.lsbb{background:#eee;border:solid 1px;border-color:#ccc #999 #999 #ccc;height:30px}.lsbb{display:block}.ftl,#fll a{display:inline-block;margin:0 12px}.lsb{background:url(/images/nav_logo229.png) 0 -261px repeat-x;border:none;color:#000;cursor:pointer;height:30px;margin:0;outline:0;font:15px arial,sans-serif;vertical-align:top}.lsb:active{background:#ccc}.lst:focus{outline:none}</style><script></script><link href="/images/branding/product/ico/googleg_lodp.ico" rel="shortcut icon"></head><body bgcolor="#fff"><script>(function(){var src='/images/nav_logo229.png';var iesg=false;document.body.onload = function(){window.n && window.n();if (document.images){new Image().src=src;}
if (!iesg){document.f&&document.f.q.focus();document.gbqf&&document.gbqf.q.focus();}
}
})();</script><div id="mngb">    <div id=gbar><nobr><b class=gb1>Search</b> <a class=gb1 href="http://www.google.com/imghp?hl=en&tab=wi">Images</a> <a class=gb1 href="http://maps.google.com/maps?hl=en&tab=wl">Maps</a> <a class=gb1 href="https://play.google.com/?hl=en&tab=w8">Play</a> <a class=gb1 href="http://www.youtube.com/?tab=w1">YouTube</a> <a class=gb1 href="http://news.google.com/nwshp?hl=en&tab=wn">News</a> <a class=gb1 href="https://mail.google.com/mail/?tab=wm">Gmail</a> <a class=gb1 href="https://drive.google.com/?tab=wo">Drive</a> <a class=gb1 style="text-decoration:none" href="https://www.google.com/intl/en/options/"><u>More</u> &raquo;</a></nobr></div><div id=guser width=100%><nobr><span id=gbn class=gbi></span><span id=gbf class=gbf></span><span id=gbe></span><a href="http://www.google.com/history/optout?hl=en" class=gb4>Web History</a> | <a  href="/preferences?hl=en" class=gb4>Settings</a> | <a target=_top id=gb_70 href="https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=http://www.google.com/" class=gb4>Sign in</a></nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div>    </div><center><span id="prt" style="display:block"> <div><style>.pmoabs{background-color:#fff;border:1px solid #E5E5E5;color:#666;font-size:13px;padding-bottom:20px;position:absolute;right:2px;top:3px;z-index:986}#pmolnk{border-radius:2px;-moz-border-radius:2px;-webkit-border-radius:2px}.kd-button-submit{border:1px solid #3079ed;background-color:#4d90fe;background-image:-webkit-gradient(linear,left top,left bottom,from(#4d90fe),to(#4787ed));background-image:-webkit-linear-gradient(top,#4d90fe,#4787ed);background-image:-moz-linear-gradient(top,#4d90fe,#4787ed);background-image:-ms-linear-gradient(top,#4d90fe,#4787ed);background-image:-o-linear-gradient(top,#4d90fe,#4787ed);background-image:linear-gradient(top,#4d90fe,#4787ed);filter:progid:DXImageTransform.Microsoft.gradient(startColorStr='#4d90fe',EndColorStr='#4787ed')}.kd-button-submit:hover{border:1px solid #2f5bb7;background-color:#357ae8;background-image:-webkit-gradient(linear,left top,left bottom,from(#4d90fe),to(#357ae8));background-image:-webkit-linear-gradient(top,#4d90fe,#357ae8);background-image:-moz-linear-gradient(top,#4d90fe,#357ae8);background-image:-ms-linear-gradient(top,#4d90fe,#357ae8);background-image:-o-linear-gradient(top,#4d90fe,#357ae8);background-image:linear-gradient(top,#4d90fe,#357ae8);filter:progid:DXImageTransform.Microsoft.gradient(startColorStr='#4d90fe',EndColorStr='#357ae8')}.kd-button-submit:active{-webkit-box-shadow:inset 0 1px 2px rgba(0,0,0,0.3);-moz-box-shadow:inset 0 1px 2px rgba(0,0,0,0.3);box-shadow:inset 0 1px 2px rgba(0,0,0,0.3)}#pmolnk a{color:#fff;display:inline-block;font-weight:bold;padding:5px 20px;text-decoration:none;white-space:nowrap}.xbtn{color:#999;cursor:pointer;font-size:23px;line-height:5px;padding-top:5px}.padi{padding:0 8px 0 10px}.padt{padding:5px 20px 0 0;color:#444}.pads{text-align:left;max-width:200px}</style> <div class="pmoabs" id="pmocntr2" style="behavior:url(#default#userdata);display:none"> <table border="0"> <tr> <td colspan="2"> <div class="xbtn" onclick="google.promos&&google.promos.toast&& google.promos.toast.cpc()" style="float:right">&times;</div> </td> </tr> <tr> <td class="padi" rowspan="2"> <img src="/images/icons/product/chrome-48.png"> </td> <td class="pads">A better way to browse the web</td> </tr> <tr> <td class="padt"> <div class="kd-button-submit" id="pmolnk"> <a href="/chrome/browser/?hl=en&amp;brand=CHNG&amp;utm_source=en-hpp&amp;utm_medium=hpp&amp;utm_campaign=en" onclick="google.promos&&google.promos.toast&& google.promos.toast.cl()">Get Google Chrome</a> </div> </td> </tr> </table> </div> <script type="text/javascript">(function(){var a={v:{}};a.v.mb=50;a.v.kb=10;a.v.La="body";a.v.Mb=!0;a.v.Pb=function(b,c){var d=a.v.Cb();a.v.Eb(d,b,c);a.v.Qb(d);a.v.Mb&&a.v.Nb(d)};a.v.Qb=function(b){(b=a.v.Na(b))&&0<b.forms.length&&b.forms[0].submit()};a.v.Cb=function(){var b=document.createElement("iframe");b.height=0;b.width=0;b.style.overflow="hidden";b.style.top=b.style.left="-100px";b.style.position="absolute";document.body.appendChild(b);return b};a.v.Na=function(b){return b.contentDocument||b.contentWindow.document};a.v.Eb=function(b,c,d){b=a.v.Na(b);b.open();d=["<",a.v.La,'><form method=POST action="',d,'">'];for(var e in c)c.hasOwnProperty(e)&&d.push('<textarea name="',e,'">',c[e],"</textarea>");d.push("</form></",a.v.La,">");b.write(d.join(""));b.close()};a.v.Pa=function(b,c){c>a.v.kb?google&&google.ml&&google.ml(Error("ogcdr"),!1,{cause:"timeout"}):b.contentWindow?a.v.Ob(b):window.setTimeout(function(){a.v.Pa(b,c+1)},a.v.mb)};a.v.Ob=function(b){document.body.removeChild(b)};a.v.Nb=function(b){a.v.Ab(b,"load",function(){a.v.Pa(b,0)})};a.v.Ab=function(b,c,d){b.addEventListener?b.addEventListener(c,d,!1):b.attachEvent&&b.attachEvent("on"+c,d)};var m={Ub:0,$:1,ka:2,va:5,Tb:6};a.s={};a.s.ya={Ya:"i",ta:"d",$a:"l"};a.s.U={Aa:"0",ma:"1"};a.s.Ba={wa:1,ta:2,ra:3};a.s.S={Sa:"a",Wa:"g",W:"c",ub:"u",tb:"t",Aa:"p",lb:"pid",Ua:"eid",vb:"at"};a.s.Za=window.location.protocol+"//www.google.com/_/og/promos/";a.s.Va="g";a.s.wb="z";a.s.Fa=function(b,c,d,e){var f=null;switch(c){case m.$:f=window.gbar.up.gpd(b,d,!0);break;case m.va:f=window.gbar.up.gcc(e)}return null==f?0:parseInt(f,10)};a.s.Ib=function(b,c,d){return c==m.$?null!=window.gbar.up.gpd(b,d,!0):!1};a.s.Ca=function(b,c,d,e,f,h,k,l){var g={};g[a.s.S.Aa]=b;g[a.s.S.Wa]=c;g[a.s.S.Sa]=d;g[a.s.S.vb]=e;g[a.s.S.Ua]=f;g[a.s.S.lb]=1;k&&(g[a.s.S.W]=k);l&&(g[a.s.S.ub]=l);if(h)g[a.s.S.tb]=h;else return google.ml(Error("knu"),!1,{cause:"Token is not found"}),null;return g};a.s.Ia=function(b,c,d){if(b){var e=c?a.s.Va:a.s.wb;c&&d&&(e+="?authuser="+d);a.v.Pb(b,a.s.Za+e)}};a.s.Db=function(b,c,d,e,f,h,k){b=a.s.Ca(c,b,a.s.ya.ta,a.s.Ba.ta,d,f,null,e);a.s.Ia(b,h,k)};a.s.Gb=function(b,c,d,e,f,h,k){b=a.s.Ca(c,b,a.s.ya.Ya,a.s.Ba.wa,d,f,e,null);a.s.Ia(b,h,k)};a.s.Lb=function(b,c,d,e,f,h,k,l,g,n){switch(c){case m.va:window.gbar.up.dpc(e,f);break;case m.$:window.gbar.up.spd(b,d,1,!0);break;case m.ka:g=g||!1,l=l||"",h=h||0,k=k||a.s.U.ma,n=n||0,a.s.Db(e,h,k,f,l,g,n)}};a.s.Jb=function(b,c,d,e,f){return c==m.$?0<d&&a.s.Fa(b,c,e,f)>=d:!1};a.s.Fb=function(b,c,d,e,f,h,k,l,g,n){switch(c){case m.va:window.gbar.up.iic(e,f);break;case m.$:c=a.s.Fa(b,c,d,e)+1;window.gbar.up.spd(b,d,c.toString(),!0);break;case m.ka:g=g||!1,l=l||"",h=h||0,k=k||a.s.U.Aa,n=n||0,a.s.Gb(e,h,k,1,l,g,n)}};a.s.Kb=function(b,c,d,e,f,h){b=a.s.Ca(c,b,a.s.ya.$a,a.s.Ba.ra,d,e,null,null);a.s.Ia(b,f,h)};var p={Rb:"a",Vb:"l",Sb:"c",Ta:"d",ra:"h",wa:"i",lc:"n",ma:"x",dc:"ma",jc:"mc",kc:"mi",Wb:"pa",Xb:"pc",Zb:"pi",ac:"pn",$b:"px",Yb:"pd",mc:"gpa",qc:"gpi",sc:"gpn",tc:"gpx",nc:"gpd"};a.o={};a.o.R={ab:"hplogo",rb:"pmocntr2"};a.o.U={qb:"0",ma:"1",Ra:"2"};a.o.w=document.getElementById(a.o.R.rb);a.o.Xa=16;a.o.nb=2;a.o.pb=20;google.promos=google.promos||{};google.promos.toast=google.promos.toast||{};a.o.qa=function(b){a.o.w&&(a.o.w.style.display=b?"":"none",a.o.w.parentNode&&(a.o.w.parentNode.style.position=b?"relative":""))};a.o.Qa=function(b){try{if(a.o.w&&b&&b.es&&b.es.m){var c=window.gbar.rtl(document.body)?"left":"right";a.o.w.style[c]=b.es.m-a.o.Xa+a.o.nb+"px";a.o.w.style.top=a.o.pb+"px"}}catch(d){google.ml(d,!1,{cause:a.o.T+"_PT"})}};google.promos.toast.cl=function(){try{a.o.Da==m.ka&&a.s.Kb(a.o.Ga,a.o.V,a.o.U.Ra,a.o.Ka,a.o.Ha,a.o.Ja),window.gbar.up.sl(a.o.V,a.o.T,p.ra,a.o.Ea(),1)}catch(b){google.ml(b,!1,{cause:a.o.T+"_CL"})}};google.promos.toast.cpc=function(){try{a.o.w&&(a.o.qa(!1),a.s.Lb(a.o.w,a.o.Da,a.o.R.Ma,a.o.Ga,a.o.Bb,a.o.V,a.o.U.ma,a.o.Ka,a.o.Ha,a.o.Ja),window.gbar.up.sl(a.o.V,a.o.T,p.Ta,a.o.Ea(),1))}catch(b){google.ml(b,!1,{cause:a.o.T+"_CPC"})}};a.o.Oa=function(){try{if(a.o.w){var b=276,c=document.getElementById(a.o.R.ab);c&&(b=Math.max(b,c.offsetWidth));var d=parseInt(a.o.w.style.right,10)||0;a.o.w.style.visibility=2*(a.o.w.offsetWidth+d)+b>document.body.clientWidth?"hidden":""}}catch(e){google.ml(e,!1,{cause:a.o.T+"_HOSW"})}};a.o.yb=function(){var b=["gpd","spd","aeh","sl"];if(!window.gbar||!window.gbar.up)return!1;for(var c=0,d;d=b[c];c++)if(!(d in window.gbar.up))return!1;return!0};a.o.Hb=function(){return a.o.w.currentStyle&&"absolute"!=a.o.w.currentStyle.position};google.promos.toast.init=function(b,c,d,e,f,h,k,l,g,n,q,r){try{if(!a.o.yb())google.ml(Error("apa"),!1,{cause:a.o.T+"_INIT"});else if(a.o.w)if(e==m.ka&&!l==!g)google.ml(Error("tku"),!1,{cause:"zwieback: "+g+", gaia: "+l}),a.o.qa(!1);else if(a.o.R.W="toast_count_"+c+(q?"_"+q:""),a.o.R.Ma="toast_dp_"+c+(r?"_"+r:""),a.o.T=d,a.o.V=b,a.o.Da=e,a.o.Ga=c,a.o.Bb=f,a.o.Ka=l?l:g,a.o.Ha=!!l,a.o.Ja=k,a.s.Ib(a.o.w,e,a.o.R.Ma,c)||a.s.Jb(a.o.w,e,h,a.o.R.W,c)||a.o.Hb())a.o.qa(!1);else{a.s.Fb(a.o.w,e,a.o.R.W,c,f,a.o.V,a.o.U.qb,a.o.Ka,a.o.Ha,a.o.Ja);if(!n){try{window.gbar.up.aeh(window,"resize",a.o.Oa)}catch(t){}window.lol=a.o.Oa;window.gbar.elr&&a.o.Qa(window.gbar.elr());window.gbar.elc&&window.gbar.elc(a.o.Qa);a.o.qa(!0)}window.gbar.up.sl(a.o.V,a.o.T,p.wa,a.o.Ea())}}catch(t){google.ml(t,!1,{cause:a.o.T+"_INIT"})}};a.o.Ea=function(){var b=a.s.Fa(a.o.w,a.o.Da,a.o.R.W,a.o.Ga);return"ic="+b};})();</script> <script type="text/javascript">(function(){var sourceWebappPromoID=144002;var sourceWebappGroupID=5;var payloadType=5;var cookieMaxAgeSec=2592000;var dismissalType=5;var impressionCap=25;var gaiaXsrfToken='';var zwbkXsrfToken='';var kansasDismissalEnabled=false;var sessionIndex=0;var invisible=false;window.gbar&&gbar.up&&gbar.up.r&&gbar.up.r(payloadType,function(show){if (show){google.promos.toast.init(sourceWebappPromoID,sourceWebappGroupID,payloadType,dismissalType,cookieMaxAgeSec,impressionCap,sessionIndex,gaiaXsrfToken,zwbkXsrfToken,invisible,'0612');}
});})();</script> </div> </span><br clear="all" id="lgpd"><div id="lga"><img alt="Google" height="92" src="/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png" style="padding:28px 0 14px" width="272" id="hplogo" onload="window.lol&&lol()"><br><br></div><form action="/search" name="f"><table cellpadding="0" cellspacing="0"><tr valign="top"><td width="25%">&nbsp;</td><td align="center" nowrap=""><input name="ie" value="ISO-8859-1" type="hidden"><input value="en" name="hl" type="hidden"><input name="source" type="hidden" value="hp"><input name="biw" type="hidden"><input name="bih" type="hidden"><div class="ds" style="height:32px;margin:4px 0"><input style="color:#000;margin:0;padding:5px 8px 0 6px;vertical-align:top" autocomplete="off" class="lst" value="" title="Google Search" maxlength="2048" name="q" size="57"></div><br style="line-height:0"><span class="ds"><span class="lsbb"><input class="lsb" value="Google Search" name="btnG" type="submit"></span></span><span class="ds"><span class="lsbb"><input class="lsb" value="I'm Feeling Lucky" name="btnI" onclick="if(this.form.q.value)this.checked=1; else top.location='/doodles/'" type="submit"></span></span></td><td class="fl sblc" align="left" nowrap="" width="25%"><a href="/advanced_search?hl=en&amp;authuser=0">Advanced search</a><a href="/language_tools?hl=en&amp;authuser=0">Language tools</a></td></tr></table><input id="gbv" name="gbv" type="hidden" value="1"></form><div id="gac_scont"></div><div style="font-size:83%;min-height:3.5em"><br></div><span id="footer"><div style="font-size:10pt"><div style="margin:19px auto;text-align:center" id="fll"><a href="/intl/en/ads/">Advertising Programs</a><a href="/services/">Business Solutions</a><a href="https://plus.google.com/116899029375914044550" rel="publisher">+Google</a><a href="/intl/en/about.html">About Google</a></div></div><p style="color:#767676;font-size:8pt">&copy; 2016 - <a href="/intl/en/policies/privacy/">Privacy</a> - <a href="/intl/en/policies/terms/">Terms</a></p></span></center><script>(function(){window.google.cdo={height:0,width:0};(function(){var a=window.innerWidth,b=window.innerHeight;if(!a||!b)var c=window.document,d="CSS1Compat"==c.compatMode?c.documentElement:c.body,a=d.clientWidth,b=d.clientHeight;a&&b&&(a!=google.cdo.width||b!=google.cdo.height)&&google.log("","","/client_204?&atyp=i&biw="+a+"&bih="+b+"&ei="+google.kEI);})();})();</script><div id="xjsd"></div><div id="xjsi"><script>(function(){function c(b){window.setTimeout(function(){var a=document.createElement("script");a.src=b;document.getElementById("xjsd").appendChild(a)},0)}google.dljp=function(b,a){google.xjsu=b;c(a)};google.dlj=c;})();(function(){window.google.xjsrm=[];})();if(google.y)google.y.first=[];if(!google.xjs){window._=window._||{};window._._DumpException=function(e){throw e};if(google.timers&&google.timers.load.t){google.timers.load.t.xjsls=new Date().getTime();}google.dljp('/xjs/_/js/k\x3dxjs.hp.en_US.0zVNO1dALvA.O/m\x3dsb_he,d/rt\x3dj/d\x3d1/t\x3dzcms/rs\x3dACT90oEwQugK-P-evQ9Dx8LNEXKJm1j7-w','/xjs/_/js/k\x3dxjs.hp.en_US.0zVNO1dALvA.O/m\x3dsb_he,d/rt\x3dj/d\x3d1/t\x3dzcms/rs\x3dACT90oEwQugK-P-evQ9Dx8LNEXKJm1j7-w');google.xjs=1;}google.pmc={"sb_he":{"agen":true,"cgen":true,"client":"heirloom-hp","dh":true,"dhqt":true,"ds":"","fl":true,"host":"google.com","isbh":28,"jam":0,"jsonp":true,"msgs":{"cibl":"Clear Search","dym":"Did you mean:","lcky":"I\u0026#39;m Feeling Lucky","lml":"Learn more","oskt":"Input tools","psrc":"This search was removed from your \u003Ca href=\"/history\"\u003EWeb History\u003C/a\u003E","psrl":"Remove","sbit":"Search by image","srch":"Google Search"},"ovr":{},"pq":"","refpd":true,"rfs":[],"scd":10,"sce":5,"stok":"v-LcUIAp6PiMrSL9Ai62cc5UFa8"},"d":{}};google.y.first.push(function(){if(google.med){google.med('init');google.initHistory();google.med('history');}});if(google.j&&google.j.en&&google.j.xi){window.setTimeout(google.j.xi,0);}
</script></div></body></html>

In [3]:
def connect(prot='http', **q):
    """
    Makes a connection with CAPE.
    Required that at least one query is made.

    Parameters
    ----------
    :params prot: Either HTTP or HTTPS
    :params    q: Query Dictionary

    Returns
    -------
    :return: Request
    :rtype : request.Request
    """
    host   = 'cape.ucsd.edu'
    inputs = 'Name', 'courseNumber', 'department'
    prot   = prot.lower()
    base   = '%s://%s/responses/Results.aspx' % (prot, host)

    assert prot in ['http', 'https']
    assert any(val in inputs for val in q)

    headers = {           "Host": host,
                        "Accept": ','.join([
                                    "text/html",
                                    "application/xhtml+xml",
                                    "application/xml;q=0.9,*/*;q=0.8"]),
               "Accept-Language": "en-US,en;q=0.5",
                    "User-Agent":  ' '.join([
                                    "Mozilla/5.0]",
                                    "(Macintosh; Intel Mac OS X 10_10_2)",
                                    "AppleWebKit/600.3.18",
                                    "(KHTML, like Gecko)",
                                    "Version/8.0.3 Safari/600.3.18"]),
                 "Cache-Control": "no-cache"
    }
    queries = '&'.join(
        [
            '{key}={value}'.format(key=key, value=value)
                for key, value in q.items()
                if  key in inputs
        ]
    )
    req = requests.get('?'.join([base, queries]), headers=headers)

    if not req.ok:
        print("Request didn't make it", file=sys.stderr)
        req.raise_for_status()

    return req

Running the Code


  • **q is a variable set of keyword arguments that it will apply to the URL
>>> connect(department=CHEM)

Will make a request to http://cape.ucsd.edu/responses/Results.aspx?department=CHEM and return the result.


In [4]:
# URL: http://cape.com/responses/Results.aspx?

req = connect(department="CHEM")

print(req.text)



<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head><meta http-equiv="Content-Type" content="text/html;charset=utf-8" /><meta content="initial-scale=1.0" name="viewport" />
<script type="text/javascript"
    src="//uxt.ucsd.edu/common/jquery/1.4.4/jquery-1.4.4.min.js"></script>
<script type="text/javascript"
    src="//act.ucsd.edu/decorators/cms/3/decorate.js?siteName=cape"></script>
<title>
	Home > Responses > CAPE Results
</title></head>
<body>
		<!-- insert breadcrumb -->
		<div id="tdr_crumbs">
			<div id="tdr_crumbs_content">
				        <ul id="tdr_crumbs_list">
            <span id="ctl00_smp"><span>
                    <li><a id="ctl00_smp_ctl00_HyperLink3" title="Home" href="/">Home</a></li>
                </span><span></span><span>
                    <li><a id="ctl00_smp_ctl02_HyperLink3" title="Responses" href="/responses">Responses</a></li>
                </span><span></span><span>
                    <li><span id="ctl00_smp_ctl04_lblCurrentNode">CAPE Results</span></li>
                </span></span>
        </ul>   
        
        
			</div>
		</div>
 
        <!-- main content -->
		<div class="tdr_fonts" id="tdr_content">
			<div id="tdr_content_content">
                <!-- <div id="HeaderImage"><h1><span id="ctl00_lblHeaderImageText" class="HeaderImageText">CAPE</span></h1></div> -->
 		
            <!-- BEGIN MAIN CONTENT -->
            

<style type="text/css" rel="stylesheet" media="all">
    #ctl00_ContentPlaceHolder1_gvCAPEs th {
	    text-align: center;
    }
	
	#ctl00_ContentPlaceHolder1_gvCAPEs td,
	#ctl00_ContentPlaceHolder1_gvCAPEs th {
		border:1px solid #fff;
		border-width:0 1px 1px 0;
		}
		
	#ctl00_ContentPlaceHolder1_gvCAPEs thead th {
		background:#91c5d4;
		}
			
		thead th[colspan],
		thead th[rowspan] {
			background:#66a9bd;
			}
		
	#ctl00_ContentPlaceHolder1_gvCAPEs tbody th {
		text-align:left;
		background:#D7E1C5;
		}
	
	tr.even td {
		background:#EEFFFF;
	}
	
	tbody td{
		background:#d5eaf0;
		}
			
	tbody tr.hover td { 
		background:#bcd9e1;
		}
</style>
<script type="text/javascript">
    (function($) {
        $(document).ready(function() {
        $("#ctl00_ContentPlaceHolder1_gvCAPEs tr:odd").addClass("odd");
        $("#ctl00_ContentPlaceHolder1_gvCAPEs tr:even").addClass("even");


        $('#ctl00_ContentPlaceHolder1_gvCAPEs tr').hover(function() {
                $(this).addClass('hover');
            }, function() {
                $(this).removeClass('hover');
            });

        })
    })(jQuery);
</script>

<form name="aspnetForm" method="post" action="Results.aspx?department=CHEM" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTM3NTUwOTE3OA9kFgJmD2QWAgIJD2QWAgIBD2QWAgIVD2QWAmYPZBYCAgUPPCsADQBkGAEFIWN0bDAwJENvbnRlbnRQbGFjZUhvbGRlcjEkZ3ZDQVBFcw9nZKQ4xKBo9bLy2Gb0NYuFXITPklyF" />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
//]]>
</script>


<script src="/WebResource.axd?d=P5Ole1VmRRAWFB7t7RlzIchXMsb2EOn7LvjLRNO9wFPj7kpH0jqkZQugvoAMeVfqfAdE8uyX7Ia4S8GLBWo8-mmYXZk1&amp;t=635588330575142005" type="text/javascript"></script>


<script src="/ScriptResource.axd?d=Hsr210UXurK1s6DnpGGwKsOTdXZpIDNBjWnIbOe_wYoQCdoQbq2f_0p6LL7tsjAWRxpaE5X4NhfzsIcZHflnbWaMNCqMu_PXOz8YuClso4EAQMIIOQf6CD_vGaARbMsZFr7bTKiwC9F3ljNjUVSSAxb8uIY0JdzPoK6osgzZ9QRoiRXp0&amp;t=2e2045e2" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>

<script src="/ScriptResource.axd?d=WNwJGmJE8INsDZ3hQmVfpebZx_Pocv35NZ4Kol5iFZSLiIRfoOjEZAv03u7UFxX9PtuH4oNCafKiqdv2bcoK89AScShqu11sU2qSzkgGkcBdYpeHQa_8rOtNzlqwvozNMJRu03g9NF3nZbE8lvvLAG-GSt8M07r-a-OuU0GV09xfHmhxyWwTlBfe5tFB3sWVDPFZhA2&amp;t=2e2045e2" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=Kr75lo_qjnN3FXi5vX6yeG56mdGfbbBOwyBRKiPhkf4KBYTuI8tHj4JVOSG7YqyWylAnn0ko5b9cikupLfTfqDGdPk99DaTKiU_rv_XQfFXDW6GSiSVQ1a8lH9mFJGGy9LOx1raaUFZo-aQid4OwZExB6ko1&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=knwzmcJ1wkKqghw1SKuOVdoqZzl6lz1htcBPU2gk6B2M5-6kleTZxqe_785mNa8L42k1utRwNLjxHDdeC8nhLvjy2BlQlCzkJiir2CukF9vpHgZapjHSIA02AJDADcCMLOBCnaBvaSdkzNtradxjaOOyAy41&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=5r1yEIVwvFiQQuss57CMbiv5MZRCBsupzi4Vz5vpqipk-CBFQOFs54gACj9zGE6e1I2myEkrIhqWdHwuLGD92jbN76202eOsN0sK58VUpqnnAoANbLbu11zLTw-NOsF44Ev2wDInK-WWomey5wVBx3iZxKI1&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=OEgm5rY1-JdHtus_-a2uMDd_-oI0NTTp1xVCm93UDcE5nmZ2Y5YVtMcQ38vG0yhwAb_B_8hrbhST4NnfGA0QFBd2SCQTDXaB86T46dxNeaUyKmXHW-HHesokgFO-LpQvljfwyDjT9JCWzEWDJy_VZoLa6PecXyX7elolGiaEbJ4He92m0&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=md5cz-Upg_aPRfFMkLCiwtvDjU05X4J2wJ83ko7l71Yn1x1bq7hpIyZCwK2QV_2Ev_s-xraAe3h2IG6c00T8qT3tyESJmvIx0HLUD8C2Z0sDoQMnMirigeCiy-sBLOZ4R0UPv_dKGgHlSo3o0ikycMsUNF7Aw37WEAIR3OEtYmx3k67F0&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=FHIj6Il9gHgNj7cC6Y2gyUfgBHVwi_3mW-2AbsxZjK3pN9pDo8nm2RsUaqWJgT_hdqOhEBjOecV18TjACdplG9cM2io4xsLpNiNSJEdAYjxHDLAdVZTZFhRTQIreHcbDU6KgL6DnWYP-C1E_1v6S5uFZpxYrA5qg8cUblcPTQvC3wcnj0&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=kas2APMawxak3TGfikb4vMNOHEv3UojdhoDghEcupkTrfOChAkWt9IZ5MblKrg1O2hh5E0pfGgOeioQplXMaY0EllfcekjEK8RtrUcipA_O-VHoJgiR5IJO3ibP_2DMERz3DmHiJU0uLrhdxzLQL1SH_sagYtONzIB7Yb_KRtzEp3pP10&amp;t=68eea9b1" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=d0qUtvKp_lRpVJ5_iZHc8FIgdBLyb7fK_uHCf8FyEYLzn2dUMj3owJ6FkAy_R0DaAHoMPIi39avoQPRlMKcrkycGdG5WvxYpbKUn-3DqOtYofIlBX-9yZdS1Zfg0gyXTW9tbyOHGp8WdIHEwUe8OrSmZFkV56mPa0yHrx3JjnScBB25y1RF9d-WL9YFuLasNBroTrw2&amp;t=68eea9b1" type="text/javascript"></script>
<div>

	<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="9F4DB2D8" />
</div>
<script type="text/javascript">
//<![CDATA[
Sys.WebForms.PageRequestManager._initialize('ctl00$ContentPlaceHolder1$scriptManager1', document.getElementById('aspnetForm'));
Sys.WebForms.PageRequestManager.getInstance()._updateControls(['tctl00$ContentPlaceHolder1$UpdatePanel1'], [], [], 90);
//]]>
</script>
 

<div class="cols_wrapper">
    <div class="col_1_of_2 col">  
        <h1 style="clear:both;">CAPE Results</h1>
                
        <span id="ctl00_ContentPlaceHolder1_lblErrorMessage" class="errormessage"></span>  
  
        <div class="field">
	        <div class="label">
		        <label for="Name">Name: (Last Name, First Name)</label>
	        </div>
	        <div class="input">
                
	            <input name="ctl00$ContentPlaceHolder1$txtInstructorName" type="text" maxlength="100" id="ctl00_ContentPlaceHolder1_txtInstructorName" style="width:300px;" />
	        </div>
        </div>
        <div class="field">
	        <div class="label">
		        <label for="courseNumber">Course Number:<br />(ex. BIMM xxx)</label>
	        </div>
	        <div class="input">
                
                
                <input name="ctl00$ContentPlaceHolder1$txtCourse" type="text" maxlength="10" id="ctl00_ContentPlaceHolder1_txtCourse" style="width:100px;" />
	        </div>
        </div>
        <div class="field">
	        <div class="label">
		        <label for="department">Department</label>
	        </div>
	        <div class="input">
	            <select name="ctl00$ContentPlaceHolder1$ddlDepartments" id="ctl00_ContentPlaceHolder1_ddlDepartments">
	<option value="">Select a Department</option>
	<option value="ANTH">ANTH - Anthropology</option>
	<option value="BENG">BENG - Bioengineering</option>
	<option value="BIOL">BIOL - Biological Sciences</option>
	<option value="CAT">CAT  - Sixth College</option>
	<option value="CENG">CENG - Chemical Engineering</option>
	<option value="CGS ">CGS - Critical Gender Studies</option>
	<option value="CHEM">CHEM - Chemistry</option>
	<option value="CHIN">CHIN - Chinese Studies</option>
	<option value="COGS">COGS - Cognitive Science</option>
	<option value="COMM">COMM - Communication</option>
	<option value="CONT">CONT - Contemporary Issues</option>
	<option value="CSE">CSE - Computer Science &amp; Engineering</option>
	<option value="DOC">DOC - Dimensions of Culture</option>
	<option value="ECE">ECE - Electrical &amp; Computer Eng.</option>
	<option value="ECON">ECON - Economics</option>
	<option value="EDS">EDS - Education Studies</option>
	<option value="ENVR">ENVR - Environmental Studies</option>
	<option value="ERC ">ERC - ERC</option>
	<option value="ESYS">ESYS - Environmental Systems</option>
	<option value="ETHN">ETHN - Ethnic Studies</option>
	<option value="FILM">FILM - Film</option>
	<option value="FPMU">FPMU - Family and Preventive Medicine</option>
	<option value="HDP">HDP - Human Development Program</option>
	<option value="HIST">HIST - History</option>
	<option value="HMNR">HMNR - </option>
	<option value="HUM ">HUM - Humanities</option>
	<option value="ICAM">ICAM - Inter. Computing and the Arts</option>
	<option value="INTL">INTL - International Studies</option>
	<option value="JAPN">JAPN - Japanese Studies</option>
	<option value="JUDA">JUDA - Judaic Studies</option>
	<option value="LATI">LATI - Latin American Studies</option>
	<option value="LAWS">LAWS - Law &amp; Society</option>
	<option value="LING">LING - Linguistics</option>
	<option value="LIT">LIT - Literature</option>
	<option value="MAE">MAE - Mechanical &amp; Aerospace Eng.</option>
	<option value="MATH">MATH - Mathematics</option>
	<option value="MMW">MMW - Making of the Modern World</option>
	<option value="MUIR">MUIR - Muir College</option>
	<option value="MUS">MUS  - Music</option>
	<option value="NENG">NENG - NanoEngineering</option>
	<option value="PHIL">PHIL - Philosophy</option>
	<option value="PHYS">PHYS - Physics</option>
	<option value="POLI">POLI - Political Science</option>
	<option value="PSYC">PSYC - Psychology</option>
	<option value="RELI">RELI - The Study of Religion</option>
	<option value="REV">REV - Revelle College</option>
	<option value="RSM">RSM  - Rady School of Management</option>
	<option value="SDCC">SDCC - Entry Level Writing</option>
	<option value="SE">SE  - Structural Engineering</option>
	<option value="SIO">SIO - SIO</option>
	<option value="SOC">SOC - Sociology</option>
	<option value="SOE">SOE - Jacobs School of Engineering</option>
	<option value="STPA">STPA - Science, Tech, &amp;Public Affairs</option>
	<option value="SXTH">SXTH - Sixth College</option>
	<option value="THEA">THEA - Theatre &amp; Dance</option>
	<option value="TMC">TMC - Thurgood Marshall College</option>
	<option value="TWS">TWS - Third World Studies</option>
	<option value="USP">USP - Urban Studies and Planning</option>
	<option value="VIS">VIS - Visual Arts</option>
	<option value="WARR">WARR - Warren College</option>
	<option value="WCWP">WCWP - Warren College Writing Program</option>

</select>
	        </div>
        </div>

        <div id="ctl00_ContentPlaceHolder1_UpdateProgress1" style="display:none;">
	
            <div class="field">
	            <div class="input">
                <img src="../_images/cape_loading.gif" height="25" width="25" alt="Loading ...">
                </div>
            </div>
            
</div>
    </div>
    <div class="col_2_of_2 col">
        <div style="width:175px; font-size:90%; font-weight:bold; text-align:center;margin-left: auto; margin-right: auto;">
        <img title="SUNNY G!" src="../_images/cape.png" alt="SUNNY G!" style="width:175px;border-width:0px;padding-bottom:0.5em;" />
        SUNNY G!
        </div>
    </div>
</div>
    
    <div id="ctl00_ContentPlaceHolder1_UpdatePanel1">
	
        <div class="field">
	        <div class="input">
	            <input type="submit" name="ctl00$ContentPlaceHolder1$btnSubmit" value="Search" id="ctl00_ContentPlaceHolder1_btnSubmit" class="button primary" />
	            <input type="reset" class="button secondary" title="Reset" />
	        </div>
        </div>
        
            
			
    <div>

	</div>
        
</div>
    
	<p></p>
	<p>Questions? Contact CAPE at <a href="mailto:cape@ucsd.edu?subject=Web Stats Question">cape@ucsd.edu</a></p>
    



<script type="text/javascript">
//<![CDATA[
Sys.Application.initialize();
Sys.Application.add_init(function() {
    $create(AjaxControlToolkit.AutoCompleteBehavior, {"completionInterval":1,"completionSetCount":5,"delimiterCharacters":"","enableCaching":false,"id":"ctl00_ContentPlaceHolder1_AutoCompleteExtender1","serviceMethod":"GetInstructorNames","servicePath":"/responses/Results.aspx"}, null, null, $get("ctl00_ContentPlaceHolder1_txtInstructorName"));
});
Sys.Application.add_init(function() {
    $create(AjaxControlToolkit.AutoCompleteBehavior, {"completionInterval":1,"completionSetCount":5,"delimiterCharacters":"","enableCaching":false,"id":"ctl00_ContentPlaceHolder1_AutoCompleteExtender2","serviceMethod":"GetCourses","servicePath":"/responses/Results.aspx"}, null, null, $get("ctl00_ContentPlaceHolder1_txtCourse"));
});
Sys.Application.add_init(function() {
    $create(AjaxControlToolkit.FilteredTextBoxBehavior, {"FilterType":15,"ValidChars":" ","id":"ctl00_ContentPlaceHolder1_FilteredTextBoxExtender1"}, null, null, $get("ctl00_ContentPlaceHolder1_txtCourse"));
});
Sys.Application.add_init(function() {
    $create(Sys.UI._UpdateProgress, {"associatedUpdatePanelId":"ctl00_ContentPlaceHolder1_UpdatePanel1","displayAfter":500,"dynamicLayout":true}, null, null, $get("ctl00_ContentPlaceHolder1_UpdateProgress1"));
});
//]]>
</script>
</form>
  
            <!-- END MAIN CONTENT -->

            
<br/>


<img src="https://a4.ucsd.edu/tritON/imagebug" alt="" />
			</div>
		</div>
		
        <!-- use this image bug to keep single sign-on sessions alive -->
        <img src="https://a4.ucsd.edu/tritON/imagebug" alt="" />
	</body>
</html>

Cleaning up the result using BeautifulSoup4

BeautifulSoup is a HTML Parser

Let's grab all the class listings within

<option value="">Select a Department</option>
    <option value="ANTH">ANTH - Anthropology</option>
    <option value="BENG">BENG - Bioengineering</option>
    <option value="BIOL">BIOL - Biological Sciences</option>
    <option value="CAT">CAT  - Sixth College</option>
    <option value="CENG">CENG - Chemical Engineering</option>
    ...
    ...

In [5]:
from bs4 import BeautifulSoup

# Grab the HTML
req = connect(department="CHEM")

# Shove it into BeautifulSoup
soup = BeautifulSoup(req.text, 'lxml')

# Find all Option Tags
options = soup.find_all('option')

# Returns a list of options
options


Out[5]:
[<option value="">Select a Department</option>,
 <option value="ANTH">ANTH - Anthropology</option>,
 <option value="BENG">BENG - Bioengineering</option>,
 <option value="BIOL">BIOL - Biological Sciences</option>,
 <option value="CAT">CAT  - Sixth College</option>,
 <option value="CENG">CENG - Chemical Engineering</option>,
 <option value="CGS ">CGS - Critical Gender Studies</option>,
 <option value="CHEM">CHEM - Chemistry</option>,
 <option value="CHIN">CHIN - Chinese Studies</option>,
 <option value="COGS">COGS - Cognitive Science</option>,
 <option value="COMM">COMM - Communication</option>,
 <option value="CONT">CONT - Contemporary Issues</option>,
 <option value="CSE">CSE - Computer Science &amp; Engineering</option>,
 <option value="DOC">DOC - Dimensions of Culture</option>,
 <option value="ECE">ECE - Electrical &amp; Computer Eng.</option>,
 <option value="ECON">ECON - Economics</option>,
 <option value="EDS">EDS - Education Studies</option>,
 <option value="ENVR">ENVR - Environmental Studies</option>,
 <option value="ERC ">ERC - ERC</option>,
 <option value="ESYS">ESYS - Environmental Systems</option>,
 <option value="ETHN">ETHN - Ethnic Studies</option>,
 <option value="FILM">FILM - Film</option>,
 <option value="FPMU">FPMU - Family and Preventive Medicine</option>,
 <option value="HDP">HDP - Human Development Program</option>,
 <option value="HIST">HIST - History</option>,
 <option value="HMNR">HMNR - </option>,
 <option value="HUM ">HUM - Humanities</option>,
 <option value="ICAM">ICAM - Inter. Computing and the Arts</option>,
 <option value="INTL">INTL - International Studies</option>,
 <option value="JAPN">JAPN - Japanese Studies</option>,
 <option value="JUDA">JUDA - Judaic Studies</option>,
 <option value="LATI">LATI - Latin American Studies</option>,
 <option value="LAWS">LAWS - Law &amp; Society</option>,
 <option value="LING">LING - Linguistics</option>,
 <option value="LIT">LIT - Literature</option>,
 <option value="MAE">MAE - Mechanical &amp; Aerospace Eng.</option>,
 <option value="MATH">MATH - Mathematics</option>,
 <option value="MMW">MMW - Making of the Modern World</option>,
 <option value="MUIR">MUIR - Muir College</option>,
 <option value="MUS">MUS  - Music</option>,
 <option value="NENG">NENG - NanoEngineering</option>,
 <option value="PHIL">PHIL - Philosophy</option>,
 <option value="PHYS">PHYS - Physics</option>,
 <option value="POLI">POLI - Political Science</option>,
 <option value="PSYC">PSYC - Psychology</option>,
 <option value="RELI">RELI - The Study of Religion</option>,
 <option value="REV">REV - Revelle College</option>,
 <option value="RSM">RSM  - Rady School of Management</option>,
 <option value="SDCC">SDCC - Entry Level Writing</option>,
 <option value="SE">SE  - Structural Engineering</option>,
 <option value="SIO">SIO - SIO</option>,
 <option value="SOC">SOC - Sociology</option>,
 <option value="SOE">SOE - Jacobs School of Engineering</option>,
 <option value="STPA">STPA - Science, Tech, &amp;Public Affairs</option>,
 <option value="SXTH">SXTH - Sixth College</option>,
 <option value="THEA">THEA - Theatre &amp; Dance</option>,
 <option value="TMC">TMC - Thurgood Marshall College</option>,
 <option value="TWS">TWS - Third World Studies</option>,
 <option value="USP">USP - Urban Studies and Planning</option>,
 <option value="VIS">VIS - Visual Arts</option>,
 <option value="WARR">WARR - Warren College</option>,
 <option value="WCWP">WCWP - Warren College Writing Program</option>]

In [6]:
# Grab the `value= ` Attribute

for option in options:
    print(option.attrs['value'])


ANTH
BENG
BIOL
CAT
CENG
CGS 
CHEM
CHIN
COGS
COMM
CONT
CSE
DOC
ECE
ECON
EDS
ENVR
ERC 
ESYS
ETHN
FILM
FPMU
HDP
HIST
HMNR
HUM 
ICAM
INTL
JAPN
JUDA
LATI
LAWS
LING
LIT
MAE
MATH
MMW
MUIR
MUS
NENG
PHIL
PHYS
POLI
PSYC
RELI
REV
RSM
SDCC
SE
SIO
SOC
SOE
STPA
SXTH
THEA
TMC
TWS
USP
VIS
WARR
WCWP

Now Grab all the Departments

Kind of.....


In [7]:
def departments():
    """
    Gets a mapping of all the deparments by key.
    """
    logging.info('Grabbing a list of Departments')
    prototype = connect("http", department="CHEM")
    soup      = BeautifulSoup(prototype.content, 'lxml')
    options   = list(reversed(soup.find_all('option')))

    options.pop()

    # Initial Course Mapping
    mapping = dict(option.text.split(' - ') for option in options)

    # Cleanup
    for dept in ['BIOL', 'SOC', 'HIST', 'LING', 'LIT', 'NENG', 'RSM ', 'SOE', 'THEA']:
        mapping.pop(dept)

    # Actual Departments
    mapping.update({
        'BIBC': 'Biology Biochemistry',
        'BILD': 'Biology Lower Division',
        'BIMM': 'Biology Molecular, Microbiology',
        'BIPN': 'Biology Physiology and Neuroscience',
        'SOCA': 'Sociology Theory & Methods',
        'SOCB': 'Sociology Cult, Lang, & Soc Interact',
        'SOCC': 'Sociology Organiz & Institutions',
        'SOCD': 'Sociology Comparative & Historical',
        'SOCE': 'Sociology Ind Research & Honors Prog',
        'SOCI': 'Sociology',
        'SOCL': 'Sociology Lower Division',
        'HILD': 'History Lower Division',
        'HIAF': 'History of Africa',
        'HIEA': 'History of East Asia',
        'HIEU': 'History of Europe',
        'HINE': 'History of Near East',
        'HILA': 'History of Latin America',
        'HISC': 'History of Science',
        'HIUS': 'History of the United States',
        'HITO': 'History Topics',
        'LTAF': 'Literature African',
        'LTAM': 'Literature of the Americas',
        'LTCH': 'Literature Chinese',
        'LTCS': 'Literature Cultural Studies',
        'LTEA': 'Literature East Asian',
        'LTEU': 'Literature European/Eurasian',
        'LTFR': 'Literature French',
        'LTGM': 'Literature General',
        'LTGK': 'Literature Greek',
        'LTGM': 'Literature German',
        'LTIT': 'Literature Italian',
        'LTKO': 'Literature Korean',
        'LTLA': 'Literature Latin',
        'LTRU': 'Literature Russian',
        'LTSP': 'Literature Spanish',
        'LTTH': 'Literature Theory',
        'LTWL': 'Literature of the World',
        'LTWR': 'Literature Writing',
        'RELI': 'Literature Study of Religion',
        'TWS' : 'Literature Third World Studies',
        'NANO': 'Nano Engineering',
        'MGT' : 'Rady School of Management',
        'ENG' : 'Jacobs School of Engineering',
        'LIGN': 'Linguistics',
        'TDAC': 'Theatre Acting',
        'TDCH': 'Theatre Dance Choreography',
        'TDDE': 'Theatre Design',
        'TDDR': 'Theatre Directing/Stage Management',
        'TDGE': 'Theatre General',
        'TDHD': 'Theatre Dance History',
        'TDHT': 'Theatre History',
        'TDMV': 'Theatre Dance Movement',
        'TDPF': 'Theatre Dance Performance',
        'TDPW': 'Theatre Playwriting',
        'TDTR': 'Theatre Dance Theory',
    })

    # Create Categorical Series
    dep = pd.Series(name='department_name', data=mapping)

    # Reindexing
    dep = dep.map(lambda x: np.nan if x == '' else x)
    dep = dep.dropna()
    dep.index.name = 'Departments'

    return dep

Data Munging


In [8]:
def create_table(courses):
    """
    Generates a pandas DataFrame by querying UCSD Cape Website.

    Parameters
    ==========
    :params courses: Either Course or Path to HTML File

    Returns
    =======
    :returns df:     Query Results
    :rtype:          pandas.DataFrame
    """
    header = [
        'instructor', 'course', 'term', 'enroll', 'evals',
        'recommend_class', 'recommend_instructor', 'study_hours_per_week',
        'average_grade_expected', 'average_grade_received'
    ]
    first, second = itemgetter(0), itemgetter(1)

    print('\nGrabbing Classes: {0}'.format(courses))

    # Get Data
    base  = 'http://cape.ucsd.edu/responses/'
    req   =  (
                open(courses).read()
                if   os.path.isfile(courses)
                else connect("http", courseNumber=courses).content
            )
    html  = BeautifulSoup(req, 'lxml')
    table = first(html.find_all('table'))

    # Create Dataframe
    df    = first(pd.read_html(str(table)), flavor=None, na_values=['No CAPEs submitted'])

    # Data Clean Up
    df.columns = header
    df['link']       = [
        urljoin(base, link.attrs['href']) if link.has_attr('href') else np.nan
            for link in table.find_all('a')
    ]
    df['instructor'] = df.instructor.map(
        lambda name: (
            str.title(name)
            if isinstance(name, str) else 'Unknown, Unknonwn'
        )
    )
    # Data Extraction
    df['first_name']  = df.instructor.map(lambda name:  second(name.split(',')).strip('.'))
    df['last_name']   = df.instructor.map(lambda name:   first(name.split(',')))
    df['class_id']    = df.course.map(  lambda course: first(course.split(' - ')))
    df['department']  = df.class_id.map(lambda course:  first(course.split(' ')))
    df['class_name']  = df.course.map(
        lambda course: (
            second(course.split(' - '))[:-4]
            if ' - ' in course else np.nan)
    )
    # Data Types
    df['recommend_class']        = df.recommend_class.map(calculate_percentage)
    df['recommend_instructor']   = df.recommend_instructor.map(calculate_percentage)
    df['average_grade_expected'] = df.average_grade_expected.map(calculate_grades)
    df['average_grade_received'] = df.average_grade_received.map(calculate_grades)

    # Reindexing and Transforms
    df['section_id'] = df.link.map(calculate_section_id)
    df = df.dropna(subset=['section_id'])
    df = df.drop_duplicates(subset='section_id')
    df['section_id'] = df.section_id.astype(np.int32)

    return df.set_index('section_id', drop=True)

def calculate_percentage(element):
    if isinstance(element, str):
        return np.float(element.strip('%').strip()) / 100
    else:
        return np.nan

def calculate_grades(element):
    if isinstance(element, str):
        return np.float(element[1:].lstrip('+-').lstrip().strip('()'))
    else:
        return np.nan

def calculate_section_id(element):
    if isinstance(element, str):
        return int(element.lower().rsplit('sectionid=')[-1].strip(string.ascii_letters))
    else:
        return np.nan

In [28]:
def to_db(df, table, user='postgres', db='graphucsd', resolve='replace', host='localhost'):
    """
    Helper Function to Push DataFrame to Postgresql Database
    """
    url = 'postgresql+psycopg2://{user}@{host}/{db}'.format(user=user, db=db, host=host)

    if not database_exists(url):
        create_database(url)

    engine = create_engine(url)

    return df.to_sql(table, engine, if_exists=resolve)

In [10]:
df = create_table('CHEM')


Grabbing Classes: CHEM

In [11]:
header = [
    'instructor', 'course', 'term', 'enroll', 'evals',
    'recommend_class', 'recommend_instructor', 'study_hours_per_week',
    'average_grade_expected', 'average_grade_received'
]
first, second = itemgetter(0), itemgetter(1)
base  = 'http://cape.ucsd.edu/responses/'
req   = connect("http", courseNumber='CSE').content
html  = BeautifulSoup(req, 'lxml')
table = first(html.find_all('table'))

In [12]:
def calculate_percentage(element):
    if isinstance(element, str):
        return np.float(element.strip('%').strip()) / 100
    else:
        return np.nan

In [13]:
import pandas as pd

In [14]:
df = first(pd.read_html(str(table)), flavor=None, na_values=['No CAPEs submitted'])

In [ ]:

Make it Go Fast with Multi Threading


In [22]:
def main(threads=6):
    """
    Get all departments
    """
    logging.info('Program is Starting')

    # Get Departments
    deps  = departments()
    keys  = [department.strip() for department in deps.keys()]

    # Run Scraper Concurrently Using ThreadPool
    pool  = ThreadPool(threads)
    logging.info('Initialize Scraper with {} Threads'.format(threads))
    table = pool.map(create_table, keys)
    logging.info('Scrape Complete')

    # Manage ThreadPool
    pool.close(); pool.join()

    df = pd.concat(table)

    return df.groupby(level=0).first()

In [24]:
df = main(threads=4)


Grabbing Classes: ANTH

Grabbing Classes: CENG

Grabbing Classes: CSE

Grabbing Classes: ERC

Grabbing Classes: ESYS

Grabbing Classes: CGS

Grabbing Classes: BENG

Grabbing Classes: ETHN

Grabbing Classes: CHEM

Grabbing Classes: BIBC

Grabbing Classes: FILM

Grabbing Classes: DOC

Grabbing Classes: BILD

Grabbing Classes: FPMU

Grabbing Classes: ECE

Grabbing Classes: HDP

Grabbing Classes: CHIN

Grabbing Classes: BIMM

Grabbing Classes: HIAF

Grabbing Classes: ECON

Grabbing Classes: HIEA

Grabbing Classes: COGS

Grabbing Classes: BIPN

Grabbing Classes: HIEU

Grabbing Classes: CAT

Grabbing Classes: HILA

Grabbing Classes: HILD

Grabbing Classes: HIUS

Grabbing Classes: COMM

Grabbing Classes: EDS

Grabbing Classes: HINE

Grabbing Classes: HUM

Grabbing Classes: HISC

Grabbing Classes: CONT

Grabbing Classes: HITO

Grabbing Classes: ICAM

Grabbing Classes: ENG

Grabbing Classes: LAWS

Grabbing Classes: LTEU

Grabbing Classes: INTL

Grabbing Classes: LIGN

Grabbing Classes: LTFR

Grabbing Classes: LTGK
Grabbing Classes: JAPN


Grabbing Classes: LTAF

Grabbing Classes: LTAM

Grabbing Classes: ENVR

Grabbing Classes: LTGM

Grabbing Classes: LTCH

Grabbing Classes: JUDA

Grabbing Classes: LTRU

Grabbing Classes: LTIT

Grabbing Classes: LATI

Grabbing Classes: LTCS

Grabbing Classes: LTSP

Grabbing Classes: MGT

Grabbing Classes: LTKO

Grabbing Classes: LTEA

Grabbing Classes: LTTH

Grabbing Classes: POLI

Grabbing Classes: LTWL

Grabbing Classes: LTLA

Grabbing Classes: MMW

Grabbing Classes: SOCA

Grabbing Classes: LTWR

Grabbing Classes: SOCB

Grabbing Classes: MUIR

Grabbing Classes: SOCC

Grabbing Classes: MAE

Grabbing Classes: MUS

Grabbing Classes: SOCD

Grabbing Classes: PSYC

Grabbing Classes: SOCE

Grabbing Classes: SOCI

Grabbing Classes: MATH

Grabbing Classes: SOCL

Grabbing Classes: RELI

Grabbing Classes: STPA

Grabbing Classes: NANO

Grabbing Classes: SXTH

Grabbing Classes: REV

Grabbing Classes: PHIL

Grabbing Classes: TDAC

Grabbing Classes: TDHD

Grabbing Classes: SDCC

Grabbing Classes: TDHT

Grabbing Classes: SE

Grabbing Classes: TDCH

Grabbing Classes: TDMV

Grabbing Classes: PHYS

Grabbing Classes: TDDE

Grabbing Classes: TDDR

Grabbing Classes: TDPF

Grabbing Classes: TDGE

Grabbing Classes: TDPW

Grabbing Classes: TDTR

Grabbing Classes: TWS

Grabbing Classes: TMC

Grabbing Classes: USP

Grabbing Classes: SIO

Grabbing Classes: VIS

Grabbing Classes: WARR

Grabbing Classes: WCWP

In [25]:
df


Out[25]:
instructor course term enroll evals recommend_class recommend_instructor study_hours_per_week average_grade_expected average_grade_received link first_name last_name class_id department class_name
section_id
594782 Schwake, Sonja A. ANTH 3 - World Prehistory (A) SU07 16 15 1.000 1.000 6.63 3.29 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Sonja A Schwake ANTH 3 ANTH World Prehistory
594783 Buehler, Lukas K. BIBC 100 - Structural Biochemistry (A) SU07 115 90 0.857 0.871 6.97 3.35 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Lukas K Buehler BIBC 100 BIBC Structural Biochemistry
594787 Coleman, Aaron B. BIBC 102 - Metabolic Biochemistry (A) SU07 108 82 0.888 0.938 6.72 3.10 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Aaron B Coleman BIBC 102 BIBC Metabolic Biochemistry
594816 Towb, Par BILD 1 - The Cell (A) SU07 98 74 0.930 0.887 7.73 3.47 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Par Towb BILD 1 BILD The Cell
594820 Towb, Par BILD 2 - Multicellular Life (A) SU07 96 73 0.944 0.958 6.69 3.48 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Par Towb BILD 2 BILD Multicellular Life
594830 Gustafson-Brown, Cindy BILD 10 - Fundamental Concepts/Modrn Bio (A) SU07 36 25 0.708 0.708 5.25 2.71 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Cindy Gustafson-Brown BILD 10 BILD Fundamental Concepts/Modrn Bio
594833 Saier, Milton H. BILD 18 - Human Impact on the Environmnt (A) SU07 23 17 1.000 1.000 2.85 3.80 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Milton H Saier BILD 18 BILD Human Impact on the Environmnt
594835 Ghiara, Jayant BIMM 100 - Molecular Biology (A) SU07 187 106 0.938 0.980 6.48 3.44 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Jayant Ghiara BIMM 100 BIMM Molecular Biology
594854 Zupanc, Gunther Karl-Heinz BIPN 142 - Systems Neurobiology (A) SU07 39 29 0.929 0.926 5.91 3.19 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Gunther Karl-Heinz Zupanc BIPN 142 BIPN Systems Neurobiology
594868 Ternansky, Robert J. CHEM 4 - Basic Chemistry (A) SU07 6 6 1.000 1.000 5.50 2.80 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Robert J Ternansky CHEM 4 CHEM Basic Chemistry
594870 Ball, Ian James CHEM 100A - Analytical Chem Lab (A) SU07 17 17 0.533 1.000 16.15 3.50 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Ian James Ball CHEM 100A CHEM Analytical Chem Lab
594872 Ternansky, Robert J. CHEM 140A - Organic Chemistry I (A) SU07 140 108 0.887 0.943 11.99 3.08 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Robert J Ternansky CHEM 140A CHEM Organic Chemistry I
594874 Nefzi, Adel CHEM 140C - Organic Chemistry III (A) SU07 162 105 0.861 0.941 8.21 3.20 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Adel Nefzi CHEM 140C CHEM Organic Chemistry III
594876 Weizman, Haim CHEM 143A - Organic Chemistry Laboratory (A) SU07 76 49 0.894 1.000 11.01 3.38 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Haim Weizman CHEM 143A CHEM Organic Chemistry Laboratory
594881 Hoeger, Carl CHEM 6A - General Chemistry I (A) SU07 66 34 0.818 0.939 9.31 3.41 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Carl Hoeger CHEM 6A CHEM General Chemistry I
594882 Dipasquale, Antonio CHEM 6B - General Chemistry II (A) SU07 104 79 0.744 0.753 7.29 3.07 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Antonio Dipasquale CHEM 6B CHEM General Chemistry II
594883 Ball, Ian James CHEM 6BL - Intr Inorganic Chem Laboratory (A) SU07 83 60 0.509 0.707 12.50 3.21 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Ian James Ball CHEM 6BL CHEM Intr Inorganic Chem Laboratory
594888 Hoeger, Carl CHEM 6C - General Chemistry III (A) SU07 89 54 0.904 0.942 9.09 3.33 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Carl Hoeger CHEM 6C CHEM General Chemistry III
594891 Osborn, Wayne H. COMM 146 - Adv Topics/Communication Cult (A) SU07 23 18 1.000 1.000 4.72 3.44 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Wayne H Osborn COMM 146 COMM Adv Topics/Communication Cult
594892 Davis, Patricia Gail COMM 10 - Introduction to Communication (A) SU07 36 28 0.963 1.000 5.57 3.44 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Patricia Gail Davis COMM 10 COMM Introduction to Communication
594894 Becvar, Laura A. COGS 10 - Cognitv Consequence/Technology (A) SU07 12 6 1.000 1.000 5.70 3.50 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Laura A Becvar COGS 10 COGS Cognitv Consequence/Technology
594896 Groppe, David M COGS 14 - Design & Analysis of Expermnts (A) SU07 20 15 0.933 1.000 3.17 3.62 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... David M Groppe COGS 14 COGS Design & Analysis of Expermnts
594898 Robinson, Alan E. COGS 101B - Learning, Memory and Attention (A) SU07 24 19 0.842 0.667 6.61 3.26 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Alan E Robinson COGS 101B COGS Learning, Memory and Attention
594900 Davis, Patricia Gail COMM 120M - Media Stereotypes (A) SU07 44 33 0.767 0.800 5.17 3.30 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Patricia Gail Davis COMM 120M COMM Media Stereotypes
594903 Schudson, Michael S. COMM 109N - American News Media (A) SU07 10 9 1.000 1.000 5.00 3.71 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Michael S Schudson COMM 109N COMM American News Media
594904 Ord, Richard CSE 3 - Fluency/Information Technology (A) SU07 35 26 0.962 1.000 3.12 3.96 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Richard Ord CSE 3 CSE Fluency/Information Technology
594905 Marx, Susan S CSE 11 - Intr/Computer Sci&Obj-Ori:Java (A) SU07 13 11 1.000 0.909 10.14 2.90 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Susan S Marx CSE 11 CSE Intr/Computer Sci&Obj-Ori:Java
594909 Kleint, John Timothy CSE 30 - Computer Organiz&Systms Progrm (A) SU07 4 5 1.000 1.000 12.90 3.40 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... John Timothy Kleint CSE 30 CSE Computer Organiz&Systms Progrm
594910 Glick, John Edward CSE 100 - Advanced Data Structures (A) SU07 30 23 0.957 1.000 10.41 3.65 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... John Edward Glick CSE 100 CSE Advanced Data Structures
594911 Paturi, Ramamohan CSE 101 - Design & Analysis of Algorithm (A) SU07 23 16 0.938 0.938 9.17 3.33 NaN http://cape.ucsd.edu/scripts/detailedStats.asp... Ramamohan Paturi CSE 101 CSE Design & Analysis of Algorithm
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
854374 Xiang, Jie ECE 35 - Introduction to Analog Design (B) FA15 100 51 0.936 0.936 10.28 2.87 2.42 http://cape.ucsd.edu/responses/CAPEReport.aspx... Jie Xiang ECE 35 ECE Introduction to Analog Design
854566 Hamaoka, Brent Y CHEM 7L - General Chemistry Laboratory (C) FA15 132 60 0.946 0.821 7.01 3.35 3.54 http://cape.ucsd.edu/responses/CAPEReport.aspx... Brent Y Hamaoka CHEM 7L CHEM General Chemistry Laboratory
854722 Evans, Ivan T SOCI 87 - Freshman Seminar (B) FA15 17 9 1.000 1.000 2.06 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Ivan T Evans SOCI 87 SOCI Freshman Seminar
854895 Halicioglu, Daniel T CSE 191 - Semnr/Computer Sci & Engineer (C) FA15 11 4 1.000 1.000 0.50 NaN NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Daniel T Halicioglu CSE 191 CSE Semnr/Computer Sci & Engineer
855238 Roxworthy, Emily TDGE 25 - Public Speaking (C) S215 13 4 1.000 1.000 6.00 3.75 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Emily Roxworthy TDGE 25 TDGE Public Speaking
856120 Griswold, William G. ENG 100L - Design for Development Lab (0) FA15 7 3 1.000 0.500 5.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... William G Griswold ENG 100L ENG Design for Development Lab
856147 Coimbra, Carlos F. ENG 100L - Design for Development Lab (0) FA15 9 6 1.000 1.000 2.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Carlos F Coimbra ENG 100L ENG Design for Development Lab
856148 Smith, David M. ENG 100L - Design for Development Lab (0) FA15 14 6 1.000 1.000 5.00 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... David M Smith ENG 100L ENG Design for Development Lab
856149 Kleissl, Jan ENG 100L - Design for Development Lab (0) FA15 9 4 1.000 0.750 2.00 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Jan Kleissl ENG 100L ENG Design for Development Lab
856150 Bartsch, Dirk-Uwe Guenther ENG 100L - Design for Development Lab (0) FA15 8 1 0.000 0.000 0.00 NaN NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Dirk-Uwe Guenther Bartsch ENG 100L ENG Design for Development Lab
856153 Cruz, Edwin Teddy ENG 100L - Design for Development Lab (0) FA15 9 2 1.000 1.000 3.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Edwin Teddy Cruz ENG 100L ENG Design for Development Lab
856154 Bratton, Maryann ENG 100L - Design for Development Lab (0) FA15 20 12 0.917 0.833 4.50 4.00 3.96 http://cape.ucsd.edu/responses/CAPEReport.aspx... Maryann Bratton ENG 100L ENG Design for Development Lab
856155 Pawlak, Geno Ronald ENG 100L - Design for Development Lab (0) FA15 6 2 1.000 1.000 4.50 3.50 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Geno Ronald Pawlak ENG 100L ENG Design for Development Lab
856157 Voelker, Geoffrey M. ENG 100L - Design for Development Lab (0) FA15 7 1 1.000 1.000 4.50 3.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Geoffrey M Voelker ENG 100L ENG Design for Development Lab
856774 Perez, Jason Magabo EDS 150 - CASP Transfer Intro Course (A) FA15 35 23 0.913 1.000 4.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Jason Magabo Perez EDS 150 EDS CASP Transfer Intro Course
856775 Mac Leod, Donald I. PSYC 90 - Undergraduate Seminar (A) FA15 23 12 1.000 1.000 0.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Donald I Mac Leod PSYC 90 PSYC Undergraduate Seminar
856787 Murakami, Hidenori MAE 93 - Design Compet/Design Race Car (A) FA15 20 9 0.778 0.667 4.25 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Hidenori Murakami MAE 93 MAE Design Compet/Design Race Car
858416 Caligagan, Maria T. TDMV 142 - Latin Dance of the World (B) FA15 41 14 1.000 0.929 3.07 3.86 3.94 http://cape.ucsd.edu/responses/CAPEReport.aspx... Maria T Caligagan TDMV 142 TDMV Latin Dance of the World
858417 Watkins, Eric PHIL 179 - Topic/Germn Phil Transltn-Adv (A) FA15 6 5 1.000 1.000 1.00 NaN NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Eric Watkins PHIL 179 PHIL Topic/Germn Phil Transltn-Adv
858423 Minnes Kemp, Mor Mia CSE 191 - Semnr/Computer Sci & Engineer (D) FA15 13 4 1.000 1.000 9.00 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Mor Mia Minnes Kemp CSE 191 CSE Semnr/Computer Sci & Engineer
858660 Ng, Kwai LAWS 101 - Contemporary Legal Issues (C) FA15 28 6 1.000 1.000 5.50 3.50 3.33 http://cape.ucsd.edu/responses/CAPEReport.aspx... Kwai Ng LAWS 101 LAWS Contemporary Legal Issues
858961 Cao, Yingjun CSE 90 - Undergraduate Seminar (B) FA15 59 46 0.913 0.978 0.81 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Yingjun Cao CSE 90 CSE Undergraduate Seminar
858989 Cabrales, Pedro BENG 187B - BENG Design Project:Developmnt (B) FA15 114 35 0.968 0.968 7.47 3.75 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Pedro Cabrales BENG 187B BENG BENG Design Project:Developmnt
858990 Hanes, Esther PSYC 180 - Adolescence (B) FA15 85 41 0.974 1.000 4.29 3.54 3.12 http://cape.ucsd.edu/responses/CAPEReport.aspx... Esther Hanes PSYC 180 PSYC Adolescence
859278 Linke, Sarah Elizabeth PSYC 124 - Clinical Assessment/Treatment (B) FA15 79 69 0.971 0.913 4.53 3.48 3.35 http://cape.ucsd.edu/responses/CAPEReport.aspx... Sarah Elizabeth Linke PSYC 124 PSYC Clinical Assessment/Treatment
859289 Cabrales, Pedro BENG 139A - Design Develpmt Molecular BENG (A) FA15 12 5 0.750 1.000 6.00 3.50 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Pedro Cabrales BENG 139A BENG Design Develpmt Molecular BENG
860178 Strom, Megan C. DOC 100D - Promises&Contradictions/USCult (D) FA15 20 12 1.000 1.000 5.17 3.20 3.57 http://cape.ucsd.edu/responses/CAPEReport.aspx... Megan C Strom DOC 100D DOC Promises&Contradictions/USCult
860255 Carver, Leslie J. PSYC 193 - Topics in Psychology (A) FA15 3 1 1.000 1.000 2.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Leslie J Carver PSYC 193 PSYC Topics in Psychology
860256 Booker, Angela N COMM 114K - CSI: Community Field Work (A) FA15 11 3 1.000 1.000 4.50 4.00 NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Angela N Booker COMM 114K COMM CSI: Community Field Work
861922 Vul, Edward PSYC 99 - Independent Study (A) FA15 1 1 1.000 1.000 4.50 NaN NaN http://cape.ucsd.edu/responses/CAPEReport.aspx... Edward Vul PSYC 99 PSYC Independent Study

27514 rows × 16 columns


In [ ]: