Computing In Context

Social Sciences Track

Matthew L. Jones

History

Lecture the first

you will not learn to price options here, sorry

Let's start with data

The good, the bad, and the downright ugly

Profoundly paradoxical

The good

Beauty in the eye of the beholder

'{\n"num_found": 193,\n    "results": [\n {\n"speaker_state": null,\n"speaker_first": null,\n"congress": 106,\n"title": "Amendment No. 3753",\n"origin_url": "http://origin.www.gpo.gov/fdsys/pkg/CREC-2000-07-13/html/CREC-2000-07-13-pt1-PgS6593.htm",\n"number": 90,\n"order": 154,\n"volume": 146,\n            "chamber": "Senate",\n"session": 2,\n"id": "CREC-2000-07-13-pt1-PgS6593.chunk154",\n"speaking": [\n"Cyber Attack Sensing and Warning...................................20"\n],\n"capitolwords_url": "http://capitolwords.org/date/2000/07/13/S6593_amendment-no-3753/",\n"speaker_party": null,\n"date": "2000-07-13",\n"bills": null,\n"bioguide_id": null,\n"pages": "S6593-S6598",\n"speaker_last": null,\n"speaker_raw": "recorder"\n},\n{\n"speaker_state": null,\n"speaker_first": null,\n"congress": 105,\n"title": "THE DISTRICT OF COLUMBIA APPROPRIATIONS ACT, 1998",\n"origin_url": "http://origin.www.gpo.gov/fdsys/pkg/CREC-1997-11-09/html/CREC-1997-11-09-pt2-PgS12315-2.htm",\n"number": 157,\n"order": 1798,\n"volume": 143,\n"chamber": "Senate",\n"session": 1,\n"id": "CREC-1997-11-09-pt2-PgS12315-2.chunk1798",\n"speaking": [\n"biological, nuclear, and cyber attack prevention and response "\n],\n"capitolwords_url": "http://capitolwords.org/date/1997/11/09/S12315-2_the-district-of-columbia-appropriations-act-1998/",\n"speaker_party": null,\n"date": "1997-11-09",\n"bills": null,\n"bioguide_id": null,\n"pages": "S12315-S12391",\n"speaker_last": null,\n"speaker_raw": "recorder"\n},\n{\n"speaker_state": "GA",\n"speaker_first": "Paul",\n"congress": 106,\n
{u'bills': None,
 u'bioguide_id': u'C000813',
 u'capitolwords_url': u'http://capitolwords.org/date/2000/03/02/S1149_authority-for-committees-to-meet/',
 u'chamber': u'Senate',
 u'congress': 106,
 u'date': u'2000-03-02',
 u'id': u'CREC-2000-03-02-pt1-PgS1149.chunk8',
 u'number': 22,
 u'order': 8,
 u'origin_url': u'http://origin.www.gpo.gov/fdsys/pkg/CREC-2000-03-02/html/CREC-2000-03-02-pt1-PgS1149.htm',
 u'pages': u'S1149-S1150',
 u'session': 2,
 u'speaker_first': u'Paul',
 u'speaker_last': u'Coverdell',
 u'speaker_party': u'R',
 u'speaker_raw': u'mr. coverdell',
 u'speaker_state': u'GA',
 u'speaking': [u"Mr. President, I ask unanimous consent that the Senate Committee on Governmental Affairs be authorized to meet during the session of the Senate on Thursday, March 2, 2000 at 10 a.m., for a hearing entitled ``Cyber Attack: Is the Government Safe?''"],
 u'title': u'AUTHORITY FOR COMMITTEES TO MEET',
 u'volume': 146}

A little less good, but we'll deal

"You will learn to love me," quoth this data.

8510000600121001710032070014100100000000001010001400141000035991200001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600222001710032070014100100000000001010001400141000023991200001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600321001710032070014100100000000001010001400141000030991200001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600421001710032070014100100000000001010001400141000018992200001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600521001710032070014100100000000001010001400141000017992200001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600621001710032070014100100000000001010001400141000012992210001111010000000100000000000000000000000000BROWN JAMES W 0 8510000600722001710032070014100100000000001010001400141000009991200001111010000000100000000000000000000000000BROWN JAMES [...]

data list file ='C:\slave1850_1.dat'/
 year       1-2
 datanum   3
 serial     4-8
 slavenum  9-11
 weight  12-13
 reel   14-17
 [...]

now it starts to get bad....


In [11]:
from IPython.display import Image
Image("http://ieeexplore.ieee.org/xploreAssets/images/absImages/01173453.png")


Out[11]:

And then the truly ugly


In [9]:



Out[9]:

Ye olde pdf of a bad scan of an excel spreadsheet Fun with FOIA!

laws of data accessibility

First law of data accessibility

never discuss data accessibility

Second law of data accessibility

data is actually useful in inverse proportion to its readiblity by mere mortals

In other words, if you can read it easily (and you are not a computer), then the computer probably can't read it easily.

Our gripes with the bad data practices of others leads us to impose a law unto ourselves:

Third law of data accessibility

It is a universal maxim to strive to produce our data findings in formats good for human beings and also in formats open for other computational tools

First Blackbox


In [18]:
#first blackbox
from IPython.display import YouTubeVideo
YouTubeVideo('W8h5OEivJdA')


Out[18]:

They took the credit for your second symphony

re-written by machine and new technology

and now I understand the problems you can see.

Oh oh -- I met your children

oh oh -- what did you tell them+

video killed the radio star

video killed the radio star

pictures came and broke your heart

we can't rewind we've gone too far --Buggles, 1979

fundamental thesis of "Video Killed the Radio Star"

In my mind and in my car

We can't rewind we've gone too far

Pictures came and broke your heart

Put the blame on VTR

lest point be lost “radio star” is stuck in a plastic tube from which she cannot escape [1:51]

metaphysics of 'Video Killed the Radio Star'

  • agency of technology
  • inevitability
  • outside our control
  • evolution unilinear
  • clear link to normative
    • technological development means (creative) destruction and downsizing
    • necessarily so, so accept it
    • "disruption" etc.

Examples

  • NSA: We can collect communications, so we should/must/can't not do it
  • “Internet Killed the Video Star: How In-House Internet Distribution of Home Video Will Affect Profit”
  • “Video killed the radio star, but has Google killed the learning organization?”

technological determinism

roughly, a belief that technology causes social, economic and cultural transformation

often belief that technology primary, most important cause of these changes
often belief that technology has internal dynamic, a univocal path of development

belief in technological determinism itself a major cause

even if false (as it surely is), belief in inevitability of technological change a major political and economic argument
need to figure out what to do given the change or, surrender to its (non-extant) inevitability

OUR COURSE:

Not "digital" literacy

Technological autonomy

Letting us direct technology critically, rather than being ruled by it.

too binary: trick to learn the affordances of extant technologies while appreciating tradeoffs

YouTubeVideo('W8h5OEivJdA')

is very easy. All I had to do was copy a part of the web address (aka URL).

  • no global optimal solution in use of pre-built technologies
  • recognize problems with solutions
  • confidence in opening black boxes IF AND WHEN called for

black box

we will at first use a fair number of black boxes to get you moving. These are procedures, initially rote gobbledigook. We'll get back to some of them. Others will likely remain rote unless you descend deeper into programming.

  • black boxes enable and constrain
    • think doing graphs in Excel or typography in Word

open boxes starts with learning Python

tie together black boxes that help us

then start opening them if necessary

python = best gosh darn'd data manipulation tool yet invented

Our process

  • find data sources, good and bad
  • data munge (clean, massage, correct, format and normalize them)
  • analyze and process them
  • visualize them

Learn the insides of some algorithms

Mostly learn how to use 'blackboxes' and to properly format data for them

  • learn some of their affordances and dangers
  • more you use a blackbox, the more you should take it apart and redo

"Raw data" is an oxymoron.

We make data from sources: we don't find it pregiven.

Data is made, not born: fully artificial

Artificiality of data first moment of reflection

  • who produced this data?
  • is there a documented standard for this data? what interests produced this standard?
  • what do and don't record?
  • how frequently? Are these sensors calibrated? Are the people drunk half the time? What sort of drunk?
  • what systems of classification used?
  • what thrown out and how?

Against the repressive hypothesis

Could treat as negative:

artificial therefore false

Or artificial therefore way to create something positive

Artificiality of data as positive critical stance

  • Biology (Bionformatics)
  • Literary criticism (Ramsay)
  • Sociology
  • History
  • Polisci

Our plan

But this is an experiment.

  • Doing things with numbers
  • Doing things with text (at least two lectures)
  • Doing things with networks
  • Doing things with maps

In [ ]: