[Data, the Humanist's New Best Friend](index.ipynb)
*Syllabus*

Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 2304B

Office: UC 114

Office Hours: Mondays 3:00pm-5:00pm

Meets: Fall 2014, Mondays 12:30pm-2:30pm, Wednesdays 12:30pm-2:30pm

Room: AB 148

Description

This course is a hands-on and pragmatic introduction to computer tools and theoretical aspects of the use of data by humanists in a variety of academic pursuits. Furthermore, it will serve as an introduction to the techniques and methods used today to make sense of data from a Humanities point if view.

Data, the Humanist's New Best Friend is divided into three blocks, plus an introductory review of the Python programming languge:

  • Data Mining, explaining the past and predicting the future by means of data analysis.
  • Text Analysis, producing valuable information from text sources.
  • Networks Science, understanding complex structures by analyzing the relationships among their entities.

Pre-requisites

This course is intended for last year undergraduate students. Since it uses the Python progamming language very heavily, students must have mastered the fundamental concepts of programing. There are two different vias to fulfill the requisites:

  • Via 1. Student must have completed Computer Sciences CS 2120: Computing for Life Scientists, with a minimum standing of 60%. This course teaches the core concepts of algorithms and data structures leading to the ability to write simple programs and scripts in Python.
  • Via 2. Student must have completed Digital Humanities DH 2220A/B: Computing and Informatics for the Humanities I and Digital Humanities DH 2221A/B: Computing and Informatics for the Humanities II, with a minimum standing of 60%. The former includes an introduction to programming, using Python, in order to create programs and scripts to address problems that arise in applied research. The latter introduces methods to deal with data.

The course will include three review classes on Python at the beginning of the term.

Course Aims

  • Highlight the importance of being able to handle data as an integral part of digital humanists' tasks.

  • Connect students to cutting-edge reasearch in Digital Humanities.

  • Make the student aware of the possibilities and boundaries of the current personal computers and how certain things can be done easier than they thought.

  • Foster independence and self-sufficiency during research, as well as develop a critical analysis skill.

  • Expose students to a new ways of thinking while providing them with the tools they need to further explore other computational tools.

Learning Outcomes

Upon completing this course successfully, the students will be able to accomplish the following:

  • Solve problems in terms of the workflow that a program should follow in order to produce an adequate solution.
  • Write actual programming code that does what the student wants it to do, and helps the student to automate tasks.
  • Be aware of the wide variety set of tools, libraries and applications already available and ready to use, so the student does not have to write everything from scratch.
  • Analyze data, extract value out of it, and generate beautiful visualizations that highlights information that is almost invisible when using traditional grids and spreadsheets.
  • Efficiently manipulate data of different nature, such as text sources, networks, or tables.
  • Apply the knowledge acquired in this course to reflect on their own experience and needs as a future researcher.

Course Materials

The course is based on micro-lectures, class activities, and recommended readings available on the calendar.

Micro-lectures are given in the format of interactive Python notebooks, that can be viewed as regular HTML pages in a browser by navigating to the course website. However, in order to fully take advantage of the materials and their interactivity, the student is expected to setup the IPython environment, download the notebooks and run them in her own computer.

All readings are recommended, although not required, and will be listed and linked (when available) in the class notebooks.

The OWL website will be used solely to send the assignments and the final project.

Texts

There is no required textbook for this course. However, the following books may help in general terms:

Any extra material or readings will be uploaded or linked, and will be available for downloading and printing (if needed) for each student.

Topics

Due to the way the class is being taught, the material covered and listed below will adapt to the interests, and abilities, of the class:

  • Block 1: How to Think Like a Computer Scientist (Review)
    • Python syntax, variables and values
    • Statements and expressions
    • Functions
    • Control flow execution
    • Data types
  • Block 2: Dealing with Data
    • Arrays and Matrices
    • Cleaning data
    • Summary statistics
    • Statistical modeling
    • Visualizations
    • Correlations
    • Machine Learning
  • Block 3: Text Analysis
    • Tokenization
    • Word and phrase frequencies
    • Co-Occurrence and similarity
    • Word inflection and lemmatization
    • n-grams
    • Information extraction
    • Sentiment Analysis
    • Generative Writing
  • Block 4: Network Science
    • Graph Theory and Networks
    • Centrality Measures
    • Assortativity and Degree Correlations
    • Modularity and Community Structure
    • Small Worlds
    • Network Dynamics
    • Social Network Analysis
    • Plot Analysis

Evaluation

Evaluation is calculated as follows:

  • Attendance: 10% (more than 3 unjustified absences means a zero)
  • Participation: 15% (including activities done during the lectures)
  • Assignments: 36% (3 assignments, 12% each one)
  • Final Project: 39% (5% Proposal, 20% Essay, 14% Oral Presentation)

Attendance (10%)

Class attendance plays an important role in the successful completion of this course. Students are expected to come to class regularly and those missing four or more classes will automatically lose their attendance grade.

Participation (15%)

In-class participation: students are expected to come to class prepared to participate actively. During the theoretical explanation, a set of activities related to the content of the class will be proposed for students to be solved in situ. These activities are meant to give students the opportunity to apply the knowledge they have acquired in the course. Students will work in groups, or with a partner, depending on the number of students enrolled in the course.

The different criteria to evaluate the participation will be the following:

  • Questions and comments during the explanations.
  • Activities made in class. Some may be requested as take home activities and to be sent to the teacher.

Assignments (36%)

There will be three different assignments based on the three main practical blocks of contents, excluding introduction and programming review. Each of which will cover a real use case in Digital Humanities.

  • Data Mining (12%).
  • Text Analysis (12%).
  • Network Science (12%).

Final Project (39%)

Students must take into account the different theoretical approaches analyzed in class when evaluating the chosen method/program of language teaching, and choose a problem, phenomenon, dataset or topic of interest for him or her, and use at least two of the three blocks to write a Notebook about it. There is no minimal extension for the Notebook, as long as the project covers all the aspects. The final project is divided in three parts:

  • Proposal (5%). One page that clearly outlines what the student has in mind for the final project. A preliminary bibliography must be provided.
  • Notebook (20%). The project itself bill be an IPython Notebook, since that is the best way to evaluate the acquired skills in exploratory programming.
  • Presentation (14%). Oral presentations will be made in class, at the end of the course, and the format is totally up to the student.

Note: Depending on the number of students, final projects might be carried out in groups, since there is only 2 classes for all the oral presentations.

Plagiarism

Plagiarism is a major academic offence (see Scholastic Offense Policy in the Western Academic Calendar). Plagiarism is the inclusion of someone else’s verbatim or paraphrased text in one’s own written work without immediate reference. Verbatim text must be surrounded by quotation marks or identified if it is no longer that four lines. A reference must follow right after borrowed material (usually the author’s name and page number). Without immediate reference to borrowed material, a list of courses at the end of a written assignment does not protect a writer against the possible charge of plagiarism. Western University Ontario uses a plagiarism-checking service called Turnitin.

Absenteeism

Students seeking academic accommodation on medical grounds for any missed tests, exams, participation components and/or assignments must apply to the Academic Counselling office of their home faculty and provide documentation. Academic accommodation cannot be granted by the instructor or department.

Accessibility Statement

Please contact the course instructor if you require material in an alternate format or if you require any other arrangements to make this course more accessible to you. You may also wish to contact Services for Students with Disabilities (SSD) at 661-2111 x 82147 for any specific question regarding an accommodation

Calendar

The schedule of classes and deadlines for assignments can be consulted in the course calendar.