This notebook introduces students to popular computational tools used in the Digital Humanities and Social Sciences and the research possibilities they create. It then provides an abbreviated introduction to Python focussing on analysis and preparing for a Twitter data analysis on Day 2.
Estimated Time: 180 minutes
Topics Covered:
Parts:
Online Point-And-Click Tools
Common Programming Languages for Research and Visualization
Qualitative Data Analysis
Geospatial Analysis
Data Management
Qualitative Analysis
Quantitative Analysis
Model Building and Machine Learning
Linguistic Analysis and Natural Language Processing (NLP)
Visualization
Pedagogy
=
symbol assigns the value on the right to the name on the left.age
and a name in quotation marks to a variable first_name
.
In [ ]:
age = 42
first_name = 'Ahmed'
__alistairs_real_age
have a special meaning
so we won't do that until we understand the convention.print
to display values.print
that prints things as text.
In [ ]:
print(first_name, 'is', age, 'years old')
print
automatically puts a single space between items to separate them.
In [ ]:
print(last_name)
Name
and name
are different variables.
In [ ]:
flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')
In [ ]:
age = age + 3
print('Age in three years:', age)
In [ ]:
int
): counting numbers like 3 or -512.float
): fractional numbers like 3.14159 or -2.5.str
): text.type
to find the type of a value.type
to find out what type a value has.
In [ ]:
print(type(52))
In [ ]:
pi = 3.14159
print(type(pi))
In [ ]:
fitness = 'average'
print(type(fitness))
In [ ]:
print(5 - 3)
In [ ]:
print('hello' - 'h')
In [ ]:
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)
In [ ]:
separator = '=' * 10
print(separator)
In [ ]:
print(len(full_name))
In [ ]:
print(len(52))
In [ ]:
print(1 + '2')
1 + '2'
be 3
or '12'
?
In [ ]:
print(1 + int('2'))
print(str(1) + '2')
In [ ]:
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)
In [ ]:
first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)
first
when doing the multiplication,
creates a new value, and assigns it to second
.second
does not remember where it came from.first
when doing the multiplication,
creates a new value, and assigns it to second
.second
does not remember where it came from.year
and assign it as the year you were bornyear_float
year_float
to a string, and assign it to a new variable year_string
year_string
.
In [ ]:
In [ ]:
first_name = "Johan"
last_name = "Gambolputty"
full_name = first_name + last_name
print(full_name)
In [ ]:
full_name = first_name + " " + last_name
print(full_name)
[]
.
In [ ]:
full_name[1]
Gotcha - Python (and many other langauges) start counting from 0.
In [ ]:
full_name[0]
In [ ]:
full_name[4]
In [ ]:
full_name[0:4]
In [ ]:
full_name[0:5]
In [ ]:
full_name[:5]
In [ ]:
full_name[5:]
In [ ]:
str.
In [ ]:
str.upper?
So we can use it to upper-caseify a string.
In [ ]:
full_name.upper()
You have to use the parenthesis at the end because upper is a method of the string class.
Don't forget, simply calling the method does not change the original variable, you must reassign the variable:
In [ ]:
print(full_name)
In [ ]:
full_name = full_name.upper()
print(full_name)
For what its worth, you don't need to have a variable to use the upper() method, you could use it on the string itself.
In [ ]:
"Johann Gambolputty".upper()
What do you think should happen when you take upper of an int? What about a string representation of an int?
In [ ]:
In [ ]:
tweet = 'RT @JasonBelich: #March4Trump #berkeley elderly man pepper sprayed by #antifa https://t.co/5z3O6UZuhL'
Using this tweet, try seeing what the following string methods do:
* `split`
* `join`
* `replace`
* `strip`
* `find`
In [ ]:
In [ ]:
country_list = ["Afghanistan", "Canada", "Sierra Leone", "Denmark", "Japan"]
type(country_list)
len
to find out how many values are in a list.
In [ ]:
len(country_list)
In [ ]:
print('the first item is:', country_list[0])
print('the fourth item is:', country_list[3])
In [ ]:
print(country_list[-1])
print(country_list[-2])
In [ ]:
print(country_list[1:4])
In [ ]:
print(country_list[:4])
In [ ]:
print(country_list[2:])
In [ ]:
country_list[0] = "Iran"
print('Country List is now:', country_list)
In [ ]:
mystring = "Donut"
mystring[0] = 'C'
object_name.method_name
to call methods.
In [ ]:
country_list.
append
method.
In [ ]:
country_list.append("United States")
print(country_list)
In [ ]:
print("original list was:", country_list)
del country_list[3]
print("the list is now:", country_list)
In [ ]:
complex_list = ['life', 42, 'the universe', [1,2,3]]
print(complex_list)
In [ ]:
print(complex_list[3])
print(complex_list[3][0])
[]
on its own to represent a list that doesn't contain any values.IndexError
if we attempt to access a value that doesn't exist.
In [ ]:
print(country_list[99])
In [ ]:
hashtags = ['#March4Trump',
'#Fascism',
'#TwitterIsFascist',
'#majority',
'#CouldntEvenStopDeVos',
'#IsTrumpCompromised',
'#Berkeley',
'#NotMyPresident',
'#mondaymotivation',
'#BlueLivesMatter',
'#Action4Trump',
'#impeachtrump'
'#Periscope',
'#march',
'#TrumpRussia',
'#obamagate',
'#Resist',
'#sedition',
'#NeverTrump',
'#maga']
print(hashtags[::2])
print()
print(hashtags[::-1])
How long is the hashtags
list?
In [ ]:
Use the .index()
method to find out what the index number is for #Resist
:
In [ ]:
Read the help file (or the Python documentation) for join()
, a string method.
In [ ]:
str.join?
Using the join
method, concatenate all the values in hashtags
into one long string:
In [ ]:
Using the string replace
method and the list index
method, print
'Never Trump' without the '#'
In [ ]: