NEH Institute

planning week 2 [suggestion]

Time/Day	Monday	Tuesday	Wednesday	Thursday	Friday
9:00 - 10:00	Modelling (basic principles)	Modelling (transcriptions)	Modelling (collation)	Modelling (annotations)	Modelling (queries and visualization)
10:30 - 11:00	Modelling (XML)	transcriptions / markup 1	transcr./markup 2	transcr./markup 3	markup/annotation
11:30 - 12:30	Modelling (XML/TAG)	markup/tokenization 1	tokenization 2	annotation 1	queries 1
12:30 - 2:00	lunch	lunch	lunch	lunch	lunch
2:00 - 3:30	Modelling (TAG)	normalization	collation	annotation 2	queries 2
3:30 - 4:00	tea	tea	tea	tea	tea
4:00 - 5:30	review	review	collation	review	visualization/review

Note: each day starts with a recap on modelling, focused on the topic of the day (respectively tokenization, collation, annotation, queries), and ends with time for review.

updated planning week 2

Time/Day	Monday	Tuesday	Wednesday	Thursday	Friday
9:00 - 10:30	Model, syntax, and markup semantics	Computational pipelines	Modelling (collation), Transcription, Markup 2	Modelling (annotations), Transcription, Markup 3	Modelling (queries and visualization), Markup, Annotation
10:30 - 11:00	Coffee break	Coffee break	Coffee break	Coffee break	Coffee break
11:00 - 12:30	Transcription with markup: XML	Tokenization 1	Tokenization 2	Annotation 1	Queries 1
12:30 - 14:00	lunch	lunch	lunch	lunch	lunch
14:00 - 15:30	XML as a tree	Normalization	Collation 1	Text Analytics	Queries 2
15:30 - 16:00	tea	tea	tea	tea	tea
16:00 - 17:30	Transcription with markup (LMNL/Alexandria)	Review	Collation 2	Review	Visualization/Review

Week 2, Day 2: Tuesday, July 18

Synopsis

Week 2, Day 2 introduces the idea of processing pipelines. It focuses on modular approaches to and algorithmic aspects of textual criticism. Outcome goals:

A modular understanding of textual criticism
Modular approaches to designing and implementing an edition

9:00–10:30: Computational pipelines and text models

Introduction to the idea of computational pipelines.

The Gothenburg model (GM) of textual variation can serve as an example of a computational pipeline for the analysis of textual variation. Focus here is not on textual variation: tokenization and normalization are necessary steps for every form of text processing and analysis.

Concepts:

Tokenization
Normalization

11:00–12:30: Tokenization 1

Tokenizing using different text models: plain text, TEI/XML.

"Simple" tokenization: plain text

Exercises
Materials from the Amsterdam workshop?

2:00–3:30: Normalization

Normalization strategies

part of speech tagging
upper case / lower case normalization
etc. Issues Materials(?)

4:00-5:30: Review

Review section. Focus on participant's questions.

Alternative (in case all is clear) is to focus on markup as an expression of a data model. Expand upon the tokenization different kinds of text with markup (TEI) or, more generally, processing text models.

Week 2, Day 3: Wednesday, July 19

9:00–10:30: Modelling (collation)

Modelling: collation, transcription, and markup. Concepts.

11:00–12:30: Tokenization 2

Advanced tokenization: XML markup; Unicode; SoundEx.

2:00–3:30: Collation 1

Collation.

Gothenburg Model (GM)
Theory of automated collation and textual variation analysis
Collation goals
Exercises with automated collation software: CollateX and Stemmaweb?
near-matching?

4:00-5:30: Collation 2

Collation and TEI/XML markup.

different approaches to markup collation:
- flattening text
- passing along markup
- JSON (Sydney workshop)

Week 2, Day 4: Thursday, July 20

Modelling: Annotations, Transcription. The goal is to make participants aware of "Research-driven annotation", letting them ask questions: 1) What are the inherent properties of the text 2) What do I need for my research? Discribe the computational pipeline: research questions → data model (including query facilities) → markup/annotation.

9:00–10:30

Definition of Text Annotation
Pipeline of making annotations to text
Different approaches to and forms of annotation (allude e.g. to Alexandria)
- TEI
- Text-to-image linking; IIIF
- etc.

11:00–12:30

...

2:00–3:30: Text analytics 1

MK: Bag of words, text processing, text as tables, query the tables

4:00-5:30: Text analytics 2

MK: Bag of words, text processing, text as tables, query the tables (continued)

Week 2, Day 5: Friday, July 21

9:00–10:30

11:00–12:30

2:00–3:30: Text analytics 3

MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis

4:00-5:30: Text analytics 4

MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis (continued)



In [ ]: