| Time/Day | Monday | Tuesday | Wednesday | Thursday | Friday |
|---|---|---|---|---|---|
| 9:00 - 10:00 | Modelling (basic principles) | Modelling (transcriptions) | Modelling (collation) | Modelling (annotations) | Modelling (queries and visualization) |
| 10:30 - 11:00 | Modelling (XML) | transcriptions / markup 1 | transcr./markup 2 | transcr./markup 3 | markup/annotation |
| 11:30 - 12:30 | Modelling (XML/TAG) | markup/tokenization 1 | tokenization 2 | annotation 1 | queries 1 |
| 12:30 - 2:00 | lunch | lunch | lunch | lunch | lunch |
| 2:00 - 3:30 | Modelling (TAG) | normalization | collation | annotation 2 | queries 2 |
| 3:30 - 4:00 | tea | tea | tea | tea | tea |
| 4:00 - 5:30 | review | review | collation | review | visualization/review |
Note: each day starts with a recap on modelling, focused on the topic of the day (respectively tokenization, collation, annotation, queries), and ends with time for review.
| Time/Day | Monday | Tuesday | Wednesday | Thursday | Friday |
|---|---|---|---|---|---|
| 9:00 - 10:30 | Model, syntax, and markup semantics | Computational pipelines | Modelling (collation), Transcription, Markup 2 | Modelling (annotations), Transcription, Markup 3 | Modelling (queries and visualization), Markup, Annotation |
| 10:30 - 11:00 | Coffee break | Coffee break | Coffee break | Coffee break | Coffee break |
| 11:00 - 12:30 | Transcription with markup: XML | Tokenization 1 | Tokenization 2 | Annotation 1 | Queries 1 |
| 12:30 - 14:00 | lunch | lunch | lunch | lunch | lunch |
| 14:00 - 15:30 | XML as a tree | Normalization | Collation 1 | Text Analytics | Queries 2 |
| 15:30 - 16:00 | tea | tea | tea | tea | tea |
| 16:00 - 17:30 | Transcription with markup (LMNL/Alexandria) | Review | Collation 2 | Review | Visualization/Review |
Week 2, Day 2 introduces the idea of processing pipelines. It focuses on modular approaches to and algorithmic aspects of textual criticism. Outcome goals:
Introduction to the idea of computational pipelines.
The Gothenburg model (GM) of textual variation can serve as an example of a computational pipeline for the analysis of textual variation. Focus here is not on textual variation: tokenization and normalization are necessary steps for every form of text processing and analysis.
Concepts:
Tokenizing using different text models: plain text, TEI/XML.
"Simple" tokenization: plain text
Normalization strategies
Review section. Focus on participant's questions.
Alternative (in case all is clear) is to focus on markup as an expression of a data model. Expand upon the tokenization different kinds of text with markup (TEI) or, more generally, processing text models.
Modelling: collation, transcription, and markup. Concepts.
Advanced tokenization: XML markup; Unicode; SoundEx.
Collation.
Collation and TEI/XML markup.
Modelling: Annotations, Transcription. The goal is to make participants aware of "Research-driven annotation", letting them ask questions: 1) What are the inherent properties of the text 2) What do I need for my research? Discribe the computational pipeline: research questions → data model (including query facilities) → markup/annotation.
...
MK: Bag of words, text processing, text as tables, query the tables
MK: Bag of words, text processing, text as tables, query the tables (continued)
MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis
MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis (continued)
In [ ]: