NEH Institute

planning week 2 [suggestion]

Time/Day Monday Tuesday Wednesday Thursday Friday
9:00 - 10:00 Modelling (basic principles) Modelling (transcriptions) Modelling (collation) Modelling (annotations) Modelling (queries and visualization)
10:30 - 11:00 Modelling (XML) transcriptions / markup 1 transcr./markup 2 transcr./markup 3 markup/annotation
11:30 - 12:30 Modelling (XML/TAG) markup/tokenization 1 tokenization 2 annotation 1 queries 1
12:30 - 2:00 lunch lunch lunch lunch lunch
2:00 - 3:30 Modelling (TAG) normalization collation annotation 2 queries 2
3:30 - 4:00 tea tea tea tea tea
4:00 - 5:30 review review collation review visualization/review

Note: each day starts with a recap on modelling, focused on the topic of the day (respectively tokenization, collation, annotation, queries), and ends with time for review.

updated planning week 2

Time/Day Monday Tuesday Wednesday Thursday Friday
9:00 - 10:30 Model, syntax, and markup semantics Computational pipelines Modelling (collation), Transcription, Markup 2 Modelling (annotations), Transcription, Markup 3 Modelling (queries and visualization), Markup, Annotation
10:30 - 11:00 Coffee break Coffee break Coffee break Coffee break Coffee break
11:00 - 12:30 Transcription with markup: XML Tokenization 1 Tokenization 2 Annotation 1 Queries 1
12:30 - 14:00 lunch lunch lunch lunch lunch
14:00 - 15:30 XML as a tree Normalization Collation 1 Text Analytics Queries 2
15:30 - 16:00 tea tea tea tea tea
16:00 - 17:30 Transcription with markup (LMNL/Alexandria) Review Collation 2 Review Visualization/Review

Week 2, Day 2: Tuesday, July 18

Synopsis

Week 2, Day 2 introduces the idea of processing pipelines. It focuses on modular approaches to and algorithmic aspects of textual criticism. Outcome goals:

  • A modular understanding of textual criticism
  • Modular approaches to designing and implementing an edition

9:00–10:30: Computational pipelines and text models

Introduction to the idea of computational pipelines.

The Gothenburg model (GM) of textual variation can serve as an example of a computational pipeline for the analysis of textual variation. Focus here is not on textual variation: tokenization and normalization are necessary steps for every form of text processing and analysis.

Concepts:

  • Tokenization
  • Normalization

11:00–12:30: Tokenization 1

Tokenizing using different text models: plain text, TEI/XML.

"Simple" tokenization: plain text

  • Exercises
  • Materials from the Amsterdam workshop?

2:00–3:30: Normalization

Normalization strategies

  • part of speech tagging
  • upper case / lower case normalization
  • etc. Issues Materials(?)

4:00-5:30: Review

Review section. Focus on participant's questions.

Alternative (in case all is clear) is to focus on markup as an expression of a data model. Expand upon the tokenization different kinds of text with markup (TEI) or, more generally, processing text models.

Week 2, Day 3: Wednesday, July 19

9:00–10:30: Modelling (collation)

Modelling: collation, transcription, and markup. Concepts.

11:00–12:30: Tokenization 2

Advanced tokenization: XML markup; Unicode; SoundEx.

2:00–3:30: Collation 1

Collation.

  • Gothenburg Model (GM)
  • Theory of automated collation and textual variation analysis
  • Collation goals
  • Exercises with automated collation software: CollateX and Stemmaweb?
  • near-matching?

4:00-5:30: Collation 2

Collation and TEI/XML markup.

  • different approaches to markup collation:
    • flattening text
    • passing along markup
    • JSON (Sydney workshop)

Week 2, Day 4: Thursday, July 20

Modelling: Annotations, Transcription. The goal is to make participants aware of "Research-driven annotation", letting them ask questions: 1) What are the inherent properties of the text 2) What do I need for my research? Discribe the computational pipeline: research questions → data model (including query facilities) → markup/annotation.

9:00–10:30

  • Definition of Text Annotation
  • Pipeline of making annotations to text
  • Different approaches to and forms of annotation (allude e.g. to Alexandria)
    • TEI
    • Text-to-image linking; IIIF
    • etc.

11:00–12:30

...

2:00–3:30: Text analytics 1

MK: Bag of words, text processing, text as tables, query the tables

4:00-5:30: Text analytics 2

MK: Bag of words, text processing, text as tables, query the tables (continued)

Week 2, Day 5: Friday, July 21

9:00–10:30

11:00–12:30

2:00–3:30: Text analytics 3

MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis

4:00-5:30: Text analytics 4

MK: Unsupervised learning, cluster analysis, PCA, paleographic analysis (continued)


In [ ]: