Portraits of dramatists and critics


Contents

  1. How it works
  2. Digital humanities today
  3. Design

How it works

CTI was started in July 2008. We strive to bring together a collection of documents that is large enough to provide a significant sample of inter-war critical discourse. Our general method is derived from corpus linguistics: given a large enough sample of language data, the meaning of a given expression from that language will emerge from the various instances that the corpus features. In this case, however, the expressions under consideration are not derived from a linguistic research programme, but rather from a research project concerning theatrical discourse ("Mass Theatre in Flanders" -- see About CTI).

Digital humanities today

The goal of Corpus Toneelkritiek Interbellum is to provide a well-organized and fully indexed corpus of documents from Flemish theatre criticism of the interbellum period. This collection takes advantage of recent advances in "digital humanities." Since more digital text is available than ever before, the technology for indexing and processing a large collection of documents has also become much more accessible (examples include Digital Texts 2.0, The Programming Historian, or Text Analysis Portal for Research).

Digital text allows for new possibilities in organizing and visualizing a collection of documents. Lists of word frequencies, concordancies, and indices may be generated in only a fraction of the time that was previously needed. Machine learning algorithms provide new ways to determine the author or date of a document. Texts may be grouped by automatic clustering -- using metadata or simply the data themselves, such as in clustering by compression -- in order to detect new groupings of historical data, and thus new ways to describe a historical development. Different manuscript versions of one single sentence may be instantly visualized.

Taken together, these developments imply that the first phase of digital text -- the digital library, exemplified by gutenberg.org -- is behind us now. Gutenberg.org, for all its merits, is poor on metadata. It does not provide any data on the source editions used to generate the digital documents. Subsequent digital libraries did much better. Witness, for example, the wealth of metadata included in projects such as the Perseus Digital Library, The William Blake Archive, or dbnl.org (Digital Library of Dutch Literature).

Digital Text

However, even digital libraries that feature excellent metadata not always succeed in taking advantage of all the possibilities which their document collection holds. The very concept of a "digital library" does not do justice to the potential of digital texts. Both long-existing repositories (such as gutenberg.org) and more recent projects conceive a digital library as a directory of digital texts with a website built around it. However modern the appearance, it does not add anything new to the age-old library concept -- in the sense of a directory of books with a building built around it.

Design

Corpus Toneelkritiek Interbellum was explicitly designed to make digital documents available online and visualize their mutual relationships by means of added metadata.

Next to every digital document, the reader may find a sidebar featuring all names, companies, and theatres that have been marked-up in the document. The reader may quickly decide if a text is useful by scanning the keywords.

Sidebar

Furthermore, related texts are suggested for every document. This was achieved by compiling a "document profile." Every profile includes the document's author, date, a list of all names mentioned in the text, and a list of the most frequent words (unigrams) and two-word expressions (bigrams), disregarding function words. Profiles are compared by computing the mutual overlap of the document with every other document:

degree of relatedness = (number of overlapping items) / (total length of both document profiles)

The result is a percentage that expresses how much two documents are related. The most strongly related documents are listed above and below the document that is being studied, together with the elements both documents have in common.

Related

This website was constructed from the original TEI XML files by parsing it to HTML using Python. See Credits