Sustaining communities in Digital Humanities: diversity, collaboration, and capacity
building in digital scholarship and pedagogy
Anna-Maria Sichani

Research Fellow in Media History & Historical Data Modeling School of Media, Arts and Humanities | Sussex Humanities Lab, University of Sussex

Current discourses around the term “sustainability” in Digital Humanities have been
focusing on our expectations of longevity for digital projects as regards mainly their
technological, financial and environmental aspects. While we are designing strategies in
order to secure the necessary financial streams and the technical robustness that will
ensure the maintenance of a digital project, we often omit to think about ways to sustain
the communities of people around the project, both creators and users.
By focusing on the example of The Programming Historian, I am going to explore ways that
allow us to invest on and sustain communities of people in digital scholarship. Prioritising
the human aspect of a digital project by supporting diversity, collaboration, and capacity
building among team members, should be seen as the key sustainability element in our
digital scholarship practice and pedagogy.

Progress and Challenges in Olfactory Information Extraction
Sara Tonelli

Head of Digital Humanities, FBK, Trento

Our senses are our gateways to the past. Much more so than any other sense, our sense of smell is linked directly to our emotions and our memories. However, smells are intangible and very difficult to preserve, making it hard to effectively identify, consolidate, and promote the wide-ranging role scents and smelling have in our cultural heritage. While some novel approaches have been recently proposed to monitor so-called urban smellscapes (Quercia et al., 2015), relying both on smellwalks in different cities and on the extraction of olfactory information from social media, when it comes to smellscapes from the past little research has been done to keep track of how places and events have been described from an olfactory perspective. Fortunately, some key prerequisites for addressing this problem are now in place. In recent years, European cultural heritage institutions have invested heavily in large-scale digitisation: we hold a wealth of object, text and image data which can now be analysed using artificial intelligence. What remains missing is a methodology for scent information extraction, as well as a broader awareness of the wealth of historical olfactory descriptions, experiences and memories contained within the heritage datasets. In this talk I will describe ongoing activities towards this goal, focused on text mining and linguistic annotation for olfactory information extraction. I will present some examples related to the olfactory annotation of travel narratives as well as the creation of a multilingual taxonomy for smells. I will discuss the main findings and the challenges related to modelling textual descriptions of odours, including the metaphorical use of smell-related terms and the well-known limitations of smell vocabulary in European languages compared to other senses.   

24) Coding Art: Multi-Modal Iconclass Codes Retrieval using Neural Networks

Nikolay Banar1, 2  , Walter Daelemans 1 , and Mike Kestemont 1,2

1CLiPS, University of Antwerp, Antwerp, Belgium
2ACDC, University of Antwerp, Antwerp, Belgium

Iconclass (Brandhorst, 2019) is an iconographic classification system used by cultural heritage institutions to describe and retrieve content in artworks. Iconclass codes represent objects, people, events and ideas depicted in visual artworks. Assigning Iconclass codes is a complicated task which requires (expensive) highly-trained subject experts. The large number of available codes makes this task very challenging. In spite of the increasing application of machine learning in digital heritage, the automatic assignment of Iconclass codes still remains an unsolved task.

In this work, we apply the latest advances in information retrieval to match artworks with Iconclass codes. We use the textual features from Dutch and English artwork titles, as well as the visual features to match Iconclass codes. First, we apply the “Self-Attention Embeddings” framework (SAEM, (Wu et al., 2019)) to investigate the feasibility of cross-modal (image-to-text) matching. Second, we investigate text-to-text matching using the text branch of SAEM. Finally, we propose a multi-modal extension of SAEM in order to exploit multi-lingual and visual data simultaneously

40) Ancient Egyptian tombs in the digital age: painting semantic landscapes (POSTER)

Nicky van de Beek
In this poster I will demonstrate the use and possibilities of semantic network analysis on iconographic and textual sources from ancient Egyptian tombs. For my PhD research I study representations of landscape in ancient Egyptian private tombs from the Old to the New Kingdom (ca. 2700-1050 BCE). These elite tombs are decorated with an elaborate iconography of ‘scenes of daily’, showing the tomb owner engaged in various activities such as hunting game in the desert, spearing fish in the marshes and inspecting the agricultural harvest. This is often set against a stylized landscape of hills, water, vegetation and animal life, where human interaction is shown such as fishing, fowling, planting and herding. These scenes offer unique insight in the way in which the ancient Egyptians perceived, used and manipulated their natural environment. The scenes are accompanied by hieroglyphic captions explaining the activity taking place, the (emic) terminology of which reveals how they classified the world around them.

10) Finding your way through Keystroke Logging Data: Visualizations of the Born-Digital Literary Writing Process

Lamyk Bekius (Huygens ING (KNAW) & University of Antwerp)

Genetic criticism investigates analogue drafts and manuscripts to gain insights into the author’s way of working and the creative process, as well as to gain a better understanding of the work itself. Nowadays, most literary authors compose their text in a digital environment. This has led to the fear of the end of genetic criticism (Mathijsen 2009). However, the research carried out by a number of authors shows a variety of techniques to study born-digital writing processes which proves that the digital writing process leaves sufficient traces to ensure genetic analysis (Ries (2018, 2014); Kirschenbaum and Reside (2013), Kirschenbaum (2008), Crombez and Cassiers (2017), Vásári (2019), Vauthier (2016) and Fenoglio (2009)). In the NWO-project ‘Track Changes: Textual Scholarship and the challenge of digital literary writing’ (2018-2023) we investigate the possibilities of yet another tool for enabling the study of the genesis of a digital (literary) writing process: the usage of the keystroke logger Inputlog at the moment of composition. Inputlog is developed at the University of Antwerp and allows the author to write in Microsoft Word (Leijten and Van Waes 2013). While the program is running, it records every keystroke and mouse movement in combination with a timestamp and saves the Word document at the start and end of each writing session.

In this paper, I will report on my ongoing PhD-research (2018-2022) on how the keystroke logging material can yield (new) opportunities for genetic criticism, here specifically through the use of different visualizations of the writing process. The research focuses on the Inputlog material of the writing process of the novel Roosevelt by the Flemish author Gie Bogaert. 1 Bogaert wrote Roosevelt in 422 writing sessions over a period of 259 days. These writing sessions resulted in 277 hours, 14 minutes and 22 seconds of keystroke logging data and 453 Word documents that show the text in different stages of composition. This vast amount of data can be overwhelming and may obscure valuable insights on the writing process from a textual perspective. Therefore, building on existing methodologies in genetic editing, I explore the role of the scholar in making the writing process intelligible from a textual genetic perspective by means of visualizations.

3) Distant and close reading in literature: a case of networks in Periodical Studies

Julie M. Birkholz and Leah Budke (Ghent University)

The question of how to read, and specifically the opposed approaches of distant and close reading, has long been contentious in the humanities and specifically in the field of literary studies. In practice, computational tools are increasingly implemented by scholars, resulting in a constantly evolving debate around the reading of research objects. We seek to explain through the lens of network studies the various ways networks are being used and can be used in humanities research including modelling information as relational, visualizing networks, and implementing quantitative network analysis. We argue that, in contemporary research practice, distant and close reading necessarily coexist and are methodological approaches that answer different yet complementary research questions. This is explained through three types of applications of network studies which focus on periodicals: close reading and network visualisations, close reading and network visualisations and analysis, and the computational generation of networks from databases for exploring relational phenomena with network analysis. We assert that there is a need for greater awareness and transparency about the role different approaches play in present-day research. Instead of pitting these two approaches against one another, we urge researchers to consider the mutual benefits of reading practices in order to facilitate a conversation on the value and role of different technologies and ways of reading in the humanities. This is a presentation of a forthcoming paper of the same title in a special issue on digital literary studies in the journal Interférences litteraires.

16) Evaluating the multilingual capabilities of the OCCAM workflow: a case of digitised historical newspapers

Julie M. Birkholz, Sally Chambers, Michal Hradis, andPavel Smrz
(On behalf of the EU funded OCCAM Project on OCR, Classification & Machine Translation.)

Increased digitization of historical newspapers by cultural heritage institutions1 has allowed humanities scholars to expand their corpora in terms of volume and diversity. This has accompanied an increase in the use of computational tools2 . Simultaneously this has also brought to rise a new set of questions around the accuracy and value of this data and related studies. In this presentation we will focus on the multilingual challenge of studying historical newspapers from the lens of Belgium: questioning how a humanities researcher with an understanding of one language Dutch or French can accurately implement a study of a national press without the knowledge of the other most populous language. This question is core to the workflows being developed in the OCCAM (OCR, ClassificAtion & Machine Translation) project’s digital humanities case. OCCAM implements a workflow for the integration of image classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT) to support the automated translation of scanned documents.

We explain this through the case of the Belgian press using a set of multilingual historical newspapers from KBR- the Royal Library of Belgium’s historical newspaper collections: BelgicaPress. Through examples from a set of Dutch and French language newspapers from the early 1900s we explain how images of textual sources in multiple languages can efficiently be OCRed using the machine learning based model of PERO. In a subsequent workflow, the results of the OCR are then fed through a machine translation module. Originally developed for the machine translation of contemporary documents, we will report on the results using our digitized historical newspapers test case, to afford research of these historical documents.

22) Persistence, self-doubt, and curiosity: Surveying code literacy in Digital Humanities

Elli Bleeker1, Kaspar Beelen2, Sally Chambers3, Marijn Koolen1,  Liliana MelgarEstrada4, and Joris van Zundert1
1Huygens ING
2Alan Turing Institute
3Ghent University
4Utrecht University

 “Do humanities students need to know how to code?” Less than a decade ago this was a sincerely provocative question not just in the humanities but even within digital humanities (cf. Kirschenbaum, 2009; Ramsay 2013a, 2013b). At present digital humanists seem to agree that knowing how to code is relevant, if not essential for DH research (cf. inter alia Klein 2020). The community ponders if coding can be called “the new literacy”. This prompts the question: what do we mean by “code literacy”? Is it knowing how to add markup to texts, being proficient in a general purpose programming language, knowing how to apply statistics in R, or rather having a high-level understanding of data structures? To date, various researchers have tackled this question, presenting valuable insights that are often based on their own experiences with teaching digital humanities courses or with working in a digital humanities context (e.g. Van Zundert, Antonijević, and Andrews 2020). However, a broadly informed evidence-based examination of the meaning and status of code literacy in the humanities remained a desideratum. With the, to our knowledge, largest ever survey on code literacy and related questions in the field, we hope to fill this gap. In our paper we present a number of preliminary findings from this survey. An exhaustive analysis augmented with qualitative interviews with respondents will be presented in a follow up research article.

38) Using SPARQL to investigate the research potential of a Linked Open Data knowledge graph: the Mapping Manuscript Migrations project

Toby Burrows, Laura Cleaver, Doug Emery, Mikko Koho, Lynn Ransom, and Emma Thomson

The Mapping Manuscript Migrations (MMM) project has developed a large Linked Open Data knowledge graph relating to the histories of 220,000 medieval and Renaissance manuscripts:  This aggregates data from three sources: the Schoenberg Database of Manuscripts, the Bibale database, and Oxford University’s catalogue of medieval manuscripts. The project produced a Web portal where users can browse, search, and visualize the data. With a combination of filtering and searching, this portal supports relatively complex queries. But the project team wanted to explore still more ambitious and analytical queries, using the SPARQL query language directly with the MMM triple store.

This paper reports on how the MMM project team used SPARQL to address research questions, enhance and expand the context of the MMM data, and carry out diagnostic exploration of the contents of the source datasets. It includes worked examples of specific queries, and compares them with other ways of querying the same dataset. It also reports on the lessons learned from this process.

59) Linked open data and ISNI as a crucial bridge identifier for cultural heritage (POSTER)

Ann Van Camp (KBR, Royal Library of Belgium)

Keywords: Linked Open Data – International Standard Name Identifier (ISNI) – Cultural heritage – Scientific research – Learning network

This poster presents the LOD-ISNI project aimed at strengthening the long-term cooperation with international partners, and at promoting the use of Linked Open Data and ISNI as a crucial bridge identifier for cultural heritage materials.

36) Examining a multi-layered approach for classification of Optical Character Recognition (OCR) quality without Ground Truth

Mirjam Cuper

In the past few decades, more and more heritage institutions have made their collections digitally available. At the same time, the availability of computer driven research tasks is increasing. With this combination, large data sets can be analysed in a fairly short time, something which would not be possible by hand. However, there is a pitfall; the OCR quality of digitised text is not always of high enough quality. This leads to several possible problems which can cause bias in the research results, both on the information retrieval and the analysis level. Although most researchers are aware of the presence of OCR errors, often they are not able to quantify these errors or the impact on their research. This leads to uncertainty about whether results can be published or not [3]. A measure for the (relative) quality of OCR would be very beneficial in the field of Digital Humanities. Furthermore, digital heritage institutions can use such a measure for improving their digitised collection.
For a few collections, a so called ‘Ground Truth’-set is available. A Ground Truth set consist of digitised texts that are manually corrected by humans. Therefore, these digitised texts are of high quality and contain very few or no errors. They can be used to determine the quality of the corresponding OCR output. However, the creation of a Ground Truth set is time consuming and expensive, resulting in only a small amount of available Ground Truth sets. The above leads to the following question: How can the quality of OCR-ed texts be determined when Ground Truth data is absent?
This study examines a multi-layered approach for measuring the OCR quality of digitised texts without the presence of a Ground Truth. Previous research has mentioned a few possibilities, such as garbage detection [2, 4], lexicality, and confidence values from the OCR engine [1]. All of these measurements have problems with their accuracy due to the nature of texts. Therefore, we propose a multi-layered approach that combines various different measurements. We include statistical measurements, such as average sentence length, average word length and letter ratio. In addition, we include more complex measurements like the before mentioned garbage detection, language detection, and various approaches of dictionary look up.

49) Interdisciplinary Resilience: critical humanities infrastructure in the times of a global pandemic: a Flemish case study

Sally Chambers (Ghent University), Wout Dillen (University of Antwerp; University of Borås), Tom Gheldof (KU Leuven), Wouter Ryckbosch (Vrije Universiteit Brussel), Vincent Vandeghinste (Instituut voor de Nederlandse Taal, INT), Christophe Verbruggen (Ghent University), and Els Lefever (Ghent University)

Keywords: Open Humanities, Humanities Data Infrastructure, Research infrastructure, DARIAH, CLARIN, CLARIAH

Researchers in the Humanities and Arts explore, contextualise, analyse and attribute meaning to cultural and social phenomena through the expressions of individual and collective human knowledge, perceptions, experiences, emotions and creativity documented in a rich variety of languages, literary and artistic works, archival records and artefacts. These cultural artefacts, preserved in galleries, libraries, archives and museums are the essential sources for humanistic exploration. Since the turn of the century, cultural heritage institutions around the world have been digitising such artefacts in earnest (Hughes, 2004; Terras, 2012). Digitisation and datafication (Borgman, 2015; Tasovac, 2020) of these gathered resources radically increases the human ability for interpretation and understanding of culture and society. At the same time, “digital humanities are uniquely placed to interpret and critique culture at the level of the infrastructure” (Liu, 2017). It is within this context that humanities research infrastructures such as DARIAH and CLARIN take on a critical role in facilitating computer-assisted humanities data exploration through advanced digital humanities methods. …
In this research paper, we will outline the three major infrastructural challenges that CLARIAH-VL intends to address with the aim of advancing digitally-enabled research in Humanities and the Arts in Flanders, Belgium and beyond. Furthermore, we will present the 1results achieved in the first phase of CLARIAH-VL (FWO IRI – 2019-2020) and how they will be developed further in this second phase:

50) Collections as Data: interdisciplinary experiments with KBR’s digitised historical newspapers: a Belgian case study

Sally Chambers¹,², Frédéric Lemmers¹, Thuy-An Pham¹, Julie Birkholz ¹,², Antoine Jacquet ¹,³, Wout Dillen⁴,6, Dilawar Ali⁵, and Steven Verstockt⁵
¹KBR, Royal Library of Belgium
²Ghent Centre for Digital Humanities (GhentCDH), Ghent University
³Université libre de Bruxelles, Sciences de l’information et de la communication Department
⁴Antwerp Centre for Digital Humanities and Literary Criticism (ACDC), University of Antwerp
⁵Internet Technology and Data Science Lab (IDLab), Ghent University
6University of Borås, Sweden

Keywords: Collections as Data; Digital Cultural Heritage; Digital Humanities Datasets; Data-level access; FAIR (Findable, Accessible, Interoperable, Reusable) data; Open Science

Digital cultural heritage collections in libraries, archives, and museums are increasingly being used for digital humanities research. However, traditional ways of providing access to such collections, for example through digital library interfaces, are less than ideal for researchers who are looking to build datasets around specific research questions. Inspired by the ‘Collections as Data’ movement as an approach for cultural heritage institutions to prepare their digital 1 collections for analysis using digital methods, KBR, the Royal Library of Belgium, has embarked on a 24 month project called DATA-KBR-BE (2020-2022) to facilitate data-level access to its 2 digitised and born-digital collections for digital humanities research.

This paper will: a) introduce the concept of ‘Collections as Data’ while exploring the opportunities and challenges for cultural heritage institutions; b) outline the DATA-KBR-BE project and the methodology for piloting ‘Collections as Data’ at KBR; and c) present the preliminary results of initial experiments to extract thematic datasets in support of digital humanities research scenarios as a first step towards designing a sustainable data extraction workflow for KBR.

21) Competition between Dutch and Latin: a Case of Cultural Drift?

Arjan van Dalfsen (Utrecht University), Folgert Karsdorp (KNAW Meertens Instituut), and Els Stronks (Utrecht University)

In recent years, several studies have emphasized the importance of neutral, stochastic forces in processes of cultural change. Phenomena varying from baby names (Bentley et al, 2004) to dog breeds (Herzog et al., 2004; Ghirlanda et al., 2014) and from archeological pottery (Shennan & Wilkinson, 2001; Hahn & Bentley, 2003; Bentley & Shennan, 2003) to irregular verbs (Newberry et al., 2017) were all shown to have patterns typical of neutral evolution. The term neutral evolution (or drift as it is also called) stems from biology. It refers to the type of evolutionary changes that, due to the workings of randomness, take place without affecting the fitness of an organism. Researchers in the field of Cultural Evolution have adopted the term as a possible explanation for developments in culture. While it is now commonly established in both the fields of evolutionary biology and cultural evolution that many changes can be accounted for (at least partly) by neutral evolution, humanities scholars in general have drawn little attention to the possibility that random, neutral factors can have a severe impact on cultural change (Karsdorp, 2019). To enhance our understanding of the applicability of neutral theory in the humanities, this study examines whether the dynamics of Dutch texts compared with Latin texts between 1600 and 1800 might be an example of neutral drift.

18) Interdisciplinary synergies: opening up new perspectives for data-driven research of the Southern Dutch Dialects

Veronique De Tier, Jesse de Does, Katrien Depuydt, Tanneke Schoonheim (†), Koen Mertens
Dutch Language Institute
Jacques Van Keymeulen, Roxane Vandenberghe, Lien Hellebaut
Department of Linguistics, Ghent University
Sally Chambers
Ghent Centre for Digital Humanities, Ghent University

Keywords: CLARIAH, CLARIN, dialectology, FAIR (Findable, Accessible, Interoperable, Reusable) data, language atlases, lexicography, linguistic data, geo-visualisation, Open Science

The Database of the Southern Dutch Dialects (DSDD) aims to aggregate and standardise three existing comprehensive lexicographic dialect databases of the Flemish, Brabantic and Limburgian dialects into one integrated dataset (Van Keymeulen, et al., 2019; Van Hout, et al., 2018, De Vriend, et al., 2006). In 2016, the Research Foundation Flanders funded a medium-scale research infrastructure project, the Database of Southern Dutch Dialects (DSDD) 1 , which ran from 2017 until 2020 and resulted in a harmonised dataset of concepts (Van den Heuvel, et al., 2016), a user-friendly search engine and a geo-visualisation tool. The application backend provides an Application Programming Interface (API) which will be further developed and made available to researchers to export subsets of the data for analysis using existing digital research tools.

At the previous DH Benelux conferences the project team introduced the project (2017), explored the cartographic tools (2018) and demonstrated the prototype (2019). Following the launch of the DSDD platform in 2020, the team will now present the importance of the interdisciplinary approach, the final results of the project, as well as future directions beyond the project lifetime.

30) After Reading the Romance (POSTER)

Hannah Ensign-George (Indiana University Bloomington), and Allen Riddell (Indiana University Bloomington)

The American romance novel bestrides the globe. Harlequin, Mills & Boon, and Avon sell millions of novels each year, in mass-market, paperback, and eBook editions, some translated into 32 different languages. Most recently, Lady Whistledown and the Bridgerton siblings have entranced audiences from Indiana to Amsterdam. Produced by Shonda Rhimes, Bridgerton is an adaptation of Julia Quinn’s bestselling novels. The Netflix show reached 82 million households within its first month; since its December 25th release the novels have sold 750,000 copies, of a total of at least 10 million copies in the U.S. alone.
Why do romance novels captivate the popular imagination? Although academic research on the subject is now ubiquitous, in the early 1980s, the genre was scarcely 20 years old and academic studies numbered in the single digits. Among the early studies, Radway’s Reading the Romance: Women, Patriarchy, and Popular Literature (1984) is by far the most cited. Indeed, it is among the most cited works in cultural studies tout court.
Reading the Romance explores the commercially successful genre through the eyes of a community of avid romance novel readers in “Smithton,” a town in the midwestern United States. The book studies the social practice of romance reading as well as the readers’ judgments of ideal and failed romance novels.
Our poster describes a dataset, ARtR (After Reading the Romance) which makes available machine-readable information about the novels and novelists featured in Radway’s book. …

15) Legacies of bondage: Towards a Database of Life Courses in the Dutch West Indies (1830-1950)

Coen van Galen, Rick Mourits, Thunnis van Oort en Jan Kok
(Radboud University, Nijmegen)

Proposal for a Research Paper at DH Benelux 2021 Nineteenth-century records of the population of Dutch colonies in the West Indies are among the richest and most detailed in the colonial Western Hemisphere. These registers potentially allow for reconstructing the life courses of both enslaved and free people of Suriname and Curacao for up to 6 generations, with obvious relevance for members of the general public rooted in the region, but also for a wide range of scholarly investigations into the long-term effects of slavery, indentured labour and other institutions of tropical colonial societies. Life courses are standardised biographies, allowing for the comparison of life events (birth, marriage, death, geographical and social mobility and so forth) of large numbers of persons (Kok 2007, 2017).

In order to unlock this treasure of information, the Platform Digital Infrastructure SSH has recently granted funding for the construction a consolidated database of the population of Suriname and Curacao between 1830 and 1950, which will include the last decades of slavery, Dutch and other European immigration, the migration of Caribbean, Chinese, Indian and Indonesian indentured labourers to Suriname, and the gradual integration of all social groups within the framework of the Dutch colonial state. This project is doable due to the relative small population size and the extraordinary quality of the sources. In this paper we want to introduce the project to the wider DH community, present preliminary results and benefit from constructive commentary of our peers.

9) Encoded Parchment: Reflections on Digital Codicology

Suzette van Haaren (University of Groningen; and the University of St Andrew)

This paper explores the principles of ‘digital codicology’, a method for analysing digital facsimiles of medieval manuscripts, as developed during my PhD project. I propose using codicological tools to understand digital facsimiles, what they are and how they function in the world. Codicology, the study of the medieval book, places the material manuscript at the centre to explore its historical, cultural, social complexities. Like the codicologist, the digital codicologist aspires to analyse the digital facsimile’s material elements, its place as cultural object and carrier of knowledge and tradition, and its position in history. This begins by acknowledging that the digital facsimile is (1) a material object and (2) a distinct object in itself. Through the digital codicological method we can lay bare the digital facsimile’s place in the manuscript’s life and understand exactly how the digital continues and affects the medieval book’s life. …

In this paper I share some (preliminary) conclusions from my PhD research: how the digital codicological approach sprung from three main case studies, and look at how analysing them lead to greater understanding of the effects of digitisation in general.

44) Exploiting Dutch speech collections for forensic speech science (POSTER)

Willemijn Heeren1,2, Meike de Boer1 , Laura Smorenburg1 , andDavid van der Vloed2
1 Leiden University Centre for Linguistics, Leiden University, Leiden
2Netherlands Forensic Institute, The Hague

Whereas linguistic research has mainly focused on capturing the behavior of language users in a general sense, more recently, individual variation has been taken up in speech perception and speech production modelling. In forensic speech science, individual variation is at the center of attention; how do individual speakers compare to the general population in their speaking behavior?

In forensic speaker comparisons (FSC), one or more disputed recordings by an unknown speaker (e.g. a threat, intercepted telephone recordings), are compared to one or more recordings from the suspect in the case to assess similarity. The speech features included in this comparative analysis are also related to other speakers’ behavior, to estimate the typicality of the speech features. The ‘other speakers’ are known as the background population, which in casework often comes down to a representative group of other same-sex speakers with a comparable language background. Ideally, background populations are available for the modelling of all relevant speech features. In practice, however, such statistics are not yet widely available; this presentation explains the issue, and how it is being dealt with in current research through the use of large speech corpora.

12) Replicating the Kinora: 3D Modelling and Printing as Heuristics in Digital Media History

Tim van der Heijden (C2 DH, University of Luxembourg), and Claude Wolf (DoE, University of Luxembourg)

This presentation reflects on the Kinora replica project, an interdisciplinary collaboration between the Luxembourg Centre for Contemporary and Digital History (C2 DH) and the Department of Engineering (DoE) of the University of Luxembourg. Combining historical inquiry with a hands-on and technical approach – involving the latest 3D modelling and desktop additive manufacturing engineering techniques – it provides insights into the process of making a working replica of the Kinora motion picture technology from the early 1900s.

53) Linked Data in a 3D context: Experiential Interfaces for Exploring the Interconnections of 17th-Century Historical Data

Hugo Huurdeman (University of Amsterdam ), Chiara Piccoli (University of Amsterdam), and Leon van Wissen (University of Amsterdam)

Within the Virtual Interiors project ( ) various Amsterdam houses are reconstructed in three dimensions based on historical research. One of the case studies is the house of 17th-century Amsterdam-born patrician Pieter de Graeff and his wife Jacoba Bicker. The interdisciplinary dataset resulting from this case study (detailed in Piccoli, 2021) is presented within an online 3D research environment, which also serves as a starting point for further exploration of historical data2 . In earlier work (Huurdeman & Piccoli, 2021), we described this as a three-step process model of knowledge creation, sharing and discovery: new insights emerge during the creation of the 3D reconstructions (1), these insights are shared via an online research environment (2) and users may discover new knowledge through interacting with the environment and additional Linked Data (3). This paper focuses on the third step, the integration of related Linked Data relevant for Humanities researchers in a 3D research environment.

We have come a long way since the floppy-drive based datasets researchers would refuse to share. Actually, quite a lot of data is coming out as Linked (Open) Data, but this presents us with new challenges. Specifically for the case of publishing, connecting and sustaining Linked Data on persons we pose five challenges that we would like to discuss with the community, especially now that the problems we face is becoming more widespread, as more and more cultural heritage institutions open up their collections, and digitization initiatives of archives take flight.


The first two challenges concern publishing Linked Data ourselves. The second two issues are relevant once a dataset is published and its existence is advocated in order for others to benefit from it and overcome the challenge of lack of interactivity of Linked Data. We finally address a challenge regarding the sustainability of Linked Data, which is especially related to the fact that Linked Data often comes forth from research projects.

  1. How can we model our datasets in such a way that they integrate well with existing Linked Data? Which are the (preferred) vocabularies or modelling strategies to choose from? And where can these be found and/or published once created?
  2. Where do we make our datasets accessible? Where can we host our Linked Data so that it can be accessed by others interactively?
  3. How can we make others interact and reuse our data on the level of vocabulary, thesauri, or resources?
  4. How can we construct links between datasets generated in research projects, on the level of individual resources?
  5. Where are datasets that are created in the time span of a research project hosted, curated, and kept alive/accessible after the project finishes?

55) Relationship between Illustrations and Texts of 19th Century Children and Youth Literature: Linking Topic Models and Computer Vision

Chanjong Im1 , Mohana Dave2 , Vraj Shah2 , and Thomas Mandl1
1 University of Hildesheim, Germany
Kadi Sarva Vishwavidyalaya (KSV), Gandhinagar, India

For Digital Humanities, the development of tools and methods for Distant Viewing, which stands for the automatic analysis of large amounts of visual data with AI algorithms is still an emerging research field. We propose a method for combining text and image analysis results into a joint topic model for exploring illustrated books.
This research is built around historic Children and youth literature which typically contains more images than adult literature. We derived clusters using both text and illustrations in a collection of 623 picture books from the 19th century. The topic models based on the text for each cluster are further developed and in parallel correlation analyses were conducted.
It is important to include both textual as well as visual information in research. Particularly for finding topics and gaining an overview of the content of the collection this approach is helpful.
An analysis of the proceedings of the DH conference shows that there are still few articles on visual data, however, their number is growing. The identification of objects within images or illustrations can be seen as a subset of this task (Crowley & Zisserman 2014). Very different results can be expected even within e.g. various books of a genre (Mitera et al. 2021).
In the last years, considerable progress has been made in image processing, especially through approaches of so-called Deep Learning. Convolutional neural networks (CNN) are a technology within the deep learning paradigm that has been developed to object detection. One system is the algorithm You Only Look Once (YOLO) (Redmon et al. 2016).
The joint analysis of text and images has been achieved by e.g. common embeddings (Chen et al. 2020) or the integration of knowledge graphs (Li et al. 2020).

39) Getting Cultural Heritage Data Easier: A New Service by Europeana (POSTER)

Alba Irollo, and Hugo Manguinhas
(Europeana Foundation)

As libraries, archives and museums make progress in embracing the digital shift, cultural heritage data becomes increasingly central to teaching, learning and research in all the fields of the humanities. At the same time, new requirements in these fields push heritage institutions to improve the features of their digital collections.

The interest in the so-called digital humanities arises at very different levels. It ranges from the use of simple ready-made tools to complex projects that are interdisciplinary by nature and require the involvement of IT technicians, digital scholarship librarians or experts in computer sciences. Only large amounts of data pave the way to experimentations with computational methods and digital tools that can shed new light on traditional disciplines.

Besides the Europeana APIs, Europeana has recently added to its offer a new service that aims to meet the needs of researchers who are more interested in data processing and are need to get access to cultural heritage data in an easier way. The new service offers all the Europeana metadata resources in XML (and Turtle), including links to the content, for bulk download. The DH Benelux Conference will be the occasion to present this service for the first time in an international academic and research context. 

58) Unsilencing the VOC Testaments Using Digital Methods to Decolonise the Archive

Charles Jeurgens (Universiteit van Amsterdam), and Mrinalini Luthra (Universiteit van Amsterdam)

The use of digital methods is emerging in the discipline of history, but until now less visible in the archival discipline. The ongoing project ‘Unsilencing the VOC testaments’ is a showcase of how digital methods can be applied in creating new avenues to access records from the past and contribute to the current debates on decolonising archives especially those concerning silences in the archive (Jeurgens and Karabinos 2020, Johnson 2017, Trouillot 1995) . The Dutch East India Company (Vereenigde Oostindische Compagnie: VOC) offered jobs to thousands of people as sailors, soldiers and servants working in the trading posts. Since many people died while serving the VOC, it was mandatory for VOC servants to notarize a will. Copies of the testaments drafted at VOC ships and VOC trading posts had to be sent to the headquarters of the Company in Amsterdam. Nowadays these testaments are important sources to get a glimpse into the private lives of common people. The historical value of these documents was soon recognized and already in the 19th century archivists decided to create an index to improve access to these wills. This resulted in an index of approximately 10,000 names of male testators. …

19) Dutch television history through the lens of digitized viewing reports: a reconstruction of television viewing and programming during the oil crisis in the 1970s (POSTER)

Jasper Keijzer (Utrecht University), Jasmijn Van Gorp (Utrecht University), and Alexander Badenoch (Utrecht University/VUAmsterdam)

Practical experimentation
Advances in digital technologies have laid the groundwork for previously impossible research in the
Humanities in general and in media history specifically. Media archives have innovated to account for
the vast amount of digitally native and digitized collections (De Leeuw 2012, 3-11). An array of digital
tools are being developed and refined to navigate these data collections. As Digital Humanities
increasingly moves toward doing digital research, methodological challenges will necessarily surface.
This project takes a practical step towards a new digital historicism, which requires, as Fickers (2012,
25) remarks, a ‘scholarly rigor’ in order to convert methodological roadblocks into stepping stones in
the process of creating new methodologies.
This research project aims to understand Dutch Television History through the lens of
digitized viewing reports, programming schedules, and broadcasts of Dutch Public Service
Broadcasting (PSB) as archived by the Netherlands Institute for Sound and Vision. The study uses the
Media Suite platform of CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities)
to access these digitized collections. The Media Suite is an innovative digital research environment
that facilitates access to key Dutch media collections with advanced multimedia search and analysis
tools (Ordelman et al. 2018). The transparent search and analysis tools allow for many new research
possibilities in an environment designed for both an algorithmic literate and a data novice, making it
specifically suitable for experimenting with new ways of working with multimedia data collections.
In this project, we are the first to make use of a new digitized collection added to the Media
Suite in 2020: digitized viewing reports, as held by ‘Stichting Kijkcijferonderzoek’ (Dutch foundation
of viewing rate research). The collection holds weekly and yearly reports on the audience rates of
television, radio, and commercials from 1967 to 1997. As such, the collection also features the
evolution of data collection methods in the Netherlands from viewing-diaries to the electronic
Intometer and eventually the digital Peoplemeter.
As a case in point, this research takes the public service television history, and specifically
its audiences and programming, during the Dutch Oil crisis of 1973. Through an empirical analysis of
digitized viewing rates, programming schedules, and policy documents, we will reconstruct the
relation between television programming and viewing behaviour during a time of perceived crisis.
Specific attention will be given to viewing and programming during the so-called “Car-free Sundays”,
a restriction imposed by the Dutch Government to deal with the scarcity of oil after multiple Arabic
countries boycotted the oil trade. For ten historic Sundays, the Dutch Government banned travel by
automotive vehicles of any kind, forcing modern society to a grinding halt, and supposedly creating a
‘captive’ television audience. Using the data collections, as well as search and visualisation tools
provided by the Media Suite, our research shows changing patterns in viewing behaviour during a
unique moment in Dutch television history. As such, it exemplifies how doing digital media
historiography can be a reflexive, hands-on approach to develop and refine new methodologies for
understanding multimedia datasets through digital tools.
Fickers, Andreas. 2012. “Towards a New Digital Historicism? Doing History in the Age of Abundance.”
VIEW (1): 19– 26. DOI:
Leeuw, Sonja de. 2012. “European Television History Online: History and Challenges.” VIEW 1(1):
Ordelman, Roeland., Melgar, Liliana., Martinez-Ortiz, Carlos., Noordegraaf, Julia and Blom, Jaap., et
al. 2019. “Media Suite: Unlocking Archives for Mixed Media Scholarly Research.” Selected papers
from the CLARIN Annual Conference 2018. Linköping Electronic

52) A thesaurus-based semantic annotation framework for Danish

Ross Deans Kristensen-McLachlan

Semantic analysis of natural language is a fundamentally important task for a wide range of research disciplines. Many of the most successful current approaches to this problem draw on unsupervised deep learning which is essentially language agnostic. However, such approaches have their limitations. In many cases, the computational algorithms are both datahungry and computationally intensive. Moreover, the sophisticated mathematical nature of these models often means that they are out of reach of many researchers in the humanities and social sciences, resulting the in infamous ‘black box’ problem.

51) Unveiling the character gallery of Danish sermon

Ross Deans Kristensen-McLachlan, Anne Agersnap, and Kirstine Helboe Johansen

Literary worlds centre on a gallery of characters situated in temporal and spatial contexts, in which other subjects and objects exist (Vosmar 1969, 89-101). More generally, within narrative theory it has been argued that humans fundamentally understand the world through narratives (Fludernik 1996; Herman 2004; Nielsen 2010: 173). As such, narratives can establish coherence between past and present (Polkinghorne 1988: 21). In this paper, we ask a number of questions related to the unfolding of a character gallery in contemporary Danish sermons: Which characters populate Danish sermons? How do pastors adapt these characters in their sermons? What does the encounter between characters tell us about preaching practices today?

2) First-Hand Accounts of War: War letters (1935-1950) from NIOD digitized  (POSTER)

Milan van Lange, Annelies van Nispen, and Carlijn Keijzer

The project ‘First-Hand Accounts of War’ is concerned with the digitization and so-called ‘datafication’ of more than 190.000 pages of historical – mostly – handwritten letters. These are not just some old letters; the letters are part of the archival collection of the NIOD Institute for War, Holocaust, and Genocide Studies. The war letters collection contains correspondence from the period before, during, and after World War II. The letters give a first-hand account of experiences from people of all layers of society. There are letters from Jewish people in hiding and camps, letters from collaborators, letters from people in forced labor camps – to name but a few.

26) The Mediascape of Dutch Chroniclers (1500 – 1850)

Alie Lassche, and Roser Morante

This study explores the media landscape of early modern Dutch middle class people, using a large corpus of Dutch chronicles. Previous studies have already shown how early modern Europe witnessed an eruption of news, due to developments such as the burgeoning business of print, the professionalization of postal networks, and the invention of the newspaper (Pettegree, 2014; van Groesen and Helmers, 2016). The production of news by various sources is therefore a well researched topic, yet little is known about the spread and reception of this news. Studies performed had a qualitative approach and often focused on one type of source (van Groesen, 2016). What is missing is a quantitative study in which the variation of used sources in the early modern period is studied. The current study aims to fill this lack. We focus on exploring whether the use of computational methods allows us to get more insight in the sources early modern Dutch chroniclers used, as well in the variety of information these sources were bringing.

48) Literary criticism 2.0: A digital analysis of the professional and community-driven evaluative talk of literature surrounding the Ingeborg Bachmann Prize (POSTER)

Gunther Martens, and Lore De Greve (Ghent University, Belgium)

In recent times, the knowledge of a limited number of professional literary critics has been challenged by technological developments and the “wisdom of the crowds”. Ample research has been devoted to shifts in traditional gatekeepers, such as hybrid publishers (Vandersmissen 2020) and prizes (English 2009, Sapiro 2016), and to the demise of professional critics’ authority at the hands of online literary criticism (Dorleijn et al. 2009, Löffler 2017, Schneider 2018; Kempke et al. 2019, Chong 2020). Nevertheless, comparatively little research (Allington 2016, Kellermann et al. 2016; Kellermann and Mehling 2017; Bogaert 2017, Pianzola et al. 2020) has actually attempted to directly ingest and mine the content of user-generated online literary criticism, as well to examine and the role of peer-to-peer recommendation systems and layperson critics as new literary gatekeepers and cultural transmitters. This project aims to study the differences between professional critics and this ‘wisdom of the crowd’, especially since traditional gatekeepers of the literary field (publishers, reviewers) are increasingly trying to tap the potential of online reading communities.

We will present the preliminary results of the FWO-funded research project “Evaluation of literature by professional and layperson critics… .

14) BESOCIAL – Towards a sustainable strategy for archiving and preserving social media in Belgium (POSTER)
Alejandra Michel, CRIDS (University of Namur), Anastasia Dimou, IDLAB (Ghent University), Eva Rolin, CENTAL (UCLouvain), Eveline Vlassenroot, MICT (Ghent University), Fien Messens, KBR Friedel Geeraert, KBR, Julie Birkholz, GhentCDH (Ghent University), Patrick Watrin, CENTAL (UCLouvain), Peter Mechant, MICT (Ghent University), Sally Chambers, GhentCDH (Ghent University), Sophie Vandepontseele, KBR Sven Lieber, IDLAB (Ghent University)

Keywords: Web Archiving, Research, Social Media Archiving (SMA), Social Media, Interdisciplinarit

The amount of digital data we produce every day is mind-boggling. The content produced on social media fuels the flood of data. However, social media content is very ephemeral and is often not archived by universities or heritage institutional stakeholders, creating serious challenges for (digital) scholars who want to use the content of archived social media as a source. Also, the absence of social media in national web archiving initiatives raises great challenges of preserving our digital heritage and supporting the understanding of contemporary society and its future.

This poster abstract introduces BESOCIAL (2020-2022) – launched by KBR and financed by BELSPO1 -, a cross-university collaborative and interdisciplinary project with a goal to develop a sustainable strategy for archiving and preserving social media in Belgium. This project builds on the PROMISE research project (2017-2019), which developed a strategy for archiving Belgian websites. Together with an interdisciplinary team – CRIDS2 , CENTAL3 , IDLab4 , GhentCDH5 , and MICT6 – , a Belgian opening in the social media archiving landscape is made. The goal is to create a strategy for analyzing social media expressions from a linguistic, cultural, historical, … perspective. In this way, this “new” form of born digital data will be preserved and searchable as part of our cultural heritage.

13) When no news is bad news – Detection of negative events from news media content

Kristoffer L. Nielbo1,2, Frida Haestrup1 , Kenneth C. Enevoldsen1 , Peter B. Vahlstrup1 , Rebekah B. Baglini2 , and Andreas Roepstorff2
1Center for Humanities Computing Aarhus, Aarhus University, Denmark
2 Interacting Minds Centre, Aarhus University, Denmark

Keywords: Newspapers, Pandemic Response, Bayesian Change Detection, Information Theory

During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned out to be quite persistent as the pandemic continued, resulting in the, from a news media’s perspective, paradoxical situation where the same news was repeated over and over. This information phenomenon, where novelty decreases and persistence increases, has previously been used to track change in news media, but in this study we specifically test the claim that new information decoupling behavior of media can be used to reliably detect change in news media content originating in a negative event, using a Bayesian approach to change point detection.

20) Black Box Computer Vision Models in Archives: A Siren’s Call

Nanne van Noord, and Julia Noordegraaf (CREATE – University of Amsterdam)

Beyond developments that are focused on applying Computer Vision (CV) techniques to curated visual corpora in DH research (Arnold & Tilton, 2019; Wevers & Smits, 2020), there has also been an increase in efforts to make CV models an inherent part of, in particular AV, archives (Lincoln et al., 2020; Masson et al., 2020; van Noord et al., 2021). Although AV archives can be considered curated visual corpora themselves, they differ in aim from corpora selected to answer specific research questions in that their mission is to be a comprehensive resource for a specific domain or topic. For instance, the focus of the Eye film museum “is on films and objects that say something about Dutch film culture”, whereas the Netherlands Institute for Sound and Vision aims to document the Dutch media landscape is a broad sense (from written press, to TV material, to computer games). A consequences of such ambitions is that the volume of material to be archived is huge, the archive of Sound and Vision contains about 1 million hours of AV material for example. Due to this scale, AV archives are often difficult to explore and navigate, finding all the material relevant to a question can be near impossible. Because of these difficulties the promise that CV offers, of being able to automatically generate rich content-related metadata at scale, sounds incredibly alluring. Yet, in some sense this promise might actually be a siren’s call.

28) Charting Kaebyŏk’s Semantics Creating a topic model of a 20th century Korean magazine

Aron van de Pol (Leiden University)

In this paper I utilise topic modelling on the Korean magazine Kaebyŏk (開闢), to investigate how socialist thought grew within it. This research indicates socialist/leftist ideology was more prevalent and significant than originally believed.

7) Modeling Ontologies for Individual Artists A Case Study of a Dutch Ceramic-Glass Sculptor

Daan Raven1,2, Victor de Boer2 , Erik Esmeijer, and Johan Oomen1
1 Netherlands Institute for Sound and Vision, Hilversum
2 Vrije Universiteit Amsterdam

There is a long tradition in the Cultural Heritage domain [1,2] of using structured, machine-interoperable knowledge using semantic methods and tools. However, research into developing and using ontologies specific to works of art of individual artists is persistently lacking. Such knowledge graphs would improve access to heritage information by making reasoning and inferencing possible. We present a re-usable method, building on the ‘Methontology’ method for ontology development [3]. We describe the steps of specification, conceptualization, integration, implementation and evaluation in a case study concerning ceramic-glass sculptor Barbara Nanning1 . The ontology specifies activities of the creative and productive processes. Challenges were to model design and production processes in general and particular glass-techniques specifically. We reuse existing vocabularies including those in the Termennetwerk2 . We present the overall method and results, details and intermediary results can be found in [4,5].

47) The PEACE PORTAL: Revisiting the Sea of Stone

Ortal-Paz Saar (Department of History and Art History, Utrecht University)

Department of History and Art History, Utrecht University Research Paper At the DH Benelux conference of 2017 I presented a poster titled “A Sea of Stone: Digitally Analyzing Jewish Funerary Inscriptions”. The poster described a new digital initiative: an international portal of Jewish funerary culture, part of which would be epigraphic and explore epitaphs from the sixth century BCE to the present day. The portal has recently been launched ( It consists of four sections, of which the epigraphic one, containing over 40,000 inscriptions, is fully functional. This paper will describe the PEACE portal focusing on its epigraphic section: content, aims, challenges and future goals.

PEACE stands for Portal of Epigraphy, Archaeology, Conservation and Education on Jewish Funerary Culture. Funerary culture refers to the array of practices pertaining to the last point in every person’s life: burial methods, epitaphs, rituals, and iconography are some of these. While seemingly pertaining to the dead, funerary culture is very much a part of life. It is the living who choose how to separate from their loved ones, to commemorate them, to emphasize particular themes in their epitaphs and to record specific details, such as the date of birth or the date of death, to mention how they died or to express the hope that they will rest in peace. Even though sharing a religion can also mean sharing some cultural traits, the funerary traditions of Jews living in third-century Palestine differed from those of their co-religionaries in third-century Italy, and both differed from those of Jews in medieval Germany. The PEACE portal aims to shed light on Jewish practices related to death and parting from the deceased, as well as provide information on the conservation of funerary heritage and education about it. In other words, it is meant to serve as a hub of Jewish funerary culture.

In its first stage the portal concentrates on epigraphic data, which will also be the focus of this paper. Currently, PEACE encompasses four epigraphical databases:…

54) Online, Blended, and Flipped Classrooms for Covid-19 and Beyond (PANEL)

Susan Schreibman (Maastricht University, the Netherlands), Zsolt Almási (Pázmány Péter Catholic University, Hungary), Amelia del Rosario Sanz Cabrerizo (Complutense University of Madrid, Spain), Esther Kamara (Maastricht University, The Netherlands), Marianne Ping Huang (Aarhus University, Denmark), Costas Papadopoulos (Maastricht University, The Netherlands), and Claartje Rasterhoff (Maastricht University, The Netherlands)

In Spring 2020 the Covid-19 pandemic upended the teaching practice of educators around the world, from elementary school teachers to university professors. From one day to the next, rather than teaching using many methods that date back to the nineteenth century, we were forced online, utilizing software developed for business meetings rather than classroom teaching. Digital Humanists have been at the forefront of integrating digital methods into our teaching practice. While most digital humanists have experimented with and adopted hybrid pedagogies in both formal and informal settings, even educators who teach with technology in face-to-face classroom settings were caught off guard by the challenges of moving our entire practice to remote teaching. …

This panel will explore new models of research-led, project and problem-based pedagogic practices (Kuladinithiet al., 2020; Jia et al., 2021) utilising #dariahTeach. #dariahTeach is a platform for Open Educational Resources (OER) for Digital Arts and Humanities educators and students, for those in the cultural and creative industries, to be used both in formal and informal educational settings. As such, a key objective of #dariahTeach is sharing and reuse, thus becoming a site for educators to publish their teaching material and for others to re-use it. It was also designed for those outside formal educational settings (lone learners) to study DH theories, methods, and skills. #dariahTeach was thus designed tocreate a symbiotic relationship between formal learning, professional practice, and societal engagement as a way to create a multimodal educational space, to open up education, especially for teaching in blended and flipped classroom settings It does this through a platform and a pedagogic philosophy that fosters active learning, research-led teaching, and critical making. It encourages students to understand the society-technology relationship through engaging with a digital platform e.g., not learning about technology by reading books or articles, but learning by and through multimodal content that provides ample opportunity for self-directed learning.

61) Digital Manuscript Platforms to support online learning

Laurents Sesink, Ben Companjen, and Peter Verhaar
(Leiden University Centre for Digital Scholarship)

Digital Humanities Education and Digital Education in the Humanities Education in the Humanities often aims to illuminate the nature, the history or the reception of primary sources managed by libraries, museums or archival institutions. While the more precise focus of the courses on such sources may vary, it can be observed that lecturers and students working with historical artefacts are typically engaged in a limited number of generic activities. Within the courses they follow, and as part of the research they carry out, scholars and students often make transcriptions, add annotations, produce translations or describe the historical background and the provenance of specific historical documents.1 Such groups of scholars and students who are interested in working with specific sources, and who perform this work through a series of prototypical activities can be referred to as sources communities. Now that large portions of the collections of heritage institutions have been digitised, lecturers and students may also choose to work with digital manifestations of these artefacts, rather than with the physical originals.2 In many cases, the use of such digital manifestations has also been enforced by the fact that many heritage institutions needed to close their doors due to Covid-19.

37) Mimesis and the importance of female characters A comparative social network analysis of Dutch literary fiction, 1960s vs 2012

Roel Smeets (Radboud University), Jurrian Kooiman (Utrecht University), and Nils Lommerde (Radboud University)

Mimesis is among the oldest and most fundamental concepts of literary theory. Since Plato’s introduction of the term in the Republic it has continued to exert influence over theories of artistic representation. As Derrida wrote: ‘the whole history of the interpretation of the arts and letters has moved and been transformed within the diverse logical possibilities opened up by the concept of mimesis’ (cited in Potolsky 2006: 2). At first glance, the idea that literature imitates life makes sense as authors often seem to write about the world around them. However, the history of literary theory has witnessed a diverse range of attitudes towards this seemingly clear idea. While both Plato and Aristotle take their cue from the belief that art mirrors reality, they draw different conclusions as to the moral aspects of artistic representation. For Plato, the imitative nature of literature is a reason to ban poets and artists from the perfect city. As a mere copy of a copy, literature is illusory and deceptive. By contrast, Aristotle sees artistic imitation as perfectly ‘natural, rational and educational’ and even ‘beneficial’ (Potolsky 2006: 46). It does not merely copy the real; it has the potential to reveal universal truths and produce cathartic effects in human beings.

This paper contributes to the longstanding discussion on the imitative dimension of literary representation by approaching it from a computational and statistical perspective. More specifically, it explores the potential of social network analysis for studying the ways in which societal dynamics are realistically reflected in products of literary fiction. Using data-driven methods it thus draws on a tradition of literary criticism that prevailed between the 1930s and 1950s and that has recently been revitalized by scholars working with character network analysis (e.g. Smeets forthcoming, Selisker 2015, Labatut & Bost 2019. …

42) Discovering Pandemic Topics on Twitter (POSTER)

Erik Tjong Kim Sang1, Marijn Schraagen2, Mehdi Dastani2, and Shihan Wang2
1Netherlands eScience Center, Amsterdam, The Netherlands
2Utrecht University, Utrecht, The Netherlands

Discussions on social media reflect and potentially influence discourse in society as a whole [5]. Therefore, knowing which topics are discussed online can be a valuable proxy for measuring opinions in society, especially when validated against other types of data such as questionnaires [8]. In this paper we examine three different methods for finding topics within COVID-19 discussions on Twitter: hashtags, the t-test and Latent Dirichlet Allocation. We present an overview of the topics found in COVID-19 tweets during the first year of the pandemic. Next, we build a query using these topics and show that it finds 136% more relevant tweets than a basic query, thus substantially improving our ability to study social media discussions related to the pandemic.

8) Is your OCR good enough? Probably so. An assessment of the impact of OCR quality on downstream tasks for Dutch texts

Konstantin Todorov (University of Amsterdam), Mirjam Cuper (National Library of the Netherlands), and Giovanni Colavizza (University of Amsterdam)

Keywords: OCR, Dutch, Natural Language Processing, Machine Learning, Text Analysis

We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks: document classification and document clustering via topic modelling. We find that for both topic modelling (using LDA) and document classification (using a variety of methods, including deep neural networks), working with an OCRed version of a corpus does not in general compromise results. On the contrary, it may sometimes lead to better results. While more work is needed, including on evaluating different datasets and methods, our results further confirm previous work in suggesting that the quality of existing OCR is often sufficient to apply machine learning techniques.

17) Linking Women Editors of Periodicals to the Wikidata Knowledge Graph (POSTER)

Marianne Van Remoortel (Department of Literary Studies, Ghent University, Belgium), Julie M. Birkholz (Ghent Centre for Digital Humanities Ghent University and KBR Royal Library of Belgium’s Digital Research Lab, Belgium), Pieterjan De Potter (Ghent Centre for Digital Humanities, Ghent University, Belgium), Katherine Thornton (Stories Services Collaborative), and Kenneth Seals-Nutt (Stories Services Collaborative)

Stories are important tools for recounting and sharing the past. To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the example of the WeChangEd research project on women editors of periodicals in Europe from 1710 – 1920 we detail how to move from archive, to a structured data model and relational database, to a Linked Open Data model and make this information available on Wikidata, to the use of the Stories Services API to generate multimedia stories related to people, organizations and periodicals. This resulted in the WeChangEd Stories App, .

29) A Toe in the Water –Introducing Students to Digital Humanities at Leiden University Library (POSTER)

Vincent Victor Wintermans (Leiden University)

Keywords: Education for Digital Humanities – Beginners – Role of Subject Librarians

Digital Humanities (DH) is a daunting subject for absolute beginners. For students who have
somehow developed an interest, there is no lack of tutorials, introductory courses, summer
schools and certificate programmes for DH. But how to kindle this interest in the first place?

In 2020 a group of subject librarians at Leiden University began investigating the possibilities
of supporting DH at this elementary stage. In this poster presentation we will present the
subject guide that we have developed and present the plans for further teaching materials.

The subject guide covers Data Visualisation and Network Analysis, Mapping and Geographic
Information Systems, Text Mining, Text Encoding and Enrichment, Digital Image Search and
Web-based applications and intellectual property:

It is expected that research databases that university libraries purchase for their collections
will be increasingly provided with easy-to-use tools for data analysis. In connection with the
themes listed above, we present such a product (The Gale Digital Scholar Lab) and other tools
that beginners can start using immediately to analyse and visualize data, like Voyant Tool,
Palladio or FromThePage.

Making these instruments known to humanities students is a logical extension of the
introductory courses in information literacy that subject librarians have been teaching for a
long time already. Such courses will serve the double goal of increasing the use of the
databases that the library provides and to give students first inkling of the possibilities of
computer- assisted research.

These introductory courses should refer students not only tools and databases, but also give
examples of research that uses DH techniques and that can inspire the neophyte. The tools
should be basic, but the examples can be of a more sophisticated kind.
We learn from contact with colleagues from other Dutch university libraries that the need for
this elementary instruction in DH is widely shared. As part of the presentation, we briefly
compare the introductory materials for DH from university libraries of Amsterdam, Antwerp,
Leiden and Utrecht.

4) Introducing the DHARPA Project: An interdisciplinary lab to enable critical DH practice

Lorella Viola, Angela Cunningham, and Helena Jaskov

Traditional humanities research has been leveraged but also destabilised by the increasing accessibility of digitized sources and computational tools for analysis. As traditional close reading methods alone are no longer sufficient to analyse such an unprecedented mass of digital data, a plethora of platforms have appeared in order to help researchers query and visualise the networks and patterns latent in these sources. However, this flood of data and push-button technologies has also threatened to obscure through abundance and instill in scholars a false sense of mastery. Some scholars have criticized DH for its naive and starry-eyed application of computational techniques, citing not only how the uncritical adoption of black-box technologies affects substantive research but also how it reproduces a positivism at odds with the purpose of humanities inquiry (e.g., Liu 2012; Chun 2013; Jagoda 2013; Raley 2013; Allington et al 2016; Brennan 2017; Grimshaw 2018). Both uncritical DH practice and the software it employs can be compared to a “Mechanical Turk,” with the decisions and interventions made by the researcher hidden from view and only the well-oiled and seemingly autonomous product on display. These trends have constituted a crisis for humanities scholarship but also an extraordinary opportunity to transform the field.

In this presentation, we introduce the Digital History Advanced Research Projects Accelerator (DHARPA), a diverse and interdisciplinary research and development laboratory based in the University of Luxembourg’s Centre for Contemporary and Digital History (C²DH). While software development is central to our project, our aim is not merely to build more tools but to encourage methodologies that self-reflexively examine the interaction of technology and historical practice. We want to show how the application of expertise works in tandem with technology to produce knowledge, how digitally enabled research is not a product but rather a process, reliant on the critical engagement of the scholar. We want scholars to open the black box and to be empowered to tinker with what’s inside.

6) Developing New Methods for Analysing Ephemeral Digital Primary Sources (PANEL)

Anne Marieke van der Wal (Leiden University)

When introducing students of history and other humanities disciplines to the methodology of primary source analysis, most instruction programmes focus on ‘traditional’ source analysis practices by physically going to an archive and analysing printed or handwritten textual sources. Studies from the American Historical Association (AHA) have shown that whereas most historians use digitalized primary sources in their research (Putnam: 2016, 388), they are nevertheless apprehensive of using digital/online sources in their class materials (Townsend: 2010, 26), even though ‘the first, and often the only, place students of any age look for primary sources is online’ (Mills Kelly, 10). Dutch history teachers have recently warned that students today are ill prepared for evaluating the authenticity of online information (Gortworst & De Vries: 2019). What’s more, students are often unaware of the archival bias and source inequity that continues to exist in the digital environment, i.e. the gap in archival collections and digitalization of collections caused by selection and curating processes, that limits the availability of non-Western and non-Anglophone sources (Putnam: 2016, 389). Nor are they enough challenged to critically reflect on the hegemonic structures of information dissemination online (Cohen & Mihailidis: 2013). …

This panel aims to stimulate a discussion with scholars working in the historical discipline for developing a revision of the standard source critique methodology skills training courses currently offered in academia based on the results from the project. As such it invites other historians or scholars from the Humanities and Social Sciences working with primary sources to contribute to this panel and send in their proposal for a presentation and research paper.

41) Cross-Register Authorship Attribution Using Vernacular and Classical Chinese Texts

Haining Wang1 , Xin Xie2 , and Allen Riddell1
1 Indiana University Bloomington
2Shanghai Normal University

Today vernacular Chinese fiction from the Ming and Qing dynasties (1368 to 1912) is widely regarded as the pinnacle of Chinese literature. At the time, however, composing in vernacular Chinese was regarded as unorthodox. Classical Chinese was the privileged register. For example, official documents were all composed using classical Chinese. Classical Chinese can be understood as preserving the grammar and semantics of Chinese as it was used before the Qin period (i.e., before 221 BC). Written vernacular Chinese evolved from this version of Chinese. The differences between the two versions of Chinese are considerable. Relative to vernacular Chinese, Classical Chinese has a “denser” lexicon (words tend to consist of a single character), more frequent part-of-speech ambiguity, more variation in part-of-speech order. Frequently, especially during the Ming and Qing periods, the boundary between vernacular and classical Chinese is not clear. Many texts mix the two registers together in various ways. For example, dialog in classical texts often resembles the vernacular equivalent. Vernacular fiction also has a tradition of opening and closing a chapter with classical verse. The boundary blurs further when classical grammar was mixed with the vernacular lexicon at the end of the Qing dynasty.
In the Ming and Qing dynasties, most vernacular fiction, including some masterpieces, were published anonymously or under a pseudonym. The authorship of these novels has puzzled scholars for more than a century.
By characterizing writing styles of specific authors, authorship attribution enables inferences about the likely author of texts of unknown authorship. …

46) Linked Data on historical persons: publishability, interconnectivity and sustainability (PANEL)

Leon van Wissen (University of Amsterdam, UvA), Richard Zijdeman (International Institute of Social History, IISG), Rick Mourits (Radboud University, RU), Ivo Zandhuis (International Institute of Social History, ISG), Lodewijk Petram (Huygens ING), and  Laura Hollink (CWI Amsterdam Type)

Keywords: Linked Open Data, Vocabularies, Infrastructure, Data Reuse, RDF, Interdisciplinary Datasets

We have come a long way since the floppy-drive based datasets researchers would refuse to share. Actually, quite a lot of data is coming out as Linked (Open) Data, but this presents us with new challenges. Specifically for the case of publishing, connecting and sustaining Linked Data on persons we pose five challenges that we would like to discuss with the community, especially now that the problems we face is becoming more widespread, as more and more cultural heritage institutions open up their collections, and digitization initiatives of archives take flight.

32) Linking Linked Data tools: an example in creating social historical data

Ivo Zandhuis (International Institute of Social History)

Linked Data for 19th-century social history Archival sources, digital processible text/data and relevant literature are the raw materials a historian uses to construct his story and analysis. All these building blocks are interrelated: the primary source is the basis for a dataset or processible text, data analysis is part of the argument made in a historical paper, of which its footnotes contain references to sources and literature. In order to create a thorough and reproducible historical reconstruction or explanation, all these building blocks and their relations should be stored in a network (or ‘graph’) that is reusable and processible in an automatic way. Because of its gradual adoption, Linked Data is the most obvious technique to use in this endeavour [1].
For my project, I’m building such a graph for research into a Dutch phenomenon called ‘typografische verenigingen’. During the first half of the 19th-century print labourers in The Netherlands (‘typografen’) organized themselves in these local associations, comparable to the English ‘friendly societies’ and the French ‘sociétés mutuelles’, of which many towns and cities had one or two instantiations. These associations were founded to ensure sickness benefits and organize a yearly feast to celebrate their identity as ‘children of Laurens Coster’, the Dutchman they believed invented printing. They were connected in a nationwide network and organized the erection of a statue in Coster’s honour in 1856. Eventually, this led to the establishment of the first national union in The Netherlands in 1866 [2]. I’m interested in how this phenomenon developed through time, until its cancellation in the 20th century.

In this paper, I present the first results in creating the graph that will help me investigate this subject and create reproducible research. It is not about the subject itself, but about how I combine tools to create the graph. As an example, I demonstrate the data about Jan Hendrik Regenboog, bookbinder and board member of the typographical association in The Hague in the 1850s.