Prof. Dr. Werner Wegstein, Senior Professor at the University of Wurzburg, on the beginnigs of TextGrid, about community building and the beauty of TEI

When did you start conceptualizing TextGrid?

Wegstein: Strictly speaking, I didn’t start anything. Somehow the concept simply emerged. Maybe it was Heike Neuroth who initiated it, by looking for possible partners with whom to submit a project proposal in reaction to a funding call from the Federal Ministry of Education and Research (BMBF). In Wurzburg we have been using data processing for linguistic and literary research since the seventies, compiling a considerable number of digital text editions and linguistic data. Through holding conferences in Wurzburg, networking and collaborating with other national and international digital researchers, we were able to look for new ways to undertake data processing, appropriate to the philological needs of the humanities research. And then, one day in 2004, there was Ms Neuroth sitting on our red sofa in the German Languages Department in Wurzburg, and together, with colleagues from other universities, discussed what is necessary for putting together a promising project proposal and what preparatory work the various institutions could contribute to such an endeavour.

So you designed the TextGrid-project together?

Wegstein: The term “designed” seems to suggest a clearer set of objectives than we actually had. The way I remember things was that the goal of the whole project only took a definite shape in the process of working on the proposal itself. Even the name “TextGrid” was coined rather late, during a planning session in Trier. The eventual success of the project proposal is – in my view at least – mainly due to the inspirational collaboration of all the participants, who came from a range of institutions with very different backgrounds and research interests, that, at a first glance, came together by chance; on the one hand, specialised academic researchers of Modern as well as Medieval German Literary Studies, and Linguistics with their varying search methods and interests, yet a common approval of the applying of digital methods in the Humanities, on the other hand, an institution, like the Goettingen State and University Library, which as a library not only accounts for the increasingly important sector of metadata in a digital environment, but furthermore provides a sustainable organisational foundation for the project. A professor, in an academic department, cannot sustain a project of this scale – only institutions, such as libraries, can undertake such a task. In a way, the Institute for the German Language Mannheim, (IDS) is the icing on the cake. As a central, non-university organisation for the research and documentation of the German language, it is ideal partner for a project, which is looking to adapt grid technologies for the digital preservation of linguistic cultural heritage. The technical implementation of new technologies is of course carried out by computer scientists: two university-related IT companies handle the authentication and the connection to the computational grid. That’s basically the full set of competences that are required in order to initiate a project of this kind in the Humanities.

Why do humanists need grid power?

Wegstein: At first we weren’t so sure about this either. It seemed obvious that a fail-safe storage grid would be ideal for collaborative work on the enormous amount of our cultural heritage data. Especially when you – as we do – intend to transfer not only language data into digital form, but also the physical material in which that data has been passed down to us: manuscripts, incunables, prints and whatever may have been used as linguistic information carrier – and all that in archival quality, quasi as a substitute for the original document. The further however we proceeded with structuring digital language data into very fine granulation, the clearer it became that, for complex analysis of language data in corpora as extensive as the IDS’s, enormous processing power is indispensable. This absolutely requires the application of a computing grid in the Humanities, unless you’re willing to wait weeks for any results.

Have any of your expectations of TextGrid been disappointed?

Wegstein: Not as far as content is regarded. But we had to revise our initial ideas of the time frame, required to develop a programme package that is ready for production. We simply underestimated the complexity of the task. On the other hand, for an academic research cooperation, the pace of work was still rather respectable and probably compares well internationally. And not only the pace of work: I especially valued the working atmosphere within the TextGrid project: a pleasant and inspiring co-operation, motivating and harmonious, notwithstanding controversially discussed objective differences.

What do you yourself hope to achieve through TextGrid?

Wegstein: Within the project I represent the users; I apply tools, which have been developed in TextGrid, in research projects in order to convince sceptical colleagues of the benefits of the TextGrid project. My first test case will be the edition of Joachim Heinrich Campe’s Dictionary of the German Language, an extensive dictionary of rather complex structures, which are however not always consistently applied. It is an ideal example of complicated electronic encoding according to international standards; an encoding with a consistent structure that can be validated.

For the digitalisation you have to encode the entire dictionary according to the TEI encoding scheme. Doesn’t that become somewhat monotonous?

Wegstein: Not at all. For me, TEI-encoding is like writing an orchestra score – even though the comparison is flawed with respect to the proportions of linearity and hierarchy. An orchestra score visualises the structures of the complex concurrences that collectively form a sound in a way similar to the way in which TEI-encodings represent the complex structure of a text. The first thing you would notice is a mass of angle brackets. But they can be arranged; they don’t need to come at you in waves. To me they are like a symphonic score, with the various musical instruments in different tags – and now the trombones join in, now these, now those – all in angle brackets. And as a whole their concurrence creates something brilliant.

To the outsider TEI is as inscrutable as written music to the non-musician: an accumulation of dots and dashes, no music at all…

Wegstein: This is why I recommend beginning with the text instead of the brackets. First, you contemplate the text, decide what exactly you are trying to find out, how best to visualise that, and how to make it verifiable. Seen in this light, I don’t consider the restrictions lid upon me by the XML format and TEI- encoding logic as a straightjacket but as a philologist’s challenge. I’ve been assigned the task of designing an encoding that reveals the structures of the text.

What’s TextGrid’s next task/homework?

Wegstein: We have to find a permanent funding structure and we need to promote community building. There’s still a lot to do. – I concur with Nancy Ide, who in 1999, in the introduction to the first TEI-guidelines P1, stated that „standards cannot be imposed: they must be accepted by the community.” That’s exactly where TextGrid has to take action, since for us Humanists community building is essential. And it only works one project, one researcher at a time. Humanists are unlikely to be persuaded in bulk; you’ll have to convince them one at a time, which is really tough work. In order to win over Humanists you need to show them that the new technique you promote really works. They wouldn’t accept that from an information scientist. Hence we have to come up with good knowledge transfer concepts, possibly as flexible and dynamic as the TEI training courses you can book at Brown University. They send over a team to impart all the necessary TEI-knowledge to your project; from basics to technical subtleties. That’s the way we’ll have to spread the word about TextGrid.

Interview by Esther Lauer.

