Word matches no of texts

Вопрос 3. Digital technologies in philology.

Digital technologies continue to change our daily lives, including the way scholars work. As a result, the Classics are currently also subject to constant change. Having established itself as an important field in the scientific landscape, Digital Humanities (DH) research provides a number of new possibilities to scholars who deal with analyses and interpretations of ancient works. Greek and Latin texts become digitally available and searchable (editing, encoding), they can be analyzed to find certain structures (text-mining), and they can also be provided with metadata (annotation, linking, textual alignment), e.g. according to traditional commentaries to explain terms, vocabulary or syntactic relationships (in particular tree-banking) for intra- and intertextual linking as well as for connections with research literature. Therefore, an important keyword in this is ‘networking,’ because there is so much potential for Classical Philology to collaborate with the Digital Humanities in creating useful tools for textual work, that a clear overview is difficult to obtain. Moreover, this scientific interest is by no means unilateral: Collaboration is very important for Digital Humanities as a way of (further) developing and testing digital methods.

However, Digital technologies are useful not only for the classical philology.


Searching linguistic data, although it may also be performed using command-line tools or even standard word-processing software, generally involves the use of concordance programs. A concordance program is a computer program that minimally allows the user to specify a specific search term, along with a list of files or directories to search through, and then lists all the occurrences of the search found in these files, usually in the so-called keyword in context (KWIC) format (it highlightens the chosen word and its neighbors).


Statistics’ on Text

Doing ‘Statistics’ on text essentially involves frequency counts and establishing co-occurrence measures of lexical items in texts and interpreting them. However, even many students of linguistics still ‘operate with’ the most simple ‘schoolbook’ definitions of a word, i.e. assuming that a word is “something that is either delimited by spaces or punctuation”. Therefore, one of the most important things to do when discussing quantitative methods is to make them aware of the problems one may encounter in dealing with real life data, where the distinctions may not be so clear-cut. For English, the best way to demonstrate this aspect is probably to give examples of compounds spelt in different ways. For instance, the word ice cream occurs in the following three different ways:

word matches no of texts

icecream 28 17

ice-cream 368 174

ice cream 471 203

As the table above shows, the only compound form that would correspond to the simple definition of a word, i.e. the one without space or hyphen, is comparatively rare (although it occurs in 17 texts), and some native speakers may even intuitively judge this form to be incorrect.

(Helps to determine the right spelling of a word in a particular environment)


