Database of Norwegian texts from the Middle Ages
Access to medieval Norwegian texts in a single database will represent a significant advancement to Norwegian research groups in the fields of language history, medieval history, grammar, lexicography and comparative syntax research.
Menotec - Medieval Norwegian Text Corpus
Status: UNDER IMPLEMENTATION
Access to Norwegian texts from the period 1150-1550 in digital form has been very limited compared to the digital collections of similar texts available in other countries. The Menotec project will change all that by bringing together all medieval Norwegian texts in a single, dedicated database. The Menotec database will be an expansion of the Medieval Nordic Text Archive (Menota), and will offer access to a substantially larger text corpus, much of which will be linguistically tagged.
The development of the database involves transcription of 1.5 million words, of which 1 million are primarily extracted from legal texts and official documents. This material will be morphologically tagged with the base form and grammatical information, while the remaining 500 000 words will be syntactically tagged. The texts will be added to the Menota archive as they are completed. Syntactic tagging is a new feature of the Menota archive, and will be carried out in accordance with tagging templates for early Indo-European languages developed in previous projects.
The morphological data will be stored in a meta-dictionary in the same format used for the Norwegian Dictionary 2014 project. This will facilitate the creation of a pan-Nordic meta-dictionary for medieval Norwegian, Swedish, Icelandic and Danish texts, with links to the national academic dictionary projects. The medieval Nordic meta-dictionary will contain semantic, morphological and comparative linguistic information extending across the languages.
The database will build on existing technology and employ international standards, facilitating its use by relevant research communities both in Norway and abroad. A new search interface for syntactically tagged texts will be developed.
The infrastructure will be of major benefit to Norwegian research groups that play a key role in research on language history, grammar, lexicography and comparative syntax, and will open the door for international collaboration, particularly in historical linguistics. The database will also greatly advance lexicography activities in Norway and the rest of the Nordic region.
It is possible that the Menotec project will be incorporated into the Common Language Resources and Technology Infrastructure (CLARIN) project, which is one of projects on the ESFRI Roadmap for which the Research Council has signalled its support. Results from the Menotec project would therefore be available in a pan-European context, extending their significance beyond Nordic borders.
The University of Bergen and University of Oslo are collaborating on the development of the Menotec database.
The Research Council has allocated NOK 7 million in funding for establishment of the database, while the partners themselves are contributing some NOK 3 million of their own in funding.
The database is planned to be completed in the course of 2012. Menota – a network of 18 Nordic archives, libraries and institutes working with medieval texts and manuscript facsimiles – will be responsible for operating and maintaining the infrastructure.