Papyrus Puzzle Project
Historical handwritten documents are still a major challenge for computer-based analysis. Unlike digital or printed documents, they are far less regular: the handwriting varies greatly, the documents sometimes show colour degradations and, depending on their state of preservation, may have holes, tears, or are completely fragmentary. Depending on the object of study, ancient languages and writing systems make the task even more difficult.
This papyrus puzzle project was part of the SNSF project Crossing Boundaries: Understanding Complex Writing Practices in Ancient Egypt (2019-2023). The object of study is a corpus of approximately 200 ancient Egyptian papyri and more than 12,000 individual papyrus fragments coming from Deir el-Medina, the workers’ settlement close to the Valleys of the Kings and Queens in Egypt and now housed at the Museo Egizio in Turin, Italy. In the course of the project, the fragments have been documented, consolidated and subsequently researched. One of the main goals is to establish links between individual fragments in order to reconstruct documents and establish relationships between different components. With such a large conglomerate of individual objects, this is an extremely time-consuming and difficult task for human researchers.
Therefore, the papyrus puzzle project aims to analyse the different fragments using machine learning approaches. There may be indications of connections between individual objects at different levels. This project examines the possibilities of machine learning to find similarities for the colour of the objects, the texture of the material, the font and the type of text written. Each of these aspects will be considered separately and similarity spaces will be calculated, eventually allowing for a hierarchical search of fragments with respect to specific aspects. This task is complicated not only by the fact that not every fragment necessarily exhibits all of these aspects (for example, objects without any text on them may be included). In addition, the documents are not usually long, detailed text compositions, but so-called heterogeneous documents, in which different texts, some with different content and from different authors, are juxtaposed on one and the same document.
In order to make the results of the research accessible to scholars in the field of ancient studies, the implementation of a reconstruction software, a "Virtual Light Table", is also part of the project. A first version for the digital processing and compilation of the fragments has already been published, and the further integration of machine learning processes for the automated analysis of the fragments will follow.