Machine learning and AI may be deployed on such grand tasks as finding
exoplanets and creating photorealistic people, but the same techniques also have
some surprising applications in academia:DeepMind has created an AI system
[https://deepmind.com/research/publications/Restoring-ancient-text-using-deep-learning-a-case-study-on-Greek-epigraphy] that helps scholars understand and recreate fragmentary ancient Greek texts on
broken stone tablets.
These clay, stone or metal tablets, inscribed as much as 2,700 years ago, are
invaluable primary sources for history, literature and anthropology. They?re
covered in letters, naturally, but often the millennia have not been kind and
there are not just cracks and chips but entire missing pieces that may comprise
Such gaps, or lacunae, are sometimes easy to complete: If I wrote ?the sp_der
caught the fl_,? anyone can tell you that it?s actually ?the spider caught the
fly.? But what if it were missing many more letters, and in a dead language, to
boot? Not so easy to fill in the gaps.
Doing so is a science (and art) called epigraphy, and it involves both intuitive
understanding of these texts and others to add context; one can make an educated
guess at what was once written based on what has survived elsewhere. But it?s
painstaking and difficult work ? which is why we give it to grad students, the
Coming to their rescue is a new system created byDeepMind
[https://crunchbase.com/organization/deepmind]researchers that they call Pythia,
after the oracle at Delphi who translated the divine word of Apollo for the
benefit of mortals.
The team first created a ?nontrivial? pipeline to convert the world?s largest
digital collection of ancient Greek inscriptions into text that a machine
learning system could understand. From there it was just a matter of creating an
algorithm that accurately guesses sequences of letters ? just like you did for
the spider and the fly.
PhD students and Pythia were both given ground-truth texts with artificially
excised portions. The students got the text right about 57% of the time ? which
isn?t bad, as restoration of texts is a long and iterative process. Pythia got
it right? well, 30% of the time.
But! The correct answer was in its top 20 answers 73% of the time. Admittedly
that might not sound so impressive, but you try it and see if you can get it in
The truth is the system isn?t good enough to do this work on its own, but it
doesn?t need to. It?s based on the efforts of humans (how else could it be
trained on what?s in those gaps?) and it will augment them, not replace them.
Pythia?s suggestions may not be perfectly right on the first try very often, but
it could easily help someone struggling with a tricky lacuna by giving them some
options to work from. Taking a bit of the cognitive load off these folks may
lead to increases in speed and accuracy in taking on remaining unrestored texts.
The paper describing Pythia isavailable to read here
[https://arxiv.org/abs/1910.06262], and some of the software they developed to
create it is inthis GitHub repository