Since farmers first dug up ancient bone fragments and in the fields around the Yellow River in eastern China over 100 years ago, researchers have been poring over the mysterious script carved on their surfaces.
Dating back to more than 3,000 years ago, this "oracle bone" script is the earliest known form of Chinese writing. Researching this script is very challenging: The bones, mainly ox scapula and the turtle plastrons, are fragile and often found as fragmented pieces, copies of the inscriptions made by ink rubbings can be blurry or incomplete and collections are scattered in national museums and private collections in China and around the world.
Now researchers in Beijing are using AI to fast-track the basic but necessary work of comparing each script sample with thousands of others in databases. This work is paving the way for researchers to decipher them and shed light on everything from the daily concerns of people in ancient times to how Chinese writing first developed.
"This is a great example of human-machine collaboration," said Mo Bofeng, a professor from the Center for Oracle Bone Studies at Capital Normal University, who is working on the "Diviner" project with Wu Zhirong, a senior researcher at Microsoft Research Asia.
Oracle bone inscriptions have been recognized by UNESCO's International Memory of the World Register as a valuable record of the Shang Dynasty (c.1600BC–1046BC ) people covering the dates 1400BC to 1100BC, in addition to being the earliest extant evidence of a Chinese writing system. In China, every kid learns about the oracle bones in school.
Since 1899, about 150,000 pieces have been unearthed and are now housed in more than 100 institutes around the world, according to experts behind the UNESCO nomination. With no equivalent of a Rosetta Stone as a guide, scientists have only deciphered about 1,000 of the approximately 4,000 characters categorized so far.
Up until now study of the oracle bone script has been painstakingly laborious. The earliest copies of these inscriptions were Chinese ink rubbings, while more recently the use of photography and 3D imaging technology has been used. Previously, researchers had to manually compare each image to find duplicates or overlaps, with the goal of stitching together fragments – like a jigsaw puzzle – into a more complete whole for study.
"Since a single oracle bone may have been documented several times with different levels of clarity and integrity, a lot of work is needed to relate, compare and interpret them," Yubin Jiang, a researcher at the Research Center for Unearthed Documents and Ancient Characters at Fudan University, explained.
"In the past, this burden fell solely on the shoulders of scholars with rich experience and sharp memory, but their research only led to random findings."
"Diviner has managed to complete wide-ranging duplication detection in a highly efficient, fruitful and exciting way," he added.
Wu, a researcher at Microsoft, focuses on the nascent field of self-supervised learning, a type of machine learning that does not rely on people to do manual labeling of data. He approached Mo about a year ago after hearing that the professor was experimenting with AI to study the oracle bone script. At the time, Mo was using off-the-shelf image recognition software, which only allowed a few images to be uploaded each time and required a user to pick one as a reference image.
Wu said he and another team member took eight to nine months to build a model. In November 2022, in the space of one week, the Diviner Project compared 181,134 examples of inscription rubbings across 100 databases. It not only reproduced tens of thousands of previously identified duplicates found by people but also found more than 300 new pairs.
After Wu and Mo shared the results on the website of the Pre-Qin Research Office at the Chinese Academy of Social Sciences, which has its own substantial collection of oracle bones, researchers at other institutions reached out to them for help, said Wu. The project was also featured in a special oracle bones episode on national broadcaster CCTV on January 2.
"The current project is to organize the data and recover the data to its original form by joining small fragments together," said Wu.
"With this, we hope we can move on to the final challenge – deciphering the meaning of these characters."
Those findings could have implications for different fields.
"To archaeologists, they are the cultural remains of humans. To historians, they are historical materials from the Shang Dynasty. To linguists, they are the earliest systemic Chinese characters," said Mo. Moreover, "records of solar eclipses, lunar eclipses and meteor showers found in oracle bone inscriptions can be merged with astronomy."
(The author is from Microsoft Stories Asia)