24-year-old Liu Xueyan has never seen a self-driving car, but her work has helped to develop an artificial intelligence (AI) algorithm that could power autonomous driving.
At a data annotation base two hours away from central Beijing, Liu was marking objects in hundreds of images shown on her computer screen. She zoomed in, drew a square around the shape of a bus, and added a label "bus" to the specific zone, before moving on to mark sidewalks, pedestrians or traffic signs on the images.
Liu is among the thousands of young workers at the data annotation base operated by Testin, a Chinese tech company founded in 2011 that offers AI data collection and annotation services. Data processed by the young workers will power applications as diverse as autonomous driving, public security cameras, medical diagnosis and retail.
Unlike what many believe, AI cannot learn on its own, it has to be taught. A large data set is needed to train the algorithm to find patterns and thus generate conclusions in future scenarios. But machines cannot recognize raw data. Scientists need to use clean, annotated data to train machines to learn.
"We can think of annotated data as the textbook for the machines. If content in the textbook is bad, the algorithm that is developed will have low accuracy," said Xu Kun, president of Testin, in an interview with CGTN. Algorithm with low-accuracy may incur security risks, for example, making it easier for others to falsify identity in facial recognition applications, he added.
Given the widespread application of AI across industries, the quality requirement for data annotation is on the rise – most industries now require data annotation to achieve a 99.9-percent accuracy rate. This means a left eye cannot be identified as a right eye in an image used for facial recognition, and a liver cannot be categorized as a lung in a CT scan image.
This is having wider implications in the industry traditionally populated by small data annotation farms in remote and impoverished areas of China. Employing mostly low-wage workers with little education background and minimal job training, those data annotation farms operate like assembly lines in the digital age.
But since AI companies now demand highly accurate data annotation, more professional service providers that have a reliable workforce are popping up across China.
At Testin, one of those companies dedicated to providing professional data annotation service, training can go on for as long as weeks. While general projects like facial recognition and natural language processing require data annotation engineers to have graduated from secondary educational institutions, highly specialized projects, like insurance, finance and medical industries would require a college degree.
The first time Liu took on a data annotation project more than one year ago, it took her only three days to master basic tagging. All she needed to do was draw circles and tag objects. It's a repetitive task that has a low skill threshold, she recalled.
Her next project, tagging objects in road scenes, was more challenging. It required her to differentiate double yellow lines from dotted white lines so that a self-driving car knows when to make a turn. She also needed to tag accurately people on foot, bicycles, motorcycles, and electric scooters so the autonomous driving software knows how to respond when seeing those people in real life.
"What we did matters a lot to the application of AI software," said Liu. "if an object is tagged wrongly, it might cause a traffic accident."
Workload varies in accordance with the nature of the projects. For a simple AI tagging project, one is required to draw around 3,000 circles every day. For a road scene tagging project, one would draw around 2,600 circles. For the more complicated task of labeling 3D Point Clouds models, the number of images processed each day is much lower.
For Liu and most of her colleagues who are in their 20s, the data labeling job is a satisfactory one, at least for now. She follows a 9 to 6 work schedule, enjoys her weekends off – unless there are urgent tasks – and has a salary ranging from 3,500 yuan (507 U.S. dollars) to 6,000 yuan (869 U.S. dollars) depending on her experience and work performance.
Despite the sometimes repetitive nature of the job, AI is far from taking over the industry, according to Xu. AI in China is still at its infancy, but the demand for AI application to increase efficiency and reduce cost would spiral in the near future, and the demand for data annotation would skyrocket, he said.
But there are signs that performance improvements can be achieved through having humans and machines work together. Scale AI, a San Francisco-based data labeling firm, pioneered the model of relying on algorithms to do the labeling before data annotation engineers have a final check on their work.
By far, most companies are using AI and human in a complementary manner. While AI is deployed to take over the repetitive tasks, jobs that require teamwork, creativity and social skills still demand human input.
For 24-year-old Liu, the idea that her work will one day be taken over by AI still seems far-fetched. "If AI products are like the newborns, the software developers are like the parents, and we are the people who cook for the newborns," said Liu. "Without the food we provide, the newborns cannot survive."