As the Covid epidemic progresses through the world, a second epidemic is following hot on its heels.

Rather than physical disease, this one involves the transmission of dubious, confusing or harmful information. While Covid-19 is largely passed from person to person through coughs and physical contact, the medium for the information epidemic is the internet.

Increasingly, when searching online, chatting on social media or checking the daily feed, people are encountering opinions about Covid that are bizarre, provocative, or positively dangerous. The coronavirus is a hoax; it was released by Bill Gates as cover for his plan to implant people with microchips; it can be treated by drinking cow urine or Hennessy cognac.

Strange bubbles of opinion of this kind can be found wherever information circulates between people. But the internet allows them to form particularly rapidly – and during the Covid crisis, they can have particularly serious effects. Can we control how misinformation spreads online during this crisis? Should we control how misinformation spreads? How should we even go about defining ‘misinformation’? What tools are available to control how information flows on the Internet?

The pandemic is shining a spotlight on how the big tech companies are approaching these questions. Companies like Google, Facebook, Youtube and Twitter all engage in some form of moderation or censorship of the internet information they process. Media attention often focuses on companies’ attitudes towards high-profile individuals: for instance, on Twitter’s decision to flag or hide messages from Trump, and on Facebook’s decision to refrain from doing so.

These decisions come from the people at the very top of the tech companies: in making them, CEOs like Mark Zuckerberg and Jack Dorsey are acting very much like old-fashioned newspaper barons, deciding to withhold a story, or run it on the front page. But the tech companies also engage in mass moderation and censorship, which is less widely reported, and less well understood by the public.

At Otago’s Centre for AI and Public Policy, Professors Alistair Knott (Computer Science) and Colin Gavaghan (Law) have used the Covid infodemic as the backdrop for research setting out the methods used by the big tech companies to perform moderation and censorship at scale, and discussing the thorny issues these methods raise. Their focus is on how companies like Facebook and Google use Artificial Intelligence (AI) techniques to curate information.

The basic process is to define various classes of internet content that are to be curated, and then build AI classifiers that can automatically recognise these classes, and take the prescribed action. These classifiers work through machine learning: they are trained on large sets of examples of the defined classes, identified by hand; after training, they can recognise new unseen examples by themselves. Classifiers are the core technology in modern AI—so essentially any process of mass curation on social media sites is delegated to AI systems.   

When it comes to mass curation, each internet company has to define its own classes of information for curation, and specify an appropriate type of curation action for each class. (The internet offers a wide range of possible actions, running from outright removal of items, to the tagging of items with various kinds of warning flags, or the demotion or removal of items from content feeds or search results.) 

The company must then create a training set of items for each classifier, which are identified by human annotators as falling within the class in question. This human annotation task is vitally important: much of the subtlety of the classifier depends on the quality of its training set. But in practice, the task of building training sets is often outsourced to contractor companies, whose employees often perform repetitive or psychologically harrowing work, in precarious working arrangements. Finally, the trained classifiers must be tested. If their task is to identify subtle distinctions in text or video content, their performance is often far less than perfect. But it is with these imperfect tools that the big tech companies must tackle the spread of disinformation or dangerous information online.

In their research, Knott and Gavaghan give illustrate how the big tech companies are currently deploying these methods to tackle disinformation during the Covid crisis, so that their strengths and shortcomings can be understood. They conclude with a more general discussion of how society should approach the mass curation of information online.

They argue that while AI classifiers implemented by the big tech companies play a vital role in this process, the decisions about how these classifiers are designed and built are not a matter for the companies alone: governments and international bodies should also be involved, and so too should citizens, and citizens’ groups. They argue that the human annotators who build the training sets for classifiers should be employed in-house, on improved terms, and that the processes of building training sets and testing classifiers should be much more transparent, so they can be scrutinised more closely. Curating information on the internet will never be easy – but we can certainly improve the processes and social institutions through which this curation happens.

Leave a comment