Analysis: There are thousands of mutations of the coronavirus out there. What can New Zealand’s tell us? Marc Daalder reports

On February 26, a woman in her 60s was rushed into Auckland Hospital in a wheelchair, after arriving on a flight from Iran via Bali earlier that day. Longstanding protocols for infectious diseases, which had been ramped up in recent weeks, snapped into place. Medical staff donned gloves, face masks and goggles before moving to treat the woman.

It would be two more days – and two false tests of nasopharyngeal swabs – before a test of the woman’s sputum confirmed what doctors already suspected: She had been infected with the novel coronavirus spreading out of China.

How did this tiny strand of RNA, 400,000 times smaller than a 10c coin, travel from Wuhan, China through Iran to Auckland, New Zealand? Where did it stop along the way?

The clues, it turns out, were in the sample that tested positive for the coronavirus, now called SARS-CoV-2. The genome of that virus – ever so slightly different from the “official” genome for SARS-CoV-2 due to natural mutations that occurred as the virus replicated – and seven others found in New Zealand have been uploaded to a public database.

Here are the stories these strains tell.


The coronavirus started small, a 30,000-long string of the same four letters (A, C, G and U) endlessly replicating itself as it circulated through a colony of bats in central or southwestern China.

An examination of the genome of SARS-CoV-2 has found it is 79 percent identical to the original SARS virus and 96 percent identical to a coronavirus found in horseshoe bats in China’s southwestern Yunnan province.

Nearly 2000 kilometres away from this colony of horseshoe bats, the new coronavirus exploded onto the world stage, sickening tens of thousands of people in Wuhan in December and January – and likely infecting many more.

How it made the jump from bat to human remains unknown. Perhaps it came via an intermediary – a pangolin slaughtered at a wet market, for example. Some have theorised it was a natural pathogen that escaped from an infectious diseases lab in Wuhan that was studying bat-borne coronaviruses, but New Zealand scientists are doubtful of this explanation, as Farah Hancock reports.

Even less likely is the notion it was bioengineered. None of the telltale signs – cropped nucleotides clumsily sutured together to create a new and more deadly or transmissible virus – are visible in an analysis of the virus’ RNA.

“All evidence so far points to the fact that Covid-19 has naturally derived from a non-manmade source,” Jemma Geoghegan, a senior lecturer in viral evolution with a focus on infectious diseases at the University of Otago, told reporters during a briefing organised by the Science Media Centre last week.

“We know this from comparing the available genomic data from known coronavirus strains and it has been firmly determined that the virus originated through natural processes.”

That same process of analysing genomic data, however, can tell us more than just where the virus came from. It can tell us where the virus has been, too.

On December 26, as Chinese authorities told worried Wuhan residents there was nothing to fear, even as they locked up whistleblowing doctors and journalists, one Wuhan man got tested. The result came back positive – a swab from the back of his throat or the inside of his nose had detected the presence of the novel coronavirus.

Scientists at the Chinese Center for Disease Control sequenced the RNA of the virus and uploaded it to a global flu-tracking database called GISAID on January 12. This sample, 29,903 letters long, became the official genome for the virus.

A graphic representing the SARS-CoV-2 virus from Nextstrain. Each section contains instructions for the creation of a specific protein.

As it circulated in Wuhan in the early days of the outbreak, the virus largely kept its form. Replicating thousands of times to seize control of the respiratory systems of hundreds of patients, in-built mechanisms ensured that the few times it made a typo in that sequence of 30,000 nucleotides, the error was corrected.

Then, finally, a mistake slipped through the cracks. Some time between December 22 and January 12, the guanine nucleotide 11,083 letters into the RNA code swapped to a uracil. The G changed to a U.

“Like all viruses, this is evolving over time,” Geoghegan said.

“How it evolves is it accumulates mutations as the virus replicates inside a host. It does that imperfectly, so these mistakes that the virus makes are mutations and they’re easily transmitted to the next person.”

Nextstrain, another open-source tool originally designed to track strains of influenza, has synthesised thousands of genomes uploaded to GISAID and estimated when and where mutations took place.* The G-to-U mutation at the 11,083rd letter, it estimates, took place on December 15. However, the first genome with that mutation doesn’t show up in Nextstrain’s dataset until January 18.

While the rest of the world was that day wondering whether the war of words between Iran and the United States would escalate, or combing through the agreement between Buckingham Palace and Harry and Meghan, two coronavirus tests were carried out in China, one in Chongqing and one in Wuhan. These tests both came back positive for the virus – and showed the 11,083 mutation.

This genome showed up in Chongqing on January 18, in which the 11,083rd letter changed from G to U. Graphic modified from Nextstrain.

By that time, however, another mutation had taken place, splitting the virus into two more strains. On January 8, according to Nextstrain, the 1397th letter changed from a G to an A. This mutation would show up in the January 18 Wuhan sample, but not the one taken in Chongqing.

As the virus tore through the population of Wuhan, replicating in greater numbers than it ever had a chance to in its old haunt of the cave-bound bat colony, the typos grew more and more common. By January 14, two more mutations had occurred – the 28,688th letter had changed from a U to a C and the 29,742nd letter had changed from a G to a U.

Canada, March 3. Graphic modified from Nextstrain.

This strain spread widely, with genomes with these four mutations – and sometimes a handful of new ones – appearing in Hong Kong in January and Canada, the United Arab Emirates, Sweden, England and New Zealand by the end of February.

By January 23, however, it was also spreading in Iran. Few of Iran’s coronavirus genomes have been uploaded to GISAID, but enough of the descendants of this mutation were linked to Iran that Nextstrain estimates the strain arrived in the country by late January.

From there, it travelled to the UAE, Canada and Australia, often without further modification. While in Iran, however, at least one more mutation occurred, in which the 25,618th letter of the virus switched to an A from a G. We know this occurred in Iran because the woman who was shepherded into Auckland Hospital on February 26 – and who had contracted the virus in Iran – was infected with this unique strain, which has not shown up anywhere else in the Nextstrain dataset.

This is the genome of New Zealand’s first coronavirus case. Graphic modified from Nextstrain.

Mapping genomes onto maps

The example of the Iran-specific mutations shows the usefulness of genome mapping. If someone turns up to a hospital in Wellington tomorrow with Covid-19, no travel history, and no clue where they got it, looking to the genome could help.

If the genome had the same five mutations as our first case, that could indicate that they were infected by someone who had recently been to Iran, significantly narrowing the scope of the investigation.

Nearly every country with a significant outbreak now has identifying mutations. Viruses where the 27,964th letter has changed from a C to a U are American. A mutation in Belgium in late January, in which the 28,881st and 28,882nd letters changed from G to A and the 28,883rd letter changed from G to C, has now spread throughout Europe.

This phylogeny tree from Nextstrain, with each shade representing a different country and each branch a separate mutation, shows how mutations tend to be linked to a specific geographic location. Each dot represents a genome from the GISAID database.

Provincial strains also exist. Unique mutations associated with outbreaks in Washington and California have shown up in New York City – as have strains from Europe and Asia. 

Why does this matter? It’s a crucial tool in tracing how the virus moves and where it may be coming from.

In China, for example, new cases that bear the hallmarks of European or American strains can be classified as linked to overseas travel and not indicative of undetected spread.

New Zealand could also benefit.

“When new cases are being announced, often it says they are still under investigation. Often that means that people are still being interviewed and that they are still trying to establish links to known clusters,” Joep de Ligt told the Science Media Centre briefing last week. De Ligt is the head of bioinformatics and genomics at the Institute of Environmental Science and Research (ESR) – the national lab testing organisation.

“Luckily in New Zealand, we are doing a very good job of containing the virus. But in some cases, it’s just difficult to make that link, either because it’s some time ago that the person had the symptoms or just because the virus can spread so easily that you might not always make the connection. In some of those cases, we can use the genome of the virus to identify a cluster that it is associated to.”

This has happened on at least one occasion, de Ligt said.

“One of the ones that was followed up by Public Health Units was a case where genomics indicated that it was linked to the Queenstown outbreak. They followed up with those persons and yes, they were in the area, but they didn’t necessarily have direct links to any of the events, but it is clear that somewhere, during that time, that [infection] must have happened,” he said.

“That’s one of the things that it was unclear where that person might have been infected but we can now link it to that local event.”

As before, these mutations can also help us understand how the coronavirus got here.

This map charts the progress of the strains that made up the eight New Zealand cases on Nextstrain.

On or around December 27, a new strain emerged in China. This one started off with two mutations – the 8,782nd letter changing from C to U and the 28,144th letter from U to C. The virus remained stable for a time after that, until January 9, when the 24,034th letter changed from C to U.

Wuhan, January 5. Graphic modified from Nextstrain.

Then, as with the earlier strain, it sped up. Just four days later, it had tagged on two new mutations (the 28,077th letter from G to C and the 26,729th from U to C). Five days after that, a new one. Another 10 days later, another mutation. At some point along the way, the strain split to make two new ones.

Shanghai, February 1. Graphic modified from Nextstrain.

By late February, one of these strains made it to the United States, picking up four more mutations along the way. By the time it arrived in a test tube at Wellington Hospital on March 15, it contained 11 mutations differing it from the original Wuhan virus.

Wellington, March 15. Graphic modified from Nextstrain.

But the test tube didn’t arrive alone. Another sample arrived the same day which bore remarkable similarities to the first. It was, in fact, the second of the strains, which likely split off in China before making its own way to the United States.

Wellington, March 15. Graphic modified from Nextstrain.

On March 17, Director-General of Health Ashley Bloomfield told the nation that New Zealand had three new cases of Covid-19, bringing the total to 11 confirmed cases and two probable.

“A Wellington man in his 30s and his father in his 70s have tested positive on their return from the United States. The man in his 30s became unwell on the flight and his father became unwell the day after they arrived,” Bloomfield said.

These two men, however, did not infect one another. Instead, they were likely infected in separate circumstances, each with one of the two strains detailed above. One of the viruses had five mutations that the other did not, making it extremely unlikely that one man had infected the other.

De Ligt and Geoghegan declined to comment on whether the virus samples came from the cases outlined above, for privacy reasons, but confirmed that they were likely separately infected.

“While we can’t go into specific details of these cases you are correct in your understanding that that amount of mutations is very, very unlikely to occur in one jump,” de Ligt told Newsroom.

“This means that while these people might be related or even were on the same flight they were most likely infected by different people that themselves were infected by people from different regions. [One of the viruses] is much closer to the virus that was seen in China compared to [the other] which looks more like sequences seen in the USA.

“Because of the large outbreak in the US, with many jumps in a short time frame, the virus rapidly accumulated more mutations. This is why it is so important to study the genomes of the virus alongside information like travel history etcetera. It provides an additional layer in our understanding on how this virus spreads and how it came to New Zealand.

“It is likely that these viruses had different origins and that the two men you referred to didn’t infect each other but rather had independent sources of infection that resulted in independent incursions of the virus into New Zealand, albeit on the same flight. Only genomics can provide these types of insights,” Geoghegan told Newsroom.

Do mutations matter?

Despite the hundreds of mutations the coronavirus has undergone in producing hundreds of different strains all over the world, it is unlikely that any of these have actually altered the biology of the virus.

“There’s actually hundreds of mutations that are present in the virus’ genome already and that’s to be expected with any virus that’s continuing to circulate. But there’s actually no evidence to suggest that any of them led to changes in the way that the virus behaves or transmits between people,” Geoghegan said.

“I think it’s too early to say whether or not these mutations are actually causing any functional changes in the virus.”

Every set of three nucleotides forms an amino acid. Often, when the last letter in the nucleotide is changed, the amino acid will still stay the same.

Beyond that, the amino acids when combined together make up proteins – the major actor in the virus. The bulk of SARS-CoV-2 – about 13,500 nucleotides – is devoted to creating the ORF1a protein. This consists of about 4400 amino acids. Changing just a couple of these is unlikely to have a significant impact on the virus’ biology.

The good news is that this virus doesn’t mutate quickly. This stability is due to proteins that act as proofreaders, fixing many of the typos that would otherwise slip through.

“In comparison to other RNA viruses, it’s a little bit more stable because it has a way of correcting some of the errors that it makes while it replicates,” Geoghegan said.

“The stability is actually a good thing. It doesn’t necessarily mutate as quickly as some other viruses do and it’s quite encouraging news, for example, for the hope of creating a long-lasting vaccine.”

The vast majority of strains are still concentrated at or under 10 mutations, as the below chart shows. The x-axis on this chart is the number of divergences – changed letters – each genome has when compared with the original sequence.

This chart from Nexstrain shows the number of mutations in each strain. Each shade represents a different country.

On average, the virus is thought to undergo about 23.92 mutations a year, or two every month. This figure can help us backdate its emergence. If there is an average of 10 mutations from the supposed index case, then the virus likely entered the human population around five months ago – or early December.

Even the two most divergent strains have just 35 mutations, representing about 0.12 percent of the genome. By comparison, SARS, which is significantly deadlier but still very similar to Covid-19, is 21 percent distinct from SARS-CoV-2.

While the mutations may not make the virus more deadly or more transmissible, understanding where in the virus these mutations occur could help with efforts to create an antiviral medication.

The mutations are not evenly distributed throughout the virus. Some mutations are far more common than others and some letters have mutated far more than others. Remember that first mutation on December 15, when the 11,083rd letter swapped from G to U? That mutation showed up in another New Zealand case and, it turns out, has mutated at least 36 times according to Nextstrain.

This modified chart from Nextstrain shows the number of mutations of each nucleotide, with the proteins overlaid.

“The parts of the genome that have accumulated many mutations are more flexible. They can tolerate changes to their genetic sequence without causing harm to the virus. The parts with few mutations are more brittle. Mutations in those parts may destroy the coronavirus by causing catastrophic changes to its proteins. Those essential regions may be especially good targets for attacking the virus with antiviral drugs,” the New York Times reported in April.

That’s part of why New Zealand scientists are seeking to sequence the genome of all 1138 of our confirmed cases. This could help with vaccine research and the quest for an effective antiviral.

“Just today, a new sequencer arrived at ESR that will double our capacity, so we can now do 100 genomes a week. We are endeavouring to paint as complete as possible a picture of the genetics of the virus in New Zealand and how it might have been spreading,” de Ligt said last week.

ESR has now sequenced 171 of the more than 600 cases referred to them.

“Our aim is to sequence every positive case in New Zealand,” Geoghegan said.

“Fortunately, that seems like a very realistic goal because we haven’t had that many cases. We’re in a really unique position to be able to do that. That will really provide us with an amazing dataset and a great case study, especially for international collaborations, to be able to understand how the virus spread here, what happened after we closed our borders, what happened after we went into Level 4 lockdown and as we begin to lift those lockdown restrictions, what happens to the transmission of the virus?”

Since New Zealand is a closed population, such studies could help researchers understand how the virus changes, without having to deal with the pressures of managing an active outbreak.

For example, an examination of different clusters in New Zealand could show cluster-specific mutations, allowing health officials to link closed cases with unknown origins to where they came from. Similarly, if a new case emerges out of nowhere, sequencing the genome of the patient’s virus could link it to another extant case or, through ruling out genomic connections to any of New Zealand’s cases, declare it an imported case.

Just as useful would be an audit of all previous cases, confirming that they came from where we suspected they came from. Take New Zealand’s second and fourth coronavirus cases.

The couple, in their 30s, had returned to Auckland from northern Italy and the woman had subsequently developed symptoms. She was tested and announced as the country’s second case on March 4. Two days later, the woman’s partner was announced as New Zealand’s fourth case and it become known that he had attended the Tool concert on February 28, when he may have been infectious.

But what if he hadn’t caught the virus from his partner and had instead, by coincidence, contracted it from someone else at the concert. What if Covid-19 had been spreading undetected in the community until then? That’s where genome mapping would come in.

An examination of the genomes of the couple show they are identical. The first mutation developed in China around January 8, when the 241st letter changed from a C to a U. As with previous strains, the process accelerated in the subsequent days as the outbreak in Wuhan intensified. Two more mutations were tacked on by January 12 – the 23,403rd letter changing from A to G and the 3037th from C to U.

Shanghai, January 28. Graphic modified from Nextstrain.

Then the virus made the jump to Belgium, where the 14,408th letter swapped from C to U on January 18. By January 29, the Belgian hallmark mutation had appeared, in which the 28,881st and 28,882nd letters changed from G to A and the 28,883rd letter changed from G to C.

This strain spread throughout Europe, appearing by late February in England, the Netherlands, Portugal, Germany, Switzerland and, yes, Italy. It was this virus that both Aucklanders returned home with, proving that they had received it together – or from each other – while abroad.

The virus that originated in Belgium and spread throughout Europe before, ultimately, ending up in New Zealand. Graphic modified from Nextstrain.

In addition to all of this, it is likely that more uses for genome mapping will be produced in the coming months, alongside more revelations about how the virus mutates and spreads.

“In relative terms, our knowledge of the new coronavirus is quite remarkable,” Geoghegan said. “If you think about it, it was only five months ago that this virus was completely unknown to us and today it’s a subject of research at an unprecedented scale.

“We’re learning more every day.”

*Nextstrain estimates are based on the composition of its dataset. As new genomes are added, some of the dates and figures in this article may no longer match what is on the Nextstrain website, but they are accurate as of time of writing. 

Marc Daalder is a senior political reporter based in Wellington who covers climate change, health, energy and violent extremism. Twitter/Bluesky: @marcdaalder

Leave a comment