For all their simplicity, viruses are sneaky little life forces. Take SARS-Cov-2, the virus behind Covid-19. Challenged with the human immune system, the virus has gradually reshuffled parts of its genetic material, making it much easier to spread among a human population. A situation know as the “Viral escape” is a nightmare scenario, in which the virus mutates just enough so that existing antibodies no longer recognize it. Making existing Vaccines that used to be an effective countermeasure obsolete.
From an evolutionary perspective, viral mutations and our immune system are constantly engaged in a struggle to win out. The critical insight was to construct a “viral language” of sorts, based purely on its genetic sequences. This language, if given sufficient examples, can then be analyzed using natural langage processing techniques to predict how changes to its genome alter its interaction with our immune system. That is, using artificial language techniques, it may be possible to track down key areas in a viral genome that, when mutated, allow it to escape roaming antibodies.
Yet when tested on some of our greatest viral foes, like influenza , HIV, and SARS-CoV-2, the algorithm was able to discern critical mutations that “transform” each virus just enough to escape the grasp of our immune surveillance system. “This is a phenomenal way of narrowing down the entire universe of potential mutant viruses,” added Dr. Benhur Lee at Mount Sinai. It could also provide insight into how the new coronavirus could further mutate and put our immune system in “check,” and in turn, give us time to battle its escape plans and end the pandemic once and for all. The idea of using language to examine viruses started with an analogy.
Language contains both grammar and semantics. Changing a single word could immediately alter the meaning to the point a listener could no longer comprehend, all the while keeping the grammar intact. Both involve their interaction with our immune system. This trait, dubbed “virulence,” needs to stay semi-consistent so that the virus can maintain itself inside a host.
The spike proteins are necessary for the virus to “talk” to our cells, allowing the virus to enter. But it’s the viral genes that dictate the shape of the spike proteins. In other words, if changes to the viral genes also alter spike proteins, these mutations would change the virus’s interaction with our cells and immune system. In order to survive, any given virus needs to follow its own system of ‘grammar’.
Break the grammar with too many mutations, or mutations in critical spots, and the virus will no longer be able to enter a cell and replicate, and will reach an evolutionary dead end. Yet grammar is just half of comprehension. Imagine the virus as a speaker, and our immune system as a listener. Yet because the virus’s grammar remains, it’s free to replicate and cause havoc, hidden away from the immune system’s defenses.
In other words, if a mutation allows a virus to keep its grammar but changes its semantics, it also allows viral escape.
In recent years, AI has gotten extremely efficient at modeling both grammar and semantics in human language, without any prior knowledge or understanding of the content. Take GPT-3 by OpenAI, which produces startling human-like prose that’s both grammatically correct and stays mostly on topic. Even without prior training, an NLP algorithm is capable of grasping patterns in human language. Take “grammar,” for example, or sequences in a viral genome that enable its entry into a cell.
If considered a language, the NLP could begin grasping sequences related to a virus’s infectiousness, without needing any previous knowledge of microbiology. A similar idea works for viral semantics. It’s possible to systematically change one viral genetic letter. Using the language example, swapping “cat” to “feline” is a tiny change.
The degree of these alterations is captured by a number, rather than intuition, and allows the algorithm to judge how far a virus has strayed from its original form. But based solely on the “language” of the virus, it replicated previous lab results of sequences that led to influenza escape. Further tapping into the language analogy, it’s possible that some people comprehend the same sentence differently based on their history, culture, and experience. It will be interesting to see whether the proposed approach can be adapted to provide a ‘personalized’ view of the language of virus evolution,