Searching for novel viruses in environmental samples is a lot like searching for a needle in a haystack, but harder. After all, with hay, you can pick up a straw, look it over, and quickly make the judgement (piece of hay or needle?) then move on to the other pieces. With viruses, researchers are usually forced to extract genetic material from a sample, then sequence and align those genes against a database of known sequences. Aligning genes is difficult and laborious, and for various biological and technical reasons, it isn’t always possible.
But can we skip the alignment step? In a study just released by mBio, researchers at Columbia University describe a new way to make the search for novel organisms easier by using a frequency analysis technique on sequence data.
“It's an interesting new approach to finding meaning in sequences that might otherwise be missed,” says Ian Lipkin, the John Snow Professor of Epidemiology, and Professor of Neurology and Pathology at Columbia University and a member of mBio’s Board of Editors.
The method is based on obtaining a signature of the genetic data and defining a distance (similarity) measure between different signatures, thereby avoiding the pitfalls and labor involved in gene alignment. Trifonov and Rabadan applied the technique to categorize negative-sense ssRNA viral genetic data and showed that it provides a viable way of discovering genetic relationships. Applied here to a specific type of virus, the authors expect the technique could also be applied to other kinds of viruses, as well, in the search for the underlying causes of idiopathic disease.