(CN) — Hundreds of mammalian species remain unclassified by scientists, many of which are hiding in plain site by blending in with other closely related species, though experts may be narrowing in on where to find some of them.
Most of these undescribed species of mammal are exceptionally small, with many weighing only a few grams, hampering experts’ ability to tell them apart without performing a genetic analysis. This discrepancy in the number of species on Earth versus those which have been given formal taxonomic descriptions is known as the Linnean shortfall, which experts say hinders progress across the biological sciences.
Researchers from Ohio State University describe the problem and their approach to solving it in a new study published Monday in the journal PNAS. The team paired a powerful super computer with a novel machine learning algorithm to determine where these unclassified species could reside.
"The major take-away of the study is that there are still a lot of species left to be discovered in Mammals," said co-author Dr. Bryan Carstens, a professor of evolution, ecology and organismal biology at Ohio State University. "We in the field recognize that mammals are very well studied by scientists in comparison to most other species (after all, we are mammals). The finding that there are still a lot of undescribed species in mammals is sobering, because we know that there are groups like beetles and mites where most of the species are not currently known to science."
The authors analyzed 90,759 DNA sequences taken from 4,310 recognized mammal species to build a predictive model that can identify and describe the taxa most likely to contain uncategorized species. They also compiled around 3.3 million GPS coordinates from recorded animal sightings to obtain further taxonomic, climate, environmental and other relevant data to improve their model.
To train their model, the authors employed random forest analysis — a type of statistical analysis often utilized to average between multiple deep decision trees — using 80% of the dataset to train the model and the remaining 20% to test the accuracy of its predictions.
Their analysis points to hundreds of currently unknown mammal species, mostly among smaller animal groups such as bats and rodents, and provides valuable new insights about the types of environments in which these animals can be found. The majority of these hidden species are predicted to have small bodies and inhabit extensive geographic ranges in lower-latitude regions with highly variable temperatures, rain and humidity levels. These regions are also predicted to be at great environmental risk due to their insular nature.
"We were surprised that out predictive model uncovered clues that suggest that taxonomists are generally aware of the types of named mammals species that are likely to be good candidates for containing hidden species within them," Carstens explained. "This suggests to us that the reason that we haven't described more of the species isn't due to an inadequacies in terms of the data or the methods that we use to describe species, but just a lack of people who are actually doing this work."
Species-level taxonomic designations are important to anyone who studies or legislates on wildlife – from conservationists and biologists, to the politicians tasked with turning scientists’ recommendations into laws. Before a species can be protected, first it must be studied and understood, and that requires differentiating it from other similar-seeming species.
Overcoming this discrepancy in the number of known species, compared with those that actually exist, will require increased funding for taxonomic research, especially among understudied and undescribed taxa facing extinction, according to the authors. Their research shows many of these undescribed species are hiding in predictable locations, and scientists are just awaiting the resources to allow for their formal description.
"My lab has been investigating the theory and practice of species delimitation and identification since it was founded in 2007," Carstens said. "We'll keep developing new statistical approaches for species discovery and delimitation with genetic data, and will keep doing these large-scale analyses because they are interesting and also really good mechanisms for student training."