(CN) — Researchers using artificial intelligence to grade decades of conservation efforts have determined we’re getting better at reintroducing once-endangered species to the wild.
In their study published Thursday in the journal Patterns, the researchers analyzed the abstracts of more than 4,000 studies of species reintroduction across four decades and found that we’re generally improving in our conservation efforts. The authors hope that machine learning could be used in this field, as well as others, to discover the best techniques and solutions from the ever-growing plethora of scientific research.
“We wanted to learn some lessons from the vast body of conservation biology literature on reintroduction programs that we could use here in California as we try to put sea otters back into places they haven’t roamed for decades,” said senior author Kyle Van Houtan, chief scientist at Monterey Bay Aquarium in California. “But what sat in front of us was millions of words and thousands of manuscripts. We wondered how we could extract data from them that we could actually analyze, and so we turned to natural language processing.”
The technique known as natural language processing is a kind of machine learning that can analyze long strings of human language to extract usable information. Through this technology, a computer can essentially read documents in the same way a human does. Sentiment analysis on the other hand, which is what the researchers used in this paper, is more focused on a trained set of words that have been assigned a positive or negative emotional value in order to assess the positivity or negativity of the text overall.
Using the database Web of Science, the researchers identified 4,313 species reintroduction studies published between 1987 to 2016 with searchable abstracts. After finding these studies, they used “off-the-shelf” sentiment analysis lexicons, meaning the words within them had already been assigned a positive or negative score based on things like movie and restaurant reviews. From this, they built a model that could give each abstract an overall score.
“We didn’t have to train the models, so after running them for a few hours we all of a sudden had all these results at our disposal,” says Van Houtan. “The scores gave us a trend over time, and we could query the results to see what the sentiment was associated with studies on pandas or on California condors or coral reefs.”
Through analysis of the positive and negative scores from the examined data, the trends they saw in the results suggested greater conservation success. “Over time, there’s a lot less uncertainty in the assessment of sentiment in the studies, and we see reintroduction projects become more successful – and that’s a big takeaway,” Van Houtan continued. “Looking at thousands of studies, it seems like we’re getting better at it, and that’s encouraging.”
Study co-author Lucas Joppa, the chief environmental officer at Microsoft, added: “If we are going to maximize our conservation dollars, then we need to be able to quickly assess what works and what doesn’t. Machine learning, and natural language processing in particular, has the ability to sift through results and shine a light on success stories that others can learn from.”
To make sure their strategy and results were accurate, the researchers looked at the words that were the most common indicators of positive sentiment, and therefore conservation success. This included words like “success,” “protect,” “growth,” “support,” “help” and “benefit.” On the other hand, some of the most common words that indicated negative sentiment included “threaten,” “loss,” “risk,” “threat,” “problem” and “kill.”
The researchers note these most common words aligned closely with what they, as long-time conservation biologists, would typically use to indicate success and failure in their own studies. Furthermore, they also discovered the trends described by the sentiment analysis program for specific reintroduction programs that are known to be successes or failures, such as the reintroduction of the California condor, matched the known outcomes.
The researchers said they were pleasantly surprised off-the-shelf sentiment analysis worked so well for them. They predicted that it was likely because many words used in conservation biology are part of our everyday lexicons and were therefore accurately coded with the appropriate positive or negative sentiment.
They believe that for the program to work in other fields, more research is necessary to develop and train a model that could accurately code the sentiment value of more technical, field-specific language and syntax. Another roadblock they identified is the limited number of open-access papers, which meant they had to assess the abstracts rather than the full papers.
“We’re really just scratching the surface here, but this is definitely a step in the right direction,” said Van Houtan.
They believe this promising technique should be applied in both conservation biology and various other fields to make sense of the vast amounts of research that’s now being conducted and published.
“So much local conservation work goes unnoticed by the global conservation community, and this paper shows how machine learning can help close that information gap,” said Joppa.
Van Houtan added: “Many of these techniques have been in use for over a decade in commercial settings, but we’re hoping to translate them into settings like ours to combat climate change or plastic pollution or to promote endangered species conservation. There’s a plethora of data that’s right at our fingertips, but it’s this sleeping giant because it isn’t properly curated or organized, which makes it challenging to analyze. We want to connect people with ideas, capacity, and technical solutions they might not otherwise encounter so we can bring some progress to these seemingly intractable problems.”