Fentanyl or Phony? Machine Learning Algorithm Learns to Pick Out Opioid Signatures | Newswise
Newswise — New forms of fentanyl are created every day. For law enforcement, that poses a challenge: how do you identify a chemical you’ve never seen before?
Researchers at Lawrence Livermore National Laboratory (LLNL) aim to answer that question with a machine learning model that can distinguish opioids from other chemicals with an accuracy over 95% in a laboratory setting. The foundation for this new technique was published in Analytical Methods.
To identify synthetic opioids like fentanyl now, chemists try to match their signature to a library of a few hundred known samples. But studies suggest there could be thousands of unknown forms, some more dangerous than others. Recognizing those new versions requires a reference-free identification system: a way to catch an opioid even if it does not exist in a chemical database yet.
“When law enforcement finds a new clandestine drug operation, those labs often produce never-before-seen fentanyl derivatives. We can’t just go check a database, and we can’t just go back to who made it and ask how they did it,” said LLNL computational mathematician and author Colin Ponce. “And law enforcement needs to identify the samples they find quickly because there’s going to be another sample tomorrow. I think that’s a little bit of a unique situation.”
Machine learning might seem like a natural fit to identify novel or unknown opioids. And it is — to an extent. The method works best with large data sets, which are difficult to generate for toxic substances like synthetic opioids.
To even get a machine learning algorithm off the ground, the team had to create the chemical data. They did so with LLNL’s mass spectrometry capabilities coupled to an autosampler, which enabled them to measure hundreds of samples under the same experimental conditions. This minimized variables for the machine learning algorithms.
“In the world of AI, data is gold, and if you don’t have good data, then you’re not going to generate accurate machine learning models,” said LLNL chemist and author Carolyn Fisher. “Good data is something that we can control and generate at LLNL.”
With that data in hand, they tried different machine learning techniques as they homed in on the best method: a random forest model.
“When a model like this eventually gets into the hands of a user, the output has to be interpretable and trustworthy,” said LLNL scientist and author Kourosh Arasteh. “We explored machine learning methods ranging from simple regression and random forests to more complex neural network approaches to balance interpretability with performance.”
The random forest approach runs through a collection of decision trees. Each tree asks a series of questions about the data and, based on each answer, lands on a prediction: opioid or not. Together, they vote on the final classification.
“Our 650 samples are not the same as having 300,000 samples. On the machine learning side, we needed to make sure that we were designing techniques that that were appropriate for that kind of scale,” said Ponce.
This study trained and tested the algorithm with analytically pure samples. These ideal chemicals contain no contaminants or impurities.
“The challenge is that nothing is analytically pure in the real world,” said Fisher. “The next step is to add in background noise and have the AI understand what it should care about during a classification task.”
Fisher and Ponce emphasized that this work would have been impossible without collaboration across the disciplines of data science and chemistry. The two are friends outside of work, and this study, a Laboratory Directed Research and Development project, emerged from a series of organic conversations between them.
“To me, this project really captures what LLNL does best,” said fellow author and LLNL software engineer Steven Magana-Zook. “When you get chemists and data scientists working side by side, you end up with results that neither group could get on their own. That kind of cross-disciplinary work is exactly what makes this place so strong.”
That approach, while essential to the work, initially proved to be an obstacle. The team faced rejection of this manuscript from two journals — reviewers in chemistry didn’t fully grasp the machine learning aspects and experts on the computational side felt uncertain about the chemistry.
“I don’t think people talk about failure enough. It’s so common in science. We fail so much more than we succeed,” said Fisher. “But we keep iterating and improving. I’m proud of our resilience.”
The team’s persistence paid off. Looking ahead, they aim to further develop their algorithm using real-world samples with higher background signals.
Other LLNL coauthors include Roald Leif, Alex Vu, Mark Dreyer, Brian Mayer and Audrey Williams.