Non-Invasive Diagnosis by Mass Spectrometry and Machine Learning

Zhenpeng Zhou, Vishnu Shankar

We are currently concerned with the non-invasive diagnosis of metabolic conditions from breath and sweat, using desorption electrospray ionization (DESI) mass spectrometry combined with artificial intelligence (AI) techniques.

Mass spectrometry is a simple, fast, and cost-effective method to analyze various chemical species by ionizing a sample and sorting the ions based on mass-to-charge (m/z) ratios. DESI is a recently developed method, where a spray of charged droplets impacts a solid sample on a substrate and creates a thin liquid film. An additional splash of the incoming droplets creates microdroplets, which are then drawn into the mass spectrometer.

DESI offers several advantages, compared to other chemical methods, which make it a well-suited for analyzing sweat and breath for diagnostic purposes. For example, DESI allows samples to be examined with minimal preparation, which offers much potential for rapid and routine real-time analysis. Additionally, as AI methods need large amounts of data for training, testing, and validation, the ability to obtain a rich spectrum of molecular species makes DESI appealing for AI methods of analysis. In addition to the complementarity of AI and DESI, AI, specifically machine learning and statistical pattern recognition, is an appropriate technique for medical diagnostics because it can offer improved sensitivity and specificity, as well as objectivity compared to human clinical practice.

As a proof of concept, we applied DESI on latent fingerprints to obtain not only spatial patterns but also chemical maps. Samples with similar lipid compositions as those of the fingerprints were collected by swiping a glass slide across the forehead of consenting adults. A machine learning model called gradient boosting decision tree (GBDT) was applied to the samples that allowed us to distinguish between different genders, ethnicities, and ages (within 10 years). The results from 194 samples showed accuracies of 89.2%, 82.4%, and 84.3%, respectively. Specific chemical species that were determined by the feature selection of GBDT were identified by tandem mass spectrometry. The machine learning model trained on the sample data was applied to overlaid latent fingerprints from different individuals, giving accurate gender and ethnicity information from those fingerprints. The results suggest that DESI-MSI imaging of fingerprints with GDBT analysis might offer a significant advance in forensic science. (Figure 1).

Figure 1. the classification result of each pixel in the image by the pretrained model. The pixels predicted to be belong to a Chinese male are shown in blue, while the pixels predicted to be from an Indian female are shown in red. In both cases, the predictions were correct.