close
close

TxGNN improves drug repurposing by predicting treatments for rare diseases for which there are no approved therapies

Researchers have developed TxGNN, an AI-powered model that outperforms existing methods by predicting treatments for diseases for which there are no approved therapies, using multi-hop explanations to provide greater transparency and trust.

Research: A foundational model for clinical-centered drug repurposing. Image source: unoL / Shutterstock

A study recently published in the journal Natural medicine developed TxGNN, a graph-based foundation model for zero-shot drug reuse. There are only approved medications for 5 to 7% of rare diseases. Expanding the use of existing drugs for new indications can help reduce the global burden of disease. Drug repurposing leverages existing safety and efficacy data, enabling faster clinical translation and lower development costs.

Predicting the effectiveness of drugs against all diseases can make it possible to select drugs with fewer side effects, develop more effective treatments for multiple targets in a disease progression, and repurpose available drugs for new therapeutic uses.

By analyzing medical knowledge graphs (KGs), drug effects can be assigned to new indications. While computational methods have identified candidates for repurposing, there are two significant challenges. First, these approaches assume that therapeutic predictions are required for diseases for which drugs already exist.

Second, most models tend to identify drugs based on similarities to existing treatments, which does not take into account diseases for which there are no treatment options. For clinical use, machine learning models need to make zero-shot predictions, that is, predict drugs for diseases with limited molecular understanding and no approved drugs. However, this capability is significantly lower in existing models.

TxGNN closes this gap by implementing a zero-shot approach to drug reuse, using a GNN and a dedicated disease similarity-based metric learning module to transfer knowledge from treatable diseases to those without treatment.

The study and results

In the present study, researchers developed TxGNN, a graph-foundation model for zero-shot drug repurposing that predicts repurposing candidates, including those currently lacking treatments. TxGNN consisted of 1) a graph neural network (GNN)-based encoder, 2) a disease similarity-based metric learning decoder, 3) a cross-relationship stochastic pre-training followed by fine-tuning, and 4) a multi-hop graph explanation module.

TxGNN was trained at a medical KG and collected decades of research on 17,080 diseases. In addition, a multi-hop TxGNN explainer was developed to facilitate the interpretation of drug candidates by linking drug-disease pairs through interpretable medical knowledge pathways. This explainer provides human experts with transparent, multi-hop explanations that increase trust in AI-generated predictions.

Model performance was evaluated on different holdout datasets. A holdout dataset was generated by sampling diseases from the KG that were omitted during training to be used as test cases later. These retained diseases were randomly or specifically selected to assess zero-shot prediction.

TxGNN was compared with eight state-of-the-art methods, including a natural language processing model, BioBERT, GNN methods such as HGT and HAN, and statistical network medicine techniques. Under the standard benchmarking strategy, where diseases in the test set already had some indications or contraindications during training, TxGNN outperformed the strongest method, HAN, by 4.3% in the Area Under Precision-Recall Curve (AUPRC) for indications.

Next, the team evaluated models in a zero-shot repurposing process, requiring models to predict therapeutic candidates for diseases for which there are no treatments. In this case, TxGNN showed a 49.2% increase in AUPRC for drug indications and 35.1% for contraindications compared to the next best model.

These gains are particularly significant because traditional models struggle in zero-shot environments where no prior drug-disease relationships are available for training. TxGNN was also evaluated under stringent conditions in nine disease areas and achieved AUPRC gains ranging from 0.5% to 59.3% for drug indications and 11.8% to 35.6% for contraindications.

In this scenario, TxGNN demonstrated consistent performance improvements over existing models, with AUPRC gains ranging from 0.5% to 59.3% for drug indications and 11.8% to 35.6% for contraindications. In addition, a pilot study was conducted with scientists and clinicians. Participants included two pharmacists, five clinicians, and five clinical researchers. They were asked to evaluate 16 TxGNN predictions, 12 of which were correct.

Participants' exploration time, rating accuracy, and confidence scores for each prediction were recorded. Their reliability and accuracy improved significantly when predictions were provided with explanations. Additionally, in post-task interviews and questionnaires, participants reported greater satisfaction with the TxGNN explainer, with 91.6% of participants agreeing that TxGNN predictions and explanations were valuable.

In contrast, 75% disagreed and relied on TxGNN predictions without explanations. Next, the team evaluated whether the predicted drugs and their explanations were consistent with the medical rationale for the following rare diseases: Kleefstra syndrome, Ehlers-Danlos syndrome, and nephrogenic syndrome of inappropriate antidiuresis (NSIAD).

This assessment protocol included three phases. First, a human expert interrogated TxGNN to identify potential reusable drugs. Next, TxGNN Explainer was interviewed to illustrate why the drug was being considered. The third phase analyzed independent medical evidence to verify TxGNN's predictions and explanations.

The model identified zolpidem, tretinoin, and amyl nitrite for Kleefstra syndrome, Ehlers-Danlos syndrome, and NSIAD, respectively. In all cases, the TxGNN explanations were consistent with medical evidence.

Real-world validation through EMRs

Using electronic medical records (EMRs) from a health system, researchers curated a cohort of over 1.2 million adults with at least one drug prescription and a disease and measured the enrichment of drug and disease co-occurrence. This validation aligns the predictions of TxGNN with actual clinical application.

Enrichment was estimated as the ratio of the probability of using a drug for one disease to the probability of using it for other diseases. Total 619,200 Log(Odds Ratio) [log(OR)] Values ​​were derived. TxGNN compiled a ranked list of therapeutic candidates for each EMR-phenotyped disease.

Drugs related to the disease were omitted and the new drug candidates were classified into the “Top 5”, “Top 5” and “Lotto 50” categories. The highest-scoring predicted drugs had, on average, about 107% higher log(OR) values ​​than the mean log(OR) of the bottom 50% of predictions, suggesting that TxGNN's predictions compare well with physician-made off-targets. Label prescriptions match.

Conclusions

Together, the study developed TxGNN for zero-shot drug repurposing, specifically targeting diseases for which limited data and therapeutic options are available. TxGNN consistently outperforms existing methods by providing multi-hop interpretable explanations for its predictions, increasing confidence and usability in clinical workflows. Furthermore, the predicted drugs are consistent with the medical consensus of human experts and are consistent with off-label prescription rates in EMRs.

TxGNN's interpretable multi-hop explanations provide new levels of transparency, promote trust, and improve model integration into clinical workflows.