Data scientists at the Icahn School of Medicine at Mount Sinai in New York and colleagues have created an artificial intelligence model that may more accurately predict which existing medicines, not currently classified as harmful, may in fact lead to congenital disabilities.
The model, or “knowledge graph,” described in the July 17 issue of the Nature journal Communications Medicine [DOI: 10.1038/s43856-023-00329-2], also has the potential to predict the involvement of pre-clinical compounds that may harm the developing fetus. The study is the first known of its kind to use knowledge graphs to integrate various data types to investigate the causes of congenital disabilities.
Birth defects are abnormalities that affect about 1 in 33 births in the United States. They can be functional or structural and are believed to result from various factors, including genetics. However, the causes of most of these disabilities remain unknown. Certain substances found in medicines, cosmetics, food, and environmental pollutants can potentially lead to birth defects if exposed during pregnancy.
“ We wanted to improve our understanding of reproductive health and fetal development, and importantly, warn about the potential of new drugs to cause birth defects before these drugs are widely marketed and distributed. Although identifying the underlying causes is a complicated task, we offer hope that through complex data analysis like this that integrates evidence from multiple sources, we will be able, in some cases, to better predict, regulate, and protect against the significant harm that congenital disabilities could cause.” Avi Ma’ayan, PhD, Professor, Pharmacological Sciences, and Director of the Mount Sinai Center for Bioinformatics at Icahn Mount Sinai, and senior author of the paper ”
The researchers gathered knowledge across several datasets on birth-defect associations noted in published work, including those produced by NIH Common Fund programs, to demonstrate how integrating data from these resources can lead to synergistic discoveries. Particularly, the combined data is from the known genetics of reproductive health, classification of medicines based on their risk during pregnancy, and how drugs and pre-clinical compounds affect the biological mechanisms inside human cells.
Specifically, the data included studies on genetic associations, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecule drugs.
Importantly, using ReproTox-KG, with semi-supervised learning (SSL), the research team prioritized 30,000 preclinical small molecule drugs for their potential to cross the placenta and induce birth defects. SSL is a branch of machine learning that uses a small amount of labeled data to guide predictions for much larger unlabeled data. In addition, by analyzing the topology of the ReproTox-KG more than 500 birth-defect/gene/drug cliques were identified that could explain molecular mechanisms that underlie drug-induced birth defects. In graph theory terms, cliques are subsets of a graph where all the nodes in the clique are directly connected to all other nodes in the clique.
The investigators caution that the study’s findings are preliminary and that further experiments are needed for validation.
Next, the investigators plan to use a similar graph-based approach for other projects focusing on the relationship between genes, drugs, and diseases. They also aim to use the processed dataset as training materials for courses and workshops on bioinformatics analysis. In addition, they plan to extend the study to consider more complex data, such as gene expression from specific tissues and cell types collected at multiple stages of development.
“We hope that our collaborative work will lead to a new global framework to assess potential toxicity for new drugs and explain the biological mechanisms by which some drugs, known to cause birth defects, may operate. It’s possible that at some point in the future, regulatory agencies such as the U.S. Food and Drug Administration and the U.S. Environmental Protection Agency may use this approach to evaluate the risk of new drugs or other chemical applications,” says Dr. Ma’ayan.