Back

Scientific Publications

29 peer-reviewed papers, conference proceedings, and preprints spanning AI for science, computational biology, drug discovery, and RNA design.

2026
RECOMB 2026
[2]
mRNA-GPT: A Novel Generative Model for mRNA Design

Li S., Chauvin P., Gross O., Bailey M., Jager S.

RECOMB 2026

Introduces a generative language model tailored to mRNA sequence design, enabling the de novo generation of optimised therapeutic mRNA molecules with improved stability and translational efficiency.

Under review
2025
JCompBio 2025
[3] Accepted
Deep Batch Active Learning for Protein Structure Modeling

Xue Z., Bailey M., Gupta A., Li R., Corrochano-Navarro A., Li S., Kogler-Anele L., Yu Q., Rommelaere H., Van Breedam W., Furtmann N., Batchelor J., Bar-Joseph Z., Jager S.

Journal of Computational Biology, 2025

Applies deep batch active learning to efficiently guide experimental protein structure determination, dramatically reducing the number of experiments needed to build accurate structural models.

In press
NeurIPS SimBioChemm 2025
[4] Highlighted · Forwarded to Nature
Evaluating Time-Series Foundation Models as Zero-Shot Surrogates for Mechanistic Virtual Patients

Mulyadi A.W., Barker C.G., Kuttae S.B., Wehling L., Rückle T., Singh G., Boucher N., Abdessalem F., Jager S., Siokis A., et al.

EurIPS/NeurIPS SimBioChem 2025

Evaluates whether large time-series foundation models can act as zero-shot surrogates for mechanistic virtual patient models in clinical pharmacology — eliminating costly model re-training. Highlighted and forwarded to Nature.

Read Paper
NeurIPS AI4Science 2025
[5]
BioMedReasoner: Towards Multi-Hop Reasoning using Path-based Relational Learning on Biomedical Knowledge Graphs

Mulyadi A.W., Wehling L., Kumar A., Boucher N., Abdessalem F., Jager S., Mosa M.H., et al.

NeurIPS AI4Science 2025

A multi-hop reasoning framework over biomedical knowledge graphs using path-based relational learning, enabling complex question answering across linked biological entities.

Read Paper
NeurIPS Negel 2025
[6]
Solvaformer: an SE(3)-equivariant Graph Transformer for Small Molecule Solubility Prediction

Broadbent J., Bailey M., Li M., Paul A., De Lescure L., Chauvin P., Kogler-Anele L., Jangjou Y., Jager S.

NeurIPS Negel 2025

An SE(3)-equivariant graph transformer that predicts aqueous solubility of small molecules with state-of-the-art accuracy by leveraging 3D geometric information invariant to rotation and translation.

Read Paper
ICML FM4LS 2025
[7] Selected Talk
NextGenPLM: A Novel Structure-Infused Foundational Protein Language Model for Antibody Discovery and Optimization

Gupta A., Li R., Leal C., Sevvana M., Mackness B.C., Casner R.G., Batchelor J.D., Park A., Bailey M., Kogler Anele L., Jager S., et al.

ICML FM4LS 2025 — Selected Talk

A structure-aware protein language model that integrates 3D structural information into sequence embeddings, achieving superior performance on antibody discovery and optimisation benchmarks.

Read Paper
RECOMB 2025
[10]
Active Learning for Protein Structure Prediction

Xue Z., Bailey M., Gupta A., Li R., Corrochano-Navarro A., Li S., Kogler-Anele L., Yu Q., Bar-Joseph Z., Jager S.

RECOMB 2025 · Lecture Notes in Computer Science, Springer · pp. 17–33

An active learning framework that selectively acquires experimental protein structure data to maximally improve prediction models, minimising the number of expensive crystallography experiments required.

Read Paper
I&EC Res. 2025
[11]
Mechanistic Exploration and Kinetic Modeling Through In Silico Data Generation and Probabilistic Machine Learning Analysis

Li X., Amirmoshiri R., Davis C.R., Muthancheri I., de Gombert A., Moayedpour S., Jager S., Rötheli A.R., Jangjou Y.

Industrial & Engineering Chemistry Research, Vol. 64, Issue 13, 2025 · pp. 6825–6837

Combines in silico data generation with probabilistic machine learning to explore reaction mechanisms and build kinetic models for complex chemical processes, accelerating process development.

Read Paper
NAR 2025
[12]
mRNA-LM: Full-Length Integrated SLM for mRNA Analysis

Li S., Noroozizadeh S., Moayedpour S., Kogler-Anele L., Xue Z., Zheng D., Ulloa Montoya F., Agarwal V., Bar-Joseph Z., Jager S.

Nucleic Acids Research, Vol. 53, Issue 3, 2025

A full-length sequence language model for mRNA that jointly models UTR and coding regions, enabling comprehensive mRNA analysis, stability prediction, and sequence optimisation for therapeutic applications.

Read Paper
arXiv 2025
[13]
Distilling and Exploiting Quantitative Insights from Large Language Models for Enhanced Bayesian Optimization of Chemical Reactions

Patel R., Moayedpour S., De Lescure L., Kogler-Anele L., Cherney A., Jager S., Jangjou Y.

arXiv, 2025

Distils quantitative chemistry knowledge from LLMs into a prior for Bayesian optimisation, significantly accelerating the search for optimal chemical reaction conditions and reducing laboratory experiments.

Read Paper
2024
Genome Res. 2024
[14]
CodonBERT Large Language Model for mRNA Vaccines

Li S., Moayedpour S., Li R., Bailey M., Riahi S., Kogler-Anele L., Miladi M., Miner J., Pertuy F., Zheng D., Wang J., Agarwal V., Bar-Joseph Z., Jager S., et al.

Genome Research, Vol. 34, Issue 7, 2024 · pp. 1027–1035

A large language model pre-trained on codon sequences that improves mRNA vaccine design by predicting translation efficiency and stability, directly supporting the development of next-generation mRNA therapeutics.

Read Paper
ICML AI4Science 2024
[15]
Many-Shot In-Context Learning for Molecular Inverse Design

Moayedpour S., Corrochano-Navarro A., Sahneh F., Noroozizadeh S., Koetter A., Vymetal J., Kogler-Anele L., Mas P., Jangjou Y., Li S., Bailey M., Matter H., Grebner C., Hessler G., Bar-Joseph Z., Jager S.

ICML AI for Science Workshop 2024

Demonstrates that many-shot in-context learning with LLMs can effectively tackle molecular inverse design, generating molecules with target properties without additional fine-tuning.

Read Paper
Bioinform. 2024
[16]
Representations of Lipid Nanoparticles using Large Language Models for Transfection Efficiency Prediction

Moayedpour S., Broadbent J., Riahi S., Bailey M., Thu H.V., Dobchev D., Balsubramani A., Santos R.N.D., Kogler-Anele L., Corrochano-Navarro A., Li S., Montoya F.U., Agarwal V., Bar-Joseph Z., Jager S.

Bioinformatics, Vol. 40, Issue 7, 2024

Uses large language model embeddings to represent lipid nanoparticles (LNPs) and predict their mRNA transfection efficiency, accelerating the design of superior mRNA delivery vehicles.

Read Paper
2023
NeurIPS GenBio 2023
[17] Spotlight & Talk
CodonBERT: Large Language Models for mRNA Design and Optimization

Li S., Moayedpour S., Li R., Bailey M., Riahi S., Miladi M., Miner J., Zheng D., Wang J., Balsubramani A., Agarwal V., Bar-Joseph Z., Jager S., et al.

NeurIPS Generative AI and Biology 2023 — Spotlight Paper & Talk

The original CodonBERT spotlight paper: a codon-level BERT model that learns rich representations of mRNA sequences, enabling optimisation of codon usage for improved expression in therapeutic contexts.

Read Paper
eLife 2023
[18]
Deep Batch Active Learning for Drug Discovery

Bailey M., Moayedpour S., Li R., Corrochano-Navarro A., Kötter A., Kogler-Anele L., Riahi S., Grebner C., Hessler G., Matter H., Bianciotto M., Mas P., Bar-Joseph Z., Jager S.

eLife, Vol. 12, 2023

Applies deep learning-guided batch active learning to drug discovery, intelligently selecting which compounds to screen experimentally to identify potent candidates while minimising assay costs.

Read Paper
Bioinform. 2023
[19]
Surface ID: A Geometry-Aware System for Protein Molecular Surface Comparison

Riahi S., Lee J.H., Sorenson T., Wei S., Jager S., Olfati-Saber R., Zhou Y., Park A., Wendt M., Minoux H., Qiu Y.

Bioinformatics, Vol. 39, Issue 4, 2023

A geometry-aware computational system for comparing protein molecular surfaces using 3D shape descriptors, enabling structure-based drug design and protein function annotation across large protein databases.

Read Paper
2015 – 2019
Oxf. SynBio 2019
[20]
Optimization of the Experimental Parameters of the Ligase Cycling Reaction

Schlichting N., Reinhardt F., Jager S., Schmidt M., Kabisch J.

Oxford Synthetic Biology, Vol. 4, Issue 1, 2019

Systematically optimises the experimental conditions for the ligase cycling reaction (LCR), a key technique in synthetic biology for seamless gene assembly and library construction.

Read Paper
IEEE/ACM TCBB 2019
[21]
SICOR: Subgraph Isomorphism Comparison of RNA Secondary Structures

Schmidt M., Hamacher K., Reinhardt F., Lotz T.S., Groher F., Suess B., Jager S.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019

A subgraph isomorphism-based method for comparing RNA secondary structures, identifying structural similarities and functional relationships across large RNA databases with high accuracy.

Read Paper
ACS SynBio 2019
[22]
Tuning the Performance of Riboswitches using Machine Learning

Groher A., Jager S., Schneider C., Hamacher K., Suess B.

ACS Synthetic Biology, 2019

Applies machine learning to rationally tune riboswitch performance, enabling programmable and quantitatively predictable gene regulation for synthetic biology circuits.

Read Paper
PhD Thesis TU Darmstadt 2018
[23]
Development of Computer-aided Concepts for the Optimization of Single-Molecules and their Integration for High-Throughput Screenings

Jager S.

TU Darmstadt, 2018

Doctoral thesis developing computational methods for optimising single molecules and integrating these approaches into automated high-throughput screening workflows for accelerated drug and material discovery.

Read Thesis
BIOspektrum 2018
[24]
Neue in silico-Methoden für die Etablierung einer Grünen Chemie

Jager S., Buss O.

BIOspektrum, Issue 01/18, Springer, 2018

Reviews novel in silico computational methods for establishing green chemistry practices, highlighting how computer-aided approaches can reduce hazardous reagents and waste in chemical synthesis.

Read Paper
NAR 2018
[25]
Riboswitching with Ciprofloxacin — Development and Characterization of a Novel RNA Regulator

Groher F., Bofill-Bosch C., Schneider C., Braun J., Jager S., Geißler K., Hamacher K., Suess B.

Nucleic Acids Research, 2018

Develops and characterises a novel RNA-based genetic switch triggered by the antibiotic ciprofloxacin, expanding the toolkit for ligand-responsive gene regulation in synthetic biology.

Read Paper
JCompChem 2018
[26]
StreaMD: Advanced Analysis of Molecular Dynamics using R

Dombrowsky M.J., Jager S., Schiller B., Mayer B.E., Stammler S., Hamacher K.

Journal of Computational Chemistry, 2018

A stream-based R framework for advanced statistical analysis of molecular dynamics simulation trajectories, enabling real-time analysis of large-scale MD simulations without full trajectory storage.

Read Paper
ChemBioChem 2017
[27]
Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX

Buss O., Muller D., Jager S., Rudat J., Rabe K.S.

ChemBioChem, 2017

Uses the computational protein design tool FoldX to engineer improved thermostability in an industrially relevant transaminase enzyme, demonstrating in silico-guided biocatalyst engineering.

Read Paper
Alg. Mol. Bio. 2017
[28]
StreAM-Tg: Algorithms for Analyzing Coarse Grained RNA Dynamics Based on Markov Models of Connectivity-Graphs

Jager S., Schiller B., Babel P., Blumenroth M., Strufe T., Hamacher K.

Algorithms for Molecular Biology, 12(1):15, 2017

Introduces Markov model-based algorithms for analysing coarse-grained RNA molecular dynamics, capturing conformational dynamics of RNA structures from connectivity-graph representations.

Read Paper
ACS JCIM 2017
[29]
Cleavage Product Accumulation Decreases the Activity of Cutinase during PET Hydrolysis

Gross C., Hamacher K., Schmitz K., Jager S.

ACS Journal of Chemical Information and Modeling, 57(2):243–255, 2017

Computationally investigates how cleavage product accumulation progressively inhibits cutinase activity during enzymatic PET plastic biodegradation, informing enzyme engineering for improved plastic recycling.

Read Paper
WABI 2016
[30]
StreAM-Tg: Algorithms for Analyzing Coarse Grained RNA Dynamics Based on Markov Models of Connectivity-Graphs

Jager S., Schiller B., Strufe T., Hamacher K.

WABI 2016 · Lecture Notes in Computer Science, Vol. 9838, Springer

Conference version presenting the StreAM-Tg algorithm for coarse-grained RNA dynamics analysis, demonstrating its application to benchmark RNA systems and comparison with all-atom MD.

Read Paper
PLoS ONE 2016
[31]
Statistical Evaluation of HTS Assays for Enzymatic Hydrolysis of β-Keto Esters

Buss O., Jager S., Dold S.-M., Zimmermann S., Hamacher K., Schmitz K., Rudat J.

PLoS One, 11(1):e0146104

Develops a statistical framework for evaluating high-throughput screening assays targeting enzymatic β-keto ester hydrolysis, enabling reliable hit identification and false positive reduction in enzyme discovery campaigns.

Read Paper
AlCoB 2015
[32]
StreaM — A Stream-Based Algorithm for Counting Motifs in Dynamic Graphs

Schiller B., Jager S., Hamacher K., Strufe T.

AlCoB 2015 · Lecture Notes in Computer Science, Vol. 9199, Springer

Introduces a memory-efficient stream-based algorithm for counting network motifs in large dynamic graphs, applicable to biological interaction networks and real-time social network analysis.

Read Paper