LMLF — LLM-Driven Molecule & Materials Generation
Semi-automatic pipeline using LLMs with logical feedback loops to propose, filter, and validate novel molecules and materials
Motivation
LLMs trained on chemistry literature have absorbed an enormous amount of structure-activity relationship (SAR) knowledge. The question: can we use them as hypothesis generators in drug discovery and materials science — not as oracles, but as a creative front-end to a rigorous validation pipeline?
The naive approach (prompt → SMILES → done) doesn’t work: LLMs hallucinate invalid molecules, ignore synthesizability, and don’t respect target-specific constraints. We needed an architecture that closes the loop.
Architecture
Target spec (protein / material property)
↓
LLM candidate generation (GPT-4 / Claude)
↓
Validity filter (RDKit, Lipinski, SCScore)
↓
Retrosynthesis check (ASKCOS / similar)
↓
Quantitative validation (docking / MD / DFT)
↓
Feedback prompts → LLM (next iteration)
The feedback loop is the key. Failed candidates are returned to the LLM with structured reasons (“violates Lipinski Rule of 5 on H-bond donors”, “no synthetic route under 5 steps”, “predicted binding affinity below threshold”). The LLM uses these to refine its next round.
Two deployments
Drug discovery (APPCAIR, BITS Pilani)
Targeting JAK2 kinase and dopamine beta-hydroxylase (DBH):
- Generated 20+ novel inhibitor candidates passing all filters
- Achieved 15-20% success rate (candidates passing validity + retrosynthesis + docking thresholds)
- Inhibitor scaffolds proposed by the system have been forwarded for experimental validation
Low-k dielectric materials (Deep Forest Sciences)
Targeting novel materials for semiconductor interlayer dielectrics:
- LLM-generated material candidates validated via Molecular Dynamics + DFT simulations
- Screened 40+ candidate materials
- Reduced screening cycle time by 83% (30+ days → 5 days)
What came of it
- Production deployment at Deep Forest Sciences for their Prithvi drug discovery platform
- Inhibitor candidates from the JAK2/DBH pipeline forwarded for wet-lab synthesis
- Methodology generalizes — same feedback-loop pattern now being adapted for catalyst design
Stack: Python · RDKit · GPT-4 / Claude APIs · ASKCOS · Gnina (docking) · LAMMPS (MD)
Collaborators: Prof. Ashwin Srinivasan (APPCAIR) · Bharath Ramsundar (Deep Forest Sciences)