RNAvec & LSRPI
Compact RNA representations + residue-level RNA-protein interaction prediction from primary sequences alone
The problem
RNA-protein interactions (RPIs) are central to gene regulation, splicing, mRNA stability, and many disease mechanisms. The state of the art for residue-level RPI prediction has two big issues:
- It needs 3D structure. Most accurate methods condition on AlphaFold-like structural predictions or experimentally resolved complexes. Both are expensive: AlphaFold inference at scale is heavy, and resolved RNA-protein complexes are a tiny fraction of UniProt.
- Existing sequence-only methods predict at the protein or RNA level, not residue level. You get “this protein binds this RNA” not “this residue pair makes contact.”
We wanted: residue-level RPI prediction, from primary sequences only, that runs fast on a single GPU.
The pieces
RNAvec — RNA representation
RNAvec is a compact RNA embedding trained with multiple self-supervised objectives that bake in:
- Local context — k-mer motif patterns
- Global context — long-range positional information via sinusoidal encoding
- Structural context — secondary structure propensity learned implicitly from sequence
- Self-attention — shared ResNet + secondary structure prediction head as a pretraining auxiliary
Output: dense per-nucleotide embedding (L × 32) — small, but structure-aware.
LSRPI — Location-Specific RNA-Protein Interaction model
A multimodal Transformer that takes RNAvec (RNA side) + ESM protein embeddings (protein side), fuses them, and outputs a residue-level interaction matrix (which nucleotide contacts which residue).
Architecture:
- Dual-encoder: RNAvec + ESM
- Multimodal Transformer fusion layer
- Convolutional head → predicted interaction matrix
Results
- Beats structure-aware baselines on standard RPI benchmarks despite using only sequence input
- Generates interpretable interaction matrices — saliency over both sequences highlights binding motifs and protein binding pockets
- Transfer learning hooks — RNAvec embeddings reused for RNA-binding protein prediction, RNA-RNA interaction prediction, and mRNA degradation prediction
Why it matters
A sequence-only, residue-level RPI predictor opens up:
- Large-scale screening of novel RNAs (no need to wait for structure prediction)
- Better priors for cryo-EM and crystallography target selection
- Faster iteration in RNA therapeutics design (siRNA, ASO, mRNA vaccine targets)
Status: Under review, 2026 — paper PDF
Stack: Python · PyTorch · HuggingFace · ESM-2 · Scikit-learn
Collaborators: Omkar S. Sathe (co-first author) · Sanket R. Gupte · Aman A. Kattuparambil · Prof. Ashwin Srinivasan · Prof. Raviprasad Aduri (BITS Pilani)