RNAvec & LSRPI

Compact RNA representations + residue-level RNA-protein interaction prediction from primary sequences alone

The problem

RNA-protein interactions (RPIs) are central to gene regulation, splicing, mRNA stability, and many disease mechanisms. The state of the art for residue-level RPI prediction has two big issues:

  1. It needs 3D structure. Most accurate methods condition on AlphaFold-like structural predictions or experimentally resolved complexes. Both are expensive: AlphaFold inference at scale is heavy, and resolved RNA-protein complexes are a tiny fraction of UniProt.
  2. Existing sequence-only methods predict at the protein or RNA level, not residue level. You get “this protein binds this RNA” not “this residue pair makes contact.”

We wanted: residue-level RPI prediction, from primary sequences only, that runs fast on a single GPU.

The pieces

RNAvec — RNA representation

RNAvec is a compact RNA embedding trained with multiple self-supervised objectives that bake in:

  • Local context — k-mer motif patterns
  • Global context — long-range positional information via sinusoidal encoding
  • Structural context — secondary structure propensity learned implicitly from sequence
  • Self-attention — shared ResNet + secondary structure prediction head as a pretraining auxiliary

Output: dense per-nucleotide embedding (L × 32) — small, but structure-aware.

LSRPI — Location-Specific RNA-Protein Interaction model

A multimodal Transformer that takes RNAvec (RNA side) + ESM protein embeddings (protein side), fuses them, and outputs a residue-level interaction matrix (which nucleotide contacts which residue).

Architecture:

  • Dual-encoder: RNAvec + ESM
  • Multimodal Transformer fusion layer
  • Convolutional head → predicted interaction matrix

Results

  • Beats structure-aware baselines on standard RPI benchmarks despite using only sequence input
  • Generates interpretable interaction matrices — saliency over both sequences highlights binding motifs and protein binding pockets
  • Transfer learning hooks — RNAvec embeddings reused for RNA-binding protein prediction, RNA-RNA interaction prediction, and mRNA degradation prediction

Why it matters

A sequence-only, residue-level RPI predictor opens up:

  • Large-scale screening of novel RNAs (no need to wait for structure prediction)
  • Better priors for cryo-EM and crystallography target selection
  • Faster iteration in RNA therapeutics design (siRNA, ASO, mRNA vaccine targets)

Status: Under review, 2026 — paper PDF

Stack: Python · PyTorch · HuggingFace · ESM-2 · Scikit-learn

Collaborators: Omkar S. Sathe (co-first author) · Sanket R. Gupte · Aman A. Kattuparambil · Prof. Ashwin Srinivasan · Prof. Raviprasad Aduri (BITS Pilani)

References