Learning path

SPEL and ProteoGPT-like analogy

This module reframes active learning as a model-lab-model loop: generate/prioritize, assay a limited batch, and use feedback to decide the next iteration.

Generate and prioritize A ProteoGPT-like model proposes candidates; filters and scores reduce the search space.
Assay less, assay better A real lab would measure MIC/MBC, toxicity, and stability. Here everything is simulated to understand the process.
Learn the next round Results guide new rules, retraining, or reprioritization under governance and traceability.
Computational shortlist30
Assayed candidates10
Promising4

Learning cycle

Not everything is tested: score exploitation is combined with diversity exploration to learn more with less lab work.

30input candidates
10selected
10simulated assays
3recommendations

Wet lab classification

active toxic
4
promising
4
safe low activity
2

Learning for the next round

  • Optimizar analogos cercanos a los candidatos promising manteniendo carga y reduciendo longitud si es posible.
  • Explorar variantes menos hidrofobicas de candidatos activos pero toxicos.
  • Usar perfiles seguros de baja actividad como region segura, pero aumentar carga o anfipaticidad simulada.

Simulated results

Each row represents a candidate selected for a fictitious first synthesis and assay batch.

Selection Sequence MIC CRAB MIC CRE MIC CRPA Hemolysis Cytotoxicity Stability Class
exploitation HRARMWVKRRQ 4.0 2.0 2.0 21.6% 7.7% 85.2% active toxic
exploitation SIQIMERKRIAMKKRLHKFQMPK 1.0 2.0 1.0 9.9% 14.9% 73.3% promising
exploitation WKYHIKINQVHSVSIRH 0.5 4.0 8.0 6.7% 4.2% 54.0% promising
exploitation CCIHTFIKKNKKAQMRRQSLFA 8.0 8.0 2.0 17.6% 13.0% 57.6% safe low activity
exploitation INMKAWHAWMGCANHHKRMRTQER 2.0 2.0 4.0 10.7% 8.7% 56.2% promising
exploitation NNIAIVFGPHKHVLRLHGRKSK 4.0 4.0 2.0 23.5% 29.0% 77.3% active toxic
exploration HEMKMRAKMHEVTE 4.0 16.0 4.0 4.5% 27.5% 71.1% active toxic
exploration SNCFVFFSEFIQNWKAMKILHKSQDKKYTK 4.0 2.0 16.0 15.1% 33.2% 44.5% active toxic
exploration HLNNLISTKWMVFKHNT 1.0 8.0 8.0 17.0% 19.1% 47.3% safe low activity
exploration HWWRSFQTLH 2.0 2.0 4.0 3.7% 9.3% 59.0% promising

Module README

07 Simulated Active Learning

This module simulates the next bottleneck after the peptide funnel:

We already have 30 computational candidates, but the lab cannot assay everything. The first batch must be chosen wisely.

Objective

Simulate a first fictitious wet-lab round from the shortlist produced by module 06_problema_cientifico_ia.

This module also serves to explain the bridge to SPEL-like approaches: generating candidates via a Protein LLM or ProteoGPT-like is not enough. An experimental batch must be selected, activity/toxicity/stability measured, and that feedback used to decide the next iteration.

The didactic flow is:

Computational shortlist of 30 peptides
        |
        v
Selection of 10 candidates
        - 6 by exploitation: best score
        - 4 by exploration: diversity of length, charge, and hydrophobicity
        |
        v
Simulated wet lab
        - MIC against CRAB/CRE/CRPA
        - Hemolysis
        - Cytotoxicity
        - Serum stability
        |
        v
Classification
        - promising
        - active_but_toxic
        - safe_low_activity
        - discard
        |
        v
Recommendations for second round

What the model learns

The simulation illustrates the idea of active learning: the highest scores are not always selected. It is also valuable to assay diverse candidates to learn which regions of sequence space deserve exploration or discarding.

In PROTEONEXT, the real equivalent would require:

  • criteria defined by scientific partners,
  • real peptide synthesis,
  • MIC/MBC assays,
  • hemolysis, cytotoxicity, and stability,
  • result recording with traceability,
  • governed retraining or re-prioritization.

Here all of that is reduced to a pedagogical simulation to understand the decision pattern.

Run

From Desarrollo:

& 'C:\ProgramData\miniconda3\python.exe' .\07_active_learning_simulado\simular_active_learning.py

Outputs:

  • salida/resultados_wetlab_simulados.csv
  • salida/active_learning_resumen.json

Disclaimer

Results are synthetic and pedagogical. There is no real antimicrobial activity, no real assays, and no clinical or biological value.