Simulated active learning
From 30 computational candidates to a 10-assay simulated batch to learn what deserves optimization.
SPEL and ProteoGPT-like analogy
This module reframes active learning as a model-lab-model loop: generate/prioritize, assay a limited batch, and use feedback to decide the next iteration.
Learning cycle
Not everything is tested: score exploitation is combined with diversity exploration to learn more with less lab work.
Wet lab classification
Learning for the next round
- Optimizar analogos cercanos a los candidatos promising manteniendo carga y reduciendo longitud si es posible.
- Explorar variantes menos hidrofobicas de candidatos activos pero toxicos.
- Usar perfiles seguros de baja actividad como region segura, pero aumentar carga o anfipaticidad simulada.
Simulated results
Each row represents a candidate selected for a fictitious first synthesis and assay batch.
| Selection | Sequence | MIC CRAB | MIC CRE | MIC CRPA | Hemolysis | Cytotoxicity | Stability | Class |
|---|---|---|---|---|---|---|---|---|
| exploitation | HRARMWVKRRQ |
4.0 | 2.0 | 2.0 | 21.6% | 7.7% | 85.2% | active toxic |
| exploitation | SIQIMERKRIAMKKRLHKFQMPK |
1.0 | 2.0 | 1.0 | 9.9% | 14.9% | 73.3% | promising |
| exploitation | WKYHIKINQVHSVSIRH |
0.5 | 4.0 | 8.0 | 6.7% | 4.2% | 54.0% | promising |
| exploitation | CCIHTFIKKNKKAQMRRQSLFA |
8.0 | 8.0 | 2.0 | 17.6% | 13.0% | 57.6% | safe low activity |
| exploitation | INMKAWHAWMGCANHHKRMRTQER |
2.0 | 2.0 | 4.0 | 10.7% | 8.7% | 56.2% | promising |
| exploitation | NNIAIVFGPHKHVLRLHGRKSK |
4.0 | 4.0 | 2.0 | 23.5% | 29.0% | 77.3% | active toxic |
| exploration | HEMKMRAKMHEVTE |
4.0 | 16.0 | 4.0 | 4.5% | 27.5% | 71.1% | active toxic |
| exploration | SNCFVFFSEFIQNWKAMKILHKSQDKKYTK |
4.0 | 2.0 | 16.0 | 15.1% | 33.2% | 44.5% | active toxic |
| exploration | HLNNLISTKWMVFKHNT |
1.0 | 8.0 | 8.0 | 17.0% | 19.1% | 47.3% | safe low activity |
| exploration | HWWRSFQTLH |
2.0 | 2.0 | 4.0 | 3.7% | 9.3% | 59.0% | promising |
Module README
07 Simulated Active Learning
This module simulates the next bottleneck after the peptide funnel:
We already have 30 computational candidates, but the lab cannot assay everything. The first batch must be chosen wisely.
Objective
Simulate a first fictitious wet-lab round from the shortlist produced by module 06_problema_cientifico_ia.
This module also serves to explain the bridge to SPEL-like approaches: generating candidates via a Protein LLM or ProteoGPT-like is not enough. An experimental batch must be selected, activity/toxicity/stability measured, and that feedback used to decide the next iteration.
The didactic flow is:
Computational shortlist of 30 peptides
|
v
Selection of 10 candidates
- 6 by exploitation: best score
- 4 by exploration: diversity of length, charge, and hydrophobicity
|
v
Simulated wet lab
- MIC against CRAB/CRE/CRPA
- Hemolysis
- Cytotoxicity
- Serum stability
|
v
Classification
- promising
- active_but_toxic
- safe_low_activity
- discard
|
v
Recommendations for second round
What the model learns
The simulation illustrates the idea of active learning: the highest scores are not always selected. It is also valuable to assay diverse candidates to learn which regions of sequence space deserve exploration or discarding.
In PROTEONEXT, the real equivalent would require:
- criteria defined by scientific partners,
- real peptide synthesis,
- MIC/MBC assays,
- hemolysis, cytotoxicity, and stability,
- result recording with traceability,
- governed retraining or re-prioritization.
Here all of that is reduced to a pedagogical simulation to understand the decision pattern.
Run
From Desarrollo:
& 'C:\ProgramData\miniconda3\python.exe' .\07_active_learning_simulado\simular_active_learning.py
Outputs:
salida/resultados_wetlab_simulados.csvsalida/active_learning_resumen.json
Disclaimer
Results are synthetic and pedagogical. There is no real antimicrobial activity, no real assays, and no clinical or biological value.