Fictitious wet lab

MIC, hemolysis, cytotoxicity, and stability are simulated. There is no real biological evidence.

SPEL and ProteoGPT-like analogy

This module reframes active learning as a model-lab-model loop: generate/prioritize, assay a limited batch, and use feedback to decide the next iteration.

Generate and prioritize A ProteoGPT-like model proposes candidates; filters and scores reduce the search space.

Assay less, assay better A real lab would measure MIC/MBC, toxicity, and stability. Here everything is simulated to understand the process.

Learn the next round Results guide new rules, retraining, or reprioritization under governance and traceability.

Computational shortlist30

Assayed candidates10

Promising4

Learning cycle

Not everything is tested: score exploitation is combined with diversity exploration to learn more with less lab work.

30input candidates

10selected

10simulated assays

3recommendations

Wet lab classification

active toxic

promising

safe low activity

Learning for the next round

Optimizar analogos cercanos a los candidatos promising manteniendo carga y reduciendo longitud si es posible.
Explorar variantes menos hidrofobicas de candidatos activos pero toxicos.
Usar perfiles seguros de baja actividad como region segura, pero aumentar carga o anfipaticidad simulada.

Simulated results

Each row represents a candidate selected for a fictitious first synthesis and assay batch.

Selection	Sequence	MIC CRAB	MIC CRE	MIC CRPA	Hemolysis	Cytotoxicity	Stability	Class
exploitation	`HRARMWVKRRQ`	4.0	2.0	2.0	21.6%	7.7%	85.2%	active toxic
exploitation	`SIQIMERKRIAMKKRLHKFQMPK`	1.0	2.0	1.0	9.9%	14.9%	73.3%	promising
exploitation	`WKYHIKINQVHSVSIRH`	0.5	4.0	8.0	6.7%	4.2%	54.0%	promising
exploitation	`CCIHTFIKKNKKAQMRRQSLFA`	8.0	8.0	2.0	17.6%	13.0%	57.6%	safe low activity
exploitation	`INMKAWHAWMGCANHHKRMRTQER`	2.0	2.0	4.0	10.7%	8.7%	56.2%	promising
exploitation	`NNIAIVFGPHKHVLRLHGRKSK`	4.0	4.0	2.0	23.5%	29.0%	77.3%	active toxic
exploration	`HEMKMRAKMHEVTE`	4.0	16.0	4.0	4.5%	27.5%	71.1%	active toxic
exploration	`SNCFVFFSEFIQNWKAMKILHKSQDKKYTK`	4.0	2.0	16.0	15.1%	33.2%	44.5%	active toxic
exploration	`HLNNLISTKWMVFKHNT`	1.0	8.0	8.0	17.0%	19.1%	47.3%	safe low activity
exploration	`HWWRSFQTLH`	2.0	2.0	4.0	3.7%	9.3%	59.0%	promising

Module README

07 Simulated Active Learning

This module simulates the next bottleneck after the peptide funnel:

We already have 30 computational candidates, but the lab cannot assay everything. The first batch must be chosen wisely.

Objective

Simulate a first fictitious wet-lab round from the shortlist produced by module 06_problema_cientifico_ia.

This module also serves to explain the bridge to SPEL-like approaches: generating candidates via a Protein LLM or ProteoGPT-like is not enough. An experimental batch must be selected, activity/toxicity/stability measured, and that feedback used to decide the next iteration.

The didactic flow is:

Computational shortlist of 30 peptides
        |
        v
Selection of 10 candidates
        - 6 by exploitation: best score
        - 4 by exploration: diversity of length, charge, and hydrophobicity
        |
        v
Simulated wet lab
        - MIC against CRAB/CRE/CRPA
        - Hemolysis
        - Cytotoxicity
        - Serum stability
        |
        v
Classification
        - promising
        - active_but_toxic
        - safe_low_activity
        - discard
        |
        v
Recommendations for second round

What the model learns

The simulation illustrates the idea of active learning: the highest scores are not always selected. It is also valuable to assay diverse candidates to learn which regions of sequence space deserve exploration or discarding.

In PROTEONEXT, the real equivalent would require:

criteria defined by scientific partners,
real peptide synthesis,
MIC/MBC assays,
hemolysis, cytotoxicity, and stability,
result recording with traceability,
governed retraining or re-prioritization.

Here all of that is reduced to a pedagogical simulation to understand the decision pattern.

Run

From Desarrollo:

& 'C:\ProgramData\miniconda3\python.exe' .\07_active_learning_simulado\simular_active_learning.py

Outputs:

salida/resultados_wetlab_simulados.csv
salida/active_learning_resumen.json

Disclaimer

Results are synthetic and pedagogical. There is no real antimicrobial activity, no real assays, and no clinical or biological value.

Learning path

Simulated active learning