Educational simulation

No candidate on this page has validated biological activity. Scores are didactic proxies to explain the funnel.

The problem we target

AMR creates infections with few therapeutic options. Scientists need to connect dispersed data, prioritize isolates, and choose candidates with limited experimental capacity.

Where AI helps

AI can reduce the search space, filter sequences, prioritize candidates, and guide active learning, but validation remains experimental.

Where GPT-5.5/Codex helps

GPT-5.5/Codex helps build the platform, document, simulate, code, test, and explain. It does not replace biomolecular models or wet lab work.

ProteoGPT-like inside PROTEONEXT

PROTEONEXT is not ProteoGPT 2.0: it is a sovereign platform that could encapsulate specialized protein LLMs and connect them with federated AMR data, governance, MLOps, and experimental validation.

Protein LLM / ProteoGPT-like Specialized scientific engine to generate, transform, or prioritize peptide sequences. It still requires filters, benchmarks, and validation.

PROTEONEXT Translational ecosystem: federated nodes, Azure, Fabric, Purview, confidential computing, MLOps, traceability, and scientific partners.

GPT-5.5 / Codex Technical copilot to build, document, test, and explain the platform. It does not decide real biological activity.

Generated sequences10000

Shortlist30

Biologically validated0

Prioritization funnel

Each stage discards candidates. The key point is that a computational shortlist is still not biological evidence.

Mass generation 10000

100.00%

Physicochemical filter 1682

16.82%

Simulated safety filter 1680

16.80%

Score ranking 1680

16.80%

Synthesis shortlist 30

0.30%

Real biological validation 0

0.00%

Top 30 didactic candidates

These candidates are useful to inspect scoring, not to make real scientific decisions.

#	Sequence	Score	Length	Charge	Hydrophobicity	Solubility proxy
1	`HRARMWVKRRQ`	99.8	11	+6	0.36	0.82
2	`SIQIMERKRIAMKKRLHKFQMPK`	99.7	23	+8	0.39	0.82
3	`WKYHIKINQVHSVSIRH`	99.0	17	+6	0.35	0.78
4	`CCIHTFIKKNKKAQMRRQSLFA`	98.8	22	+7	0.36	0.76
5	`INMKAWHAWMGCANHHKRMRTQER`	98.7	24	+7	0.38	0.76
6	`NNIAIVFGPHKHVLRLHGRKSK`	98.5	22	+8	0.36	0.75
7	`KMIAKNRCVHHRGNKVTTIVI`	98.4	21	+7	0.38	0.74
8	`HSFDHRTMHFFAK`	98.3	13	+4	0.38	0.74
9	`SIALDAWSHHQSHQRWHIMASQKVKNLVC`	98.3	29	+6	0.41	0.74
10	`MVCDWKKIWKNGHLNKRSVR`	98.3	20	+6	0.35	0.74
11	`IFKDKVMYHLLWKTASTKHHD`	98.1	21	+5	0.38	0.73
12	`HKHFITVDNLINSLLTRKSC`	98.0	20	+4	0.35	0.72
13	`SNCFVFFSEFIQNWKAMKILHKSQDKKYTK`	98.0	30	+5	0.37	0.72
14	`KGHSATRHKTIHVAVHAVPEFVDTGQATRV`	98.0	30	+6	0.37	0.72
15	`FLNRFIKNKVHDHPKV`	97.9	16	+5	0.38	0.72
16	`GILHWRQKYKAKCPHFERWRAKEAMFWHFN`	97.9	30	+8	0.40	0.72
17	`RHSHKWPFWITTVRRIHFAPAWWNPKGN`	97.9	28	+8	0.39	0.71
18	`HLNNLISTKWMVFKHNT`	97.8	17	+4	0.41	0.71
19	`MNTVAIKTFHLHGNKHE`	97.8	17	+4	0.35	0.71
20	`MVRQMEHRWLFCANAQKEPMHKRHM`	97.7	25	+6	0.40	0.71
21	`VSAKEFHATLWCKVIHPNNLKQVQKIR`	97.7	27	+6	0.41	0.71
22	`TERKYMKDHLQAMPRKANQAWRCRFIW`	97.7	27	+6	0.37	0.71
23	`HEMKMRAKMHEVTE`	97.7	14	+2	0.36	0.71
24	`FCHQDHVAAEAVKCHTKRAVSH`	97.6	22	+5	0.36	0.70
25	`WDHIFHREMHT`	97.6	11	+2	0.36	0.70
26	`WFLVKCKKVEIHAYAKLSFRRIPFRECHH`	97.6	29	+8	0.41	0.70
27	`MCIKKVQAHQTHSI`	97.5	14	+4	0.36	0.70
28	`QQNKTHLHFRLIGV`	97.5	14	+4	0.36	0.70
29	`HILIPFNKSKHYKRVLWRMMCWLPRHD`	97.5	27	+8	0.41	0.69
30	`HWWRSFQTLH`	97.5	10	+3	0.40	0.69

Module README

06 Scientific Problem and AI

This module explains the challenge that PROTEONEXT aims to address and simulates the central bottleneck in antimicrobial peptide discovery:

Generating many sequences is easy; deciding which ones deserve synthesis and assay is hard.

The real problem as of May 1, 2026

Antimicrobial resistance (AMR) is a health, scientific, and industrial problem. In infections caused by multidrug-resistant pathogens — especially hospital Gram-negatives — clinicians may have very few therapeutic options. Scientists not only need new molecules; they need better ways to prioritize candidates, connect hospital data, leverage experimental results, and reduce pointless iterations.

In PROTEONEXT the initial focus is best understood through three groups:

CRAB: Acinetobacter baumannii resistant to carbapenems.
CRE: Carbapenem-resistant Enterobacterales.
CRPA: Pseudomonas aeruginosa resistant to carbapenems.

Why AI

AI can help in different tasks:

Federated analytics: understand which data and isolates exist without moving sensitive rows.
Predictive models: prioritize phenotypes, nodes, mechanisms, and candidates.
Generative protein AI: explore AMP sequences at scale.
Filters and scoring: discard unlikely candidates before synthesis.
Active learning: choose the next experimental batch to learn more with fewer assays.

AI does not validate antimicrobial activity. Real validation requires MIC/MBC, hemolysis, cytotoxicity, stability, and additional assays.

Where GPT-5.5/Codex helps

GPT-5.5/Codex helps as a technical copilot:

Building didactic simulators.
Generating and reviewing code.
Creating validators, APIs, dashboards, and tests.
Translating scientific concepts into Microsoft architecture.
Helping document decisions and risks.
Preparing prompts, playbooks, and literature reviews for humans.

It should not be used as a biomedical validator or as a substitute for specialized protein LLM models, QSAR, scientific committees, or wet lab work.

ProteoGPT-like inside PROTEONEXT

PROTEONEXT should not be explained as a "version 2.0" of ProteoGPT. They are different by nature:

ProteoGPT or a ProteoGPT-like model represents the specialized scientific layer: protein language models capable of generating, transforming, or prioritizing peptide sequences.
PROTEONEXT represents the sovereign translational platform: federated AMR data, governance, security, confidential computing, MLOps, Fabric/Purview, traceability, scientific partners, and experimental validation.
GPT-5.5/Codex represents the technical copilot: it helps build, test, document, explain, and operate the platform, but does not replace biomolecular models or wet lab.

The correct relationship is that PROTEONEXT could encapsulate or integrate ProteoGPT-like models as a specialized generative engine. The differential contribution of PROTEONEXT is not just generating peptides, but connecting that generation with authorized microbiological/genomic data, privacy, governance, MLOps, and an experimental validation loop.

Module 07 didactically simulates that model-lab-model loop: a computational shortlist moves to fictitious assays and those results guide the next iteration. It is an analogy of SPEL-like active learning, not a real biological execution.

Funnel simulation

The script simular_funnel_peptidos.py generates 10,000 synthetic peptides of 8 to 40 amino acids and applies a funnel:

Mass generation.
Basic physicochemical filter.
Simulated safety filter.
Score ranking.
Shortlist of 30 candidates.

Results are written to:

salida/funnel_resultados.json
salida/shortlist_peptidos.csv

Run

From Desarrollo:

& 'C:\ProgramData\miniconda3\python.exe' .\06_problema_cientifico_ia\simular_funnel_peptidos.py

Learning path

Generating is easy; prioritizing is hard

The problem we target

Where AI helps

Where GPT-5.5/Codex helps

ProteoGPT-like inside PROTEONEXT

Prioritization funnel

Top 30 didactic candidates

Module README

06 Scientific Problem and AI

The real problem as of May 1, 2026

Why AI

Where GPT-5.5/Codex helps

ProteoGPT-like inside PROTEONEXT

Funnel simulation

Run