Learning path

Technical summary

PROTEONEXT Technical Summary

PROTEONEXT is conceived as a sovereign R&D platform to accelerate the discovery and prioritization of antimicrobial peptides against multidrug-resistant pathogens.

The defensible promise is not to create a clinical antibiotic within a few months. The defensible promise is to build a reproducible platform that reduces the search space, prioritizes candidates, and closes a first lab-model loop without moving sensitive data.

Natural role of SYNTAX

SYNTAX should lead the technology layer:

  • Azure landing zone.
  • Microsoft Entra ID, RBAC, PIM, and workload identities.
  • Private network, Private Link, Azure Firewall, and private DNS.
  • Key Vault or Managed HSM.
  • Federated analytics and federated learning.
  • MLOps, model registry, and traceability.
  • Microsoft Purview for governance and lineage.
  • Fabric and Power BI for project control with aggregated data.
  • Defender for Cloud and Sentinel for security.
  • Confidential computing and attestation when the risk justifies it.

Natural role of scientific partners

Scientific and clinical partners should contribute:

  • Clinical microbiology.
  • Bacterial isolates.
  • Antibiograms, MIC, and resistance phenotypes.
  • Bacterial genomics when available.
  • MIC/MBC, hemolysis, cytotoxicity, and stability assays.
  • Scientific criteria on pathogens, panels, and candidates.

Security rule

Sensitive data stays at source. In the federated design, queries, code, models, metrics, protected gradients, or authorized artifacts travel — not raw rows.

First didactic laboratory

The Desarrollo folder starts with synthetic data and simulated nodes. This allows learning and demonstrating:

  • What an antibiogram is.
  • What MIC means.
  • How to aggregate AMR metrics without moving rows.
  • How to train a basic federated model.
  • How to score a peptide in a simplified way.
  • Where confidential computing fits in Azure.

Glossary

PROTEONEXT Glossary

Scientific and AMR

Term Brief explanation
AMR Antimicrobial resistance. Resistance of microorganisms to antimicrobials.
One Health Approach connecting human, animal, and environmental health.
AMP Antimicrobial peptide. Peptide with possible antimicrobial activity.
MIC Minimum inhibitory concentration. Lowest concentration that inhibits bacterial growth.
MBC Minimum bactericidal concentration. Lowest bactericidal concentration.
AST Antimicrobial susceptibility testing. Test to measure susceptibility to antimicrobials.
CRAB Carbapenem-resistant Acinetobacter baumannii.
CRE Carbapenem-resistant Enterobacterales.
CRPA Carbapenem-resistant Pseudomonas aeruginosa.
Hemolysis Rupture of red blood cells; signal of toxicity for some peptides.
Cytotoxicity Damage to human or mammalian cells.
Active learning The model chooses which new experiments would provide the most information.

Data and federation

Term Brief explanation
Node Hospital, laboratory, or entity that keeps data locally.
Federated analytics Distributed queries where only aggregates leave the node.
Federated learning Distributed training where each node trains locally and shares parameters.
Secure aggregation Technique so the coordinator sees only protected aggregates.
DUA Data use agreement.
DPIA Data protection impact assessment.

Microsoft / Azure

Term Brief explanation
Entra ID Identity, access, groups, RBAC, PIM, and applications.
Private Link Private access to PaaS services without public exposure.
Key Vault Management of secrets, keys, and certificates.
Managed HSM Managed HSM for keys with higher requirements.
Purview Data governance: catalog, lineage, classification, and policies.
Fabric Analytics platform for lakehouse, pipelines, reporting, and Power BI.
Sentinel SIEM/SOAR for monitoring and response.
Confidential VM Virtual machine with in-use memory protection.
Attestation Cryptographic verification that a workload runs in an expected environment.

Concept map

Concept Map

AMR pathogens and local data
        |
        v
Simulated hospital nodes
        |
        | No raw rows leave
        v
Federated analytics
        |
        | Only aggregates leave
        v
Dashboard / Fabric / Power BI

Synthetic data + future experimental results
        |
        v
Local models per node
        |
        | Only weights or gradients leave
        v
Federated learning
        |
        v
Global model
        |
        v
AMP candidate scoring
        |
        v
Wet lab validation by scientific partners

How to read it from the Microsoft perspective

  • Federation defines the data pattern: do not centralize sensitive rows.
  • Entra ID defines who can launch jobs and from where.
  • Key Vault / HSM protects secrets, keys, and job signing.
  • Purview documents which data exists, who uses it, and with what lineage.
  • Fabric / Power BI shows aggregates, quality, FL rounds, and AMP funnel.
  • Confidential Computing protects sensitive computation and provides attestation.

What this laboratory does not solve

  • Does not validate a therapeutic candidate.
  • Does not replace microbiologists or bioinformaticians.
  • Does not use real data.
  • Does not determine the final Azure architecture.
  • Does not review licenses for models such as ESM, AlphaFold, or RFdiffusion.