Learning path
Nodes3
Isolates1160
AST results4640

Minimum contract tables

The goal is not copying the whole medical record, but building the minimum useful dataset for federated AMR.

patients Pseudonymous demographic minimum: age band, sex, and simulated region.
encounters Hospital encounter context: month, ICU flag, infection focus, and aggregated outcome.
specimens Microbiology samples with sample type, local code, and simulated terminology reference.
isolates Bacterial isolates with AMR group, species, resistance mechanism, and genomics availability.
ast_results Antimicrobial susceptibility results with antibiotic, MIC, breakpoint, and S/I/R interpretation.
genomics Optional bacterial genomics metadata: MLST, AMR genes, and simulated plasmid marker.

Volume by node

Each node keeps its local tables. The central platform only needs contracts and authorized aggregates.

Node Patients Encounters Specimens Isolates AST Genomics
nodo_madrid_norte 327 327 327 353 1412 165
nodo_barcelona_mar 433 433 433 459 1836 286
nodo_valencia_turia 315 315 315 348 1392 121

Synthetic quality

OK

  • one_patient_per_encounter: OK
  • one_specimen_per_isolate: OK
  • four_ast_results_per_isolate: OK
  • no_direct_identifiers: OK

Privacy

No real patients, no direct identifiers, and no dates below month-level granularity.

Module README

08 More Realistic AMR Data Model

This module evolves the simple CSV from the start into a relational AMR model closer to what a hospital node would need to normalize before participating in PROTEONEXT.

No real data is used. Everything is synthetic and pedagogical.

Problem it solves

In a real hospital, data does not arrive as a clean table. It is usually spread across:

  • Clinical systems.
  • LIS / microbiology.
  • Antibiograms and MIC.
  • Bacterial sequencing.
  • Local catalogs and terminologies.

The goal is not to copy the entire medical record. The goal is to build the minimum useful dataset for AMR:

  • Episode and minimum clinical context.
  • Sample and sample type.
  • Bacterial isolate.
  • AST/MIC results.
  • Resistance mechanisms.
  • Genomic metadata if available.

Synthetic tables

Each node generates these tables:

Table Content
patients.csv Pseudonymous patients with age band, sex, and simulated region
encounters.csv Hospital episodes with ICU/non-ICU flag and aggregated outcome
specimens.csv Microbiological samples
isolates.csv Bacterial isolates and AMR group
ast_results.csv Antibiotic, MIC, S/I/R interpretation, and breakpoint
genomics.csv Simulated genomic metadata, AMR genes, and MLST

Also generated:

  • salida/data_dictionary.json
  • salida/quality_report.json
  • salida/manifest.json

Nodes do not have the same size or profile. Each simulated hospital has different volume, pathogen mix, ICU rate, genomics availability, and resistance pressure to make the data more credible.

Run

From Desarrollo:

& 'C:\ProgramData\miniconda3\python.exe' .\08_modelo_datos_amr\generar_modelo_amr.py

Relationship with EHDS

This module does not implement EHDS, OMOP, or real FHIR. It simulates the type of data contracts we would need to discuss with hospitals and scientific partners before connecting federated nodes.