Module README
08 More Realistic AMR Data Model
This module evolves the simple CSV from the start into a relational AMR model closer to what a hospital node would need to normalize before participating in PROTEONEXT.
No real data is used. Everything is synthetic and pedagogical.
Problem it solves
In a real hospital, data does not arrive as a clean table. It is usually spread across:
- Clinical systems.
- LIS / microbiology.
- Antibiograms and MIC.
- Bacterial sequencing.
- Local catalogs and terminologies.
The goal is not to copy the entire medical record. The goal is to build the minimum useful dataset for AMR:
- Episode and minimum clinical context.
- Sample and sample type.
- Bacterial isolate.
- AST/MIC results.
- Resistance mechanisms.
- Genomic metadata if available.
Synthetic tables
Each node generates these tables:
| Table | Content |
|---|---|
patients.csv |
Pseudonymous patients with age band, sex, and simulated region |
encounters.csv |
Hospital episodes with ICU/non-ICU flag and aggregated outcome |
specimens.csv |
Microbiological samples |
isolates.csv |
Bacterial isolates and AMR group |
ast_results.csv |
Antibiotic, MIC, S/I/R interpretation, and breakpoint |
genomics.csv |
Simulated genomic metadata, AMR genes, and MLST |
Also generated:
salida/data_dictionary.jsonsalida/quality_report.jsonsalida/manifest.json
Nodes do not have the same size or profile. Each simulated hospital has different volume, pathogen mix, ICU rate, genomics availability, and resistance pressure to make the data more credible.
Run
From Desarrollo:
& 'C:\ProgramData\miniconda3\python.exe' .\08_modelo_datos_amr\generar_modelo_amr.py
Relationship with EHDS
This module does not implement EHDS, OMOP, or real FHIR. It simulates the type of data contracts we would need to discuss with hospitals and scientific partners before connecting federated nodes.