Module README
01 Synthetic AMR Data
Generates and validates fictitious microbiology data for three simulated hospital nodes.
Each node has a different volume and epidemiological profile to avoid an artificially symmetric demo. Generation uses a fixed seed for reproducibility, but does not produce exactly the same number of rows per node.
The data includes:
- Priority AMR pathogen.
- Antibiotic.
- Synthetic MIC.
- Resistant/susceptible interpretation.
- Fictitious resistance mechanism.
- Minimum aggregated clinical context.
- Indicator of genomics availability.
There are no real patients, real identifiers, or EHDS data.
Run
From Desarrollo:
python .\01_datos_sinteticos_amr\generar_dataset_sintetico.py
python .\01_datos_sinteticos_amr\validar_calidad.py
Outputs
salida/nodos/*.csv: one CSV per simulated node.salida/manifest.json: generation summary.
Simulated profiles
- Madrid Norte: higher proportion of CRE and greater carbapenem pressure.
- Barcelona Mar: higher volume and more genomics availability.
- Valencia Turia: more CRPA, higher simulated ICU rate, and greater resistance shift.
These profiles are pedagogical; they do not represent real hospitals.