# STROBE Checklist — Cross-Sectional Studies (Methodological Demonstration) **Manuscript:** Calcium Status in Pediatric Lower Respiratory Tract Infection: A Methodological Demonstration of an Automated Waterfall Research Pipeline on a Synthetic Teaching Dataset **Reporting standard:** STROBE Statement, version 4 (cross-sectional checklist), applied to a methodology demonstration on a synthetic teaching dataset. **Important:** No human subjects research was performed; the dataset is a synthetic teaching artefact (`LRTI_Hypothetical_Database.xlsx`, identifiers `Patient_1`…`Patient_100`). STROBE items that presuppose human-subject data collection (12d, 13b, 13c, 16c) are noted as such. | Item No. | Recommendation | Reported on (manuscript section) | |---|---|---| | 1a | Indicate the study's design with a commonly used term in the title or the abstract | Title (“methodological demonstration … on a synthetic teaching dataset”); Abstract → Methods | | 1b | Provide in the abstract an informative and balanced summary of what was done and what was found | Abstract (structured Background / Methods / Results / Conclusions; explicit synthetic-data disclosure) | | 2 | Background/rationale | Introduction (paragraphs 1–3) | | 3 | Objectives, including any prespecified hypotheses | Introduction (final paragraph: prevalence, total–ionised concordance, age stratification, demote-on-failure logistic) | | 4 | Study design: present key elements of study design early in the paper | Methods → Data and dataset provenance; Methods → Statistical analysis | | 5 | Setting: locations and relevant dates | Methods → Data and dataset provenance (synthetic dataset; no human setting; pipeline run dates listed in the public snapshot) | | 6a | Eligibility criteria, sources and methods of selection of participants | Methods → Data and dataset provenance (no participants; 100 simulated records, 1–5 y, LRTI label, paired total + ionised calcium) | | 7 | Variables: clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers; diagnostic criteria | Methods → Hypocalcaemia definitions and reference ranges (cut-offs and reference intervals defined with citations) | | 8 | Data sources / measurement | Methods → Data and dataset provenance (synthetic source spreadsheet); Methods → Statistical analysis (calcium variable types and units) | | 9 | Bias: describe any efforts to address potential sources of bias | Methods → Statistical analysis (Shapiro–Wilk-gated test selection, BH-FDR, 5-fold CV with demotion gate); Discussion → Limitations | | 10 | Study size: explain how the study size was arrived at | Methods → Precision considerations (post-hoc) — n = 100 fixed by the synthetic dataset | | 11 | Quantitative variables: explain how quantitative variables were handled | Methods → Statistical analysis (continuous summaries; age banded into 1-y windows for stratified comparison; logistic on continuous age) | | 12a | Statistical methods | Methods → Statistical analysis | | 12b | Methods used to examine subgroups and interactions | Methods → Statistical analysis (4 age bands; Kruskal–Wallis with one-way ANOVA where each band n ≥ 5) | | 12c | How missing data were addressed | Methods → Statistical analysis + Results (5/5 variables 100 % complete; no imputation needed) | | 12d | Methods taking account of sampling strategy | Not applicable (synthetic dataset; no sampling) | | 12e | Sensitivity analyses | Methods → Statistical analysis (in-sample vs CV-AUC optimism; Bland–Altman in mg/dL plus z-scored secondary; demotion gate at CV-AUC < 0.65) | | 13a | Numbers of individuals at each stage of study | Results → Cohort characteristics (n = 100, 5/5 variables 100 % complete) | | 13b | Reasons for non-participation at each stage | Not applicable (synthetic dataset) | | 13c | Flow diagram | Not included; no exclusions on the synthetic dataset | | 14a | Descriptive data: characteristics of participants, exposures, confounders | Results → Cohort characteristics; Table 1 | | 14b | Number of participants with missing data for each variable | Supplementary Table S1 (all variables 100 % complete) | | 15 | Outcome data: report numbers of outcome events or summary measures | Results → Hypocalcaemia prevalence and concordance | | 16a | Main results: unadjusted estimates with precision (e.g. 95 % CI); confounder-adjusted estimates if applicable | Results → Hypocalcaemia prevalence and concordance (Wilson 95 % CIs); Total vs ionised calcium agreement (with explicit bias=0-by-construction caveat for the mg/dL Bland–Altman); Age-stratified analysis. No covariate adjustment by design. | | 16b | Category boundaries when continuous variables were categorised | Methods → Statistical analysis (age bands 1–<2, 2–<3, 3–<4, 4–5 y); Results → Age-stratified analysis | | 16c | Translate estimates of relative risk into absolute risk for a meaningful time period | Not applicable (no time-to-event component) | | 17 | Other analyses: subgroups, interactions, sensitivity analyses | Results → Predictive modelling – negative result (logistic with 5-fold CV; both models demoted under pre-specified CV-AUC < 0.65 gate) | | 18 | Key results: summarise with reference to study objectives | Discussion → Principal findings | | 19 | Limitations: discuss limitations of the study, including sources of potential bias or imprecision; direction and magnitude of any potential bias | Discussion → Limitations (synthetic-data limits explicitly enumerated) | | 20 | Interpretation: cautious overall interpretation considering objectives, limitations, multiplicity of analyses, results from similar studies | Discussion → all sub-sections; Conclusion (no clinical inference drawn) | | 21 | Generalisability (external validity) | Discussion → Limitations + Implications for the pipeline (synthetic data; methodology generalises, biology does not) | | 22 | Funding source | Declarations → Funding (no external funding) | --- **Statement.** This manuscript is a methodological demonstration on a synthetic dataset and is reported in accordance with the STROBE Statement for cross-sectional studies, applied where each item is meaningful for synthetic data. All 22 items have been addressed; items not applicable on a synthetic dataset (12d, 13b, 13c, 16c) are explicitly justified above. **Source.** STROBE Statement v4 — .