<!-- markdownlint-disable MD013 MD041 -->
# STROBE Checklist — Cross-Sectional Studies (Methodological Demonstration)

**Manuscript:** Calcium Status in Pediatric Lower Respiratory Tract Infection: A Methodological Demonstration of an Automated Waterfall Research Pipeline on a Synthetic Teaching Dataset
**Reporting standard:** STROBE Statement, version 4 (cross-sectional checklist), applied to a methodology demonstration on a synthetic teaching dataset.
**Important:** No human subjects research was performed; the dataset is a synthetic teaching artefact (`LRTI_Hypothetical_Database.xlsx`, identifiers `Patient_1`…`Patient_100`). STROBE items that presuppose human-subject data collection (12d, 13b, 13c, 16c) are noted as such.

| Item No. | Recommendation | Reported on (manuscript section) |
|---|---|---|
| 1a | Indicate the study's design with a commonly used term in the title or the abstract | Title (“methodological demonstration … on a synthetic teaching dataset”); Abstract → Methods |
| 1b | Provide in the abstract an informative and balanced summary of what was done and what was found | Abstract (structured Background / Methods / Results / Conclusions; explicit synthetic-data disclosure) |
| 2 | Background/rationale | Introduction (paragraphs 1–3) |
| 3 | Objectives, including any prespecified hypotheses | Introduction (final paragraph: prevalence, total–ionised concordance, age stratification, demote-on-failure logistic) |
| 4 | Study design: present key elements of study design early in the paper | Methods → Data and dataset provenance; Methods → Statistical analysis |
| 5 | Setting: locations and relevant dates | Methods → Data and dataset provenance (synthetic dataset; no human setting; pipeline run dates listed in the public snapshot) |
| 6a | Eligibility criteria, sources and methods of selection of participants | Methods → Data and dataset provenance (no participants; 100 simulated records, 1–5 y, LRTI label, paired total + ionised calcium) |
| 7 | Variables: clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers; diagnostic criteria | Methods → Hypocalcaemia definitions and reference ranges (cut-offs and reference intervals defined with citations) |
| 8 | Data sources / measurement | Methods → Data and dataset provenance (synthetic source spreadsheet); Methods → Statistical analysis (calcium variable types and units) |
| 9 | Bias: describe any efforts to address potential sources of bias | Methods → Statistical analysis (Shapiro–Wilk-gated test selection, BH-FDR, 5-fold CV with demotion gate); Discussion → Limitations |
| 10 | Study size: explain how the study size was arrived at | Methods → Precision considerations (post-hoc) — n = 100 fixed by the synthetic dataset |
| 11 | Quantitative variables: explain how quantitative variables were handled | Methods → Statistical analysis (continuous summaries; age banded into 1-y windows for stratified comparison; logistic on continuous age) |
| 12a | Statistical methods | Methods → Statistical analysis |
| 12b | Methods used to examine subgroups and interactions | Methods → Statistical analysis (4 age bands; Kruskal–Wallis with one-way ANOVA where each band n ≥ 5) |
| 12c | How missing data were addressed | Methods → Statistical analysis + Results (5/5 variables 100 % complete; no imputation needed) |
| 12d | Methods taking account of sampling strategy | Not applicable (synthetic dataset; no sampling) |
| 12e | Sensitivity analyses | Methods → Statistical analysis (in-sample vs CV-AUC optimism; Bland–Altman in mg/dL plus z-scored secondary; demotion gate at CV-AUC < 0.65) |
| 13a | Numbers of individuals at each stage of study | Results → Cohort characteristics (n = 100, 5/5 variables 100 % complete) |
| 13b | Reasons for non-participation at each stage | Not applicable (synthetic dataset) |
| 13c | Flow diagram | Not included; no exclusions on the synthetic dataset |
| 14a | Descriptive data: characteristics of participants, exposures, confounders | Results → Cohort characteristics; Table 1 |
| 14b | Number of participants with missing data for each variable | Supplementary Table S1 (all variables 100 % complete) |
| 15 | Outcome data: report numbers of outcome events or summary measures | Results → Hypocalcaemia prevalence and concordance |
| 16a | Main results: unadjusted estimates with precision (e.g. 95 % CI); confounder-adjusted estimates if applicable | Results → Hypocalcaemia prevalence and concordance (Wilson 95 % CIs); Total vs ionised calcium agreement (with explicit bias=0-by-construction caveat for the mg/dL Bland–Altman); Age-stratified analysis. No covariate adjustment by design. |
| 16b | Category boundaries when continuous variables were categorised | Methods → Statistical analysis (age bands 1–<2, 2–<3, 3–<4, 4–5 y); Results → Age-stratified analysis |
| 16c | Translate estimates of relative risk into absolute risk for a meaningful time period | Not applicable (no time-to-event component) |
| 17 | Other analyses: subgroups, interactions, sensitivity analyses | Results → Predictive modelling – negative result (logistic with 5-fold CV; both models demoted under pre-specified CV-AUC < 0.65 gate) |
| 18 | Key results: summarise with reference to study objectives | Discussion → Principal findings |
| 19 | Limitations: discuss limitations of the study, including sources of potential bias or imprecision; direction and magnitude of any potential bias | Discussion → Limitations (synthetic-data limits explicitly enumerated) |
| 20 | Interpretation: cautious overall interpretation considering objectives, limitations, multiplicity of analyses, results from similar studies | Discussion → all sub-sections; Conclusion (no clinical inference drawn) |
| 21 | Generalisability (external validity) | Discussion → Limitations + Implications for the pipeline (synthetic data; methodology generalises, biology does not) |
| 22 | Funding source | Declarations → Funding (no external funding) |

---

**Statement.** This manuscript is a methodological demonstration on a synthetic dataset and is reported in accordance with the STROBE Statement for cross-sectional studies, applied where each item is meaningful for synthetic data. All 22 items have been addressed; items not applicable on a synthetic dataset (12d, 13b, 13c, 16c) are explicitly justified above.

**Source.** STROBE Statement v4 — <https://www.strobe-statement.org/checklists/>.