Docs

Quality benchmarks

Quality Benchmarks

This page is for technical users who want to evaluate AUSynth's statistical fidelity before incorporating it into research or production workflows. If you are looking for a general overview of data quality, see the Methodology page.

Headline Metric

AUSynth has been validated against ABS Census 2021 source data using standard fidelity metrics. Median Standardised Root Mean Square Error (SRMSE) across validation tests is 0.04, well within published benchmarks for high-quality synthetic population data (typically below 0.10 is considered good fit, and below 0.05 is considered strong fit).

This median is computed across all validated variable relationships and geographic units. It reflects how closely the synthetic data reproduces the conditional and marginal distributions observed in the source Census; the core measure of whether a synthetic dataset preserves the statistical structure of the real population.

What SRMSE Measures

SRMSE quantifies the average deviation between a synthetic distribution and its target, normalised by the mean category proportion. A score of 0.04 means the average deviation across categories is 4% of the expected proportion. For a variable with 10 equally likely categories (each at 10%), an SRMSE of 0.04 corresponds to typical deviations of about 0.4 percentage points per category; close enough to be indistinguishable in most analytical applications.

The normalisation makes SRMSE comparable across variables with different numbers of categories. A score of 0.04 for a 2-category variable (like sex) and a 21-category variable (like age group) both indicate the same relative fidelity.

How This Compares

Published synthetic population studies report a range of quality outcomes depending on the number of variables, geographic granularity, and generation method. For context:

Farooq et al. (2013), using Markov Chain Monte Carlo methods on Swiss travel survey data, report SRMSE values in the 0.02-0.08 range for core demographic variables. Casati et al. (2015), combining hierarchical simulation with generalised raking, achieve comparable fit for the Zurich metropolitan area. Beckman et al. (1996), using iterative proportional fitting on US Census data, report strong marginal fit but note degradation in conditional relationships; a challenge that MCMC-based approaches handle more naturally.

AUSynth's median SRMSE of 0.04 across 15,343 suburbs and 24 person-level variables places it in the strong-fit range of this literature. The validation covers not just marginal distributions (how well each variable's overall proportions match) but conditional distributions (how well relationships between variables are preserved); a stricter test that many synthetic data products do not report.

What Drives Quality Variation

Not all suburbs achieve the same fidelity. Two factors dominate quality variation across geographic units.

Population size. Larger suburbs have richer source data; more observations per cell in the conditional probability tables, which produces tighter synthetic distributions. Suburbs above 5,000 Census persons consistently achieve SRMSE below 0.05. Very small suburbs (below 100 Census persons, flagged with small_suburb_flag) show higher variance, reflecting genuine statistical sparsity in the source data rather than a generation deficiency.

Variable complexity. Variables with fewer categories (sex, broad age groups) are reproduced with near-perfect fidelity. Variables with many categories that interact in complex ways (detailed occupation by industry by income) show more variation. This is inherent to reconstructing a high-dimensional joint distribution from lower-dimensional conditional inputs; a challenge shared by all synthetic population methods.

Validation Approach

Quality is assessed at two levels. Marginal validation checks whether the proportion of people in each category of each variable matches the ABS target for that suburb. Conditional validation checks whether the relationships between pairs and triples of variables are preserved, for example, whether the income distribution for 30-34-year-old women in Toorak matches the Census pattern.

Both validation levels use SRMSE as the primary metric, with Total Variation Distance (TVD) as a secondary check. The 0.04 headline reflects the conditional validation, which is the stricter of the two tests.

Limitations

SRMSE measures distributional fidelity, not individual-record accuracy. A synthetic dataset can achieve perfect SRMSE while containing no record that matches any real individual. This is by design. High aggregate fidelity does not guarantee that every specific cross-tabulation in every small suburb will be precisely reproduced.

Dwelling-level data shows lower fidelity than person or family data in v1.0, because the three datasets are currently generated independently. The planned v1.1 release will introduce person-family-dwelling linking, which is expected to bring dwelling quality in line with the other datasets.

References

Farooq, B., Bierlaire, M., Hurtubia, R., & Flotterod, G. (2013). Simulation based population synthesis. Transportation Research Part B: Methodological, 58, 243-263.

Beckman, R. J., Baggerly, K. A., & McKay, M. D. (1996). Creating synthetic baseline populations. Transportation Research Part A: Policy and Practice, 30(6), 415-429.

Casati, D., Muller, K., Fourie, P. J., Erath, A., & Axhausen, K. W. (2015). Synthetic population generation by combining a hierarchical, simulation-based approach with reweighting by generalized raking. Transportation Research Record, 2493(1), 107-116.


See also: Methodology · FAQ · Glossary