Docs
Profile clustering — methodology
Population Profile Analysis -- Methodology
What Profiles Represent
AUSynth segments the Australian population into 8 demographic profiles based on a combination of age, sex, marital status, household relationship, income, education, occupation, industry, and labour force engagement.
These profiles are derived from statistical patterns in the data. They are not predefined categories but emerge from clustering analysis applied to real ABS Census data (adjusted to 2025-26 values).
How Profiles Were Created
Profiles are derived from Australian Census records using multiple correspondence analysis (MCA) combined with K-means clustering. MCA reduces 9 categorical variables into a continuous coordinate space while preserving the relationships between categories. K-means then groups people with similar MCA coordinates into distinct profiles.
Each of the 9 input variables is grouped into broad, commercially meaningful categories (for example, age is grouped into five life stages rather than 21 five-year brackets). This simplification improves the clarity of the resulting profiles without losing meaningful demographic distinctions.
Records with substantive responses across the profiling variables were used to fit the model; the fitted model then assigns a profile to every record in the synthetic population.
A small share of records have extensive non-response in the original Census data. These are still assigned a profile based on the fitted model, but with lower confidence -- the model maps them to the nearest demographic cluster based on the information that IS available.
No records are excluded from profile assignment. Profile composition in customer reports represents the full population of the selected geography.
How Many Profiles?
We evaluated solutions from 4 to 10 profiles using silhouette scores (a measure of cluster separation) and interpretability. Eight profiles were selected because they best balance statistical fit with practical usefulness: fewer profiles miss important distinctions (such as conflating labourers with trades workers); more profiles become too granular to be actionable.
The 8 Profiles
-
Retired and semi-retired (20.2%) -- Older Australians (70% aged 60+), predominantly not in employment, with low incomes. About two-thirds are partnered. This profile captures retirees, pensioners, and those winding down from the workforce.
-
Non-earning dependants (5.2%) -- People reporting zero income (94%) across all ages but skewing older. Includes non-working spouses financially supported by a partner, carers, and some early retirees without income sources yet.
-
Young singles (21.1%) -- Younger Australians (58% under 35), overwhelmingly single (94%), split between part-time and full-time work. Includes students, early-career workers, and young adults living in shared or family households.
-
Degree-qualified professionals (12.5%) -- University-educated (64% bachelor degree or higher), concentrated in managerial and professional occupations. Mostly full-time workers (78%) with high incomes. Slightly more male (61%).
-
Trades and technical workers (9.1%) -- Overwhelmingly male (92%) with vocational qualifications (52% Certificate III-IV). Concentrated in trades occupations (41%) and full-time employment (71%). Spread across working ages.
-
Established partnered households (19.8%) -- Predominantly female (80%) and partnered (86%), this is the second largest profile. Most are family heads working part-time or full-time in clerical, service, or professional roles. Spans mid-career to senior ages.
-
Labourers and operators (8.7%) -- Mainly male (87%) in labouring and machinery operation roles (56%). Full-time workers (68%) with mid-range incomes. Spread across trade, construction, and logistics industries.
-
Transport and logistics workers (3.4%) -- Concentrated in the transport and logistics industry (97%). Two-thirds male, spanning all working ages. A mix of full-time (56%) and part-time (30%) workers.
How To Use The Profile Composition Analysis
The grouped bar chart shows the percentage of your area's population in each profile (blue bars) alongside the national average (grey bars). Profiles are sorted by the size of the difference.
What to look for:
-
Over-represented profiles (blue bar taller than grey): these define what makes your area distinctive. An area with 25% degree-qualified professionals versus 12.5% nationally has a markedly different service and housing profile.
-
Under-represented profiles (blue bar shorter than grey): these indicate gaps relative to the national mix. An area with few retirees may have less demand for aged care services.
-
Concentration vs diversity: areas where one or two profiles dominate have more predictable characteristics; areas with an even spread require broader planning approaches.
Variables Used
The 9 variables that define profiles are: age (5 life-stage groups), sex, marital status (partnered/single), relationship in household (4 roles), personal income (5 brackets), highest education (5 levels), occupation (5 groups), industry (5 sectors), and labour force status (4 categories). Geographic variables are deliberately excluded so that profiles describe who people are, not where they live.
Methodological Notes
Profile assignments are deterministic: each person is assigned to their nearest cluster centroid in MCA space. The distance to that centroid is also recorded, providing a measure of how typical each person is of their assigned profile. People near cluster boundaries could plausibly belong to adjacent profiles.
Records with high rates of non-response in the original Census data are still assigned a profile, but flagged with lower confidence. The model assigns them to the nearest demographic cluster based on the information that is available. This means profile composition in your reports always represents the full population -- no records are excluded.
This analysis uses real ABS Census data (adjusted to 2025-26) processed through statistical clustering. The profiles describe actual Australian population structure, not artefacts of the synthesis methodology.