Docs
Data dictionary
Customer downloads carry integer index codes for every variable. Use the dictionary to map codes to human-readable labels.
Data Dictionary
What This Is
Every AUSynth record contains integer-coded variables. The integer values correspond to ABS Census 2021 category labels, for example, AGE5P = 3 means "15–19 years" and SEXP = 0 means "Male". The data dictionary maps every integer index to its human-readable ABS label for all 44 variables across the three datasets.
You will need the dictionary whenever you want to interpret, tabulate, or present AUSynth output. Raw integer codes are compact and efficient for computation, but meaningless without the mapping.
Three Formats
The dictionary is available in three formats. All contain identical mappings.
Excel (ausynth_dictionary.xlsx); A workbook with four sheets: Persons (21 variables), Families (9 variables), Dwellings (14 variables), and About (version and source metadata). Each sheet lists every variable code, its description, and all category index–label pairs. Use this for quick reference or to share with colleagues who work in spreadsheets.
Python (ausynth_dictionary.py); A standalone Python module you can drop into any project. Import VARIABLE_LABELS for a nested dictionary of {variable: {index: label}} mappings, or use the apply_labels() convenience function to add label columns to a pandas DataFrame in one call. No dependencies beyond pandas.
from ausynth_dictionary import apply_labels
df = pd.read_parquet("paddington_persons.parquet")
df = apply_labels(df, ["AGE5P", "SEXP", "INCP"])
# New columns: AGE5P_label, SEXP_label, INCP_label
R (ausynth_dictionary.R); An R script that defines variable_labels as a named list of named character vectors. Source it and use apply_label() for a single variable or apply_labels_df() to label multiple columns at once. Returns proper factors with levels ordered to match the ABS category sequence.
source("ausynth_dictionary.R")
df <- arrow::read_parquet("paddington_persons.parquet")
df <- apply_labels_df(df, c("AGE5P", "SEXP", "INCP"))
# New columns: AGE5P_label, SEXP_label, INCP_label
Variable Coverage
The dictionary covers all variables delivered in AUSynth output:
Persons (21 variables): AGE5P, SEXP, INCP, LFSP, OCCP, INDP, HRWRP, MSTP, RLHP, BPLP, HEAP, INGP, QALFP, HSCP, TYPP, SIEMP, NPRD, LTHP, HLTHP, CLTHP, ANZS1P.
Families (9 variables): FMCF, CDCF, FINF, LFSF, HCFMF, INGF, SSCF, TRCF, RELF.
Dwellings (14 variables): DWTD, TEND, BEDRD, MRERD, RNTRD, STRD, HIND, VEHRD, NPRD_D, HHCD, HCFMD, INGDWTD, CPLTHRD, LANDLLD.
A Note on "Not Applicable" and "Not Stated"
Two category labels appear frequently and mean different things.
Not applicable is a structural code. It means the variable does not logically apply to this record, for example, a child aged 0–14 has OCCP = Not applicable because the ABS does not collect occupation data for children. This is not missing data. It is definitionally correct.
Not stated means the respondent did not answer the question, or the response could not be coded. The ABS applies this to a small proportion of records for most variables. In AUSynth, "Not stated" appears at roughly the same rate as in the Census cross-tabulations.
Classification Source
AUSynth uses category labels derived from ABS Census 2021 classifications. Category granularity varies by variable, for example, RLHP (Relationship in Household) has 33 categories while SEXP (Sex) has 2. If you are comparing AUSynth output to other ABS products, minor differences in category groupings may exist. The category labels in this dictionary are authoritative for AUSynth data.
Version
This dictionary corresponds to AUSynth v1.0. Category labels are sourced from ABS Census 2021 data. Variable codes follow ABS Census Dictionary 2021 (Catalogue 2901.0) conventions. If future ABS releases change category definitions, updated dictionaries will be published alongside the corresponding AUSynth version.
See also: Methodology · FAQ · Glossary