Evidence for the use of an algorithm in resolving inconsistent and missing Indigenous status in administrative data collections
Australian Journal of Social Issues
Australian Social Policy Association
Measures of the gap in living standards, life expectancy, education, health and employment between Indigenous and non‐Indigenous Australians are primarily derived from administrative data sources. However, Indigenous identification in these data sources is affected by administrative practices, missing data, inconsistency, and error. As these factors have changed over time, assessing whether the gap between Indigenous and non‐Indigenous Australians has changed over time, based on data unadjusted for these sources of error can potentially lead to misguided conclusions. Combining administrative data on the same individuals collected from different sources provides a method by which a more consistent derived Indigenous status can be applied across all records for an individual within a linked data environment. We used the Western Australian Data Linkage system to produce derived Indigenous statuses for individuals using a range of algorithms. We found that these algorithms reduced the amount of missing data and improved within‐individual consistency. Based on these findings, we recommend our Multi‐Stage Median algorithm be used as the standard indicator of Indigenous status for any reporting based on administrative datasets when multiple datasets are available for linkage, and that algorithmic approaches also be considered for improving the quality of other demographic variables from administrative data sources.