Frequently Asked Questions
We’ve compiled a list of frequently asked questions to provide you with quick and informative answers, whether you’re seeking details about data linkage generally, data collections available, application processes, or publishing research.
If you have a specific question which isn’t addressed on this page, please contact us directly and we will do our best to assist with your query.
Data Linkage
What is data linkage?
Data linkage is the establishment of linkage keys that enable data integration of two or more datasets. This technique connects information that is thought to relate to the same person, family, place or event for analysis.
What is a linkage key?
A linkage key is a unique identifier generated via data linkage methodologies. These are used to join up the content data of the records to produce an integrated dataset without releasing identifying information.
What is the separation principle?
The separation principle operates to protect the identities of individuals and organisations in datasets. This best practice protocol must be applied when producing an integrated dataset.
The principle mandates that staff who have access to demographic data for the purpose of linkage are prevented from accessing content data for the relevant service record. A person working with data in an integrated dataset is similarly prevented from accessing the demographic data used to generate linkage keys.
WA health entities performing data linkage and integration must ensure acceptable work practices are maintained with respect to confidentiality of patient records and other sensitive information by complying with the separation principle.
This protocol, developed by Kelman, Bass and Holman and first used in WA in 2001, is now used widely by a number of linkage centres around the world. You can find out more on our Privacy page.
What is linked data used for?
There are many applications for linked data, including:
- Population based health research and policy development
- To investigate potential projects i.e., testing hypotheses and pilot studies
- As a capture-recapture tool, to improve the quality of datasets
- For follow-up and comparison of different treatment regimes
- To study the aetiology, co-morbidities, and outcomes of disease
For more information, please see our document on the Limitations and Suitable Use of Linked Data (PDF, 493 KB).
How does Data Linkage Services ensure the validity of its links?
The Department of Health employs a variety of approaches and tools to ensure that the links we make between records and chains are of the highest quality.
For more detailed information, please refer to the linkage quality paper available for download from our Publications page.
What data items are required for ad-hoc data linkage?
Data linkage is most often performed using probabilistic data linkage techniques, but can also be done deterministically, or pseudo deterministically.
To achieve the maximum possible linkage rate, mandatory data fields include:
- Unique record ID
- First name(s)
- Surname(s)
- Date of birth
- Address
- Postcode
- Fields related to core linked datasets (ie. UMRN)
If only some of these fields can be provided, linkage can still be done, however the linkage rate is often much lower. Additionally, in cases where there is limited demographic information provided a record might link equally well to two or more individuals. In such cases it will not be linked at all to avoid the overprovision of data.
Other recommended data items may not be compulsory for linkage but are valuable as they can increase linkage quality, completeness, and efficiency. This includes, for example:
- Person ID
- Phone number
- Unit medical record number (UMRN)
- Medicare number
- Sex
- Date of event or service
- Middle name(s)
- Suburb
See our Information for Data Providers page for more details.
Our Data
What datasets are linked?
Data Linkage Services routinely links more than twenty health data collections. Electoral, birth and death records are also linked under special arrangements with the WA Electoral Commission and the WA Registry of Births, Deaths and Marriages.
Information on data collections routinely linked by the Department of Health is available on our Available Datasets page.
New demographic information is received on a regular basis to ensure that links in the WADLS remain as up to date as possible.
What is "core" data?
We refer to the population health data collections managed within the Department of Health as the “core” data.
Electoral records, birth and death registrations are also considered to be “core” data.
Who do I contact to discuss data variables?
Queries related to data variables should be directed to the relevant Data Custodian(s).
Please refer to our Available Datasets page for Custodian contact information.
Where can I find SEIFA and RA codes?
Socio-Economic Indexes For Areas (SEIFA) and Remoteness Area (RA) mapping files are available for download from the Australian Bureau of Statistics. You can find links to these files in the ‘Geocoding Resources’ section of the Dataset Information page.
Data collections which are routinely geocoded to include Statistical Areas (SA1 or SA2) are compatible with these SEIFA/RA mapping files.
How does geocode "match score" relate to the "radius" previously supplied?
Both “match score” and “radius” indicate the accuracy of the geocode data. Match score indicates the quality of the inputs (address details) to the geocoding algorithm whilst radius indicates a rough error or tolerance of the geocode in terms of a radius e.g., 5km. See MatchCodes and MatchScores for more details.
Applying for Data
Who can access linked data?
Access to linked data is granted to applicants who have obtained approval from the relevant Data Custodians to ensure the data requested is appropriate for the purpose of the project.
For research projects, approval is also required from the WA Health Central Human Research Ethics Committee and from the Research Governance Office.
Other approvals may be required depending on the nature of the request and the data collection(s) being applied for. Strict protocols must be followed to ensure the confidentiality and security of linked data.
For more information on data applications for research purposes, please visit our Application Process page.
I want to apply for linked data - what are the next steps?
A comprehensive guide to applying for research projects is available on our Application Process page.
For non-research applications, contact us to discuss data application requirements with a Project Coordinator.
Once you are familiar with the application process, download relevant forms and variable lists from the Application Forms page to complete and submit to ISPD Client Services.
Why can't I access certain variables?
There are some variables contained in the data collections which are deemed to be identifiable or potentially identifiable (e.g. name, full date of birth, address).
The National Health and Medical Research Council (NHMRC) National Statement states that the public benefit of using personal health information must outweigh the risk to privacy; therefore wherever possible only non-identifiable data will be released for medical and health research.
How do I know the progress of my project?
Researchers can enter their project number into the Project Application Tracker to check the progression of their application.
Non-research applicants are advised to contact the ISPD Client Services team for status updates.
How much will my project cost?
In recognition of the system-wide benefits of providing data services to generate evidence and inform service planning and policy, Data Linkage Services have implemented a partial cost recovery of 15% for all projects, and a cap on the total cost for project delivery. Costs are determined according to set criteria which are outlined in the Prioritisation and Costing Framework (PDF, 368 KB).
We recommend that potential applicants check our Price Estimate Calculator (found on page 4 of the Project Application Tracker) to see how charging principles are applied.
Please note that prices in the calculator are an approximation only – a more accurate estimate is provided to applicants once Data Custodians have given in-principle support.
How are projects prioritised?
A prioritisation framework is used to ensure consistency and transparency in actioning requests for data.
Considerations such as availability of resources, existing data request commitments, the complexity of the request, and the purpose for which the data is required, will affect prioritisation.
The criteria for prioritisation is outlined in detail in our Prioritisation and Costing Framework (PDF, 368 KB).
How long does it take to deliver data requests?
We endeavour to deliver data for each project within 6 months of the receipt of formal Data Custodian approval, subject to the applicant finalising research governance approvals in a timely manner (where applicable).
An estimated timeline is provided to applicants in line with the principles of the Prioritisation and Costing Framework (PDF, 368 KB).
Exceptional circumstances, such as an escalation for further review, failure of an applicant to respond to feedback, or delays in the provision of external data, may warrant a longer timeframe.
Receiving Data
How will my data be delivered?
All complete data files are delivered to the relevant analyst via a secure online file transfer system, e.g., SURE or MyFT. Files are encrypted and the password is sent separately.
There is something wrong with my data - who do I contact?
If you have identified an issue with data you have received from Data Linkage Services, please immediately contact the ISPD Client Services team.
Research
How do I add or remove personnel to my research project?
Change of project personnel requires review and approval from the WA Health Central Human Research Ethics Committee (HREC) and the Research Governance Office (RGO).
To add or remove personnel, submit an online Amendment Form to the Department of Health HREC Executive Officer via the WA Health Research Governance System (RGS).
For more information, see our Amendments page.
I have discovered a breach in protocol, what do I do?
Please immediately contact the ISPD Client Services team for advice.
Publications
What am I allowed to release in publications?
No information that will directly or indirectly identify individuals should be released in publications. When there are a small number of people in a study group, be careful about describing details (e.g., cause of death) in the text. The same applies to tables, graphs and maps.
Cell suppression for small cell counts is generally applied to all project outputs, unless you wish to seek specific approval from the ethics committee to publish such information.
How do I acknowledge Data Linkage Services and the Department of Health WA in publications?
Acknowledging Data Linkage Services, the Department of Health WA and other data collections in publications is a requirement for secondary use of data or information under section 3.1.50 of the National Health and Medical Research Council’s Australian Code for the Responsible Conduct of Research, and part of the undertakings signed to by Principal Investigators.
The acknowledgement will vary according to the individual project, but here are some examples:
Acknowledgement 1: Standard Project | The authors wish to thank the staff at Data Linkage Services, the Department of Health WA, and [insert names of Data Collections involved]. |
Acknowledgement 2: More complex project | The authors wish to thank the Linkage, Data Outputs and Client Services teams at Data Linkage Services Western Australia, in particular [insert names of staff who provided extra help], as well as [insert names of Data Collections/Custodians involved]. |
Acknowledgement 3: Required where Cause of Death Unit Record File (COD URF) data has been used for analysis | The authors wish to thank the Australian Co-ordinating Registry, the Registries of Births, Deaths and Marriages, the Coroners, the National Coronial Information System and the Victorian Department of Justice and Community Safety for enabling COD URF data to be used for this publication. |
Acknowledgement 4: National study using data collections from multiple states |
The authors wish to thank the staff of the data linkage units of the State and Territory health departments (WA, Victoria, SA-NT, NSW, QLD) for the linkage of the data. Further, we thank the data custodians for the provision of the following data:
|
We also encourage Data Applicants to acknowledge the people of Western Australia, whose data is being used for these projects.
What things should I check in my draft output prior to publication?
Please review your draft outputs prior to publication to ensure:
- The draft output checklist has been completed with relevant information;
- Small cell counts have been suppressed (at a minimum, expressed as <=5), unless you have been granted approval from the WA Health Central HREC to publish smaller counts;
- Data Collections are appropriately named (refer to Available Datasets);
- Data Collections and the Department of Health are included in the acknowledgements section (not required for abstracts);
- The output is consistent with the aims and methodology detailed in your approved Application for Data and ethics application.
What should I include in a data availability statement in publications?
The following wording can be used for data availability statements in publications:
“The datasets generated and/or analysed during the current study are not publicly available due to the terms of the ethics approval granted by the WA Health Central Human Research Ethics Committee (HREC) and data disclosure policies of the Data Providers. The datasets may be available from the corresponding author upon request and subject to approval from the HREC and relevant custodians.”