ENSURE THE QUALITY OF YOUR DATA
Ensuring data quality involves considering what could go wrong during data handling and how risks can be managed.
For example:
With interview data, a risk might be data alteration during transcription—the transcriber might accidentally skip a part of the recording, resulting in missing content. Or, in a literature review, a risk could be the omission of key research articles due to an inadequate search strategy, which would compromise the reliability and consistency of the dataset.
Consider the following questions in advance:
- How will you ensure that the original raw data remains untouched?
- How will you prevent accidental changes to the data during processing (= immutability)?
- How will you ensure the data remains error-free throughout its life-cycle?
- How will you ensure your dataset is consistent and coherent?
- Are there risks that could compromise the reliability of the data content?
Best practices for ensuring data quality:
- Make backups of your data so you can revert to a previous version if something goes wrong.
- Store data and backups on the university’s U-drive, which is secure and automatically backed up.
- Make a copy of the raw data or initial state and work with the copy.
- Check that the original data content remains intact when transferring data between systems, software, file formats, or physical locations.
- For example, when exporting survey results from a survey software to an analysis software, or when transcribing interview recordings.
- Review transcriptions of recorded (audio/video) materials yourself or together with your thesis partner.
- Note: If the data contains personal data (e.g., participant's voice), it cannot be reviewed by just anyone.
- In joint thesis projects, reviewing transcriptions together is a good practice.
- Ensure that the interview structure and questions are as consistent as possible across all interviewees.
- Check the calibration of measurement devices.
- Use checksums if the software provides them.
- Ensure that digitized data accurately reflects the original physical or analog material.
- For example, when writing down citations from a physical book to your word processor.
Source: University of Helsinki