Description and quality of data

Table of contents

Start your data management plan by introducing your research data and adopting good practices to ensure the quality, consistency, and immutability of the data.

INTRODUCE AND DESCRIBE YOUR DATA

  • What kind of data do you produce or use?
    • Do you produce data by conducting interviews, observations, or surveys? Or do you perhaps do observations, measurements, or code?
    • Does producing data involve human participants? How will you contact and recruit them in your research?
    • Are you using previously collected and combined datasets, or maybe you analyse artwork? How do you access the data you wish to use?
    • Are you creating new parts of the dataset based on raw data, such as transcriptions, tables, charts, or visualisations?
  • How much storage space will the data require?
    • Gigabytes or physical space—an estimate is sufficient.
    • If the data requires an unusually large amount of space, acquire the necessary storage immediately.
    • Usually, the personal U-drive provided by the University of Jyväskylä offers sufficient storage space.
  • Do you need special software tools for collecting, processing, or producing the data?
    • Familiarize yourself with the required software well in advance.

ENSURE THE QUALITY OF YOUR DATA

Ensuring data quality involves considering what could go wrong during data handling and how risks can be managed.

For example:

With interview data, a risk might be data alteration during transcription—the transcriber might accidentally skip a part of the recording, resulting in missing content. Or, in a literature review, a risk could be the omission of key research articles due to an inadequate search strategy, which would compromise the reliability and consistency of the dataset.

Consider the following questions in advance:

  • How will you ensure that the original raw data remains untouched?
  • How will you prevent accidental changes to the data during processing (= immutability)?
  • How will you ensure the data remains error-free throughout its life-cycle?
  • How will you ensure your dataset is consistent and coherent?
  • Are there risks that could compromise the reliability of the data content?

Best practices for ensuring data quality:

  • Make backups of your data so you can revert to a previous version if something goes wrong.
  • Store data and backups on the university’s U-drive, which is secure and automatically backed up.
  • Make a copy of the raw data or initial state and work with the copy.
  • Check that the original data content remains intact when transferring data between systems, software, file formats, or physical locations.
    • For example, when exporting survey results from a survey software to an analysis software, or when transcribing interview recordings.
  • Review transcriptions of recorded (audio/video) materials yourself or together with your thesis partner.
    • Note: If the data contains personal data (e.g., participant's voice), it cannot be reviewed by just anyone.
    • In joint thesis projects, reviewing transcriptions together is a good practice.
  • Ensure that the interview structure and questions are as consistent as possible across all interviewees.
  • Check the calibration of measurement devices.
  • Use checksums if the software provides them.
  • Ensure that digitized data accurately reflects the original physical or analog material.
    • For example, when writing down citations from a physical book to your word processor.

Source: University of Helsinki

CHECKLIST

  • Write a text describing your research data.
    • This text will help you identify the essential characteristics of your dataset and plan your data management accordingly.
  • Ensure the quality, consistency, and immutability of your research data.
    • Think ahead about the stages of data collection and processing where there is a risk that data quality could be compromised.
    • Include in your plan practices that will help you manage these risks.

This section is related to the FAIR principles Findable, Accessible and Re-usable.