Data management plan - General description of the data

What types of research data do you use and produce? Do they contain sensitive or confidential information? Are you reusing existing data from other sources? In what file format are the data? How much data will be accumulated during the research? How is their quality ensured?

What kinds of data do you use in your research?

The concept of research data is very broad. Familiarise yourself with its definition before working on the first section of the data management plan.

The data management plan (DMP) begins with the identification and description of your data. Key variables here include the origin of the data, various usage rights, types of data, personal data, file formats, and the endpoints of their lifecycle.

Data should be classified and briefly described according to their different types (e.g., pre-existing base data that you reuse, raw data you collect yourself, processed analysis data). You can use a table or a list. It is advisable to name the types of data so that they can be easily referenced later in the DMP.

An example of a data overview

Data type	Source	Personal data/Confidentiality/secrecy	File format (open formats recommendded)	Estim. size/accumulation	End of lifecycle and availability
Analysed DNA sample	Self produced from DNA raw data	No	.xlsx, .csv	2 Gb	Will be published openly
A statistic data set X	Ready data set from the FSD archive	No	SPSS (.por, .sav)		Personal copy destroyed
An e-survey	Collected from the study participants	Yes, contains information about the respondent's health	.csv,	5 Mt	Will be archived (resrtricted access)
Interview recording (video)	Collected from the study participants	Yes	.avi, .mp4		Destroyed after transcription
Interview transcription	Drawn from the original recordings	No	.csv, .txt, .xlsx	>10 Mt	Will be archived (restricted access)
Photograph	Self-produced observation data	Yes	.tif, .jpeg, .gif, .raw

In smaller projects, this type of table is convenient. In larger projects, it provides a good foundation for a more thorough data inventory that enables good data management throughout the research project.

Consistency and quality of data

Ensuring the quality and consistency of data involves actions to prevent structural and content errors that could impair its readability, comprehensibility, and usability. Although this question is asked of all users of the DMP templates of Research Council of Finland or Science Europe, it is understandably not relevant for all types of data.

The DMP plan should describe the methods used to ensure that data is obtained from the source intact and unaltered, and how its content and accuracy are maintained throughout its lifecycle. Quality-related issues can arise from errors in technical processing (such as device calibration), during data conversion from one format to another, or during contextual processing and analysis.

Examples of best practices:

Version control is used for the data, starting with shared naming and documentation practices, and in some cases, the use of Git repositories, which allows for reverting to different versions.

Measuring devices are always calibrated precisely according to the laboratory’s work protocol.

When converting analogue data to digital format, the highest possible resolution is used to maintain accuracy.