Original research data are typically collected e.g. with questionnaires, interviews, video recording and various devices and sensors. However, the interpretation and responsible use of the produced data require other context information, i.e. documentation. Documentation is an up-to-date description of the methods, compilation, structure and handling of the material during the research. Depending on the study, it can be e.g.
- written description of variables and key vocabulary and measurement units
- tabular inventory of the basic information, layout and implementation of the interviews
- code books, field and laboratory diaries
- technical metadata produced by technical devices, for example, about device calibrations.
The minimum documentation can be considered a ReadME file (for example in .txt or .doc format), which should be found in the main folder of the data directory. It should briefly describe the basic metadata of the data (name of the project, authors, owner of the data, link to the metadata published in JYX), describe the folder structure and naming conventions for files and folders. If it is convenient to describe other documentation related to the use of the data in the same file, that is also possible. However, separate documentation files are often preferred in a clearly named subfolder (/DOCUMENTATION) or in connection with separate data subfolders as their own files (for example, in the subfolder of each test, in addition to the measurement results, the laboratory diaries of the test in question).
A key part of the documentation is also the data inventory, which collects information about all the data and their key characteristics that are used in the research project. This is an important tool for the project itself as well as an incomparable help for working on other documentation.
In connection with all published data, there should be sufficient documentation to understand it. It is possible to publish the documentation or its parts even when the data itself cannot be published for justified reasons (e.g. personal data). In this way, the researcher can tell the world more about their data and expertise.
The starting point for documentation should be the question of whether another researcher would understand what the data is about, what it contains and how/what it can be responsibly used for. What should be in connection with the data to make this happen? Find examples of documentation from different fields as well as a few external links on the subject below on this page.