Documentation and metadata

Table of contents

Documentation and metadata help ensure that your work is smooth and organised, and they guarantee the reproducibility and reliability of your research.

DOCUMENTATION

In scientific research, it is required that the entire research process is documented in such detail that the research design can be replicated later in the same way. This ensures that the validity of the research results can be verified.

The requirement for documentation also applies to research data, as it is an essential part of your research process. The production, structure, processing, and analysis of the data must therefore be described in a form that is understandable to others and in such detail that this process could (at least in principle) be replicated.

Documentation increases the reproducibility, transparency, and reliability of your research results. It also serves to demonstrate that the research data has been produced and processed ethically, in accordance with good scientific practice and the requirements of legislation. Therefore, documentation is an integral part of data management.

Decide in your data management plan how you intend to document the data management and analysis process so that you can describe it accurately and in detail in your thesis. In practice, this means keeping a record of what you did, when, how, why, and with whom.

Different disciplines may have their own practices related to documentation. If your field does not have a specific practice in place, you can use, for example, a formal research diary, laboratory notebook, text document, or Excel spreadsheet. Whatever method you choose, your diary/document must include a comprehensive description of the entire data management process and lifecycle.

Documentation also benefits your own work. For example:

  • When you record the themes of interviews, you can later easily find the specific interview where topic X was discussed.
  • When you record the variables you used, their definitions, and any changes you made - such as if you originally asked for the background variable “age” in years but later decided to group ages into five-year intervals - you will remember to handle them consistently in your analysis.
  • When you record every step where you modified the data, you can always return to the correct version of the dataset.

Your work will become smoother, more systematic, and better organised. You will always know what you are doing, what your dataset contains, and what you need at any given time. 

  • If there is a break in your thesis process, you can return to the data without difficulty.
  • And if you are writing your thesis with a partner, as part of a research group, or in collaboration with a company, your partners will also easily understand what you have done and what you plan to do next.

METADATA

Recording, updating, and maintaining the basic descriptive metadata of your dataset is an essential part of data management documentation.

  • Careful description improves the quality of the dataset and is crucial for reproducibility.
  • Metadata makes your research data understandable and usable.
  • Imagine your dataset as a sealed package whose contents you do not know. Metadata is like the label on the package that tells you what’s inside.
  • Metadata is “data about data”: in this case, descriptive information about your research dataset.

Description is especially important if the dataset will be published, but it is also useful for supporting your own work.

  • If the research data is opened and/or published, it requires descriptive information prepared (if possible) in both human-readable and machine-readable formats to enable identification and discoverability.
  • If your goal is to archive or publish the dataset for future reuse, you must pay special attention to recording metadata.
  • For Master's theses, research data is usually not prepared for reuse, so metadata primarily serves as your own working tool.

Metadata acts as a table of contents for your dataset, helping keep files and folders organised.

  • With metadata, you ensure that you can find everything you need and interpret the dataset unambiguously, regardless of time or context.
    • If you only had the raw data without any explanatory information and took a break from your thesis, would you remember after a month what you were doing and what each file contains?

Example:

  • You have two photo collections – one of your own photos and another of photos taken by four research participants.
    • What basic information must you record to distinguish the collections and individual photos, so you can work efficiently and document your process?
    • You also made agreements with each participant specifying how you may use their photos. What information must you record so you can locate the correct agreement for the correct photo?
    • And if you agreed to share the dataset with a research group, what information must you record so that group members can understand your dataset?

At the start of your research, you may not be able to define all relevant metadata precisely. Record what you can in advance and update as your work progresses.

What if you use pre-compiled, archived, or published data?

For existing datasets, the archive or publishing entity has created the metadata, which you can find in the archive’s catalogue or dataset details.

  • However, any parts of the dataset you produce – such as tables – must be described by you.
  • You can store both the research data and its metadata files in the same location.
  • Metadata combined with documentation of the data management process serves as a user guide for your dataset.

Common types of metadata

A widely used method for recording metadata is a README file.

A README is a text file that contains basic descriptive information about the dataset and detailed instructions on how to interpret and handle it.

There are many types of metadata. Here are the most common ones:

General descriptive information

  • Name of the dataset – use a descriptive name
  • Researcher(s) names
  • Other participants in the research process and their roles (e.g., transcriber)
  • When the data was collected (dd.mm.yyyy or date range)
  • Where the data was collected (e.g., “samples taken at location X” if relevant)
  • Keywords (e.g., social workers, parliamentary elections, online services)
  • How you document(ed) data collection and processing (e.g., formal research diary or Excel sheet)

Files and folders

  • Logic for naming folders and files (folder structure, version control)
    • Name files and folders consistently and descriptively
    • Short description of folder contents (e.g., .txt file saved in the folder)
    • If needed, description of naming conventions (e.g., .txt file saved in the folder)
  • Create a consistent folder structure and note where the data is stored
    • Example path: U:\Documents\Studies\Thesis\Data
  • If data or parts of it are stored elsewhere, note the location
  • For large datasets, you may need an index to help locate information
  • When folders and files were created (dd.mm.yyyy or date range)
  • When different versions were created (dd.mm.yyyy or date range)
  • For major changes or new versions, describe changes in the file name / separate text file and in the research diary

Methods

  • Describe the methods used
  • How was the data collected?
  • What devices or software were used? With what settings? How were instruments calibrated?
  • How was the data processed and prepared for analysis?
  • Research and analysis methods are usually described in the thesis research plan, along with justification for sample selection

Instructions for using and interpreting the data

  • How is the data used in your research?
  • What devices or software are needed to view the data?
  • Are additional instructions needed for installing software and processing data?
  • Variables and their definitions
  • Explanations of codes, symbols, and abbreviations
  • Indication of missing or incomplete data

Data sharing and access information

  • Licenses governing data use (e.g., Creative Commons licenses)
  • Agreements defining copyright or related rights
  • Other restrictions on distribution and use (e.g., personal data)
  • Has the data already been published somewhere? How can it be accessed?

CHECKLIST

  • Plan Ahead:
    • How will you organise your work?
    • How will you document the data management process and its lifecycle?
    • What descriptive metadata must you record to make your dataset understandable and usable?
  • Complete and update the information throughout the entire research process.
  • Is the dataset intended for reuse? Will you share it, for example, with a research group or deposit it in a data archive?
    • Pay special attention to the quality of your metadata.
      • It is not enough that only you understand how the data was produced, processed, interpreted, and what tools are needed to use it in the future.
    • Find out what requirements your chosen data repository or archiving service has for metadata.

This section relates to all FAIR principles - Findable, Accessible, Interoperable, and Re-usable.