Personal data

Table of contents

Do you study people?

If you collect data from or about a person, start from the assumption that you are processing personal data. You must handle personal data responsibly and comply with the EU General Data Protection Regulation (GDPR) and Finland’s national data protection legislation. Responsible processing of personal data is also one of the fundamental principles of ethically conducted research.

PERSONAL DATA

The definition of personal data is broad.

Personal data can include any information or characteristics related to a person that could make them identifiable. Precise details are not always necessary for identification. Identification can occur, for example, by combining information contained in the research data with additional information found on the internet. In such cases, the information in the research data is also considered personal data, even if it cannot, on its own, be linked to a specific individual or is not sufficient to identify a person.

If your research involves human participants or you collect data from or about people, it is likely that your research data contains personal data.

Personal data includes direct identifiers, which alone are sufficient for identification (such as a person’s name or social security number), and indirect identifiers, which may not be enough on their own but can lead to identification when combined with other data. Indirect identifiers—such as age or place of residence—are often collected as background information or variables.

In other words, some types of personal data alone are enough to identify a person, but all data that can be used to identify someone is considered personal data.

Certain types of research data are likely to involve personal data: for example, surveys with open-ended questions might entail direct or indirect identifiers, and a recorded interview includes the participant's voice as a direct identifier. If the interview is recorded on video, the facial image becomes another direct identifier.

Belonging to a specific target group can be considered personal data if there is sufficient information to identify the individual. For example, if you ask secondary school physical education teachers about their personal exercise habits, their profession and exercise preferences/hobbies already serve as indirect identifiers. These indirect identifiers become personal data if the dataset includes other information that enables identification. Such information could be, for instance, place of residence—if there is only one physical education teacher in that area who practices mountaineering. Or, if the combination of hobbies is sufficiently unique (e.g., rock climbing, canoe polo, and Formula 1 racing), identification may be possible even without information on where the teacher lives.

Personal data does not need to be particularly secret or intimate. What matters is whether the person can be identified. Identifiability does not mean that anyone could identify the person—it is sufficient if only a family member or a colleague could do so.

Source: Data protection guidelines for researchers

NOTE: Guidelines for processing personal data do not apply to deceased or fictional individuals.

Justify the collection and processing of personal data

In your research plan, you describe the research design, research questions and methods, as well as the objectives of the study. Based on these, you must be able to justify the collection and processing of personal data.

Your dataset must not contain information that is unnecessary for your research. For example, if it makes no difference whether the participant has a cat or a dog as a pet, that question should not be asked, and the questions should be formulated in a way that avoids collecting unnecessary data by accident.

The purpose of data protection is not to prevent research, but to protect the participants. What matters is that the processing of personal data has a research-based and legal justification, and that the participant is aware of what will be done with their data.

It is common for research data to contain personal data. The goal is not to avoid personal data, but to be aware of the guidelines that must be followed when handling it.

When is a person identifiable?

This is a question you need to consider from the perspective of your own dataset. Remember that identification can occur by combining information from different sources—including sources outside your dataset. When reflecting on this question, you may want to reread this page up to this point.

Examples:

If the town in which the participant resides is mentioned, along with their profession, and the town is small and the profession is relatively rare, these two details could be enough to identify the person.

If the participant holds a position only one person at a time can hold—such as the President of Finland—the person is easily identified by this detail alone. This does not mean that you cannot interview the President, only you need to inform them that they are identifiable.

Assess risks

Assessing whether the participants could be identified in the dataset, is part of risk management.

Informal risk assessment is a part of all data collection. You should consider whether the collection and/or processing of data could pose risks to the participants, to yourself, or to third parties.

If the processing of personal data poses a high risk to the rights and freedoms of a person, a Data Protection Impact Assessment (DPIA) must be conducted before collecting or processing the data.

Such high-risk situations include, for example:

  • the use of new technologies (such as artificial intelligence) to process personal data
  • large-scale processing of personal data
  • the collection of location and geolocation data.

The need for a DPIA is assessed by conducting an initial mapping available on the University of Jyväskylä intranet.

The initial mapping includes a list of situations that may lead to a DPIA. If enough of the listed conditions are met, the DPIA becomes mandatory.

If necessary, complete the DPIA together with your supervisor.

You will find links to the assessment form (= initial mapping) and DPIA template from the website Assessment tools to support data protection implementation.

SPECIAL CATEGORIES OF PERSONAL DATA

If your dataset contains special categories of personal data, it is especially important to handle the data responsibly.

The privacy notice includes a specific section where you must indicate whether your dataset contains special categories of personal data.

Follow these protective measures (more details below):

  • Stricter data security requirements
  • Data minimisation: you may only collect special categories of personal data that are essential for conducting the research. The collection and processing must be proportionate to the research objectives.
  • Pseudonymisation should always be applied, if it is feasible within the research design.

For example, if you study fatigue among university students, you might receive information about a participant's depression, anemia, or other health-related issues. This means your dataset would then contain special categories of personal data.

Data collection must be planned in such a way that, if you do not intend to collect special categories of personal data, they do not end up in the dataset accidentally.

If there is a risk that your dataset may include special categories of personal data, you must plan the data collection and processing assuming that such data will be present.

TEST: DOES MY DATA CONTAIN PERSONAL DATA?

Test whether your research data contains personal data or special categories of personal data:

Do you process personal data -tool

  • You can change the language of the test on the top right-hand corner.
  • Remember that the test results are indicative.

OTHER SENSITIVE OR CONFIDENTIAL INFORMATION

Even if your dataset does not contain special categories of personal data, it may still include sensitive information or data that is legally confidential.

Sensitive topics may include, for example:

  • school bullying
  • domestic violence
  • financial difficulties
  • criminal convictions
  • substance use

Reflect on how sensitive your research topic is. Could sensitive issues arise in relation to the topic, even if they are not the actual focus of the study? For example, is there a risk that an interviewee might share sensitive information while answering your questions?

Apply the same protective measures to sensitive data as you would for special categories of personal data—for example, stricter data security requirements.

Legally confidential data may include, for example:

  • business trade secrets
  • information about endangered animal or plant species (location and protection measures)
  • data related to national security or defence

ETHICAL REVIEW

In some instances, a thesis must undergo an ethical review in advance.

This means submitting a request for the JYU ethics committee to review your research. In the case of a thesis, the request is submitted by the thesis supervisor, and the request is prepared together.

If your research meets even one of the criteria set by the Finnish National Board on Research Integrity (TENK), you must request an ethical review from the Ethics Committee for Human Sciences at the University of Jyväskylä. 

Criteria for an ethical review:

  • The principle of informed consent is not followed.
  • The research intervenes in the physical integrity of the participants.
  • The research targets individuals under the age of 15 without separate consent or information provided to the guardian, allowing them to refuse participation.
  • The research exposes participants to exceptionally strong stimuli.
  • There is a risk of causing mental harm to participants (or people close to them) beyond what is typical in everyday life.
  • The research may pose a safety risk to participants, the researcher, or people close to them.

These criteria are explained in more detail on the JYU ethics committee’s website

The ethical review must be conducted before the research begins!

It is recommended to avoid topics that require a Data Protection Impact Assessment or an ethical review.

Also, in a bachelor's or master's thesis, it is generally not advisable to address highly sensitive topics, as they may be ethically too complex. Similarly, handling confidential data can be challenging—especially since the thesis itself is a public document. This is not necessarily about the student’s ability to handle ethically demanding topics or confidential data, but about what is feasible and appropriate within the scope of a thesis.

INFORMING RESEARCH PARTICIPANTS
 

By default, research participants / data subjects have the right to know that their personal data is being processed—in other words, that your research data contains their personal information.

Informing the participant means that you explain the processing of personal data using a privacy notice. In addition, participants must be informed that they are subjects of research and what the objectives of the study are. For this, you use a research notification. The consent form ensures that the participant has decided to give their consent to participate in the study based on the information you have provided. The consent form should be given even if no signature is requested, as it provides additional details specifically related to consent.

According to law, a privacy notice must always be provided if the research material contains personal data. In addition to the privacy notice, the participant must be given a research notification and a consent form.

You will find the University of Jyväskylä instructions on the website: Instructions for students. In this educational resource, we will summarise the essential parts of the JYU instructions.

How to draft a privacy notice?

A privacy notice must be provided to the research participants.

  • The privacy notice is a form in which you explain to the participant, for example, what personal data is collected, who collects it, how it is processed, and how it is protected. This is part of informing the participant (i.e., the “data subject”).
  • The JYU data protection instructions include template forms (privacy notice, research notification, consent form).
    • Ask your supervisor for help if needed.
  • The thesis author is the data controller, meaning they are responsible for the processing of personal data.

Legal basis for processing personal data and template forms

  • The privacy notice must state the legal basis for processing personal data (“Legal basis for processing personal data”).
  • In scientific research, the legal basis is typically public interest.
  • If the research plan of your thesis meets the criteria for scientific research in your field, as assessed by your supervisor, public interest may be used as the legal basis. In that case, use the template forms for Scientific research participants or subjects: privacy notice, research notification, and consent form.
  • In theses, consent may also be used as a legal basis. This depends on whether the thesis is considered scientific research.
  • Different faculties may have different practices. According to the university’s general data protection guidelines Bachelor’s theses are generally not considered scientific research, while the supervisor may consider Master’s theses to be scientific research.
  • It is recommended to use public interest as the legal basis whenever possible.
  • If the legal basis is the participant’s consent, use the template forms for the privacy notice, research notification, and consent form found under “Participants in coursework or theses.”
    • If the data contains special categories of personal data, the legal basis must be explicit consent, which is a separate section in both the privacy notice and the consent form.
    • The template also includes legitimate interest as an option for legal basis, which is only suitable in specific situations and requires a so-called balancing test.

See the university’s data protection instructions for more information under:

Why Does the Legal Basis Matter?

The participant’s rights are based on the legal basis stated in the privacy notice.
For example, if the legal basis is consent and the participant withdraws their participation, all collected data must be deleted, even if it is inconvenient. If the legal basis is public interest, previously collected data is not deleted, but further data collection is stopped.

If you are writing your thesis as part of a research group or using existing data, the data controller is often the university or another research organization.

  • In such cases, you usually do not prepare the privacy notice yourself.
  • The research project may already have a privacy notice, and you may be listed in it as a data processor. You will then receive the data or part of it confidentially.
  • You make a Commitment (= a contract when processing personal data) with the project.

If the participant is under 15 years old, guardian consent is usually required.

  • The privacy notice, research notification, and consent form must be provided to both the guardian and the child.
  • The child must be informed about the research in an age-appropriate way to ensure they understand what it involves.

Providing a privacy notice to research participants

The privacy notice can be, for example, attached to an email, handed out in paper form, or linked at the beginning of a Webropol survey.

If you want to include a link to the privacy notice in a Webropol survey, you can, for instance:

  • Share it via SharePoint.
  • Publish it on your personal JYU website, which is accessible to students through the university, and link it from there to the beginning of the Webropol survey.
  • In Webropol, you need to enable the Text Editor to insert a link (two adjacent “T” letters in the top right corner).

If delivering the privacy notice to participants would cause unreasonable effort, it should be published. Everyone has access to a personal JYU website, which can be used for this purpose.

For example, if you are studying comments on a public social media channel, you should then link the published privacy notice in the comments.

Consent to participate in research

Participants must always be asked for their consent to take part in the research. Consent must be documented, meaning it must be verifiable afterwards.

You can document consent in the following ways:

  • By requesting a signature on the consent form.
  • By asking for verbal consent at the beginning of the interview recording.
  • By including a mandatory checkbox at the beginning of a survey, e.g., “I have read the privacy notice, research notification, and consent form, and I give my consent.”

Consent to participate must be requested regardless of whether it is listed as the legal basis for processing personal data in the privacy notice. The participant must have sufficient information about the research before they can agree to take part. For example, they should know what data will be collected about them and why, and what participation in the research requires from them (e.g., how long it takes to complete the survey or how interviews are scheduled). These details are described in the research notification.

When consent is the legal basis for processing (i.e., the study is not considered scientific research), special attention must be paid to the content of the consent and how it is requested, because giving consent must be an active decision.

Here are links to the consent form templates:

PROTECTING PARTICIPANTS

As a researcher, your task is to protect the participants and the research data containing their personal data.
Practices that help protect participants include data minimisation, secure and data protection regulation–compliant storage and processing of the data, as well as anonymisation or pseudonymisation of personal data.

Data security

  • If your dataset contains personal data, the information cannot be stored just anywhere. For example, you cannot use personal, commercial cloud services like Google Drive or iCloud.
  • You also need secure software and devices for collecting and processing the data.
  • More information on data security can be found in this educational resource section: Data security.

Data minimisation

  • Only collect personal data that is truly necessary for your research. Do not collect extra or irrelevant information.
    • For example, if the participant’s age is not relevant to your study, do not ask for it. It is usually better to collect information such as age or years of work experience in ranges, e.g., “5–10 years of work experience.”
  • Sometimes an interviewee may share more information than the interviewer asked for. In such cases, the extra data should be removed from the dataset.
  • Avoid situations where you collect special categories of personal data (such as health information) or sensitive data (such as personal experiences of domestic violence or bullying) combined with direct identifiers (such as voice or name).

Pseudonymisation

Personal data must be pseudonymised whenever possible.

  • Names, place of residence, and other personal data of participants are replaced with codes.
    • The code key must be stored separately from the dataset in a secure location, such as a locked desk drawer. With the code key, it is still possible to identify individual participants in the dataset.
  • A code key is typically a list that includes, for example, the participant’s name and the corresponding alias or number. An alias might be “Interviewee 1,” “Interviewee 2,” etc.
  • In addition to coding direct identifiers, pseudonymisation also requires removing indirect identifiers from open-ended survey responses or interview transcripts.
  • Indirect identifiers can be removed by:
    • Classifying values, e.g., age groups: “15–20 years,” “20–25 years,” etc.
    • Generalizing details, e.g., “Viitasaari” → “a municipality in Central Finland”; “Cygnaeus School” → “a comprehensive school in Jyväskylä.”
  • Note: not all background variables are identifiers!
  • Only data that could help deduce the identity of a participant needs to be modified for pseudonymisation.

Pseudonymised data is personal data.

Pseudonymised data becomes anonymous after the destruction of the code key and consent forms.

Anonymisation

  • The dataset is modified so that all personal data is removed, including indirect identifiers such as place of residence or occupation.
    • The participant can no longer be identified in any way.
  • Anonymisation is one of the options mentioned in the regulation for making datasets available for secondary use.
  • However, true anonymisation is challenging.
    • It may even be impossible if removing identifiers would strip so much content that the remaining data becomes meaningless and useless.
    • Additionally, as technology evolves, new ways of combining data may emerge.
  • If you are considering anonymization, carefully assess whether it is realistic.
    • Do not promise participants that the data is anonymous if that is not truly possible.
  • Note: This refers to the anonymity of the participants within the dataset. This has nothing to do with the anonymity of participants in the published thesis.

If the data is truly anonymised, it is no longer considered personal data.

CHECKLIST

  • Identify what personal data you are collecting and/or processing.
  • Who is the data controller?
  • Always conduct an informal risk assessment.
    • If the processing of personal data may involve high risks, assess the need for a Data Protection Impact Assessment (DPIA) by conducting an initial mapping. If necessary, complete the DPIA together with your supervisor.
    • Avoid conducting research in your thesis that would require a DPIA.
  • Identify whether your dataset contains special categories of personal data or otherwise particularly sensitive topics.
  • If needed, initiate the ethical review process together with your supervisor.
  • Provide the participant with a privacy notice, research notification, and consent form (= informing the participants).
    • If consent is listed as the legal basis for data processing in the privacy notice, and the dataset contains special categories of personal data, explicit consent is required (a separate section in both the privacy notice and the consent form).
    • If you receive a pre-existing dataset, e.g., from a research project, you do not need to prepare a privacy notice yourself. Instead, you will sign a commitment (template available in the University of Jyväskylä intranet).
  • Ensure the participant’s consent and document it.
  • Only collect personal data that is relevant to your research (= data minimisation).
  • Document the processing of personal data.
  • Apply pseudonymisation or anonymisation if possible, and avoid collecting direct identifiers from the start.
  • Ensure data security, such as using secure interview and survey software and storage locations.