Personal data
Table of contents
Do you study people?
If you collect data from or about a person, start from the assumption that you are processing personal data. You must handle personal data responsibly and comply with the EU General Data Protection Regulation (GDPR) and Finland’s national data protection legislation. Responsible processing of personal data is also one of the fundamental principles of ethically conducted research.
PERSONAL DATA
The definition of personal data is broad.
Personal data can include any information or characteristics related to a person that could make them identifiable. Precise details are not always necessary for identification. Identification can occur, for example, by combining information contained in the research data with additional information found on the internet. In such cases, the information in the research data is also considered personal data, even if it cannot, on its own, be linked to a specific individual or is not sufficient to identify a person.
If your research involves human participants or you collect data from or about people, it is likely that your research data contains personal data.
- Age
- Educational background
- Profession or workplace
- Area of residence
- (Facial) image
- Voice
- Statements and opinions characteristic of the individual
- Email address
- Nationality
- Ethnic background (special category of personal data)
- Health information (special category of personal data)
- Income
- Exercise habits
- Fingerprints
- Walking style
- Distinctive physical feature
- Information about the participant’s family
- Information about the participant’s friends, colleagues, or other individuals related to the participant, such as a teacher’s students
Personal data includes direct identifiers, which alone are sufficient for identification (such as a person’s name or social security number), and indirect identifiers, which may not be enough on their own but can lead to identification when combined with other data. Indirect identifiers—such as age or place of residence—are often collected as background information or variables.
In other words, some types of personal data alone are enough to identify a person, but all data that can be used to identify someone is considered personal data.
Certain types of research data are likely to involve personal data: for example, surveys with open-ended questions might entail direct or indirect identifiers, and a recorded interview includes the participant's voice as a direct identifier. If the interview is recorded on video, the facial image becomes another direct identifier.
Belonging to a specific target group can be considered personal data if there is sufficient information to identify the individual. For example, if you ask secondary school physical education teachers about their personal exercise habits, their profession and exercise preferences/hobbies already serve as indirect identifiers. These indirect identifiers become personal data if the dataset includes other information that enables identification. Such information could be, for instance, place of residence—if there is only one physical education teacher in that area who practices mountaineering. Or, if the combination of hobbies is sufficiently unique (e.g., rock climbing, canoe polo, and Formula 1 racing), identification may be possible even without information on where the teacher lives.
Personal data does not need to be particularly secret or intimate. What matters is whether the person can be identified. Identifiability does not mean that anyone could identify the person—it is sufficient if only a family member or a colleague could do so.
Source: Data protection guidelines for researchers
NOTE: Guidelines for processing personal data do not apply to deceased or fictional individuals.
You are interviewing a participant about the topic of balancing work and family life. What kinds of personal data might such an interview contain?
In their responses, the interviewee might mention their profession, previous workplace, current workplace, their children's birth years, their spouse’s name and profession, and describe their personal experiences. None of this information alone would likely be sufficient to identify the person, but when combined, the individual could be easily identified. Therefore, all the identifiers mentioned are considered personal data.
In addition, the interview recording contains the person’s voice, which alone is sufficient for identifying a person. If the interview is also recorded on video, the facial image becomes another direct identifier.
Justify the collection and processing of personal data
In your research plan, you describe the research design, research questions and methods, as well as the objectives of the study. Based on these, you must be able to justify the collection and processing of personal data.
Your dataset must not contain information that is unnecessary for your research. For example, if it makes no difference whether the participant has a cat or a dog as a pet, that question should not be asked, and the questions should be formulated in a way that avoids collecting unnecessary data by accident.
The purpose of data protection is not to prevent research, but to protect the participants. What matters is that the processing of personal data has a research-based and legal justification, and that the participant is aware of what will be done with their data.
It is common for research data to contain personal data. The goal is not to avoid personal data, but to be aware of the guidelines that must be followed when handling it.
When is a person identifiable?
This is a question you need to consider from the perspective of your own dataset. Remember that identification can occur by combining information from different sources—including sources outside your dataset. When reflecting on this question, you may want to reread this page up to this point.
Examples:
If the town in which the participant resides is mentioned, along with their profession, and the town is small and the profession is relatively rare, these two details could be enough to identify the person.
If the participant holds a position only one person at a time can hold—such as the President of Finland—the person is easily identified by this detail alone. This does not mean that you cannot interview the President, only you need to inform them that they are identifiable.
Assess risks
Assessing whether the participants could be identified in the dataset, is part of risk management.
Informal risk assessment is a part of all data collection. You should consider whether the collection and/or processing of data could pose risks to the participants, to yourself, or to third parties.
If the processing of personal data poses a high risk to the rights and freedoms of a person, a Data Protection Impact Assessment (DPIA) must be conducted before collecting or processing the data.
Such high-risk situations include, for example:
- the use of new technologies (such as artificial intelligence) to process personal data
- large-scale processing of personal data
- the collection of location and geolocation data.
The need for a DPIA is assessed by conducting an initial mapping available on the University of Jyväskylä intranet.
The initial mapping includes a list of situations that may lead to a DPIA. If enough of the listed conditions are met, the DPIA becomes mandatory.
If necessary, complete the DPIA together with your supervisor.
You will find links to the assessment form (= initial mapping) and DPIA template from the website Assessment tools to support data protection implementation.
If enough of the conditions listed in the initial mapping are met, a Data Protection Impact Assessment (DPIA) is mandatory.
For example, a DPIA is likely to be required if the research involves TWO of the following conditions, and obligatory, if it involves all THREE of them:
- Information on the participant's sexual orientation (special category of personal data)
- Lack of informed consent (deviation from participants’ rights, e.g. in social media research)
- Asylum seekers (a vulnerable target group).
- General description of risk assessment: Risk assessment and data protection planning
- Criteria for assessing the likelihood of a high-risk and DPIA: Impact assessment
SPECIAL CATEGORIES OF PERSONAL DATA
If your dataset contains special categories of personal data, it is especially important to handle the data responsibly.
The privacy notice includes a specific section where you must indicate whether your dataset contains special categories of personal data.
Follow these protective measures (more details below):
- Stricter data security requirements
- Data minimisation: you may only collect special categories of personal data that are essential for conducting the research. The collection and processing must be proportionate to the research objectives.
- Pseudonymisation should always be applied, if it is feasible within the research design.
For example, if you study fatigue among university students, you might receive information about a participant's depression, anemia, or other health-related issues. This means your dataset would then contain special categories of personal data.
Data collection must be planned in such a way that, if you do not intend to collect special categories of personal data, they do not end up in the dataset accidentally.
If there is a risk that your dataset may include special categories of personal data, you must plan the data collection and processing assuming that such data will be present.
- Ethnicity
- Political opinions
- Religious or philosophical beliefs
- Trade union membership
- Health information
- Sexual orientation or behavior
- Genetic and biometric data for the purpose of identifying a person
You are interviewing a participant on pauses and disruptions in their career.
The participant might mention, for instance, that they were temporarily laid off and received support from their trade union for job seeking. They might also share their background as an immigrant, if they feel it has affected their career.
In this case, your dataset would include information about trade union membership and race or ethnic origin—both of which are considered special categories of personal data.
TEST: DOES MY DATA CONTAIN PERSONAL DATA?
Test whether your research data contains personal data or special categories of personal data:
Do you process personal data -tool
- You can change the language of the test on the top right-hand corner.
- Remember that the test results are indicative.
OTHER SENSITIVE OR CONFIDENTIAL INFORMATION
Even if your dataset does not contain special categories of personal data, it may still include sensitive information or data that is legally confidential.
Sensitive topics may include, for example:
- school bullying
- domestic violence
- financial difficulties
- criminal convictions
- substance use
Reflect on how sensitive your research topic is. Could sensitive issues arise in relation to the topic, even if they are not the actual focus of the study? For example, is there a risk that an interviewee might share sensitive information while answering your questions?
Apply the same protective measures to sensitive data as you would for special categories of personal data—for example, stricter data security requirements.
Legally confidential data may include, for example:
- business trade secrets
- information about endangered animal or plant species (location and protection measures)
- data related to national security or defence
ETHICAL REVIEW
In some instances, a thesis must undergo an ethical review in advance.
This means submitting a request for the JYU ethics committee to review your research. In the case of a thesis, the request is submitted by the thesis supervisor, and the request is prepared together.
If your research meets even one of the criteria set by the Finnish National Board on Research Integrity (TENK), you must request an ethical review from the Ethics Committee for Human Sciences at the University of Jyväskylä.
Criteria for an ethical review:
- The principle of informed consent is not followed.
- The research intervenes in the physical integrity of the participants.
- The research targets individuals under the age of 15 without separate consent or information provided to the guardian, allowing them to refuse participation.
- The research exposes participants to exceptionally strong stimuli.
- There is a risk of causing mental harm to participants (or people close to them) beyond what is typical in everyday life.
- The research may pose a safety risk to participants, the researcher, or people close to them.
These criteria are explained in more detail on the JYU ethics committee’s website.
The ethical review must be conducted before the research begins!
It is recommended to avoid topics that require a Data Protection Impact Assessment or an ethical review.
Also, in a bachelor's or master's thesis, it is generally not advisable to address highly sensitive topics, as they may be ethically too complex. Similarly, handling confidential data can be challenging—especially since the thesis itself is a public document. This is not necessarily about the student’s ability to handle ethically demanding topics or confidential data, but about what is feasible and appropriate within the scope of a thesis.
You are interviewing young adults who have been bullied at school, in order to study the effects of bullying.
As the interviews would deal with reminiscing and reliving traumatic experiences, your research would meet the TENK criteria: "Research that involves a risk of causing mental harm that exceeds the limits of normal daily life." A prior ethical review should therefore be requested.
Please note: Bachelor's and Master's students should avoid research topics that require an ethical review. Instead, you could reconsider your research design, and opt to study the phenomenon by interviewing, for example, school psychologists about the effects of school bullying.
- Read about the TENK criteria and the JYU Ethics Committee on the website: Does your research need to be reviewed by Ethics Committee?
INFORMING RESEARCH PARTICIPANTS
By default, research participants / data subjects have the right to know that their personal data is being processed—in other words, that your research data contains their personal information.
Informing the participant means that you explain the processing of personal data using a privacy notice. In addition, participants must be informed that they are subjects of research and what the objectives of the study are. For this, you use a research notification. The consent form ensures that the participant has decided to give their consent to participate in the study based on the information you have provided. The consent form should be given even if no signature is requested, as it provides additional details specifically related to consent.
According to law, a privacy notice must always be provided if the research material contains personal data. In addition to the privacy notice, the participant must be given a research notification and a consent form.
You will find the University of Jyväskylä instructions on the website: Instructions for students. In this educational resource, we will summarise the essential parts of the JYU instructions.
Templates
The university's website contains templates for the privacy notice, research notification and consent forms.
How to draft a privacy notice?
A privacy notice must be provided to the research participants.
- The privacy notice is a form in which you explain to the participant, for example, what personal data is collected, who collects it, how it is processed, and how it is protected. This is part of informing the participant (i.e., the “data subject”).
- The JYU data protection instructions include template forms (privacy notice, research notification, consent form).
- Ask your supervisor for help if needed.
- The thesis author is the data controller, meaning they are responsible for the processing of personal data.
Legal basis for processing personal data and template forms
- The privacy notice must state the legal basis for processing personal data (“Legal basis for processing personal data”).
- In scientific research, the legal basis is typically public interest.
- If the research plan of your thesis meets the criteria for scientific research in your field, as assessed by your supervisor, public interest may be used as the legal basis. In that case, use the template forms for Scientific research participants or subjects: privacy notice, research notification, and consent form.
- In theses, consent may also be used as a legal basis. This depends on whether the thesis is considered scientific research.
- Different faculties may have different practices. According to the university’s general data protection guidelines Bachelor’s theses are generally not considered scientific research, while the supervisor may consider Master’s theses to be scientific research.
- It is recommended to use public interest as the legal basis whenever possible.
- If the legal basis is the participant’s consent, use the template forms for the privacy notice, research notification, and consent form found under “Participants in coursework or theses.”
- If the data contains special categories of personal data, the legal basis must be explicit consent, which is a separate section in both the privacy notice and the consent form.
- The template also includes legitimate interest as an option for legal basis, which is only suitable in specific situations and requires a so-called balancing test.
See the university’s data protection instructions for more information under:
Why Does the Legal Basis Matter?
The participant’s rights are based on the legal basis stated in the privacy notice.
For example, if the legal basis is consent and the participant withdraws their participation, all collected data must be deleted, even if it is inconvenient. If the legal basis is public interest, previously collected data is not deleted, but further data collection is stopped.
If you are writing your thesis as part of a research group or using existing data, the data controller is often the university or another research organization.
- In such cases, you usually do not prepare the privacy notice yourself.
- The research project may already have a privacy notice, and you may be listed in it as a data processor. You will then receive the data or part of it confidentially.
- You make a Commitment (= a contract when processing personal data) with the project.
If the participant is under 15 years old, guardian consent is usually required.
- The privacy notice, research notification, and consent form must be provided to both the guardian and the child.
- The child must be informed about the research in an age-appropriate way to ensure they understand what it involves.
Providing a privacy notice to research participants
The privacy notice can be, for example, attached to an email, handed out in paper form, or linked at the beginning of a Webropol survey.
If you want to include a link to the privacy notice in a Webropol survey, you can, for instance:
- Share it via SharePoint.
- Publish it on your personal JYU website, which is accessible to students through the university, and link it from there to the beginning of the Webropol survey.
- In Webropol, you need to enable the Text Editor to insert a link (two adjacent “T” letters in the top right corner).
If delivering the privacy notice to participants would cause unreasonable effort, it should be published. Everyone has access to a personal JYU website, which can be used for this purpose.
For example, if you are studying comments on a public social media channel, you should then link the published privacy notice in the comments.
What if I don’t intend to publish the personal data I collect? Do I still need to provide a privacy notice?
It does not matter whether you publish the data you collect. What matters is that you process this data. So yes, you must provide a privacy notice.
You are conducting an interview study. Your department has agreed that master’s theses are considered scientific research, so you will use the templates designed for scientific research.
In the privacy notice, you will describe, among other things, what personal data will be collected from the participants. You have marked “public interest” as the legal basis for processing personal data.
You will send the privacy notice, a research notification, and a consent form to the participants in advance and ask them to review the documents before the interview. You will confirm the participant’s consent at the beginning of the interview recording.
Consent to participate in research
Participants must always be asked for their consent to take part in the research. Consent must be documented, meaning it must be verifiable afterwards.
You can document consent in the following ways:
- By requesting a signature on the consent form.
- By asking for verbal consent at the beginning of the interview recording.
- By including a mandatory checkbox at the beginning of a survey, e.g., “I have read the privacy notice, research notification, and consent form, and I give my consent.”
Consent to participate must be requested regardless of whether it is listed as the legal basis for processing personal data in the privacy notice. The participant must have sufficient information about the research before they can agree to take part. For example, they should know what data will be collected about them and why, and what participation in the research requires from them (e.g., how long it takes to complete the survey or how interviews are scheduled). These details are described in the research notification.
When consent is the legal basis for processing (i.e., the study is not considered scientific research), special attention must be paid to the content of the consent and how it is requested, because giving consent must be an active decision.
Here are links to the consent form templates:
- Consent as the lawful basis for processing personal data: Consent to personal data processing and participation
- Public interest as the lawful basis for processing personal data (= scientific research): Appendix 7. Consent form for research subjects (the informed consent process 2/2) | University of Jyväskylä
How do I confirm and verify consent in a survey?
Attach the research notification, privacy notice, and consent form at the beginning of the survey.
Include a mandatory checkbox where the participant confirms that they have read and understood the documents.
It is advisable to provide the consent form (or equivalent information) to the participant even if no signature is required. This ensures that the participant has all the necessary information they need to make an informed decision.
If the participant is under 15 years old, the consent of the guardian is usually also required for participation in the study.
PROTECTING PARTICIPANTS
As a researcher, your task is to protect the participants and the research data containing their personal data.
Practices that help protect participants include data minimisation, secure and data protection regulation–compliant storage and processing of the data, as well as anonymisation or pseudonymisation of personal data.
Data security
- If your dataset contains personal data, the information cannot be stored just anywhere. For example, you cannot use personal, commercial cloud services like Google Drive or iCloud.
- You also need secure software and devices for collecting and processing the data.
- More information on data security can be found in this educational resource section: Data security.
Data minimisation
- Only collect personal data that is truly necessary for your research. Do not collect extra or irrelevant information.
- For example, if the participant’s age is not relevant to your study, do not ask for it. It is usually better to collect information such as age or years of work experience in ranges, e.g., “5–10 years of work experience.”
- Sometimes an interviewee may share more information than the interviewer asked for. In such cases, the extra data should be removed from the dataset.
- Avoid situations where you collect special categories of personal data (such as health information) or sensitive data (such as personal experiences of domestic violence or bullying) combined with direct identifiers (such as voice or name).
Pseudonymisation
Personal data must be pseudonymised whenever possible.
- Names, place of residence, and other personal data of participants are replaced with codes.
- The code key must be stored separately from the dataset in a secure location, such as a locked desk drawer. With the code key, it is still possible to identify individual participants in the dataset.
- A code key is typically a list that includes, for example, the participant’s name and the corresponding alias or number. An alias might be “Interviewee 1,” “Interviewee 2,” etc.
- In addition to coding direct identifiers, pseudonymisation also requires removing indirect identifiers from open-ended survey responses or interview transcripts.
- Indirect identifiers can be removed by:
- Classifying values, e.g., age groups: “15–20 years,” “20–25 years,” etc.
- Generalizing details, e.g., “Viitasaari” → “a municipality in Central Finland”; “Cygnaeus School” → “a comprehensive school in Jyväskylä.”
- Note: not all background variables are identifiers!
- Only data that could help deduce the identity of a participant needs to be modified for pseudonymisation.
Pseudonymised data is personal data.
Pseudonymised data becomes anonymous after the destruction of the code key and consent forms.
Anonymisation
- The dataset is modified so that all personal data is removed, including indirect identifiers such as place of residence or occupation.
- The participant can no longer be identified in any way.
- Anonymisation is one of the options mentioned in the regulation for making datasets available for secondary use.
- However, true anonymisation is challenging.
- It may even be impossible if removing identifiers would strip so much content that the remaining data becomes meaningless and useless.
- Additionally, as technology evolves, new ways of combining data may emerge.
- If you are considering anonymization, carefully assess whether it is realistic.
- Do not promise participants that the data is anonymous if that is not truly possible.
- Note: This refers to the anonymity of the participants within the dataset. This has nothing to do with the anonymity of participants in the published thesis.
If the data is truly anonymised, it is no longer considered personal data.
I’m conducting a survey that asks participants for age and place of residence. However, the data cannot be linked to individual respondents. Do I need to provide a privacy notice? Is the data anonymous or personal data?
A survey can be considered anonymous only if it includes, for example, questions answered on a scale (e.g., 1–5) and a few categorised background variables.
- Ask for age in ranges, such as “20–30 years.”
- Use Webropol’s Public link option to distribute the survey.
If the survey is anonymous—meaning no participant can be identified directly from the data, or indirectly by combining the data with information from other sources—a privacy notice is not required. However, you still need to provide a research notification and obtain consent to participate!
However, if there is even a small possibility that a respondent could be identified, you must also prepare a privacy notice.
- For example, if the survey includes open-ended questions, the risk of collecting personal data is higher.
Always ensure that you use a secure survey platform, such as Webropol or REDCap. Use of platforms like Google Forms is prohibited. Also remember to delete the survey responses from the platform once data processing is complete.
Read more about personal data, identifiers, and anonymisation from the Finnish Social Science Data Archive website: Anonymisation and personal data.
CHECKLIST
- Identify what personal data you are collecting and/or processing.
- Who is the data controller?
- Always conduct an informal risk assessment.
- If the processing of personal data may involve high risks, assess the need for a Data Protection Impact Assessment (DPIA) by conducting an initial mapping. If necessary, complete the DPIA together with your supervisor.
- Avoid conducting research in your thesis that would require a DPIA.
- Identify whether your dataset contains special categories of personal data or otherwise particularly sensitive topics.
- If needed, initiate the ethical review process together with your supervisor.
- Check when an ethical review is required.
- Avoid conducting research in your thesis that would require an ethical review.
- Provide the participant with a privacy notice, research notification, and consent form (= informing the participants).
- If consent is listed as the legal basis for data processing in the privacy notice, and the dataset contains special categories of personal data, explicit consent is required (a separate section in both the privacy notice and the consent form).
- If you receive a pre-existing dataset, e.g., from a research project, you do not need to prepare a privacy notice yourself. Instead, you will sign a commitment (template available in the University of Jyväskylä intranet).
- Ensure the participant’s consent and document it.
- Only collect personal data that is relevant to your research (= data minimisation).
- Document the processing of personal data.
- Apply pseudonymisation or anonymisation if possible, and avoid collecting direct identifiers from the start.
- Ensure data security, such as using secure interview and survey software and storage locations.