Research data management
Research data management (RDM) is the process of systematically planning, organizing, storing, sharing, and preserving information or data obtained during research. The goal of RDM is to ensure that data is secure, reusable, transparent, and compliant with both ethical and legal requirements.
RDM applies to activities related to research data throughout the research life cycle—from research planning, data collection, secure storage, processing, and analysis to data sharing at the end of the study.
Research planning
Good research practice includes early planning of research projects. This is essential to ensure that activities are carefully organized and implemented, thus guaranteeing efficiency and successful completion of the work. Particular attention should be paid to how research-related data will be managed during and after the study. This section summarizes the most important aspects of data management that should be considered when planning a study.
During the planning stage, it is essential to clearly define not only the methodological aspects in the study plan or protocol, but also the data management strategy, which is described in the data management plan. If you plan to work with sensitive data, special attention should be paid to the ethical aspects of the study.
Topics in the research planning section:
- Research plan or protocol
- Research data
- FAIR principles
- Research data management
- Data management plan
- Opinion of the research ethics committee
RESEARCH PLAN OR PROTOCOL
Before starting the research, it is recommended to create a research plan (also known as a research protocol or research program).
This is a detailed plan that describes the information related to the research. It includes the following:
- Research objectives, questions, and hypotheses
- Research design, e.g., quantitative or qualitative, using experiments, surveys, etc.
- Selected methods and instruments, e.g., measurement tools, equipment, etc.
- Data collection strategy
- Planned analysis, e.g., what statistical methods and/or tests will be used
The research protocol helps to clearly formulate what exactly will be done within the framework of the research, thus ensuring that the research is conducted in accordance with scientific and ethical standards. From a data management perspective, the study plan helps to understand what type of data will be generated and/or processed during the study. Research data is described in more detail in the data management plan.
RESEARCH DATA
What is research data?
Research data is any information collected, observed, or created during a research project and used as a basis for obtaining research results and drawing conclusions.
|
Types of research data |
Examples |
|
Numerical measurements |
Temperature measurements in laboratory experiments |
|
Text |
Literature analysis notes |
|
Images |
Microscopy images |
|
Video |
Experiment records |
|
Audio recordings |
Interview recordings |
|
Program codes |
Data analysis software code |
|
Other data formats |
Geographic information system data |
What is not research data?
Data that is not directly related to scientific research and does not contribute to the scientific analysis or evidence base of a research project is not considered research data.
|
Research data is not |
Examples |
|
Administrative records of the study |
Financial reports or personnel documents |
|
Commercial or private communications |
Emails or correspondence documents |
|
Legal documents |
Employment contracts or cooperation agreements |
|
Marketing materials |
Advertising brochures |
What is a dataset?
A dataset is a systematically structured collection of data, usually arranged in tables or other structured forms, consisting of multiple data elements or values that have been collected and prepared for analysis.
In everyday language, the terms "data" and "dataset" are often used as synonyms. Although "data" and "dataset" are closely related concepts, they have slightly different meanings. Simply put, a dataset is a systematically organized collection of data.
Example:
- Data: "25", "Latvia", "Woman"
- Data set: a table with data collected from several study participants; each row of the table corresponds to a specific participant, and the columns indicate their age, country of origin, and gender.
FAIR DATA PRINCIPLES
The FAIR principles (findable, accessible, interoperable, reusable) are an internationally recognized set of guidelines developed to promote the management and use of high-quality research data. They form the basis for open science, promoting data transparency, reliability, and sustainable use in both academic and wider society.
1. Findable
In order for data to be easily findable by both humans and automated systems, it must be:
- described with rich and structured metadata;
- registered in internationally recognized repositories (e.g., Zenodo, Dataverse);
- assigned a unique and persistent identifier (e.g., DOI).
Example: Climate research data with DOI and descriptive metadata available on OpenAIRE or Google Dataset Search.
2. Accessible
Research data should be made accessible under clearly defined conditions:
- using secure, standardized protocols (HTTPS, API);
- by applying licenses (e.g., Creative Commons) that define the terms of use;
- by maintaining the availability of metadata even when the data itself is protected (e.g., in sensitive studies).
Example: Open statistical data with a CC-BY license that allows free analysis.
3. Interoperable
Data must be technically and semantically compatible with other systems:
- using open data formats (CSV, JSON, RDF);
- by complying with internationally recognized metadata schemas (Dublin Core, Schema.org);
- linking data to other resources using hyperlinks and references.
Example: Healthcare data in different countries, structured uniformly and linked to ontologies, allows for comparative research.
4. Reusable
Data must be documented and prepared for reuse:
- including complete documentation (data origin, methodology, limitations);
- ensuring reproducibility (published analysis codes, scripts);
- complying with legal and ethical standards (e.g., GDPR requirements).
Example: COVID-19 vaccine research data with analysis scripts and README files.
The importance and benefits of FAIR principles for higher education
FAIR principles:
- promote open, reliable, and reproducible science;
- reduce data duplication and optimize resource use;
- facilitate collaboration in interdisciplinary research and cooperation with industry;
- increase the impact of research results on society and the economy.
The implementation of FAIR in Latvia is supported by initiatives such as EOSC (European Open Science Cloud) and GO FAIR.
FAIR principles are an essential part of open science, promoting sustainable, efficient, and reliable research data circulation. The ability of higher education institutions to comply with them is not only an indicator of quality but also a prerequisite for international competitiveness and cooperation.
By implementing the FAIR principles, a higher education institution gains:
- greater data visibility and impact;
- stronger cooperation with industry and society;
- a higher reputation in the international scientific community.
Additional resources:
- Turning FAIR into Reality – European Commission report
- OpenAIRE guidelines
- Zenodo repository
- https://digital-strategy.ec.europa.eu/lv/policies/open-science-cloud
RESEARCH DATA MANAGEMENT
Research data management (RDM) is the process of systematically planning, organizing, storing, sharing, and preserving information or data obtained during research. The aim of RDM is to ensure that data is secure, reusable, transparent and compliant with both ethical and legal requirements.
RDM applies to activities related to research data throughout the research life cycle—from research planning, data collection, secure storage, processing, and analysis to data sharing at the end of the study.
Benefits of good research data management
Good and clearly understandable research data management is essential for responsible, high-quality research. Good data management practices benefit researchers, institutions, and the wider society.
- Improved data security and reduced risk of data loss
- Improved research quality and reproducibility
- Ensured compliance with funder requirements
- Promoted data sharing and collaboration with other researchers
- Reuse of data
- Effective use of time and resources
Data management plan (DMP)
The Data Management Plan (DMP) describes how research data will be managed throughout the research project and after its completion. The DMP includes information on how data will be collected, stored, protected, and shared. The Data Management Plan summarizes information about data formats, version control, security, and data submission to a repository, as well as the tools needed for reuse.
The DMP is an important element that must be developed before the start of the study and should be supplemented during the course of the study. A well-filled DMP promotes research integrity, as the data set is understandable not only to the research group working with it, but also to other parties involved, such as funders, research support staff, institutional information system administrators, etc.
The requirements for completing a DMP may be specified in the funder's regulations and may also be regulated by the research institution's regulatory acts. It is therefore important to ascertain the requirements for a DMP from the institution represented by the researcher and the funder. Often, funders and research institutions offer DMP templates that already include the main aspects of data management, giving researchers a better idea of how to proceed.
To begin work on a DMP, it is necessary to familiarize yourself with the main aspects to be described in the plan, as well as to choose a platform on which to complete it and ensure that it complies with the funder's requirements.
Sections of the DMP
The sections included in the DMP may vary depending on institutional or funding requirements, but most data management plans include the information listed below.
- What data will be generated and/or used during the study?
- Describe the data source, type, format, and expected volume.
- Have ethical and intellectual property issues been addressed?
- Indicate how ethical principles will be observed.
- How will the data be organized, securely stored, and protected during the project?
- Describe the data storage locations, data organization principles, and security measures for data protection.
- How will the research data sets be documented?
- Indicate what metadata will be described, what documentation will be created, e.g., ReadMe files or code books containing variable definitions and descriptions.
- How will the data be processed?
- Indicate what software is needed to work with this data.
- How will the data be stored in the long term and under what conditions will it be shared with others?
- Describe the choice of research data repository or platform, the duration of data set storage, and solutions for data access (open access, restricted access, closed data). Include the terms of use of the data by granting a license.
- Who is responsible for the various data management tasks?
- Describe the roles involved in working with data—data managers, data users, etc.
- What budget and resource considerations need to be taken into account when implementing the project?
- Indicate the costs associated with data management—data storage, backup solutions, software, and personnel costs. Indicate how these costs will be covered.
Opinion of the research ethics committee
The research ethics committee is an independent body that assesses the ethical issues of the research and points out any shortcomings that need to be addressed for the research to be conducted in accordance with ethical standards. Before starting to collect data, researchers must submit their research plan to the ethics committee and obtain approval. The committee checks whether the research ensures the safety, privacy, and voluntariness of participants, as well as whether other ethical requirements are met, such as obtaining informed consent and ensuring appropriate data processing.
Submission to the research ethics committee is required in cases where the research project involves ethically sensitive issues, particularly regarding humans, animals, or the environment. The main situations and research areas where this is required are:
Research involving humans
- Medical and clinical research (clinical trials, psychological experiments, etc.)
- Sociological and anthropological research involving the collection of data from humans (interviews, questionnaires, focus groups, etc.)
- Research involving sensitive or private personal data (health information, ethnicity, political views, etc.)
- Research involving vulnerable groups (children, elderly people, people with disabilities, prisoners, etc.)
Research involving animals
- Biomedical experiments with animals
- Research that may cause pain, stress, or suffering to animals
- Research involving genetic modification of animals
Research involving the environment and ecosystems
- Experiments that may affect biodiversity or ecosystems
- Research involving toxic substances or pollution
- Research on genetically modified organisms (GMOs) in the natural environment
New technologies and data processing
- Artificial intelligence and machine learning research that analyzes human behavior or health
- Collection and processing of biometric data
- DNA analysis and genetic research
If a researcher works in one of these areas or their research has potential ethical risks, submission to the research ethics committee is mandatory.
RESEARCH ETHICS COMMITTEE
Turiba University (BAT) Research Ethics Committee assists BAT lecturers and students by providing advice and evaluating the ethical aspects of research. The Research Ethics Committee complies with its regulations, the laws of the Republic of Latvia and international law and is independent in its research and decisions.
Committee members:
Dr. Evija Kļave, Vide-Rector for Studies and Academic Affairs;
Dr. Agita Doniņa, Head of the Department of Tourism and Hospitality;
Dr. Jānis Pekša, Head of the Information Technology Department;
Dr. Ingrīda Veikša, Professor of the Department of Law;
Dr. Vitālijs Romanovs, Associate Professor of the Department of Health Care.
DATA COLLECTION
Research data collection is a systematic process in which a researcher obtains information to answer research questions or test hypotheses.
It is important to choose an appropriate data collection method that corresponds to the research objectives and the type of data. Timely planning and data quality control at this stage significantly affect the results of the entire study, its accuracy and reliability.
To ensure the integrity of the study and the quality of the data set, it is advisable to consider data storage solutions and organizational principles in advance, as well as to document data collection methods in a data management plan.
If sensitive data is collected in the study, additional ethical aspects must be taken into account and appropriate data security measures must be observed.
TYPES OF DATA
When starting data collection, it is important to understand what type of data will be obtained during the study so that it can be managed well in a timely manner. Research data can be divided into different groups. Each type of data may have specific good management principles.
Quantitative or qualitative data
One of the most common ways to categorize data is by the type of information it reflects—quantitative or qualitative data.
Quantitative research data is data expressed in numerical form that is obtained using systematic methods such as surveys, experiments, measurements, observations, or secondary data analysis. This data is analyzed using statistical methods to identify trends, relationships, and patterns.
Qualitative research data is usually expressed in verbal, visual, or other non-numerical formats and is analyzed to understand meanings, experiences, attitudes, and social phenomena. Qualitative data is obtained using qualitative research methods such as in-depth interviews, focus group discussions, observations, and document analysis.
Primary or secondary data
Data types can also be distinguished by their source of origin — primary data or secondary data.
Primary data is original data or information collected by the researcher (or group of researchers) specifically for a particular study. Primary data can be collected using various methods, such as surveys, interviews, experiments, observations, etc.
Secondary data is data that has already been collected and stored, which the researcher uses but does not collect directly during the study. Sources of secondary data can be public or private data repositories, data collected by various institutions and organizations (e.g., statistical offices, monitoring programs, government agencies, commercial enterprises, or healthcare systems). The process of obtaining such data involves researching various data sources and ensuring access to the relevant data sets.
DATA PROCESSING AND ANALYSIS
Data processing and analysis is one of the most important stages in research work, as it is at this stage that the findings and conclusions that form the results of the study are obtained.
To ensure the reliability and reproducibility of results, researchers must pay attention to the implementation of good data management principles when processing and analyzing data. This section of the guide summarizes various principles and examples of good practice that will help to make data processing and analysis reproducible.
Researchers should ensure that their approach to data processing and analysis is structured, understandable, and in accordance with established regulations and ethical principles. This includes documenting data processing and analysis so that anyone can clearly understand and trace what actions have been taken with the data from the moment it was collected to the moment the results were obtained and interpreted.
Special attention should be paid to the processing and management of sensitive data, as the security of sensitive data is not only a legal issue but also an ethical obligation that builds trust between researchers and research participants from whom sensitive data has been collected. Researchers must process such data in accordance with the requirements of the General Data Protection Regulation (GDPR).
DOCUMENTATION OF DATA PROCESSING AND ANALYSIS
To promote the reproducibility or repeatability of the study, special care must be taken in documenting the steps involved in data processing and analysis. This includes detailed recording and documentation of all processes and activities related to data processing and analysis. Properly described and documented data ensures that data sets can be understood and used by both the researchers themselves and others who wish to replicate the research or use the data for further analysis.
Research data processing is a process in which raw data is structured, transformed, and prepared for analysis.
Often, when data is collected, it is not in a suitable format or organized in such a way that analysis can begin immediately. Therefore, data processing is an integral and important stage in research. It involves various activities with data, which may vary depending on the type and complexity of the data.
Examples:
Quantitative data processing: survey variables are recoded (e.g., responses on a scale from "strongly disagree" to "strongly agree" are converted into numbers from 1 to 5); missing values are replaced with "NA."
Qualitative data processing: transcription of interview recordings – audio or video recordings are converted into text format; analog research materials are digitized.
Research data analysis is the stage at which the processed data is analyzed to answer the research questions and/or test the hypotheses. At this stage, the researcher uses methods and analysis techniques appropriate to their field of research and the objectives of the study. Various tools and computer programs are used for data analysis.
Recommendations for reproducible data processing and analysis
Reproducible research data processing and analysis means that another researcher, using the same data and clearly documented procedures, can repeat the steps taken and obtain identical results. Reproducibility is essential for scientific transparency and credibility—it helps prevent accidental errors, allows the validity of results to be verified, and promotes collaboration among researchers.
Recommendations for promoting reproducibility in research
Create detailed documentation: make clear notes about each step of data processing and analysis. Describe how the data was transformed from raw data into a data set ready for analysis and what methods, parameters, and software were used.
Keep the folders where the data is stored in order: organize folders and files in a clear, logical way and ensure version control in accordance with the data management plan.
Use tools that promote reproducibility: where possible, choose tools for data processing and analysis that allow scripting to document the entire processing and analysis process
Implement reproducibility checks: ask colleagues to check whether they can reproduce the results using the documentation created for the data processing and analysis steps
Quantitative data processing and analysis
Data review and preparation: describe all steps taken to review, clean, transform, and systematize the data. This process can be done in various ways. For example, if a programming language is used for data processing, the code or script can be saved with descriptive comments. If data processing is performed in so-called point-and-click programs, such as Excel, SPSS, and Stata, it is advisable to document the processing steps in code books, data dictionaries, ReadMe files, or other types of documentation.
Processes that are important to record:
- Processing of missing values: how did you handle missing values, for example, did you fill them in with the average value or NA, etc.?
- Handling outliers: how did you identify and handle values that are illogical or significantly different from other values, e.g., did you omit them, transform them, etc.?
- Data transformations: did you perform data transformations, such as normalization or logarithmic transformation? Why and how?
- Coding and categorization: describe how you coded or categorized the data, e.g., did you create age groups?
Data analysis methods: describe in detail all statistical methods and tests used.
- Descriptive statistics: indicate which descriptive statistical indicators you calculated, e.g., arithmetic mean, median, standard deviation, frequency.
- Inferential statistics: if you used inferential statistical methods, such as t-tests, ANOVA, regression analysis, provide more details.
- Specific tests: name the tests precisely, e.g., t-tests for two independent samples, Pearson's correlation coefficient
- Assumptions: check and document whether the data meets the assumptions of the tests used, such as normal distribution and homogeneity.
- Statistical significance: indicate P values and significance levels, e.g., p < 0.05
Software and tools: In your data management plan, indicate the software and tools you used for data processing and analysis. For quantitative data processing and analysis, it is recommended to use tools that allow you to sequentially document data processing, transformation, and analysis in the form of scripts. Programming languages such as R and Python are increasingly being used for this purpose, as well as data analysis programs based on them, RStudio and JupyterLab, which offer a wide variety of packages for data processing, analysis, and visualization. Save the scripts you create so that the data processing and analysis steps can be easily repeated if necessary. To promote the implementation of open science principles in practice, it is recommended to publish scripts together with the data set.
Data processing and analysis tools such as Excel, SPSS, and Stata are popular among researchers. If data is processed using these programs and no syntax files are created to store processing and analysis scripts, special attention should be paid to documenting the actions performed on the data. This can be done in a ReadMe file, code book, or other type of documentation.
Visualization and interpretation of results: clear and concise descriptions of the results of statistical analysis. Include tables, graphs, and diagrams to visualize the data. Interpret the results and explain what they mean in the context of the study.
Qualitative data processing and analysis
Data preparation – transcription (if necessary): often, if data is collected in audio or video format, it is advisable to transcribe it into text format for analysis. Transcription can be done by the researchers themselves or by an external service provider. Transcription is increasingly being done automatically with the help of various tools, but these tools are not always able to transcribe Latvian without errors. As a result, researchers have to review and correct automatically transcribed texts later.
It is advisable to document the transcription process in the data management plan. It is recommended to indicate whether the transcription is verbatim or edited (i.e., whether any specific language deficiencies have been corrected).
Special attention should be paid when transcribing recordings that contain sensitive information. In this case, it is advisable to avoid uploading audio or video files to online tools that do not have a clear data management policy. If the transcription is carried out by an external service provider, it is advisable to conclude a data processing agreement that regulates confidentiality and privacy issues.
During the transcription process, the personal data of participants (and other people mentioned) is often replaced with pseudonyms or other information that does not reveal personal data. From a personal data protection perspective, such transcripts (transcriptions of audio or video recordings) are considered less sensitive than audio or video recordings in which participants can be identified by their voice or appearance.
Software and tools: indicate the software and tools you used for qualitative data processing, such as Atlas.ti, MAXQDA, NVivo. These are qualitative data analysis software programs that help researchers code, structure, and interpret qualitative data.
Data can also be processed manually by coding and analyzing it in Word/Excel documents. It is recommended to choose tools that are available to the entire research team and from which data can be easily extracted in interoperable formats.
Document the actions performed with the data – in a ReadMe file, code book, or other type of documentation.
Analysis process: detailed descriptions of the analysis process.
- Indicate which qualitative analysis method you used, e.g., thematic, discourse, phenomenological, or narrative analysis, and describe the specific analysis steps and principles that were followed.
- Describe the coding scheme or categories you used for data analysis and explain how the codes, categories, or themes were identified (e.g., inductively from the data or deductively based on theory).
Reliability of analysis: use strategies appropriate to the chosen method of qualitative data analysis to promote reliability of analysis.
- Keep reflexivity notes to critically evaluate how the researcher's personal experience, values, perceptions, and sociocultural background may influence the course of the study, the interpretation of results, and conclusions.
- If the data was coded by several people, describe whether and how inter-coder reliability was ensured, e.g., whether Cohen's kappa coefficient was calculated, whether discussions took place about code discrepancies, etc.
Results and interpretation: it is important to remember that when presenting results with quotations, it is necessary to double-check that participants cannot be identified.
LONG-TERM STORAGE AND SHARING OF DATA
At the end of a research project, it is important to ensure the long-term storage or archiving of data sets in a reliable environment, as well as their sharing or publication in a research data repository, if possible.
Publishing data in a repository means that a description (metadata record) of the dataset is created in the data catalog and the data itself is added – files with research data or, in some cases, a link to the website where they are located. This metadata record allows other users to easily find the dataset, obtain concise information about it, learn about the conditions of access, and reuse the data even after the research project has ended.
Data in the repository, with the appropriate information in the metadata record, can be published in various ways, for example:
- Open data: data that is freely available to anyone without restrictions. It can be downloaded, used, and distributed immediately and free of charge, subject to the specified license terms.
- Semi-closed data: data that can be accessed under specific conditions, such as by registering or requesting access permission by contacting the owner or manager of the data set.
- Closed data: data that is not publicly available and is only accessible to a limited group of people, for example, for internal use by an organization or to protect sensitive information. Closed data can be used to create a metadata record in a repository.
Sometimes researchers choose to make data openly available after an embargo period. This means that the data is not publicly available for a certain period of time, although it will be made open in the future. Such restrictions may be related to copyright, intellectual property protection, publication requirements, or commercial considerations.
During the embargo period, metadata is made available to provide information about the data and its future availability. This helps researchers and interested parties learn about the dataset, its content, and the possible access time after the embargo period ends. However, metadata availability may vary depending on the repository policy and data type.
Why is archiving research data important?
- Data sets can also be checked after the project has ended
- Data sets can be reused in the future (e.g., for teaching purposes or new research)
- Compliance with the requirements of financiers, publishers, institutions, or organizations regarding specific data retention periods is ensured.
- Data that is of significant value at the organizational, national, or societal level is retained for the long term
Source: Research Data Management Guide. https://dataverse.lv/par-celvedi/