Authors: Romin Pajouheshnia, Rosa Gini, Gillian Hall, Robert Platt, Soko Setoguchi, Shahab Abtahi, Suzanne Landi, Carrie Nielson, Lin Li
With the growing consideration and acceptance of real-world evidence (RWE) by regulatory agencies and health technology assessment bodies in decision-making, the use of real-world data sources (RWDS) has become an ever-increasing interest of the ISPE community. A diverse array of RWDS is being used in pharmacoepidemiologic research. With new approaches to generate and harness different types of health-related data, the diversity will only continue to increase, bringing greater challenges and opportunities for our field.
Each data source comes with its own complexities and idiosyncrasies. For example, where the date and reason for an individual to enter or exit the source population may be explicitly recorded in some RWDS, this information is not trivial to obtain in others. If unavailable for a study, this can have major implications for the validity and interpretation of results. To fully realize the potential for real-world data, we need to continue developing tools to help leverage expert knowledge about individual data sources, to understand their strengths and limitations and which designs and methods are needed to appropriately analyze data from a given source in a specific study. To this end, the Databases for Pharmacoepidemiological Research Special Interest Group (or Databases SIG) of ISPE has collaborated on several projects to provide such tools.
In 2020, the ISPE-funded DIVERSE initiative was launched to identify existing guidance, tools, or recommendations for how to capture, report, or leverage the differences between data sources and provide updated guidance. Results of the first stage of the initiative were published this May in a special issue of Pharmacoepidemiology and Drug Safety focused on Research Reproducibility in Pharmacoepidemiology.1 Based on the findings of this scoping review, a framework to describe real-world data sources was developed, consisting of nine dimensions to describe RWDS: organization accessing the data source, data originator, prompt, inclusion of population, content, data dictionary, time span, healthcare system and culture, and data quality. By explicitly reporting on each proposed dimension, researchers can provide a complete picture of the strengths and limitations of a data source for their specific purposes and support interpretation of results from a specific study based on possible biases induced by the nature of the data.
The DIVERSE framework complements existing guidance, such as RECORD-PE2 and the HARPER protocol template,3 by providing a standardized approach to describe data sources in either single- or multidatabase studies. The nine dimensions also align with the information collected within the recently updated HMA-EMA Catalogues of Real-World Data Sources and Studies,4 providing a foundation for the development of an ecosystem of catalogues to describe RWDS.5
Currently, the DIVERSE initiative is developing a more formal guidance on how to address data source diversity in pharmacoepidemiologic studies. We encourage journals to include the nine dimensions of the DIVERSE framework in their guidelines for authors for submissions of RWE studies. This will provide guidance on how to report on RWDS, facilitating reproducibility and the interpretation of studies.
Beyond this initiative, the Databases SIG works to develop and exchange knowledge around the use of real-world data in pharmacoepidemiology. The Data Strategy for RWE educational initiative, funded by ISPE in 2022, is disseminating data strategy principles, with a focus on aligning data strategy with organizational goals. As a part of this, the four-part ISPE Data Strategy Series of educational videos will provide practical best practices for data strategy, to promote productive partnerships in data strategies among pharmacoepidemiologists, data architects, and engineers. The topics of the videos include (1) introduction to data strategy, (2) aligning data strategy and data architecture, (3) data governance and data strategy, and (4) creating a roadmap for data strategy execution.
In parallel, the Databases SIG started another ISPE-funded initiative in 2023 on Guidance and best practices for data quality checking and benchmarking of results in large multi-database pharmacoepidemiologic studies using heterogenous data sources, which follows and complements the DIVERSE initiative. This consists of a scoping review of literature and qualitative analysis of interviews with experts to understand the current status of data quality assessment (DQA) in multi-database studies. This initiative will also develop some necessary new tools (e.g., checklist or dashboard) to assist with DQA and provide a set of recommendations for a comprehensive approach to DQA in multi-database pharmacoepidemiologic studies. This will provide insights into how our community handles differences in data structure across diverse RWDS, performs quality control of data harmonization in distributed analyses, assesses “fitness for purpose,” and examines the internal and external validity of studies using multiple data sources.
As we move away from the perspective that data source diversity is only a problem to mitigate and acknowledge that it is also an opportunity, we will continue to find new ways to leverage the potential of real-world data to provide more robust information on the safety, effectiveness, and utilization of drugs and vaccines.
Funding statement: The DIVERSE initiative, Data Strategy for RWE educational initiative, and ISPE data quality initiative received funding from ISPE.
Declarations of interest: RPa, RG, RP, SS, and LL are coauthors of the DIVERSE scoping review.
References
1. Gini R, Pajouheshnia R, Gardarsdottir H, et al. Describing diversity of real world data sources in pharmacoepidemiologic studies: the DIVERSE scoping review. Pharmacoepidemiol Drug Saf. 2024;33(5):e5787. doi:10.1002/pds.5787.
2. Langan SM, Schmidt SA, Wing K, et al. The reporting of studies conducted using observational routinely collected health data statement for pharmacoepidemiology (RECORD-PE). BMJ. 2018;363:k3532. doi:10.1136/bmj.k3532.
3. Wang SV, Pottegård A, Crown W, et al. Harmonized protocol template to enhance reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: a good practices report of a joint ISPE/ISPOR task force. Pharmacoepidemiol Drug Saf. 2023;32(1):44-55. doi:10.1002/pds.5507.
4. Heads of Medicines Agencies and European Medicines Agency. HMA-EMA catalogues of real-world data sources and studies. [cited 6 Jun 2024] https://catalogues.ema.europa.eu/
5. Swertz M, van Enckevort E, Oliveira JL, et al. Towards an interoperable ecosystem of research cohort and real-world data catalogues enabling multi-center studies. Yearb Med Inform. 2022;31(1):262-272. doi:10.1055/s-0042-1742522.
Corresponding author:
Romin Pajouheshnia
RTI Health Solutions, Department of Epidemiology, RTI Health Solutions, Barcelona, Spain.
Email: rpajouheshnia@rti.org