GMS | 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) | Identifying the risk of sample overlap in meta-analysis of registry-based studies

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Artikel

XML Version

Artikel empfehlen

Identifying the risk of sample overlap in meta-analysis of registry-based studies

Meeting Abstract

Suche in Medline nach

Zhentian Zhang - Institut für Medizinische Statistik, Universitätsmedizin Göttingen, Göttingen, Germany
Tim Mathes - Institut für Medizinische Statistik, Universitätsmedizin Göttingen, Göttingen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 248

doi: 10.3205/23gmds080, urn:nbn:de:0183-23gmds0803

Veröffentlicht:	15. September 2023

© 2023 Zhang et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.

Gliederung

Top
Text

Text

There is a risk of sample overlap, when observational studies using existing registries are included in a meta-analysis. More specifically, when combining the results of the studies in the meta-analysis, some observations could be counted multiple times because they are included in the samples used by multiple studies’ reports, e.g., when different articles use the data from the same trial. This could increase the alpha error, which potentially leads to wrong conclusions.

Although theoretically the sample overlap could be resolved by simply excluding all overlapping observations, this approach would require identification information such as personal IDs. In already aggregated data, such as journal articles and study reports, this information is usually not available. Thus, in this presentation we emphasize on overlap analysis using aggregated data, which could of course also be applied to IPD.

As the first step for handling overlapping studies, we find it necessary to quantify the degree of the overlap. Our approach transforms the task of finding overlapping observations between study samples into identifying overlaps in samples’ key characteristics, e.g., the range of time and region of the observations. Our approach is presented using a combination of common concepts in set theory, linear algebra and the coding of the basic information about the study samples. More specifically, denote Ω as the set of studies that are included in a meta-analysis, we aim to find out all subsets of the Ω that for any of these subsets, no observation is shared by all studies included in the subset. Furthermore, we constructed estimators for the degree of overlap in each of the identified subsets. To do this, we first define the key characteristics of a sample, whose mutual exclusiveness excludes the possibility of overlap among study samples. Then, we code the samples’ key characteristics into a collection of binary vectors and use the product of a normalized multilinear function of the binary vectors (product of the normalized dot product in the case of pairs) to estimate the degree of overlap.

We also suggested several ways to visualize the overlap structure among the studies within the meta-analysis. We used both fictional and real examples to show the viability and the value of the visualization.

We applied our methods to existing meta-analyses and were able to confirm the viability of our method to real-world scenarios. The result also illustrates the necessity of an overlap analysis due to the high risk of overlap we discovered with the help of our method in the investigated cases.

Little research has been done to address this challenge so far. In our opinion, this will be a growing issue because of the increasing use of registry and other real-world data in evidence synthesis.

We believe that our approach could be a good starting point for further research on the sample overlap problem and could improve the quality of evidence synthesis in the future.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

gms | German Medical Science