Artikel
Identifying the risk of sample overlap in meta-analysis of registry-based studies
Suche in Medline nach
Autoren
| Veröffentlicht: | 15. September 2023 |
|---|
Gliederung
Text
There is a risk of sample overlap, when observational studies using existing registries are included in a meta-analysis. More specifically, when combining the results of the studies in the meta-analysis, some observations could be counted multiple times because they are included in the samples used by multiple studies’ reports, e.g., when different articles use the data from the same trial. This could increase the alpha error, which potentially leads to wrong conclusions.
Although theoretically the sample overlap could be resolved by simply excluding all overlapping observations, this approach would require identification information such as personal IDs. In already aggregated data, such as journal articles and study reports, this information is usually not available. Thus, in this presentation we emphasize on overlap analysis using aggregated data, which could of course also be applied to IPD.
As the first step for handling overlapping studies, we find it necessary to quantify the degree of the overlap. Our approach transforms the task of finding overlapping observations between study samples into identifying overlaps in samples’ key characteristics, e.g., the range of time and region of the observations. Our approach is presented using a combination of common concepts in set theory, linear algebra and the coding of the basic information about the study samples. More specifically, denote Ω as the set of studies that are included in a meta-analysis, we aim to find out all subsets of the Ω that for any of these subsets, no observation is shared by all studies included in the subset. Furthermore, we constructed estimators for the degree of overlap in each of the identified subsets. To do this, we first define the key characteristics of a sample, whose mutual exclusiveness excludes the possibility of overlap among study samples. Then, we code the samples’ key characteristics into a collection of binary vectors and use the product of a normalized multilinear function of the binary vectors (product of the normalized dot product in the case of pairs) to estimate the degree of overlap.
We also suggested several ways to visualize the overlap structure among the studies within the meta-analysis. We used both fictional and real examples to show the viability and the value of the visualization.
We applied our methods to existing meta-analyses and were able to confirm the viability of our method to real-world scenarios. The result also illustrates the necessity of an overlap analysis due to the high risk of overlap we discovered with the help of our method in the investigated cases.
Little research has been done to address this challenge so far. In our opinion, this will be a growing issue because of the increasing use of registry and other real-world data in evidence synthesis.
We believe that our approach could be a good starting point for further research on the sample overlap problem and could improve the quality of evidence synthesis in the future.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
