Artikel
Appropriate statistical model for count data in central statistical monitoring and application on German Multiple Sclerosis Registry
Suche in Medline nach
Autoren
Veröffentlicht: | 15. September 2023 |
---|
Gliederung
Text
Central statistical monitoring (CSM) has been a commonly practiced monitoring approach in clinical trials. It is one of the approaches available to improve data quality in research. CSM involves using various statistical methods to identify data-related issues that can indicate problems in a trial’s conduct. Several research studies have utilized different statistical concepts for the benefit of CSM. In previous research, we demonstrated the benefit of applying comparisons of centers to the grand mean of the data [1]. This universal approach can be applied to different data types. We previously investigated different statistical models that can implement this type of comparison for binomial, ordinal, and continuous endpoints. In this research, we further investigate whether this comparison can be applied by different subtypes of generalized linear models for count data. We demonstrate this approach on Real-World-Data (RWD) from the German Multiple Sclerosis Registry.
The state-of-the-art generalized linear model (GLM) would be the first choice to implement the center comparison to the grand mean. However, a common short come of GLM is the presence of 0 events specifically when the investigated event is rare. Since it is common to analyze count data on log link, the multiple comparisons on a log scale, where observed zeros lead to extreme estimated differences and extreme standard errors. Bayesian generalized linear model (BayesGLM) is one alternative to counteract the 0 problem and can account for overdispersion in count data. We additionally consider a Negative binomial model (Negbin), a common approach to account for overdispersion in count data. In a simulation study, we investigate whether these models can control Type I error when comparing centers to the Grand mean. Simulations were run for balanced and unbalanced scenarios covering a range of settings that could be found in clinical trials in different centers, additionally, an exposure period for counts reported was considered as an offset. The simulations aimed to detect whether both models can control type I error. For each scenario, 1000 data sets were generated and tested by both models. Simulation results indicate which model has better control of type I error for the settings considered.
The simulations indicate the superiority of the Negbin model over the GLM and BayeGLM model in controlling type I error for relatively small mean μ<1 and small sample sizes in which ni <20 while accounting for overdispersion in balanced scenarios. With increasing μ>1 and increasing variance, all models are similar in controlling type I error. Ideal type I error control is achieved in settings where ni >20. Unbalanced scenarios in which centers have different ni have similar results to balanced scenarios in controlling type I error. The approach is implemented on the GMSR data for adverse events reported. The models show centers that are deviating from the GM. Thus, it indicates the centers that need proper justification for such deviations.
It would be recommended to utilize the negative binomial model for the comparisons to GM for count data in the absence of 0 counts, and BayesGLM in the presence of 0 counts.
Competing interests
- Firas Fneish is an employee of the German MS Registry.David Ellenberger had no personal financial interests to disclose other than being an employee of the German MS Registry.
- Niklas Frahm is an employee of the German MS Registry. Moreover, he received travel funds for research meetings from Novartis.
- Alexander Stahmann has no personal financial interests to disclose, other than being the leader of the German MS Registry, which receives (project) funding from a range of public and corporate sponsors, recently including G-BA, The German MS Trust, German MS Society, Biogen, Celgene (Bristol Myers Squibb), Merck, Novartis, Roche, Sanofi and Viatris.
- Frank Schaarschmidt has nothing to disclose.
The authors declare that an ethics committee vote is not required.
References
- 1.
- European Medicines Agency. EMA E6 (R2) Good Clinical Practice: Integrated Addendum to ICH E6 (R1)- Guidance for Industry - Guideline for Good Clinical Practice - E6 (R2). 2018.
- 2.
- Fneish F, Ellenberger D, Frahm N, Stahmann A, Fortwengel G, Schaarschmidt F. Central Statistical Monitoring approach and implementations on the German MS Registry. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, editor. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). 21.-25.08.2022. Düsseldorf: GMS; 2022. DocAbstr. 82 DOI: 10.3205/22gmds084
- 3.
- Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biom J. 2008 Jun;50(3):346–63.
- 4.
- Gelman A, Jakulin A, Pittau MG, Su YS. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008.
- 5.
- Agresti A. Categorical Data Analysis. Third Edition. Hoboken, New Jersey: John Wiley and Sons; 2013.
- 6.
- Ohle LM, Ellenberger D, Flachenecker P, Friede T, Haas J, Hellwig K, Parciak T, Warnke C, Paul F, Zettl UK, Stahmann A. Chances and challenges of a long-term data repository in multiple sclerosis: 20th birthday of the German MS registry. Sci Rep. 2021 Jun 25;11(1):13340. DOI: 10.1038/s41598-021-92722-x