Article
Building a data warehouse for SQB reports for the epidemiologic cancer registry of Lower Saxony
Search Medline for
Authors
Published: | September 15, 2023 |
---|
Outline
Text
Introduction: German Cancer registries have been established with the purpose of collecting data on cancer cases in order to capture, process, and analyze the data for internal and external researchers. The data collected by cancer registries are primarily based on mandatory reports, e.g. from physicians, clinics or pathologists [1]. Cancer registries are also interested in secondary data sources such as structured quality reports of hospitals (SQB) which can provide additional insights into cancer cases. SQBs contain data on individual hospitals such as number of beds and location, but also more detailed information such as the structure of individual departments and number of treatments performed according to medical classification systems such as OPS-codes. These reports are provided as XML-files by the Federal joint committee [2]. To facilitate work with SQBs for staff of the epidemiological cancer registry of Lower Saxony (EKN), a data warehouse (DWH) has been created. This allows EKN staff to perform OLAP analysis on the SQBs, e.g. group and filter data along multiple dimensions and at different hierarchy levels and calculate predefined measures.
State of the art: Kraska et al. list various data quality problems in SQBs and propose solutions [3]. This highlights the importance of ensuring data quality when preparing data for analysis.
Building a DWH for the analysis of SQBs does not appear to be widely adopted. According to a brief online post, a research group led by Prof. Schöffski is investigating this topic [4]. However, the post provides no further details regarding the implementation and purpose of their DWH.
Concept: In collaboration with the EKN, we analyzed the SQB format to identify relevant data for subsequent analyses. On this basis, we derived appropriate dimensions and measures for a multidimensional OLAP model.
In the process of preparing the data, we faced several quality issues: for instance, some XML-files were not well-formed or did not conform to the XSD-schema, making them unreadable without modification. We also had to resolve content-related issues, such as ambiguity about whether a report covered a hospital association or a single hospital. Another challenge was the integration of multiple reporting years, since the format of SQBs changes regularly.
Implementation: We designed and implemented a tool that performs extraction, transformation and loading of the SQBs into a star schema in a Microsoft SQL Server database. Subsequently, the tool generates and deploys a multidimensional model on a Microsoft Analysis Services server.
Lessons learned: This project showed us that SQBs are a useful secondary data source for analyzing the health care situation in Lower Saxony. We found that by transforming SQBs into a multidimensional OLAP DWH, we were able to provide easy and flexible access to the data for the staff of the EKN. One of the outcomes of this project was a publication analyzing pancreatic cancer care in northwestern Lower Saxony [5]. Currently, the SQBs can only be analyzed independently of the cancer registry data. Linking it with the mandatory reports is an interesting aspect that we want to examine more closely in the future.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Stegmaier C, Hentschel S, Hofstädter F, Katalinic A, Tillack A, Klinkhammer-Schalke M. Das Manual der Krebsregistrierung. Germering: W. Zuckschwerdt Verlag; 2019.
- 2.
- Qualitätsberichte der Krankenhäuser [Internet]. Gemeinsamer Bundesausschuss; [cited 2023 Apr 28]. Available from: https://www.g-ba.de/themen/qualitaetssicherung/datenerhebung-zur-qualitaetssicherung/datenerhebung-qualitaetsbericht/
- 3.
- Kraska RA, de Cruppe W, Geraedts M. Probleme bei der Verwendung von Qualitätsberichtsdaten für die Versorgungsforschung [Problems with Using Hospital Quality Reports as a Secondary Data Source for Health Services Research in Germany]. Gesundheitswesen. 2017 Jul;79(7):542-547. DOI: 10.1055/s-0035-1555953
- 4.
- Schöffski O. Data Mining [Internet]. Bayerisches Landesamt für Gesundheit und Lebensmittelsicherheit; 2020 Sep 03 [cited 2023 Apr 28]. Available from: https://www.lgl.bayern.de/gesundheit/gesundheitsversorgung/gesundheitsversorgungsforschung/informationsplattform_versorgungsforschung/faugesman_schwpkt2_projekt2.htm
- 5.
- Vohmann C, Uslar V, Weyhe D, Kieschke J. Bauchspeicheldrüsenkrebs – Inzidenz und Versorgungszahlen in Nordwest-Niedersachsen unter Betrachtung von routinemäßig erhobenen Sekundärdaten. In: 18. Deutscher Kongress für Versorgungsforschung. Berlin, 09.-11.10.2019. Düsseldorf: GMS; 2019. Doc19dkvf424. DOI: 10.3205/19dkvf424