gms | German Medical Science

24th Annual Meeting of the German Drug Utilisation Research Group (GAA)

Gesellschaft für Arzneimittelanwendungsforschung und Arzneimittelepidemiologie

30.11. - 01.12.2017, Erfurt

Utilizing health insurance routine data: an overview of methods, algorithms and fields of application in the federal state of Schleswig-Holstein

Meeting Abstract

Gesellschaft für Arzneimittelanwendungsforschung und Arzneimittelepidemiologie e.V. (GAA). 24. Jahrestagung der Gesellschaft für Arzneimittelanwendungsforschung und Arzneimittelepidemiologie. Erfurt, 30.11.-01.12.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. Doc17gaa90

doi: 10.3205/17gaa90, urn:nbn:de:0183-17gaa901

Published: December 5, 2017

© 2017 Schuster et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Drug prescription data structure in outpatient treatment (described by § 300 SGB V (Volume V of the Social Insurance Code in German Law)), diagnostic data in outpatient treatment (using ICD-10), remedy prescription data (§ 302 SGB V) and inpatient DRG (Diagnose Related Groups) data (§ 301 SGB V) are nowadays not only used for administrative purposes but also for negotiations between the contracting parties (statutory health insurance (SHI) funds, the association of SHI-accredited physicians and hospitals) as well as for contract controlling and the consultation of physicians. A new and steadily growing field of application are medical, pharmaceutical and economic research.

Health related data can now be captured, stored and analyzed in form of very large data sets, referred to as so called “big data”. In health research, big data helps to identify and moreover forecast potential risk factors, causalities or hazards for the improvement of primary-care quality. However, due to their volume, variety of data types and high velocity of data flow big data are often difficult to process. Hence, the generation and transmission processes of the data sets including media breaks and large differences in documentation accuracy have to be considered. Statistical noise related to missing and wrong values have to be adressed by adequate filtering and controlling.

Materials and Methods: Drug prescription data of Schleswig-Holstein (SH) contain 1.40m patients in quarter 2/2016 and 2.17m in the year from quarter 3/2015 till 2/2016. All patients of all physicians of the region are included independent of their place of residence. In SH there are 2.43m SHI insured individuals. There are about 4,600 physicians in 2,600 permanent establishments (“Betriebsstätten”). ICD-10 diagnostic data contain 2.62m patients per year (1.94m in quarter 2/2016) because cases with no drug prescriptions are included. There are 3.71m patient-physician encounters in the quarter 2/2016. Quarterly there are 6.16m data sets of drug prescriptions and 24.95m in one year. The data sets for remedy prescriptions contain 2.05m data sets of 0.56m patients in the period of observation and 0.22m in one quarter.

The five different resolution levels of the international anatomic-therapeutic-chemical classification system (ATC) are used in our prescription analysis. In [14], [39], [40] the concept of Morbidity Related Group (MRG) was introduced in order to determine a main drug class for each patient with respect to physician and quarter. MRG was developed in analogy to the Diagnose Related Group (DRG) in the hospital setting mainly based on diagnoses. The basic MRG is determined by the drug group with the highest cost on the third level of ATC. For the determination only the cost rank and not the cost value is relevant. Like the DRG the MRG is further specified by degrees of severity defined using age, prescription intensity and multimorbidity. In contrast to the formerly applied prescription limits (“Richtgrößen”) the new drug controlling approach it is patient centered. . Regarding remedy prescriptions a modified MRG concept using a combination of the first three characters of the ICD-10 diagnoses and the three digit remedy indication for grouping was introduced. The large amounts of data can be reorganized, joined and analyzed with the help of script languages (perl, gawk). Test calculations have shown that even data of German federal states larger than SH can be computed in the same way with only minor changes regarding parallelization. Thereby the powerful concept of associative arrays can be used for aggregation and matching. It is highly recommended to use 64 Bit Linux systems because array calculations with memory requirements over 2 GB get instable using equivalent Windows systems. We made use of approximately 20 GByte of main memory. The hardware requirements can be reduced using external sort procedures. But this increases the programming efforts remarkably while keeping the program running times more or less identical. In order to create advisory documentation (“Beratungsunterlagen”) for economic, pharmaceutical and medical consultation universally readable files have to be provided. More than 30 individual Excel spreadsheets are created on a Linux system with the perl module “write excel” accessing pre-aggregated data in a MySQL database. These documents contain information about drug expenditures compared with the prospective and retrospective MRG budgets, in-depth analysis of MRG with potential problems, aggregated tables on different ATC levels, status quo of PRISCUS drugs (potentionally inadequate medication in elderly), polypharmacy and detailed results for drug target values. Comparative values of suitable specialist groups are provided. There are similar documents for remedy audits of physicians. Each year there are more than 200 physician audits regarding drug and remedy prescriptions.

Results: In Schleswig-Holstein daily treatment expenses are calculated using prescribed daily doses (PDD) instead of defined daily doses (DDD) determined by statistical measurements of regional prescription behaviour. Therefore pharmaceutical databases provide additional information for each PZN (central pharmaceutical number) code of a drug, e.g the ATC code and more detailed information like potency and dosage form. The PDD is a regional statistical value and should not be interpreted as an individual recommendation for treatment. The measurements use successive prescription dates, while other approaches are only applicable for continuous prescriptions over a period of at least one year. Sufficiently large prescription numbers are necessary in order to get stable statistics. Next we apply a linear regression with respect to the drug strength in order to get results for all drugs within the considered group resulting in further smoothing behavior. It has to be noted that MRG is solely based on drug prescriptions. Because of the strong correlation of the most expensive drug group and the morbidity of a patient there should also be a relation to the diagnostic structure. Indeed, the MRG can be used to determine a physician-based main diagnosis out of all ICD-10 assigned to patient in a specific quarter. For example 33 % of all patients having basic MRG M01A (antiinflammatory and antirheumatic products) are documented to suffer from ”dorsalgia“ (ICD-10: M54) applying age and gender standardization. Failure of mapping is a strong indicator for missing diagnoses in the documentation. For instance if insulin treated patients have no diabetes diagnosis. Considering the communication between physicians one has to take into account limited resources as well as potential benefits. In principal the same problems were stated concerning constructions of computer networks [2], [13], [17], [20], [22], [26], [28]. The majority of the physician activities and not just the special case of actual treatment should be patient oriented. If we consider the common patients of pairs of afore mentioned 2,600 physicians we get 2,600 x 2,600 = 6,670,000 pairs. Two iterated for-LOOP can be used or just the upper diagonal elements would be sufficient. Both approaches are practically failing because of computational limitations. For a region of a scale factor 10 in comparison with Schleswig-Holstein a 667 Million step procedure would be quite unrealistic for practical computations on an advanced standard PC level. The algorithmic organization is optimized considering patients. The amount of computation needed now only increases linearly and not quadratic. The determination of the pairs of physicians of patients is cheap in computing time because standard sort procedures can be parallelized using all processor cores. This linear computing gives a sparse list of pairs of physicians. This approach to reduce algorithmic complexity represents a standard element in big data analysis in the field of health insurance. In this way the top n (n=1, 2, 3) physicians (most common patients) for all physicians can be determined resulting in a hierarchical structure which can be applied in order to construct further graphs. Additionally a balanced graph can be constructed in which every physician is connected to three other physicians (cubic). It is well known that generally the determination of a cubic subgraph for a given graph is a NP-hard problem. The network structure between physicians in outpatient treatment is thereby a special graph structure which should be analyzed in detail. Considering the MRG of all patients or subgroups, we analyze the group changes from quarter to another. This gives a n by n matrix of transitions. The question arises if this transition is a Markov process which does not require long time information of the past for reliable predictions. Is the transition from quarter 1 to 2, 2 to 3 and 3 to 4, the same as directly from quarter 1 to 4? This can be confirmed for the underlying problem. Another interesting point is if the situation has reached an equilibrium or if the situation still moves in some direction. Instabilities would cause different care situation in the future. The actual state is nearly state. Observed differences are may be induced by a yearly rhythm. Furthermore the related graph is strongly connected. The calculations are primarily done in gawk (script language). In order to consider Markov properties and the stability of the transition, we have to compute eigenvalues and eigenvectors of a 230 by 230 matrix. This can be easily done, like the Visualization of Graphs, with Mathematica by Wolfram Research.

Conclusion: The advancement from the drug benchmark system “Richtgrößen” to MRG is a large step towards justness for patients, physicians and statutory health insurance funds in order to realize the optimal treatment while having scarce resources. It is based on a patient centered multidimensional analysis of prescription data. The adoption in other federal states would be desirable and preliminary computations have already been made. The optimization of the communication and therefore networks between physicians and other players of the health care system can be done using tools from graph theory. Markov processes and Shannon Entropy still have a large potential. The MRG system can be extended in order to improve the care of special groups like elderly people, patients with benefits from care insurance and patients with psychiatric diseases. In order to achieve this, other datasets have to be included: e.g. ICD-10 diagnostic data, care insurance data and MRG data. This increases the need for multidimensional analysis with the considered tools.


References

1.
Amann U, Schmedt N, Garbe E. Prescribing of potentially inappropriate medications for the elderly. Age. 2012; 65(69): 70-74.
2.
Begoli E. A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data. Proceedings of the WICSA/ECSA. 2012: 177-183.
3.
Berman A, Plemmons JR. Nonnegative Matrices in the Mathematical Sciences. SIAM; 1994.
4.
Bharathi R, Keswani NN, Shinde SD. An Approach to mining massive Data. Proceedings of the MPGI National Multi Conference. International Journal of Computer Applications. 2012: 32-36.
5.
Bharathi R, Keswani NN, Shinde SD. An Approach to mining massive Data. Proceedings of the MPGI National Multi Conference. International Journal of Computer Applications. 2012: 32-36.
6.
Bickel PJ, Hammel EA, O'Connell JW. Sex Bias in Graduate Admissions: Data from Berkeley. Science. 1975; 187 (4175): 3.
7.
Blyth CR. On Simpson's Paradox and the Sure-Thing Principle. Journal of the American Statistical Association. 1972; 67(338): 364-366.
8.
Bratzke B, Spies KP, Krebs S. Morbiditätskomponente bei Arznei- und Heilmittelbudgets einführen. Deutscher Ärztetag, Drucksache V I-37. 2012.
9.
Busse R, Panteli D, Krebs S. Arzneimittelversorgung in der GKV und 15 anderen europäischen Gesundheitssystemen: Ein systematischer Vergleich. Universitätsverlag der TU Berlin; 2015. (Working papers in health policy and management; 11).
10.
Cao L. Data science: nature and pitfalls. IEEE Intelligent Systems. 2016; 31(5): 66-75.
11.
Cao L, Fayyad U. Data science: Challenges and directions. Commun ACM. 2016: 1-9.
12.
Donetti L, Hurtado PI, Munoz MA. Entangled networks, synchronization, and optimal network topology. Physical Review Letters. 2005;95(18):188701.
13.
Dorogovtsev SN, Mendes JFF. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford: Oxford University Press; 2003.
14.
Emcke T, Ostermann T, Heidbreder M, Schuster R. Comparison of Different Implementations of a Process Limiting Pharmaceutical Expenditures Required by German Law, Proceedings of the 10th International Joint Conference on Biomedical Engineering and Technologies (BIOSTEC). HealthInf. 2017; 5: 35-40.
15.
Friedman J. On the second eigenvalue and random walks in random d-regular graphs. Combinatorica. 1991; 11: 331-362.
16.
GKV Versorgungsstärkungsgesetz (GKV-VSG). BGBl. 2015; I:1211.
17.
Gupta S, Venkatesh R, Saurabh S. Fast Exponential Algorithms for Maximum r-Regular Indued Subgraph Problems. Lecture Notes in Computer Science. 2006; 4337.
18.
Hassani H, Silva E S. Forecasting with big data: A review. Annals of Data Science. 2015; 2(1): 5-19.
19.
Holt GB. Potential Simpson's paradox in multicenter study of intraperitoneal chemotherapy for ovarian cancer. Journal of Clinical Oncology. 2016; 34(9): 1016-1016.
20.
Hu H, Wen Y, Chua T S, Li X. Toward scalable systems for big data analytics: A technology tutorial. IEEE Access. 2014; 2: 652-687.
21.
Julious SA, Mullee MA. Confounding and Simpson's paradox. BMJ. 1994;309(6967):1480-1. DOI: 10.1136/bmj.309.6967.1480 External link
22.
Lubotzky A. Discrete Groups, Expanding Graphs and Invariant Measures. Birkhäuser Verlag; 1994.
23.
Lubotzky A, Phillips R, Sarnak P. Explicit expanders and the Ramanujan conjectures, Proc. of the Eighteenth Annual ACM Sympos. On Theory of Computing. 1986; 18: 240-246.
24.
Markov AA. Extension of the limit theorems of probability theory to a sum of variables connected in a chain. Reprinted in Appendix B of: R. Howard. Dynamic Probabilistic Systems, volume 1: Markov Chains. John Wiley and Sons; 1971.
25.
Mathieson L, Szeidler S. The Parameterized Complexity of Regular Subgraph problems and Generalizations. Proceedings of the 2nd international conference on Combinatorial Optimization and Applications; 2008.
26.
McCubbins MD, Paturi R, Weller N. Connected coordination network structure and group coordination. American Politics Research. 2009;37(5):899 920.
27.
Meyn SP, Tweedie RL. Markov Chains and Stochastic Stability. London: Springer; 1993.
28.
Milgram S. The Small World Problem. Psychology Today. 1967 Mai; 6067. ISSN 0033-3107.
29.
Nummelin E. General irreducible Markov chains and non-negative operators. Cambridge University Press; 1984, 2004.
30.
Ostermann T, Schuster R. An Informationtheoretical Approach to Classify Hospitals with Respect to Their Diagnostic Diversity using Shannon’s Entropy. Proceedings of the International Conference on Health Informatics (HealthInf). 2015: 325-329.
31.
Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming. 2005; 13(4): 277-298.
32.
Pohl-Dernick K, Meier F, Maas R, Schöffski O, Emmert M. Potentially inappropriate medication in the elderly in Germany: an economic appraisal of the PRISCUS list. BMC health services research. 2016; 16(1): 109.
33.
Press G. A very short history of big data. Forbes Tech Magazine. 2013 May 9.
34.
Pretti M, Weigt M. Sudden emergence of q-regular subgraphs in random graphs. Europhysics Letters. 2006; 75: 8.
35.
Robbins A. GNU awk 4.0: teaching an old bird some new tricks. Linux Journal. 2011; 209: 5.
36.
Satorras RP, Vespignani A. Evolution and Structure of the Internet: A Statistical Physics approach. Cambridge: University Press; 2004.
37.
Schuster R. Biomathematik. Stuttgart: Teubner-Verlag; 2009.
38.
Schuster R. Graphentheoretische Analyse von Vernetzungsstrukturen zwischen Wirkstoffen und Wirkstoffgruppen in Bezug auf gleichzeitige Verordnung beim Patienten. GAA; 2015.
39.
Schuster R. Morbidity Related Groups (MRG) and drug economic index - a new concept after the age of Richtgrößen benchmarks in Germany. GAA; 2015.
40.
Schuster R, Emcke T, von Arnstedt E, Heidbreder M. Morbidity Related Groups (MRG) for epidemiological analysis in outpatient treatment. IOS Press; 2016. p. 783-787.
41.
Schuster R, Schuster M. Graphentheoretische Analyse von Vernetzungsstrukturen im vertragsärztlichen Sektor einer Region der kassenärztlichen Vereinigung. GAA; 2015. DocAbstr. 202
42.
Schuster R, von Arnstedt E. Aspekte der Dynamik der Multimedikation in der Vertragsärztlichen Versorgung. GAA; 2012.
43.
Seneta E. Non-negative matrices and Markov chains. 2nd rev. ed. 1981. (Springer Series in Statistics).
44.
Shimono T. A hacking toolset for big tabular files (Codenames: Bin4tsv, Kabutomushi). Proceedings of the IEEE International Conference on Big Data; 2016. P. 2902-2910.
45.
Simpson EH. The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Ser. B. 1951; 13: 238-241.
46.
Spinellis D. A repository of Unix history and evolution. Empirical Software Engineering. 2017: 1-33.
47.
Wagner CH. Simpson's Paradox in Real Life. The American Statistician. 1982 Feb; 36 (1): 46-48. DOI: 10.2307/2684093 External link
48.
Wang W, Krishnan E. Big data and clinicians: a review on the state of the science. JMIR medical informatics. 2014; 2(1): e1.
49.
Wersborg T. Morbiditätsbezogene Richtgrößen zur Steuerung einer bedarfsgerechten und wirtschaftlichen Arzneimittelversorgung innerhalb der gesetzlichen Krankenversicherung in Deutschland [Dissertation]. München: LMU, Medizinische Fakultät; 2006.
50.
Wielandt H. Unzerlegbare, nicht negative Matrizen. Mathematische Zeitschrift. 1950; 52 (1): 642-648.
51.
Wünschiers R. Awk. In: Computational Biology. Berlin Heidelberg: Springer; 2013. p. 197-254.
52.
Wyber R, Vaillancourt S, Perry W, Mannava P, Folaranmi T, Celi L A. Big data in global health: improving health in low-and middle-income countries. Bulletin of the World Health Organization. 2015; 93(3): 203-208.