gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Quantifying readability and vocabulary metrics of the Austrian National Health Portal

Meeting Abstract

Search Medline for

  • Richard Zowalla - Hochschule Heilbronn - Medizinische Informatik, Heilbronn, Deutschland
  • Martin Wiesner - Hochschule Heilbronn - Medizinische Informatik, Heilbronn, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 230

doi: 10.3205/18gmds123, urn:nbn:de:0183-18gmds1234

Published: August 27, 2018

© 2018 Zowalla et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: Even with a higher level of health literacy, vocabulary and concepts of diagnosis and treatment may not be easy to understand for lay people [1]. A low level of patient health literacy represents a major reason for worse prognosis, less preventive actions or reduced therapy adherence [2]. In this context, the German Federal Ministry of Health outlined the concept for a “National Health Portal” in 2018 [3]. In their analysis of existing work, experts of the IQWiG checked several existing portals of other official sources, e.g., the ‘Public Health Portal of Austria‘ (P-HPA,

Yet, to the best knowledge of the authors, no full-scale assessment of the health-related articles published at the P-HPA is publicly available. The aim of this study is to fill this gap. We present the results of a computer-based readability and vocabulary analysis of the P-HPA content, as published in 2018.

Methods: Readability is a term to describe the properties of written text. It can be checked for with different instruments [4]. Several metrics have been adapted for the German language, e.g., Flesch-Reading Ease (FRE) or Vienna-Formula (4th-WSTF, [5]). Beyond readability, modern methods from the field of Machine Learning can be leveraged [6] to compute an expert level (L=\'7b1,…,10\'7d, [7]) which relates to the vocabulary used.

All articles were downloaded via the crawler4j framework; text content was extracted and HTML-sanitized via JSoup and cleaned from disturbance artifacts. For every article, the readability metrics were computed via an analysis software written in Java. For sentence detection, we relied on OpenNLP and related sentence models for the German language.

Results: The analysis included n=2931 articles as found in the sub-categories (i) diseases (n=914), (ii) laboratory and diagnosis (n=993) (iii) life (n=1024), mainly on prevention and aging. The data acquisition was carried out on April 5th, 2018.

For WSTF, the analysis yields an average of 12.42 corresponding to 13 years in school. The mean FRE score resulted in a value of 22.74 and showed a mean understandability value of L=5.93 which corresponds to a moderate level of vocabulary difficulty.

Articles in sub-category ‘life’ are written in easier vocabulary (L=3.31) and require attending school for 11 years (WSTF=11.55;FRE=31.21). Information on ‘diseases’ score a moderate L (L=6.18) and the reader needs 13 years in school (WSTF=12.32;FRE=22.77). By contrast, ‘laboratory’ articles make use of complex sentence structures (WSTF=13.35;FRE=14.44) and difficult vocabulary (L=8.26).

Note: The full results of the analysis are published at as supplemental material.

Discussion: As shown by our analysis, the expert level, i.e. the vocabulary required by readers, seems moderate on average: L<=6 but varies largely depending on the topic. Therefore, article authors should carefully check written material and reduce expert-centric terms. Automatic tooling, as used for this study, can support in the process of text production.

Yet, an exclusive computation of readability scores does not reflect the individual knowledge or the motivation of patients or family members. Moreover, aspects such as illustration and type-setting were not assessed in this study.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Friedman DB, Hoffman-Goetz L. A systematic review of readability and comprehension instruments used for print and web-based cancer information. Health Education & Behavior. 2006;33(3):352–373.
Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K. Low health literacy and health outcomes: An updated systematic review. Annals of Internal Medicine. 2011;155(2):97–107.
Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG), editor. Konzept für ein nationales Gesundheitsportal - Version 2.0 [Concept for a National Health Portal]. Köln: Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG); 2018 [last accessed 7.4.2018]. Available from: External link
Walsh TM, Volsko TA. Readability Assessment of Internet-Based Consumer Health Information. Respiratory Care. 2008;53(10):1310–1315.
Bamberger R, Vanacek E. Lesen - Verstehen - Lernen - Schreiben [Reading - Comprehension - Learning - Writing]. Diesterweg; 1984.
Leroy G, Miller T, Rosemblat G, Browne A. A balanced approach to health information evaluation: A vocabulary-based naïve Bayes classifier and readability formulas. J Am Soc Inf Sci Technol. 2008;59:1409–1419. DOI: 10.1002/asi.20837. External link
Zowalla R, Wiesner M, Pfeifer D. Automatically Assessing the Expert Degree of Online Health Content using SVMs. Stud Health Technol Inform. 2014;202:48–51.