gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Communicating distributional regression results to applied scientists

Meeting Abstract

Suche in Medline nach

  • Fabian Otto-Sobotka - Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 254

doi: 10.3205/19gmds058, urn:nbn:de:0183-19gmds0582

Veröffentlicht: 6. September 2019

© 2019 Otto-Sobotka.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Multiple linear regression is a standard tool for multivariate analysis. However, its usual assumptions regarding normality and homoscedasticity of the residuals seldomly hold up in a real data set. Also, the main goal of a multivariate analysis may not always be to model the expectation of the response variable. Distributional regression methods allow for a more complete analysis of the entire distribution of the response. The strict assumptions of classical regression models are omitted and changes to the distribution of the residuals are accounted for by design. However, due to the additional complexity the explanation of distributional regression results is often not straightforward and the interpretation of the new results beyond the mean can be overly difficult for your applied cooperation partners. This presentation aims to provide some advice for the communication about distributional regression estimates and facilitate a discussion about different experiences and hurdles with these analyses.

Methods: There are three main methods in distributional regression. Generalised Additive Models for Location, Scale and Shape (GAMLSS [1]) extend classical generalised regression models by introducing additional predictors for all parameters of the assumed distribution instead of just modelling the expectation parameter with covariates. All regression coefficients are then estimated by maximising the likelihood componentwise. The two alternatives offer nonparametric estimates of the conditional distribution of the response. Its empirical distribution function is either constructed by a dense set of regression quantiles [2] or expectiles [3], [4]. The latter are quantiles-like generalisations of the mean estimated by least asymmetrically weighted squares. With quantiles and expectiles, the response is usually modelled with separate regression models for each quantile level. While a single quantile is relatively easy to explain, any expectile beyond the mean does not have a straightforward explanation.

Results: The practical use of distributional regression methods is shown with two exemplary analyses of hearing scores in a representative sample of the German population and patient satisfaction scores from young people with inflammatory bowel disease. The GAMLSS results can be made more accessible by showing a set of quantiles from the fitted distribution instead of the estimated effects for variation or skewness, for example. Expectiles can be transformed into tail means to represent the location of the extreme observations.

Discussion: A large part of the additional information obtained with distributional regression is certainly due to the large flexibility of the regression model, starting with the inclusion of covariates. While many users routinely estimate multiple linear models, metric covariates can be modelled to have nonlinear effects on the distribution of the response variable. This seems to be a first problem since no single regression coefficient can by itself be used to explain the whole effect of a covariate. And next, there seems to be an open question how the estimated effects modify the response specifically if it is not the mean.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Rigby RA, Stasinopoulos DM. Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2005 Jun 1;54(3):507-54.
2.
Koenker R, Bassett Jr G. Regression quantiles. Econometrica: journal of the Econometric Society. 1978 Jan 1;46(1):33-50.
3.
Newey WK, Powell JL. Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society. 1987 Jul 1;55(4):819-47.
4.
Sobotka F, Kneib T. Geoadditive expectile regression. Computational Statistics & Data Analysis. 2012 Apr 1;56(4):755-67.