gms | German Medical Science

GMDS 2012: 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

16. - 20.09.2012, Braunschweig

On the possibility of a holistic class model for the clinical bioinformatics domain

Meeting Abstract

Search Medline for

  • Markus Gumbel - Hochschule Mannheim, Institut für Medizinische Informatik, Mannheim, Deutschland
  • Patrick Sturm - Hochschule Mannheim, Institut für Medizinische Informatik, Mannheim, Deutschland
  • Florian Meyerer - Hochschule Mannheim, Institut für Medizinische Informatik, Mannheim, Deutschland
  • Amelie Bauer - Hochschule Mannheim, Institut für Medizinische Informatik, Mannheim, Deutschland

GMDS 2012. 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Braunschweig, 16.-20.09.2012. Düsseldorf: German Medical Science GMS Publishing House; 2012. Doc12gmds091

doi: 10.3205/12gmds091, urn:nbn:de:0183-12gmds0919

Published: September 13, 2012

© 2012 Gumbel et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Introduction: Gene-technology-based methods along with their bioinformatics methods are more and more applied in clinical medicine. Examples are the support for diagnoses or pharmacogenomics including personalized medicine [1]. Many databases with molecular-biological content are available. However, databases that additionally consider individual patients are rare. Thus, a universal data model, which could be used as a template for a biomedical information system, is not available. Instead, molecular entities are modeled with ontologies, markup languages, terminologies, or with HL7v3 ([2], [3] and below). However, if different data sources need to be integrated there are interoperability issues [4] which only could be solved if the entities, their relationships and thus their semantics are known and specified globally. We have analyzed the possibility of modeling such a holistic data model by means of UML [5].

Material and Methods: Data integration methods in life science have a long history. We have analyzed, a) which solutions for data integration already exist, b) which public databases could be integrated, c) which existing sources for a (new) domain model could be used and d) which techniques could be useful to develop a de-novo model. Based on the outcome of this analysis, a UML prototype was created. The class-diagram was also influenced by an object-oriented database which was created in another project to demonstrate that biomedical data can naturally be stored in object-oriented databases [6].

Results: a) BioMart [7] and BioRS [8] were evaluated as tools for data integration. Apparently, data integration postpones the creation of a domain model until it is needed in concrete use-case dependent scenarios at a customer/organization. Thus, public and reusable domain models are not available. b) Public APIs of 30 key data sources from a list of about 1,300 databases [9] were examined. It turned out that many APIs are difficult to locate and to use. Their data structures are often text-based and mostly semi-structured. c) The gene ontology GO [10], the markup languages SBML [11] and GSVML [12], HL7's "Clinical Genomics" model [2] and to some extend the terminology UMLS [13] were considered as input for a domain model. We found that markup languages and HL7 are less suitable because of their hierarchical data-structures. Ontologies, in contrast, are a good source but a framework like GO is much too complex to be converted into UML directly. d) UML, like other modeling concepts, has the constraint of only having two meta levels [14].) Nevertheless, we never encountered a scenario where UML appeared to be inappropriate. Overall, our prototype model consists of 30 classes.

Discussion: Surprisingly, there are not many methods available for developing a domain model from scratch - modeling still seems to be an art. Methods for the transformation of ontologies to UML and back should be further investigated ([15], [16]). We believe it is possible to formulate a holistic domain model for clinical bioinformatics in UML. It could be used for the integration of data from different data sources, for querying them, for message exchange and as a template for information systems.


References

1.
Coleman W, Tsongalis G. Molecular Diagnostics. Humana Press; 2010.
2.
Hiroi K, Ido K, Yang W, Nakaya J. Interface analysis between GSVML and HL7 version 3. J Biomed Inform. 2007;40(5):527-38. DOI: 10.1016/j.jbi.2006.12.006 External link
3.
Benson T. Principles of Health Interoperability HL7 and SNOMED (Health Informatics). Springer; 2009.
4.
Katayama T, Arakawa K, Nakao M. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. J Biomed Semantics. 2010;1.
5.
Object Management Group. Uml 2.3 specification. Availabe from: http://www.omg.org/spec/UML/2.3/ External link
6.
von den Berken R. OODB4Genomics: Eine objektorientierte Datenbank zur Speicherung biomedizinischer Daten [Bachelor's thesis]. Mannheim: University of Applied Sciences; 2011.
7.
Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. Biomart-biological queries made easy. BMC Genomics. 2009;10:22. DOI: 10.1186/1471-2164-10-22 External link
8.
Kaps A, Dyshlevoi K, Heumann K, Jost R, Kontodinas I, Wolff M, Hani J. The BioRS(TM) Integration and Retrieval System: An open system for distributed data integration. J Integrative Bioinformatics. 2006;3(2):44.
9.
Galperin MY, Fernández-Suáre XM. The 2012 nucleic acids research database issue and the online molecular biology database collection. Nucl Acids Res. 2012;40:D1-D8.
10.
GO – The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25:25-9. Available from: http://www.geneontology.org/-GO_nature_genetics_2000.pdf External link
11.
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524-31.
12.
Nakaya J, Kimura M, Hiroi K, Ido K, Yang W, Tanaka H. Genomic Sequence Variation Markup Language (GSVML). Int J Med Inform. 2009. DOI: 10.1016/j.ijmedinf.2009.11.003 External link
13.
Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267-D270. DOI: 10.1093/nar/gkh061 External link
14.
Atkinson C, Kühne T. Reducing accidental complexity in domain models. Softw Syst Model. 2008;7:345-59.
15.
Gasevic D, Djuric D, Devedzic V, Damjanovi V. Converting UML to OWL ontologies. In: 13th international World Wide Web conference on Alternate track papers & posters, ser. WWW Alt; 2004. New York, NY, USA. pp. 488-9.
16.
Kalibatiene D, Vasilecas O. On OWL/SWRL mapping to UML/OCL. In: 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies, ser. CompSysTech '10; 2010; New York, NY, USA. pp. 58-63.