gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

A general data schema and robust processing package for gait data analytics and their evaluation with six publicly available data sets

Meeting Abstract

Search Medline for

  • Fabian Schmidt - Medizinische Informatik, Hochschule Heilbronn, Heilbronn, Germany; Medizinische Informatik, Universität Heidelberg, Heidelberg, Germany
  • Christoph Maier - Medizinische Informatik, Hochschule Heilbronn, Heilbronn, Germany
  • Alexandra Reichenbach - Zentrum für Maschinelles Lernen, Hochschule Heilbronn, Heilbronn, Germany; Medizinische Fakultät Heidelberg, Universität Heidelberg, Heidelberg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 90

doi: 10.3205/22gmds009, urn:nbn:de:0183-22gmds0098

Published: August 19, 2022

© 2022 Schmidt et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: Human gait analysis is used in a variety of domains [1], e.g. for assessing patients with movement disorders [2]. Gait tracking via sensors contained in smartphones became more and more popular due to their widespread availability [3]. The goal of the current project is to design a general data schema optimized for data analysis, develop robust preprocessing and features extraction, and evaluate these features across databases, sensor types, and locations.

State of the art: Current literature about gait analysis indicates that methods are rather diverse: Usage of different sensors, either specific ones attached directly to the body or built-in smartphone sensors, the location of sensors on the body ranges from the ankle over pockets to the back, data format, preprocessing and feature extraction is manifold [1]. This diversity hampers the comparison of results from different machine learning models trained to achieve a task like diagnosis. Research is based either on private data or on publicly available data sets [4], [5], [6], [7], [8], [9]. However, we found public sets incompatible in terms of e.g. sensor model, location/orientation, data format, and performed activities.

Concept: A database for consolidation of all data sources was designed in a star schema with raw accelerometer data as initial fact. Additional facts contain preprocessed data and features. The main dimensions for filtering are sensor location, activity, and demographic subject information. Six publicly available data sets [4], [5], [6], [7], [8], [9] were selected for the initial database. Because of the vastly different formats of the original data sources, an individual extract-transform-load (ETL) script was necessary for each source. In order to include new data sources, new ETL-scripts need to be derived. The preprocessing and feature extraction processes are standardized for all data contained in the database and quality control processes including aggregated data visualization enable easy checks.

Implementation: Data are stored in a postgreSQL database and processing steps were implemented in Python with packages for database interaction, data handling, and processing. General preprocessing steps include upsampling and transformation into a world coordinate system. Robust gait cycle detection was developed that deals well with different noise levels, different sensor locations and therewith rather different gait characteristics as well as with initial misclassification. Three types of features are extracted: frequency features, normalized gait cycle [10], and statistical metrics of the gait cycle. We plan to publish a python package for database creation, preprocessing, and feature extraction after project completion.

Lessons learned: Data characteristics and quality varied strongly across data sources but robust gait cycle detection for walking and subsequent feature extraction algorithms were realized. We analyzed the patterns of mean gait cycles between data sources (same or similar sensor location) and between sensors of similar locations (same data source) and found that these patterns are surprisingly sensitive to all of the variations investigated. The database with the standardized processing is well suited for a variety of data sources containing wearable sensor data of human gait. However, in order to combine data sets from different sources, more robust features than the gait cycle pattern need to be investigated.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Prakash C, Kumar R, Mittal N. Recent developments in human gait research: parameters, approaches, applications, machine learning techniques, datasets and challenges. Artificial Intelligence Review. 2018;49(1):1-40.
Celik Y, Stuart S, Woo WL, Godfrey A. Gait analysis in neurological populations: Progression in the use of wearables. Medical Engineering & Physics. 2021;87:9-29.
Schneider B, Banerjee T, Grover F, Riley M. Comparison of gait speeds from wearable camera and accelerometer in structured and semi-structured environments. Healthcare technology letters. 2020;7(1):25-8.
Chereshnev R, Kertész-Farkas A. Hugadb: Human gait database for activity recognition from wearable inertial sensor networks. In: International Conference on Analysis of Images, Social Networks and Texts. Springer; 2017.
Frank J, Mannor S, Precup D. Data sets: Mobile phone gait recognition data. 2010.
Khandelwal S, Wickström N. Evaluation of the performance of accelerometer-based gait event detection algorithms in different real-world scenarios using the MAREA gait database. Gait & posture. 2017;51:84-90.
Luo Y, Coppola SM, Dixon PC, Li S, Dennerlein JT, Hu B. A database of human gait performance on irregular and uneven surfaces collected by wearable sensors. Scientific data. 2020;7(1):1-9.
Ngo TT, Makihara Y, Nagahara H, Mukaigawa Y, Yagi Y. The largest inertial sensor-based gait database and performance evaluation of gait-based personal authentication. Pattern Recognition. 2014;47(1):228-37.
Vajdi A, Zaghian MR, Farahmand S, Rastegar E, Maroofi K, Jia S, et al. Human gait database for normal walk collected by smart phone accelerometer [Preprint]. arXiv. 2019. arXiv:190503109
Choi S, Youn IH, LeMay R, Burns S, Youn JH, editors. Biometric gait recognition based on wireless acceleration sensor using k-nearest neighbor classification. 2014 international conference on computing, networking and communications (ICNC). IEEE; 2014.