Artikel
Multicentric Outlier Detection Algorithm for Healthcare Laboratory Data
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Introduction: Precise laboratory results are critical in healthcare, as they often provide the initial insights into a patient's medical condition, guiding subsequent diagnostic and treatment decisions 1. Technical errors in laboratory results can lead to false diagnoses and suboptimal treatment strategies. Therefore, it is crucial to differentiate between implausible values resulting from technical errors and realistic values that reflect true clinical conditions. The manual setting of clinically plausible ranges is impractical in the Big Data domain, thus automated and scalable solutions are required to ensure consistency across diverse laboratory datasets. Using multicentric clinical data enhances outlier detection, allowing to pinpoint site-specific variances and thus indicative of a greater likelihood of unrealistic laboratory results. Our multicentric outlier detection algorithm systematically evaluates standardizes according to LOINC (Logical Observation Identifiers, Names and Codes) laboratory results stemming from 68 medical sites, creating robust reference ranges across a large set of laboratory results.
Methods: We developed an outlier-detection algorithm that considers statistics from individual sites as well as aggregated across all sites. Specifically, we extract the minimum and maximum values from each site and calculate the median M of these values across all sites. Then we define the lower range as M minus the 33% percentile, and the upper range as the median plus the 66% percentile. This approach allows ranges to be less sensitive to sites prone to producing extreme values, while still yielding sensitive ranges, able to capture all plausible values. Values outside of the ranges are flagged as outliers. This approach was validated using data from 68 sites, encompassing a total of 1630 LOINC laboratory codes. We excluded LOINC tests represented at fewer than three sites, resulting in 530 remaining codes for analysis. The algorithm was developed and tested within a PySpark environment.
Results: The samples provided enabled us to validate the effectiveness of our novel outlier detection algorithm by manually evaluating the computed boundaries. All LOINC laboratory tests were categorized into two groups based on their value distributions: 1) those where only positive values are valid, and 2) those where both positive and negative values are possible. Our algorithm successfully distinguished between implausible and real values for both groups of laboratory tests.
Discussion: While numerous studies have explored various outlier detection algorithms for laboratory results 2,3, our novel algorithm distinctively incorporates both the distribution of laboratory results from each site and the aggregate data. This dual approach reduces sensitivity to extreme values from sites prone to producing unrealistic values. Consequently, our method provides a general solution that can be applied to each site. Furthermore, the computational efficiency of our algorithm allows an immediate implementation in a real-world setting.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Miriovsky BJ, Shulman LN , Abernethy AP. Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care. J Clin Oncol. 2012;30(34):4243-8. DOI: 10.1200/JCO.2012.42.8011
- 2.
- Monjas AM , Ruiz DR , Pérez-Rey D, Palchuk M. Automatic Outlier Detection in Laboratory Result Distributions Within a Real World Data Network. Stud Health Technol Inform. 2023;18:302:88-92. DOI: 10.3233/SHTI230070
- 3.
- Estiri H , Klann JG, Murphy SN. A Clustering Approach for Detecting Implausible Observation Values in Electronic Health Records Data“. BMC Med Inform Decis Mak. 2019;19:142. DOI: 10.1186/s12911-019-0852-6