Repository logo
  • English
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Research Outputs
  • Researchers
  • Disciplines
  • English
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Research Output
  3. Articles
  4. Improving clinical documentation: automatic inference of ICD-10 codes from patient notes using BERT model
 
  • Details
Options

Improving clinical documentation: automatic inference of ICD-10 codes from patient notes using BERT model

Journal
The Journal of Supercomputing
Date Issued
2023
Author(s)
Abu Zitar, Raed 
Physics, Mathematics, Computer science 
Al-Bashabsheh, Emran
Alaiad, Ahmad
Al-Ayyoub, Mahmoud
Beni-Yonis, Othman
Abualigah, Laith
DOI
10.1007/s11227-023-05160-z
URI
https://depot.sorbonne.ae/handle/20.500.12458/1391
Abstract
Electronic health records provide a vast amount of text health data written by physicians as patient clinical notes. The world health organization released the international classification of diseases version 10 (ICD-10) system to monitor and analyze clinical notes. ICD-10 is system physicians and other healthcare providers use to classify and code all diagnoses and symptom records in conjunction with hospital care. Therefore, the data can be easily stored, retrieved, and analyzed for decision-making. In order to address the problem, this paper introduces a system to classify the clinical notes to ICD-10 codes. This paper examines 7541 clinical notes collected from a health institute in Jordan and annotated by ICD-10’s coders. In addition, the research uses another outsource dataset to augment the actual dataset. The research presented many approaches, such as the baseline and pipeline models. The Baseline model employed several methods like Word2vec embedding for representing the text. The model structure also involves long-short-term memory a convolutional neural network, and two fully-connected layers. The second Pipeline approach adopts the transformer model, such as Bidirectional Encoder Representations from Transformers (BERT), which is pre-trained on a similar health domain. The Pipeline model builds on two BERT models. The first model classifies the category codes representing the first three characters of ICD-10. The second BERT model uses the outputs from the general BERT model (first model) as input for the special BERT (second model) to classify the clinical notes into total codes of ICD-10. Moreover, Baseline and Pipeline models applied the Focal loss function to eliminate the imbalanced classes. However, The Pipeline model demonstrates a significant performance by evaluating it over the F1 score, recall, precision, and accuracy metric, which are 92.5%, 84.9%, 91.8%, and 84.97%, respectively.
Subjects
  • ICD-10

  • Deep learning

  • BERT

  • Long short-term memor...

  • Convolutional neural ...

Views
10
Acquisition Date
Jun 1, 2023
View Details
google-scholar
Downloads
Explore by
  • Research Outputs
  • Researchers
  • Departments
Useful Links
  • Library
  • About us
  • Study
  • Careers
Contact

Email: library@sorbonne.ae

Phone: +971 (0) 2 656 9555/666

Website: https://www.sorbonne.ae/

Address: P.O. Box 38044, Abu Dhabi, U.A.E

Deposit your work

Email your work to: library@sorbonne.ae

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement