Abu Zitar, Raed
Comparative Study on Arabic Text Classification: Challenges and Opportunities
2023, Abu Zitar, Raed, Abualigah, Laith, Oliva, Diego, Hussien, Abdelazim G., Melhem, Mohammed K. Bani
There have been great improvements in web technology over the past years which heavily loaded the Internet with various digital contents of different fields. This made finding certain text classification algorithms that fit a specific language or a set of languages a difficult task for researchers. Text Classification or categorization is the practice of allocating a given text document to one or more predefined labels or categories, it aims to obtain valuable information from unstructured text documents. This paper presents a comparative study based on a list of chosen published papers that focus on improving Arabic text classifications, to highlight the given models and the used classifiers besides discussing the faced challenges in these types of researches, then this paper proposes the expected research opportunities in the field of text classification research. Based on the reviewed researches, SVM and Naive Bayes were the most widely used classifiers for Arabic text classification, while more effort is needed to develop and to implement flexible Arabic text classification methods and classifiers.
Review on COVID-19 diagnosis models based on machine learning and deep learning approaches
2022, Zitar, Raed
COVID-19 is the disease evoked by a new breed of coronavirus called the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Recently, COVID-19 has become a pandemic by infecting more than 152 million people in over 216 countries and territories. The exponential increase in the number of infections has rendered traditional diagnosis techniques inefficient. Therefore, many researchers have developed several intelligent techniques, such as deep learning (DL) and machine learning (ML), which can assist the healthcare sector in providing quick and precise COVID-19 diagnosis. Therefore, this paper provides a comprehensive review of the most recent DL and ML techniques for COVID-19 diagnosis. The studies are published from December 2019 until April 2021. In general, this paper includes more than 200 studies that have been carefully selected from several publishers, such as IEEE, Springer and Elsevier. We classify the research tracks into two categories: DL and ML and present COVID-19 public datasets established and extracted from different countries. The measures used to evaluate diagnosis methods are comparatively analysed and proper discussion is provided. In conclusion, for COVID-19 diagnosing and outbreak prediction, SVM is the most widely used machine learning mechanism, and CNN is the most widely used deep learning mechanism. Accuracy, sensitivity, and specificity are the most widely used measurements in previous studies. Finally, this review paper will guide the research community on the upcoming development of machine learning for COVID-19 and inspire their works for future development. This review paper will guide the research community on the upcoming development of ML and DL for COVID-19 and inspire their works for future development.
Improving clinical documentation: automatic inference of ICD-10 codes from patient notes using BERT model
2023, Abu Zitar, Raed, Al-Bashabsheh, Emran, Alaiad, Ahmad, Al-Ayyoub, Mahmoud, Beni-Yonis, Othman, Abualigah, Laith
Electronic health records provide a vast amount of text health data written by physicians as patient clinical notes. The world health organization released the international classification of diseases version 10 (ICD-10) system to monitor and analyze clinical notes. ICD-10 is system physicians and other healthcare providers use to classify and code all diagnoses and symptom records in conjunction with hospital care. Therefore, the data can be easily stored, retrieved, and analyzed for decision-making. In order to address the problem, this paper introduces a system to classify the clinical notes to ICD-10 codes. This paper examines 7541 clinical notes collected from a health institute in Jordan and annotated by ICD-10’s coders. In addition, the research uses another outsource dataset to augment the actual dataset. The research presented many approaches, such as the baseline and pipeline models. The Baseline model employed several methods like Word2vec embedding for representing the text. The model structure also involves long-short-term memory a convolutional neural network, and two fully-connected layers. The second Pipeline approach adopts the transformer model, such as Bidirectional Encoder Representations from Transformers (BERT), which is pre-trained on a similar health domain. The Pipeline model builds on two BERT models. The first model classifies the category codes representing the first three characters of ICD-10. The second BERT model uses the outputs from the general BERT model (first model) as input for the special BERT (second model) to classify the clinical notes into total codes of ICD-10. Moreover, Baseline and Pipeline models applied the Focal loss function to eliminate the imbalanced classes. However, The Pipeline model demonstrates a significant performance by evaluating it over the F1 score, recall, precision, and accuracy metric, which are 92.5%, 84.9%, 91.8%, and 84.97%, respectively.
Salak Image Classification Method Based Deep Learning Technique Using Two Transfer Learning Models
2023, Abu Zitar, Raed, Theng, Lau Wei, San, Moo Mei, Cheng, Ong Zhi, Shen, Wong Wei, Sumari, Putra, Abualigah, Laith, Izci, Davut, Jamei, Mehdi, Al-Zu’bi, Shadi
Salak is one of the fruits plants in Southeast Asia; there are at least 30 cultivars of salak. The size, shape, skin color, sweetness or even flesh color will be different depending on the cultivar. Thus, classification of salak based on their cultivar become a daily job for the fruit farmers. There are many techniques that can be used for fruit classification using computer vision technology. Deep learning is the most promising algorithm compared to another Machine Learning (ML) algorithm. This paper presents an image classification method on 4 types of salak (salak pondoh, salak gading, salak sideempuan and salak affinis) using a Convolutional Neural Network (CNN), VGG16 and ResNet50. The dataset consists of 1000 images which having 250 of images for each type of salak. Pre-processing on the dataset is required to standardize the dataset by resizing the image into 224 * 224 pixels, convert into jpg format and augmentation. Based on the accuracy result from the model, the best model for the salak classification is ResNet50 which gave an accuracy of 84% followed by VGG16 that gave an accuracy of 77% and CNN which gave 31%.
Markisa/Passion Fruit Image Classification Based Improved Deep Learning Approach Using Transfer Learning
2023, Abu Zitar, Raed, Abdo, Ahmed, Hong, Chin Jun, Kuan, Lee Meng, Pauzi, Maisarah Mohamed, Sumari, Putra, Abualigah, Laith
Fruit recognition becomes more and more important in the agricultural industry. Traditionally, we need to manually identify and label all the fruits in the production line, which is labor intensive, error-prone, and ineffective. Therefore, a lot of fruit recognition systems are created to automate the process, but fruit recognition system for Malaysia local fruit is limited. Thus, this project will focus on classifying one of the Malaysia local fruits which is markisa/passion fruit. We proposed two CNN models for markisa classification. The performances of the proposed models are evaluated on our own dataset collection and produces an accuracy of 97% and 65% respectively. The results indicated that the architecture of CNN model is very important because different architecture can produce different results. Therefore, first CNN model is selected because it can classify 4 types of markisa with a higher accuracy. In the proposed work, we also inspected two transfer learning methods in the classification of markisa which are VGG-16 and InceptionV3. The results showed that the performance of the first proposed CNN model outperforms VGG-16 (95% accuracy) and InceptionV3 (65% accuracy).