Ensemble learning-based prediction of COVID-19 positive patient groups determined by IL-6 levels and control individuals based on the proteomics data
Seyma Yasar, Zeynep Kucukakcali, Adem Doganer
Coronavirus disease (COVID-19) is a newly found coronavirus that causes an infectious disease. COVID-19, which has a detrimental impact on many people, has varied effects on different people. Therefore, proteomic analysis is an important approach used to develop early diagnosis and treatment strategies. This research to classify COVID-19 positive patient groups represented by interleukin 6 (IL-6) levels (low, medium, high) and control groups based on proteomic analysis using ensemble learning methods (Adaboost, Bagging, Stacking, and Voting). The public dataset from a website consists of 49 subjects (31 COVID-19 positives and 18 controls) and 493 proteins achieved from blood samples. The dataset was handled to estimate the relation between disease severity and proteins using ensemble learning approaches (Adaboost, Bagging, Stacking, and Voting) using ten-fold cross-validation. Predictions were evaluated with accuracy, sensitivity,etc. performance metrics. The accuracy of Adaboost (96.00%) was higher as compared to Voting (93.88%) and Bagging (91.84%). However, the Stacking ensemble learning method produced the highest accuracy (97.92%). IL6, SERPINA3, SERPING1, SERPINA1, and GSN were the five most important proteins associated with disease severity. In comparison to the other methods, the suggested ensemble learning model (Stacking) produced the best estimation of disease severity based on proteins. The results indicate that changes in blood protein levels correlated with the severity of COVID-19 may be benefited to follow early diagnosis/treatment of the COVID-19 disease.
Key words: Adaboost, bagging, COVID-19 severity, ensemble learning, stacking, voting