An Improved Deep Neural Network Algorithm for the Prediction of Limited Proteolysis in Native Protein
Keywords:Protease Nick Sites, Random Forest (RF), Support Vector Machine (SVM), Deep Neural Network (DNN), Hybrid model of Random Forest and Deep Neural Network (Hybrid RF DNN)
Protease is a proteolytic enzyme that hydrolyzes the amino acid where the cleavage only occurs at specific sites of the amino acid substrate. By discovering the nick site, the prediction on the function of proteases can be identified and enable humans to control the protein’s hydrolysis by their corresponding protease. This is an important process to control as it can help to control protein replication especially viral proteins. With the rise of computational methods in this era, mainly through the successful application of deep learning in various domains, the application of this method in biological data can help to improve predictions to support the experimental methods. Conventional techniques such as mass spectrometry and two-dimensional gel electrophoresis can be supported by computational methods by preparing predictions. Thus reducing the cost of experiment and time taken to identify and predict the protein proteolysis site. This study improves the deep learning algorithm by proposing the Hybrid model of Random Forest + Deep Neural Network (Hybrid RF+DNN) to classify proteolysis or nick sites. The classification in this study is compared with other machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), and Deep Neural Network (DNN). The proposed method enhances the classification results in identifying the positive and negative nick sites. The RF is a feature-selector that gathers the most important feature before entering the DNN classifier. This approach reduces the data dimensionality and speeds up the execution time of the training process. The performance of the models was measured by confusion matrix, specificity, sensitivity, etc. However, the proposed method is not the best performer among the mentioned classifiers as the classifiers have obtained 0.64, 0.65, and 0.58 for Datasets A, B, and C, respectively. The proposed method may become the best performer as the parameter tuning is done more precisely, even after the feature selection by the RF algorithm. Thus, the proposed method with the enhancement appears to be an alternative to the researcher discovering the limited proteolysis or nick site.