IPHOSS(DEEP)-PSEAAC: IDENTIFICATION OF PHOSPHOSERINE SITES IN PROTEINS USING DEEP LEARNING ON GENERAL PSEUDO AMINO ACID COMPOSITIONS

Abstract

Phosphoaspartate is one of the major components of eukaryotes and prokaryotic two-component signaling pathways, and it communicates the signal from the sensor of histidine kinase, through the response regulator, to the DNA alongside transcription features and initiates the transcription of correct response genes. Thus, the prediction of phosphoaspartate sites is critical, and its experimental identification can be expensive, time-consuming, and tedious. For this purpose, we propose iPhosD-PseAAC, a new computational model for predicting phosphoaspartate sites in a particular protein sequence using Chou’s 5-steps rues: (1) Benchmark dataset(2) The feature extraction techniques such as pseudo amino acid composition (PseAAC), statistical moments, and position relative features. (3) For the classification, artificial neural network AAN will be used. (4) In this step, 10-fold cross-validation and self-consistency testing will be used for validation. For self-consistency testing, 100% Acc is achieved, whereas, for 10-fold crossvalidation 95.14% Acc, 95.58% Sn, 94.70% Sp and 0.95 MCC are observed. (5). The final step is the development of a user-friendly web server for the ease of users. Thus, the iPhosD-PseAAC is the first and novel predictor for accurate and efficient identification of phosphoaspartate sites

Existing System

? The feature vector was formulated from both positive and negative samples (Cheng et al., 2018b). In 2018, Khan et al. (2018) proposed a strategy named iPhosTPseAAC for an expectation of phosphothreonine destinations utilizing PSeAAC, measurable minutes, and different position relative highlights. ANN was utilized for classification while testing was performed by 10-overlay Cross-Validation and Jackknife testing. ? Formulation of the proposed predictor based on the 5-step rule brings a large dividend. It renders the model clarity of rationale, sets a benchmark for improvements, and makes it easily accessible to the wide-spread scientific community.

Disadvantages

? Commonly, the experimentally proven datasets are used for model prediction; sometimes for testing, we do not have an experimentally proven dataset to test the model against the actual available data. By chance, if the data is available, it might be possible that data are not sufficient to test the accuracy of the predicting model. Only those samples could be incorporated into the model which exists in nature as for such a biological problem, it is not possible to build hypothetical datasets. To score the four metrics of Eq. what kind of testing should be done to meet sufficient accuracy reliability? Usually, the dataset is split into 3 partitions. One partition is used for training while another is used for testing, and the leftover partition is used for validation.

Proposed System

? In a comparative analysis, the results of iPhosD-PseAAC for the metrics are compared with already existing PTM site prediction models, i.e., iPhosT-PseAAC (Khan et al., 2018) and PhosphoSVM (Dou et al., 2014). Both the models iPhosT-PseAAC and PhosphoSVM are merely used benchmarks for comparison of accuracy metrics. Since no earlier model for identification phosphorylation sites of aspartic acid has been found in texts. Considering these benchmark values, the metrics yielded by iPhosD-PseAAC has higher values than iPhosT-PseAAC and PhosphoSVM for all Acc, Sp, Sn, and MCC. This indicates better prediction as compared to others

Advantages

? If an obvious dataset is not available to validate the model prediction, cross-validation is the best option to choose and to give the validation that the developed model is working fine. Herein, we performed 10-fold cross-validation and calculated accumulated accuracy by adding the accuracy of each fold. The average accuracy was 95.14%, as shown in Tab. 2 and Fig. 3. We also validate the prediction model using jackknife to verify the quality of iPhosD-PseAAC. For jackknife validation training, every instance of both the datasets is used for training and testing for unique output and received 94.46% of the prediction validation accuracy

Download DOC Download PPT