Bhat, Heena Farooq

PSSM Amino-Acid Composition Based Gene Identification Using Support Vector Machines - Vol 6 (1), Jan- Apr - New Dlhi STM Journals 2019 - 50-58p.

main characteristic of identifying the molecular mechanism of the cell is to understand the significance or function of each protein encoded in the genome. For that purpose, genome annotation proves to be very supportive. One of the most obligatory phases of genome annotation is the prediction of the genes. Several methods or techniques have been developed in order to locate or predict the patterns of genes in genome sequence. However, still the recognition of genes is found to be a very complicated problem. Recognizing the corresponding gene of a given protein sequence by means of conventional tools is error prone. Hence, the recognition of genes is a very demanding task. In this paper, we first concentrate on the problem of gene prediction and its challenges. We then present a new method for identifying genes. This new method follows a two-step procedure. Firstly, we present new features extracted from protein sequences and these features are derived from a position specific scoring matrix (PSSM). The PSSM profiles are converted into uniform numeric representation. Finally, the PSSM vectors are given as an input to SVM for classification purpose. This new method has been demonstrated on genome DNA set dataset. It is shown that the experimental results of new approach produces better results.


Computer Engineering