Normal view MARC view ISBD view

Native Language Identification from Spoken Indian English (Record no. 10846)

MARC details
000 -LEADER
fixed length control field	a
003 - CONTROL NUMBER IDENTIFIER
control field	OSt
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20200108141851.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	200108b xxu\|\|\|\|\| \|\|\|\| 00\| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency	AIKTC-KRRC
Transcribing agency	AIKTC-KRRC
100 ## - MAIN ENTRY--PERSONAL NAME
9 (RLIN)	11586
Author	Imani, Siddika
245 ## - TITLE STATEMENT
Title	Native Language Identification from Spoken Indian English
250 ## - EDITION STATEMENT
Volume, Issue number	Vol.9(2), May-Aug
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc.	New Delhi
Name of publisher, distributor, etc.	STM Journals
Year	2019
300 ## - PHYSICAL DESCRIPTION
Pagination	1-6p.
520 ## - SUMMARY, ETC.
Summary, etc.	Automatic speech recognition (ASR) systems that facilitate voice based search and information retrieval have gained importance recently. While the performance of ASR systems for Indian languages have improved in the recent past. They have yet to gain wide acceptability as much as the ASR systems for English spoken by Indians. Almost all Indians learn English as a second or third language. So, the phoneme set and the prosody of native language of Indians influences the acoustic characteristics of spoken English. Since Indians speak a wide variety of languages, the acoustic characteristics of English spoken by Indians vary a lot. Thus, the recognition accuracy of Indian English could be improved by employing native language dependent English ASR systems. This approach requires automatic identification of the native language of the speaker. Here, we report the performance of an automatic Native Language Identification (NLI) system that recognises the native language of the speaker as Assamese or Bengali or Bodo after analysis of an English sentence spoken by the speaker. Training and performance evaluation of a NLI system needs appropriate linguistic resources. These include (a) speech data, in each of the 3 languages from several speakers, (b) corresponding word level transcriptions and (c) a pronunciation dictionary. While pronunciation dictionaries for English language are freely available, spoken English by speakers of the above-mentioned three languages and transcriptions are not publicly available. So, we created a relevant speech database. We recorded English spoken by native speakers, both male and female, of these three scheduled languages. Each speaker read 100 sentences out of a set of 700 English sentences; these were either proverbs or digit sequences. Each sentence contained 5 to 8 words. The digitised speech, recorded under ambient conditions using a laptop, had the following characteristics: 16000 Hz, 16 bit, mono. The database contains spoken English from 35 native Assamese speakers, 33 Bengali and 30 Bodo speakers. In order to carry out a threefold evaluation of the performance of the system, the speakers from each language were grouped into 3 subsets such that each subset contains nearly equal number of speakers. In each fold, one subset was designated as test data, and the remaining two subsets were used to train the system. We used Kaldi, an open source ASR toolkit, for implementation of the NLI system. As a first step in the development of NLI system, we implemented three English ASR systems, each trained using training data from one of the three languages: Assamese, Bengali and Bodo. A three-state Hidden Markov Model (HMM) represented a phone. Each state of HMM was associated with a Gaussian mixture model. We used Mel frequency cepstral coefficients and their temporal derivatives as features, and bigram as the language model. In order to identify the native language of a speaker, the test speech file was fed to each of the three ASR systems. An ASR system not only generates the decoded word sequence, but also the corresponding log likelihood. The NLI system follows the maximum likelihood criterion. The language corresponding to the ASR system that yielded the highest likelihood for the test speech was declared as the native language of the speaker. The overall accuracy of the NLI system was computed as the unweighted average recall, computed from the confusion matrix. The NLI accuracy of the system, averaged over threefold cross evaluations, was 59% for test speech of just 3 seconds. The confusion was largest among Assamese and Bengali languages as both are close members of Indo-Aryan language family. In contrast, Bodo belongs to the Sino-Tibetan language family. We discuss the performance of the NLI system using different models such as context-dependent and context independent HMMs, employing Gaussian mixture model or deep neural network to estimate the likelihood of a feature vector emitted from a state of HMM.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN)	4623
Topical term or geographic name entry element	Electrical Engineering
700 ## - ADDED ENTRY--PERSONAL NAME
9 (RLIN)	11587
Co-Author	Sarma, Parismita
773 0# - HOST ITEM ENTRY
Place, publisher, and date of publication	Noida STM Journals
International Standard Serial Number	2321-4260
Title	Trends in electrical engineering (TEE)
856 ## - ELECTRONIC LOCATION AND ACCESS
URL	http://engineeringjournals.stmjournals.in/index.php/TEE/article/view/3253
Link text	Click here
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Articles Abstract Database

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Shelving location	Date acquired	Total Checkouts	Barcode	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			School of Engineering & Technology	School of Engineering & Technology	Archieval Section	08/01/2020		2020629	08/01/2020	08/01/2020	Articles Abstract Database