Normal view MARC view ISBD view

Automatic Language Identification using Basic Signal Class

By: Panchanan, Supaarna .

Contributor(s): Arup, Saha .

Publisher: New Delhi STM Journals 2019Edition: Vol.9(2), May-Aug.Description: 12-19p.Subject(s): Electrical Engineering

Online resources: Click here In: Trends in electrical engineering (TEE)Summary: Automatic language identification (ASLID) is a problem of identifying an unknown language from spoken utterance by a computer. A segmental approach to ASLID based on the assumption that the acoustic structure of languages can be estimated by segmenting speech into three basic classes of speech signals. This paper presents a procedure of ASLID with the details of methodology and the results without recognizing the words, but the lengths segments of three basic classes of signals namely, quasi-periodic (free voice vowels, obstructed voice, e.g., murmurs, laterals), quasi-random (noise segments, sibilants, frictions in affricates) and quiescent (plosives and affricates, silent periods, occlusions as well as silences caused by breath pause). The quasi-periodic class is again classified as fully voiced signals, obstructed vocalic signals. The classifier uses features from these four classes, which are extracted with more than 98.6% accuracy. The study is conducted with standard dialects of sixteen spoken languages namely Assamese, Bengali, Hindi, Marathi, Gujarati, Panjabi, Urdu, Malayalam, Odia, Konkani, Maithili, Kannada, Manipuri, Nepali and Telugu. The sixteen languages have been chosen in such a manner so that it covers all most all the states of India. The corpus mainly contains the spontaneous speech in conversational mode on various topic, viz. agriculture, social welfare, personal interview, etc. spoken by both sexes. The database consists of more than 30 minutes of spoken data for each of these dialects. The corpus has been collected from the regional radio broadcast. It is expected that the relative abundance of the aforesaid signal classes is different for different languages. Hence, a unique pattern is expected to be observed across the languages. Hence, the collected database is evaluated with Relative Abundance Model (RAM) using weighted Euclidean distance classifier. Here, we are proposing a model which explores the spoken data using time domain parameter. The uniqueness of this model is that it does not use any normally used linguistic information. It is observed that variation of segmental duration of the aforesaid signal types is present in different languages. Exploiting the above phenomenon RAM has been developed. With these sixteen languages of three language families, viz. Indo-Aryan, Dravidian and Tibeto-Burman the recognition rate of 70% has been achieved.

Tags from this library: No tags from this library for this title. Log in to add tags.

average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes
Comments ( 0 )
Images

Item type	Current location	Call number	Status	Date due	Barcode	Item holds
Articles Abstract Database	School of Engineering & Technology Archieval Section		Not for loan		2020631

Total holds: 0

Automatic language identification (ASLID) is a problem of identifying an unknown language from spoken utterance by a computer. A segmental approach to ASLID based on the assumption that the acoustic structure of languages can be estimated by segmenting speech into three basic classes of speech signals. This paper presents a procedure of ASLID with the details of methodology and the results without recognizing the words, but the lengths segments of three basic classes of signals namely, quasi-periodic (free voice vowels, obstructed voice, e.g., murmurs, laterals), quasi-random (noise segments, sibilants, frictions in affricates) and quiescent (plosives and affricates, silent periods, occlusions as well as silences caused by breath pause). The quasi-periodic class is again classified as fully voiced signals, obstructed vocalic signals. The classifier uses features from these four classes, which are extracted with more than 98.6% accuracy. The study is conducted with standard dialects of sixteen spoken languages namely Assamese, Bengali, Hindi, Marathi, Gujarati, Panjabi, Urdu, Malayalam, Odia, Konkani, Maithili, Kannada, Manipuri, Nepali and Telugu. The sixteen languages have been chosen in such a manner so that it covers all most all the states of India. The corpus mainly contains the spontaneous speech in conversational mode on various topic, viz. agriculture, social welfare, personal interview, etc. spoken by both sexes. The database consists of more than 30 minutes of spoken data for each of these dialects. The corpus has been collected from the regional radio broadcast. It is expected that the relative abundance of the aforesaid signal classes is different for different languages. Hence, a unique pattern is expected to be observed across the languages. Hence, the collected database is evaluated with Relative Abundance Model (RAM) using weighted Euclidean distance classifier. Here, we are proposing a model which explores the spoken data using time domain parameter. The uniqueness of this model is that it does not use any normally used linguistic information. It is observed that variation of segmental duration of the aforesaid signal types is present in different languages. Exploiting the above phenomenon RAM has been developed. With these sixteen languages of three language families, viz. Indo-Aryan, Dravidian and Tibeto-Burman the recognition rate of 70% has been achieved.