Open Access Open Access  Restricted Access Subscription or Fee Access

Automatic Language Identification using Basic Signal Class

Panchanan Suparna, Saha Arup, Datta Asoke Kr

Abstract


Automatic language identification (ASLID) is a problem of identifying an unknown language from spoken utterance by a computer. A segmental approach to ASLID based on the assumption that the acoustic structure of languages can be estimated by segmenting speech into three basic classes of speech signals. This paper presents a procedure of ASLID with the details of methodology and the results without recognizing the words, but the lengths segments of three basic classes of signals namely, quasi-periodic (free voice vowels, obstructed voice, e.g., murmurs, laterals), quasi-random (noise segments, sibilants, frictions in affricates) and quiescent (plosives and affricates, silent periods, occlusions as well as silences caused by breath pause). The quasi-periodic class is again classified as fully voiced signals, obstructed vocalic signals. The classifier uses features from these four classes, which are extracted with more than 98.6% accuracy. The study is conducted with standard dialects of sixteen spoken languages namely Assamese, Bengali, Hindi, Marathi, Gujarati, Panjabi, Urdu, Malayalam, Odia, Konkani, Maithili, Kannada, Manipuri, Nepali and Telugu. The sixteen languages have been chosen in such a manner so that it covers all most all the states of India. The corpus mainly contains the spontaneous speech in conversational mode on various topic, viz. agriculture, social welfare, personal interview, etc. spoken by both sexes. The database consists of more than 30 minutes of spoken data for each of these dialects. The corpus has been collected from the regional radio broadcast. It is expected that the relative abundance of the aforesaid signal classes is different for different languages. Hence, a unique pattern is expected to be observed across the languages. Hence, the collected database is evaluated with Relative Abundance Model (RAM) using weighted Euclidean distance classifier. Here, we are proposing a model which explores the spoken data using time domain parameter. The uniqueness of this model is that it does not use any normally used linguistic information. It is observed that variation of segmental duration of the aforesaid signal types is present in different languages. Exploiting the above phenomenon RAM has been developed. With these sixteen languages of three language families, viz. Indo-Aryan, Dravidian and Tibeto-Burman the recognition rate of 70% has been achieved.

 

Keywords: Automatic language identification, basic signal class, relative abundance model, speech, acoustic, phonetics, equal error rate, Euclidean distance classifier

 


Full Text:

PDF


DOI: https://doi.org/10.37591/.v9i2.3252

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Trends in Electrical Engineering

eISSN: 2249-4774