R&D in Speech and Language Processing in KIIT
Over the past three decades digital signal processing has emerged as a recognized discipline. Much of the impetus for this advance stems from research in representation, coding, transmission, storage and reproduction of speech and image information. In particular, interest in voice communication has stimulated central contributions to digital filtering and discrete-time spectral transforms.
This dynamic development was built upon the convergence of three then-evolving technologies:
(i) sampled-data theory and representation of information signals( which led directly to digital telecommunication that provides signal quality independent of transmission distance);
(ii) electronic binary computation( aided in early implementation by pulse-circuit techniques from radar design); and,
(iii) invention of solid state devices for exquisite control of electronic current( transistors – which now, through microelectronic materials, scale to systems of enormous size and complexity). This timely convergence was soon followed by optical fiber methods for broadband information transport.
These advances impact an important aspect of human activity- information exchange. And, over man’s existence, speech has played a principal role in human communication. Now, speech is playing an increasing role in human interaction with complex information systems. Automatic services of great variety exploit the comfort of voice exchange, and, in the corporate sector, sophisticated audio/ video teleconferencing is reducing the necessity of expensive, time-consuming business travel. In each instance, an overarching target is a user environment that captures some of the naturalness and spatial realism of face-to-face communication. Again, speech is a core element, and new understanding from diverse research sectors can be brought to bear. KIIT Research Lab, unlike in any corporation, is spread across the entire organization and is not limited just to a few pockets of excellence.
Departments like DRDO and DIT providing fellowship to our R&D Lab for a fee to deliever research output. On an average, KIIT’s total revenue from these funded activities is estimated at Rs. annually. Based on the nature of research, the faculty involved gets a share as a percentage of the income.
Low cost, lower cost and lowest cost is the very essence of innovation that researchers work on day in and day out. KIIT, Gurgaon acts as a catalyst via its Research & Development Lab in the field of Speech Processing.Research here is usually honorary, meaning neither the faculty nor the institution make any money from it. We devoted resources to speech and language processing, and employed Research Scholars, giving them room to explore natural language and speech recognition.
Today, KIIT employs fellows directly working on speech engines and platforms. Our team of members has grown into the incubation Lab, where natural language and speech findings can be applied to make our lives easier, better, more fun, more productive. The results approach is “Say it. Get it” speech recognition and synthesis capabilities in the field of Speech Processing by providing interdisciplinary and encouraging environment.
KIIT College of Engineering has established world level Research & Development Lab in which a lot of advance work has been done in several areas of Natural Language and Speech Processing related to Indian spoken languages. The work is going on in collaboration with Indian and Foreign Laboratories. The major areas of research include Automatic Speech Recognition, Speaker identification, Language Identification, Creation of Pronunciation Lexicons for Punjabi Language and Development of text corpus and speech databases pertaining to Indian languages. The major achievements at KIIT Research Lab can be summarized as follows:
Development of Indian spoken languages Mobile databases
A text corpus of 2 million words of natural messages in 12 different domains in Hindi and Indian English and a speech corpus of 100 speakers, each speaking 630 phonetically rich sentences, has been created. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. This project was sponsored by Nokia Research Centre China.
Emotional Speech Database
Hindi database for the analysis of isolated words, ten Hindi digits (0-9)were recorded in all six emotions i.e. happy, fear, happy, sad, surprise and neutral, hence a large size corpus of 3000 utterances were created. We are in process for developing the Punjabi, Nepali and Indian English databases for DEIT and DRDO as per their requirements.
Automatic recognition of isolated words
Recognition of Isolated words using Neural Network and Dynamic Time Warping in MATLAB and PRATT tool has been done.
Automatic speaker verification and identification using mobile communication data
For this experiment Multilayer error back propagation Feed Forward Neural Network by Associative memory for speaker identification has been considered. In this, 70% of the samples were used to train the network and 15 % were used for validation and 15% for testing of the network. With this specification 20 neurons were used at the hidden layer. An HMM based Speaker Identification System has also been done using two channels (Mobile & Head Held) database of NATO words.
Recognition of emotions by Human and Machine
For this experiment six emotions i.e. neutral, happy, sad, fear, anger and surprise has been considered for recognition by human and machine (Neural Network). The acoustic prosodic features such as intensity, duration and intonation changes corresponding to each of these emotions were analyzed using PRAAT speech processing software tool. It has been observed that the performance of machine is better as compared with human.
Trigram Language Model has been developed using both MATLAB and JAVA platforms. Trigram model has been proved to be an effective way to differentiate between the two languages with the same script such as English and French specially for the web content searching. JAVA platform has provided an additional efficiency to the Trigram model. Trigram model for the language identification of Indian English is being developed.
Pronunciation Lexicon Specification
Under the guidance of CDAC Kolkata under the project “PLS Creation for Indian Languages” KIIT is developing the PLS for Punjabi language. Under this project KIIT is doing the Acoustic and Articulatory studies of different words in Punjabi language covering the Punjabi dictionary words.
Acoustic analysis of words at phoneme level
Acoustic correlation of emotions
Spectrum verification of vowel segments for Indian English, American and Chinese
Comparison of prosodic features.
Development of pronunciation lexicon and experimental study of phonetics and phonemics for Punjabi language.
Ability to meet Challenges
1. Natural Language Processing (NLP) is the new area where the major developments will be undertaken. To ensure that Indian Languages are on this new platform, exciting and new technologies are being developed. KIIT research group is focused on developing efficient algorithms to process texts and to make their information accessible to computer applications. The goal of the group is to design and build software that will analyze, understand, and reproduce speech that humans produce naturally, so that eventually a user will be able to communicate with the computer as though he/she is addressing another person.
2. This goal is not easy to reach, because there are 22 official languages and many scripts in India. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. Recognition and synthesis of these languages is very difficult. We are dealing with 3 Indian languages i.e., Punjabi, Nepali and Indian English, which is not an easy task. In spite of this, we find immense success in the analysis of these languages. At present, we are working with so many new technology software, such as- Cool Edit, Wave surfer, etc. And hence, day-by-day we are approaching towards the rising potential of Language and Speech Processing.