Www.erpublications.com



Issues In English to Indian Sign Generation and Translation model and developed Corpus based Translation System to tackle those issuesSyed Faraz Ali Computer Science and EngineeringSharda UniversityGreater Noida, Indiasyedifaraz@Abstract: -- Sign language is used by deaf and hard of hearing people throughout the world. The sign language used in India is Indian Sign language -ISL. This paper explores the application of data-driven sign generation model for Indian sign language (ISL). The provision of an Indian Sign Language generation system can facilitate communication between Deaf and hard of hearing people by translating information into the native and preferred language of the individual. We have developed an Indian Sign Language generation system by which animated signs can be displayed according to the inputted text. The proposed system enables to generate signs by inputting text even having no knowledge in sign language. There is a detailed explanation of our system, describing different modules of our system developed. This thesis also gives our approach for developing the ISL translation system.Introduction Human interaction is not possible if communication didn’t exist. For normal people there in no problem, they use proper spoken languages for communication but as we consider the physically disabled people like deaf and dumb there is a communication, these people cannot interact like normal people, and they need the help of sign language for their interaction. Sign language is the language which uses manual communication (physical body movements) and facial gestures to convey message and thoughts.The sign language is also used by the people who can hear but cannot physically speak. Wherever there are deaf people or community sign language exists. On the basis of the deaf population at one region sign language can be categorized in the following:Home sign language XE "Home sign language" \i : Where there is only one person in the family who is deaf or hard of hearing the language he uses to communicate with other family members is the home sign language.Village sign language XE "Village sign language" \i : Where there is more the one person deaf or hard of hearing people in the village the sign language used by these people to communicate is the village sign languageDeaf community sign language XE "Deaf community sign language" \i : When deaf people from different places come together to formulate a standardized signs for communicating this type of language created is the deaf community sign language as the deaf community have developed it.Basically Signs of Sign Languages consists of two featuresManual features : The manual communication involves the movement of hands and fingers for the communication. In this type of communication the signer tries to convey message using the movement of hands and fingers.Nonmanual features: This communication involves the facial expression and body gestures. The facial expression of the signer tells the listener what the signer is trying to say.We have made an effotrs to help these disabled people of india by developing a system that could help these deaf and dumb people for communicating with the normal world. This paper discusses our approach for developing English text to indian sign language translation and sign generation. This paper gives the overview of the previously proposed systems and the detailed structure and interface of our developed system. It also discusses the issues that came up while developing the system and different obstacles that have to be tackled for developing the perfect system.INDIAN SIGN LANGUAGEAll around the world there are different communities of deaf and dumb people and thus the language of these communities will be different. Just as there are many spoken languages in the world like English, French, Urdu, etc., similarly there are different sign languages and different expressions used by hearing disabled people worldwide. The Sign Language USA is American Sign Language XE "American Sign Language" (ASL XE "ASL" ); British Sign Language XE "British Sign Language" \i (BSL XE "BSL:British Sign Language" \i ) is used in Britain; and Indian Sign Language (ISL) is used in India for expressing thoughts and communicating with each other.The interactive systems are already developed for many sign language e.g. for ASL XE "ASL" and BSL XE "BSL:British Sign Language" \i etc. To help hearing impaired people in India to interact with others we are developing the system that will translate the English text to the text of Indian sign language which can further be represented in ISL XE "ISL" \i . Since it is difficult to generate signs for each verb/phrase in the vocabulary or dictionary, we will limit experiments in a domain, like we will try to develop the system for railways that will display the signs accordingly .We will take all the possible conversations from the railways enquiry/reservation counters and will then analyse and find the respective signs used in ISL.India is a large country with the population of 1,241,491,960 (Google Public Data) .In India there are 30 states and the languages used in most of the states there are their local language e.g. Kashmiri is spoken in Kashmiri, Punjabi is spoken in Punjab similarly there is slight difference in the sign language in different parts of India. Related work Since we are dealing with the translation model for indian sign language , we will be discussing the models proposed for indian sign language For spoken languages Machine Translation is a booming area of research and development. It can be inferred from the proliferation of different Machine Translation products for sale, such as Systran and Language Weaver , as well as freely available on-line Machine Translation tools such as AltaVista’s Babel Fish and Google Translate . The funding of large research projects such as the Global Autonomous Language Exploitation (GALE) project , the TCSTAR project and the most recent Centre for Science, Engineering and Technology (CSET) project on Next Generation Localisation further demonstrate the importance given to such areas of research in the Europe and the US . The same level of activity cannot be said for Sign Language Machine Translation, with little more than a dozen systems having tackled this area of translation. Most papers describe prototype systems that often focus primarily on Sign Language generation rather than applying Machine Translation techniques to these visual-gestural languages.According to Dorr et al [1] the machine translation systems can be grouped into three basic designs:DirectTransferInter-lingual In direct there is word to word conversion, none of the other aspects of the sentences are taken into consideration. This means that the transfer rules that perform this type of conversation fully depend on the source language. The transfer systems analyse the input text to syntactic and semantic level, here the transfer rules that perform this type of conversation is dependent on both source and target language. And for the last interlingual architecture the analysis of the source language text should result in the representation of the text that is independent of the source language. The systems are categorised on two basisRULE BASED APPROACHESThe rule based approaches came into existence in 1976 and gained their position in the research field. Rule-based approaches may be sub-classified into transfer and inter-lingua based methodologies in transfer we know the syntactic and semantic analysis takes place and then the translation takes place. The interlingua is the top level phase in the machine translation pyramid as seen in Dorr et al [1] pyramid. In a transfer approach, analysis of the source language input sentence is usually shallow (when compared with interlingual approaches rather than a direct methodology) and on a syntactic level, often producing constituent structure-based parse trees. Interlingual approaches tend to enact a deeper analysis of the source language sentence that creates structures of a more semantic nature. The transfer systems analyse the input text to syntactic and semantic level, here the transfer rules that perform this type of conversation is dependent on both source and target language. And for the last interlingual architecture the analysis of the source language text should result in the representation of the text that is independent of the source language. Summarising some of the rule based systems as under:Purushottam Kar et al [2] in their work have developed a system named INGIT . It is a cross-model translation system from Hindi strings to Indian Sign Language for possible use in the Indian Railways reservation counters. The system translates input from the reservation clerk into Indian Sign Language, which can be then displayed to the ISL user. They have used Fluid Construction Grammar (FCG) [3] , for constructing the grammar for Sign language. In this the domain-specific construction grammar for Hindi is implemented in FCG. This grammar converts the input into a thin semantic structure which is an input to ellipsis resolution, after which a saturated semantic structure is obtained. Depending on the type of utterance (statement, query, negation, etc.) a suitable ISL-tag structure is generated by the ISL generator. This is then passed to a HamNoSys [4] [5] converter to generate the graphical simulation.For validating the system, they collected small corpus on six different days. This corpus was based on interaction with speaking clients at a computer reservation counter. They after evaluation found the interaction constituted 230 words, of which many were repeated. The vocabulary of 90 words included 10 verbs in various morphological forms (e.g. work, worked, working etc.), 9 words related to time, 12 words specific to the domain (e.g. ticket, tatkal, etc.), Other words were numerals (15), names of months (12), cities (4) and trains (4) as well as digits particles etc. The INGIT system has three main modules:Input parserEllipsis Resolution ModuleISL Generator (including ISL lexicon with HamNoSys [4] [5] phonetic descriptions)Their system cannot show the non-manual features like facial expressions, gestures, etc. Their system has a restricted domain i.e. it is only applicable for railway systems. The vocabulary of sign language will be very small.DATA BASED APPROACHThis is also known as the corpus based approach or example based approach. In this there takes direct mapping as in the last level of the pyramid of machine translation in Dorr . Data-driven approaches came into existence in the 1990s and now dominate the research field. This approach, often termed ‘corpus-based’, can be sub-divided into statistical Machine translation and example–based machine translation. Compared to rule-based approaches, there are fundamental differences in both data-driven processes yet they remain inherently similar. In general, linguistic information and rules are eschewed in favour of probabilistic models collected from a large parallel corpus.In the data based approaches the dataset is generated that is huge and the direct mapping between the words takes place. This approach is booming as there is no dictionary for the sign languages this approach might help to bring up one , and if the corpus collected for all the languages is taken into consideration might help to build the standardised sign language in future.The systems fro this technique has not been developed till now for indian sign language, we have taken this approach for the development of our system. Issues For Sign Language TranslationIn India for Indian Sign Language the only one system has been developed i.e. INGIT. For many different countries there is work going on sign language to help the deaf and dumb people of their country .So to help the deaf and dumb people of our country I am taking an initiative towards building this system . It will help these people that have been off-track from present fast growing world to communicate with us. As mentioned above India is a very large country which is 2nd largest in population. Thus in proportion to population it can be predicted that it might have the largest number of the deaf and dumb people. So for these people we are making effort to develop this system. Some of the issues for developing the system are described here.LOCAL VARIATION XE "Local variation" \i In this as we are aware that the India is a large country with the population of 1,241,491,960 (Google Public Data). In India there are 30 states and the languages used in most of the states there are their local language e.g. Kashmiri is spoken in Kashmiri, Punjabi is spoken in Punjab similarly there is slight difference in the sign language in different parts of India. This shows there is a variation in the languages as we move from one state to another, not only this, in some states there is variation in languages within e.g. in Jammu and Kashmir the Kashmiri is spoken in Kashmir and in Jammu ‘Dogri’ is spoken this variation is not only in the spoken languages, it can also be seen in the sign languages. Sign language just like spoken languages varies from place to place. There are three categories in which these languages can be categorised. This variation creates a barrier for making the efficient system for translation. Also there is no standard for Indian sign language; this is a very important issue to look upon because without any standard language for Indian sign language it will be difficult for us to design the system. This is because for one single word we might have different signs and there will be confusion for this situation.To tackle this problem we will use the mostly used sign for the word to be translated and then use the same sign to depict the translation of the given word.DATA XE "Data" \i Data for Indian sign language is very less. The lack of a standardised writing system contributes to the limited availability of SL XE "SL" \i data both in terms of desired quantity and quality for use in a data-driven Indian Sign Language Machine Translation system. Finding a corpus to suit the data needs of an SL MT system is a difficult task. In spoken language we have a dictionary which is written and is very efficient but in sign language it is not like that. Making a dictionary type thing for sign language is very difficult task I, it is time consuming and very expensive.In the spoken language there is proper standard and it could be written which makes it very cost effective and very easy to make a dataset of dictionary but in sign language it is not like that. Reason being firstly the notation for sign language is not universally standardised and thus could not cater all the signs of different sign languages. The local and national variation limits this to some extent. Secondly as it is not written it needs visual and that is provided by the videos. The collection of these videos is time consuming and expensive, also it requires a lot of space for its storage which is not the case with the spoken languages. There are some very good organizations online which have made marvellous efforts for building the dataset for the signs of the Indian sign language. These have taken the videos of different signs of different words uploaded on their database. There we can also see that the amount of data for which we have sign is much less than the vocabulary of the proper spoken language likes English. Thus it creates a barrier for the translation of English to the Indian sign language translation because for some English words there might be no signs. To counter this problem we can make the use of the synonym of that particular word. This again will take lot of effort and time.Thus building the dataset for the Indian sign language is very tedious task and needs a lot of time and space. Zeshan [6] has done morphological analysis on the Indo-Pak sign languages and developed a HamNoSys notation for Indo-Pak sign language, which could be used for the development of the system. She has given an idea how the Indo-Pak sign languages are similar and used the survey to develop a language for writing Indo-Pak sign language in Hamnosys notation. This dataset gives us the potential for building the data-driven machine translation systems for sign language to English translation system.LACK OF GRAMMAR XE "Lack of Grammar" \i Syed Faraz Ali et al [7] on visit to various deaf and dumb schools found that there is no standardised grammar for the Indian sign language. This in turn created a hindrance for developing the rule based system for the Indian sign language translation system. Since there was no particular standardised grammar no semantic and syntactic transfer is possible. Thus in order to develop the rule based system for Indian sign language translation the grammar for the system is to be built first then this grammar is then first to be verified weather it takes all the signs correctly then the rules can be applied. This needs the help of big organizations for standardizing the grammar rules. This is the major drawback for rule based approach systems for the Indian sign language translation model .The lack of grammar causes lot of limitations for the translation model. According to the translation pyramid the semantic and syntactic transfer is not possible if there is no particular grammar. As compared to BSL XE "BSL:British Sign Language" \i (British Sign Language XE "British Sign Language" \i ) and ASL XE "ASL" (American Sign Language XE "American Sign Language" ) were there is particular grammar for different sign languages Indian sign language falls back and the rule based approach for this system will not be feasible. So, the best way for translation is the direct mapping methodSIGNING HOMONYMS AND HOMOGRAPHSHomonyms are the words that have the same spelling as well as same pronunciation but have different meanings, homographs are the words with same spelling but different meaning . While for signing these words will have different signs but when used for translation there will be an issue because While generating signs for words like ‘May The month may has different sign while as the verb may has different sign so if the sign for may has to be displayed the computer has to see which may it is it the month or the verb thus a certain type of intelligence has to be integrated for such situations.The approach we could use is like such words need to be finger spelled instead of signing the word. For this the all the possible homonyms have to be figured out and then these words need to be finger spelled as required. SINGLE SIGN FOR TWO LETTER WORDSSince the sign is being made for words when the two letter word appears as the splitting is done on the basis of ‘space’ between two words the problem and the system that is translating that word will recognise each word individually instead of two letter words. Eg : The letter” good morning” For this the system will recognise good as an individual word and morning as other word but for this there is usually a particular sign.For this we will have to make a dataset of the two letter words that could occur and before the single word translation the two letter word translation operation could run first and then the single word translationDeveloped SystemArchitectureThe system for the Indian sign Language translation interface has been developed where the system accepts the input text and then translates the given words in sequence by making an avatar to display signs of each word. The translation here is corpus based. There is direct mapping between the English and ISL XE "ISL" \i text. Since it is very inefficient to make signs for each word our domain is bounded by certain criteria for which the translator translates the given text. The system which we propose is for railway reservation counters for enquiry.The architecture of the system for the indian sign language is shown in (Figure 1: System architecture). It consists of the following modules. Figure SEQ Figure \* ARABIC 1: System ArchitectureInput Module :It is in the form of text box which takes the text or sentence for the translation as an input. It takes all words weather scrambled or a letter. Tokenizer: It splits the English text or sentence entered into the input module to the individual words.Resource: It contains the respective ISL signs for the English words. Since the domain is specified for railway enquiry so it will contain the signs of the different words that will be used for the enquiry at railway reservation counter. If the sign for the entered word has no sign representation the synonym of that word is used to represent that word in ISL.Translator: It checks for the sign in the resource for the respective word entered for translation and helps accumulator to filter the entered textAccumulator: It filters the words to be translated by ignoring the words for which there is not respective sign in the resource and then accumulates the words in the sequence they were entered in the input module.Display:It is the 3D character that displays the sign for the respective word by hand movements. The Indian Sign Language has no grammar as there is for other languages so rule based system is not feasible for the translation, there is no syntax to compare the sentences. Thus the system will perform the translation by direct word to word mapping and there is no checking for tenses. The system will translate the entered text if the entered words are present in the resource directory.SYSTEM INTERFACEThe interface of this system is given in the (Figure 2: System Interface ). AvatarTranslate ButtonText BoxAvatarTranslate ButtonText BoxFigure SEQ Figure \* ARABIC 2: System InterfaceThe avatar shown in this figure translates the text entered in the given textbox. There is also a translate button below the text box, the steps to translate the given sentence into the Indian Sign Language are:Step1: Enter the text into the text boxStep2: click the translate ButtonThe example for word “hello” Sign is shown in Figure 3: Avatar displaying “hello” Sign in Indian sign Language.Figure 3 SEQ Figure \* ARABIC : Avatar displaying the “Hello” in Indian Sign languageFigure 3 SEQ Figure \* ARABIC : Avatar displaying the “Hello” in Indian Sign languageThe Avatar will show the signs of the signs of the words entered in their respective order in which they were enteredFEATURES XE "Features" \i The features of the system are discussed below. The system is:The system is Domain BoundedThe system has Sequential DisplayThe system is Animation BasedDomain Bounded XE "Domain Bounded:Features" \i The system developed is for the railway reservation counter. Approximately 100 signs have been developed for the system. If we have to make signs for a complete translation model the task will become very hectic, as we know the vocabulary of the English dictionary is so vast that making sign for each word will not be a feasible task. Thus if we limit our domain to a specific scenario, it might help us as the words for which the sign is to be generated will now have certain limitation.For this process we have taken into consideration following questions which could come up at the railway reservation counter, following are the questions which the general quires for a person for railway communication These conversations were collected in 3 days from old Delhi reservation counter (Signs for Some of the words from conversations have been generated) :Sequential display XE "Sequential display:Features" \i In this as discussed above the system is designed in the way that the signer that is the avatar will perform or display the signs of the text entered in the order in which it is inputted in the text box for translation. The example for this can be inferred from the given Figure 4: Avatar displaying The sign for “What”, Figure 5: Avatar displaying the sign for “Time” after what in sequence, Figure 6: Avatar Displaying Sign For “your” after “time” in sequence and Figure 7: Avatar Displaying Sign for “Train” after “your” where the entered text is translated into the Indian Sign language XE "Indian Sign language" \i in sequence in which the text was entered. Figure 4: Avatar displaying The sign for “What”Figure 4: Avatar displaying The sign for “What” Figure 5: Avatar displaying the sign for “Time” after what in sequence.Figure 5: Avatar displaying the sign for “Time” after what in sequence.Figure 6: Avatar Displaying Sign For “your” after “time” in sequenceFigure 6: Avatar Displaying Sign For “your” after “time” in sequence Figure 7: Avatar Displaying Sign for “Train” after “your” Figure 7: Avatar Displaying Sign for “Train” after “your” The entered text here is “What Time is your train”. After inputting the text into text box and clicking the “Translate” button. The avatar will start showing the sign sequentially , here the word ‘ what’ is shown first as shown in the Figure 4 , then the sign for “Time” is displayed followed by the signs for “your” and “Train” respectively as shown in the Figure 5, Figure 6, Figure 7.Animation Based XE "Animation Based:Features" \i The system developed is animation based that is the avatar will be showing the signs for given text in the text box in the Indian sign language. Avatar is a personalized graphic file or rendering that represents a computer user. This is also known as computer-generated imagery. The system consists of an avatar, created using 3ds max for animation and sign generation. The avatar responds to the text inputted in the text box on clicking the translate button and performs the respective animation which is stored in the database of our system. The avatar character just like real human beings consists of bones and joints and the same thing moves the joints just like real human beings, they are made to look as realistic as possible. Future WorkThe application of this system can be elevated by changing the dataset i.e. by developing the data of signing avatar having the different specified domain like hospital counter etc. thus by changing our translation mode domain this system can be used at different scenarios. This requires only the change in the data of the signing avatar. The system will also need some changes in the coding portion but will be helpful for the deaf community in the long run. Similarly, we can change our domain to air ticketing counter for helping the interaction of the deaf and dumb people with the airline services.Refrences[1] P. W. J. W. Bonnie J.Dorr, "A Survey of Current Paradigms in Machine Translation," Advances in Computers, vol. 49, pp. 1-68, 1999. [2] M. R. A. M. ,. A. M. R. Purushottam Kar, "INGIT: Limited Domain Formulaic Translation from Hindi Strings to Indian Sign Language," in Multilingual Europe Technology Alliance (META), Hyderabad, Jan 4, 2007 - Jan 6, 2007. [3] J. D. B. Luc Steels, "Unify and Merge in Fluid Construction Grammar," in Symbol Grounding and Beyond: Proceedings of the Third International Workshop on the Emergence and Evolution of Linguistic Communication, 2006. [4] T. Hanke, "HamNoSys – Representing Sign Language Data in Language," in Workshop on the Representation and Processing of Sign Languages, Lisbon, 2004. [5] R. L. H. Z. T. H. a. J. H. Siegmund Prillwitz, "HamNoSys Version 2.0: Hamburg Notation System for Sign Languages: An Introductory Guide," in Proceedings of International Studies on Sign Language and Communication of the Deaf, Hamburg, Germany, 1989. [6] U. Zeshan, SIGN LANGUAGE IN INDO-PAKISTAN, Amsterdam: John Benjamins Publishing Company, 2000. [7] G. S. M. A. K. S. Syed Faraz Ali, "Domain Bounded English to Indian Sign Language Translation Model," in International Conference on Electrical Engineering and Computer Science (ICEECS), Coimbatore, 2013. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download