A Review on Machine Translation in Indian Languages

—Machine translation (MT) is considered an important task that can be used to attain information from documents written in different languages. In the current paper, we discuss different approaches of MT, the problems faced in MT in Indian languages, the limitations of some of the current existing MT systems


INTRODUCTION
Machine translation may be defined as the task of conversion of text from one language, called source language, to another language, called target language.There are two kinds of MT, metaphrase and paraphrase.In metaphrase, an exact word to word translation takes place but the translated text may or may not have the similar semantics as the source text.In paraphrase, translation is not performed at word level but at sentence level.Here, the semantics of source text are conserved while translating into translated text.MT is one application of natural language processing.MT involves five major goals: 1. Morphological analysis is the process of generation of all possible roots from word level information.
2. Part-of-speech tagging is the process of assigning part-ofspeech tags to every word in a given sentence.
3. Chunking is the process of identification of phrases such as noun phrase (NP), adjective phrase (JJP), verb phrase (VP) etc. in a given sentence.
4. Parsing is the process of generation of a parse tree with the help of the information obtained from part-of-speech tagging and chunking.
5. Word sense disambiguation is the process of identification of meaning of a word in a particular sentence when a given word has multiple meanings.

II. PROBLEMS FACED IN MT IN INDIAN LANGUAGES
Problems faced in MT in Indian languages include: 1. Indian languages are free word order languages.
2. They are morphologically and inflectionally rich languages.
3. Named entity recognition (NER) can be used to improve MT.But, NER in Indian languages is not an easy task since these languages do not provide capitalization information that helps in performing NER.
4. Many common nouns exist as proper nouns.So, these languages involve a large amount of semantic ambiguity.

5.
There is scarcity of resources pertaining to Indian languages on web.
Today, there are many available machine translators pertaining Indian languages but still these machine translators do not produce translations with very high accuracy.Consider the following Source text: "Jammu and Kashmir, India's one of the most picturesque state lies on the peaks of Himalayan Ranges with varying topography and culture.Jammu was the stronghold of Hindu Dogra kings and abounds with popular temples and secluded forest retreats.Kashmir's capital city, Srinagar offers delightful holidays on the lakes with their shikaras and houseboats".This source text in English is translated into Hindi using different machine translators.The translations are shown in Figure 1.It is undeniable that the translated texts obtained from existing machine translators are not of good quality.Some of the words in the translated text appear in English, and some of the words are transliterated instead of translated.So, there is a need to develop machine translator that can produce good translations.

III. APPROACHES OF MT
Various approaches of MT are depicted in Figure 2.These include the following: • Direct machine translation

• Rule based machine translation
• Corpus based machine translation, including statistical machine translation and example based machine translation.
The description, advantages and disadvantages of the different approaches are shown in Table I.Hybrid approach involves a combination of the above listed approaches.MT quality is expected to improve if hybrid approach is used to perform MT in Indian languages.II.
V. EVALUATION For evaluation of MT system, automatic evaluation metrics or human evaluation metrics may be used.Here, Precision is calculated by considering the number of matches between the two outputs divided by the total number system outputs.Recall is calculated by considering the number of matches between the two outputs divided by the total number of human outputs and F-Measure would be the combination of the two.Apart from these automatic evaluation metrics, BLEU, METEOR etc. can also be used for evaluation of MT output.-Good quality translation.

A. Automatic Evaluation Metrics
-Complex rules are needed to be constructed.
-It involves tedious tasks and is time consuming.

Corpus based MT
Rules are constructed by analysis of parallel corpus.

2) Bilingual Evaluation Understudy (BLEU)
Its value lies between 0 and 1.It indicates how close a machine translated text is to the expected translated text.Average of BLEU scores of all sentences is taken to get the whole corpus overall score.

3) NIST
Apart from calculating n-gram precision, it also assigns weights to n-gram.A low weight is assigned if n-gram matches exactly with the expected translation otherwise high weight is assigned.

4) Word Error Rate (WER)
Estimates the number of tokens that differ between machine translated text and expected translated text.

5) METEOR
Estimates weighted harmonic mean of unigram precision and recall.It also involves matching of synonyms and lemmatized forms.

6) LEPOR
Involves a collection of different evaluation factors such as precision, recall, sentence length penalty, and word order penalty based on n-gram.

B. Human Evaluation Metrics
For human evaluation, authors in [10] used some linguistic features that include: • Translation of gender and number of noun(s) • Translation of voice in the sentence.
• Translation of tense in the sentence • Identification of the proper noun(s) • Use of adjectives and adverbs corresponding to nouns and verbs • Selection of proper words/synonyms (lexical choice).
• Sequence of phrases and clauses in the translation.
• Use of punctuation marks in the translation.
• Fluency of translated text and translator's proficiency.
• Maintaining semantics of source sentence in the translation.
• Evaluating the translation of source sentence (with respect to syntax and intended meaning).
In order to access the translation quality, a five-point scale is used, which is shown in Table III.Similarly, adequacy and fluency score may be calculated using five-point scales as represented in Tables IV and V.

VI. CONCLUSION
In this paper we discussed about MT, problems faced in MT in Indian language context, problems with existing machine translators, approaches of MT and the work that has been done till now in MT regarding Indian languages.As we have seen, the quality of existing MT systems is not good, so there is a need to develop machine translators that can provide good translation with high accuracy.We have discussed about automatic evaluation metrics and human evaluation metrics that can be used to access the translation quality.

TABLE I .
Direct MTNo parallel corpus is used.It makes use of bilingual dictionary, target language and source language corpus.-Producesgood accuracy.-Less tedious and less time consuming

TABLE II .
DESCRIPTION OF MTS OF DIFFERENT LANGUAGE PAIRS

TABLE III .
FIVE-POINT SCALE TO ACCESS TRANSLATION QUALITY