JEMAP outline1 – エイシーティ株式会社

JEMAP:Japanese English Mapping Computer Aided Translator

Introduction

The ability to translate written text from one language to another can satisfy many needs. For example to someone with little or no knowledge of Japanese, it can be very useful to acquire the meaning of a Japanese Document. It can open a window into the extensive scientific research published in Japan but not translated to English.

To writers of Japanese not well-versed in English it can be the means to publishing in the English translation, often in learned publications.

Features of JEMAP

Most commercial Japanese to English translators are based on some form of semantic parsing.(see for example Bond [1]). The semantic natural language parsing is in its infancy and though quite powerful, it still has a problem handling compound long sentences.

On the other hand context free parsing is a well established technique. Church et al [2] have demonstrated a general method of performing statistical syntax free parsing.

The origins of JEMAP are based on the following observations:

Japanese documents have a tendency to use long sentences.

Written Japanese is predominantly composed of words with a single meaning. In contrast to non-Kanji based languages which have many senses to a word.

The availability of Japanese Wordnet [3] helps distinguish the word senses in Japanese.

The author has the capability to do good syntax free parsing in Python. This is a legacy from other computational linguistics projects.

To take advantage of other available codes

A translator is a complex piece of work. In order to do a good job, it is important to take advantage of any other available code that can be of use. This is the reason for calling it a Computer Aided Translator. The author makes no apology in including the necessary pieces of software as an integral part of JEMAP.
It takes advantage of the following software :

A Japanese word processor. In my case the NJStar word processor. The original text and all the dictionaries are based on *.txt copies of UTF8 unicode.

The Google Japanese English translator which does a good job in translating phrases, specially those that result from a conjugation of verbs that are not readily available in the dictionaries used. This translator appears to be based on the Honyaku translator by Toshiba[4]. It does not do a good job in translating compound sentences but serves well at the phrase level.
The executive writing version of WhiteSmoke [5] helps correct the grammar and spelling of the resulting English. It does a fair job of supplying the missing articles and plural forms in English. This is also supplemented by the spell checker in Microsoft Word. A better placement of articles and plurals is given in [1]. It is envisaged that the algorithm described in [1] will be implemented here in the future.

[References]

1. Bond, Francis,”Translating the Untranslatable, A solution to the problem of generating
English Determiners.”, CLSI publications, Stanford, 2005.
2. Church, K. W., “A Stochastic Parts Program and Noun Phrase Parser for Unrestricted
Text,” Proc. Second Conf. on Applied Natural Language Processing, Austin, TX, 1988,
pp 136-143.
3. Francis Bond, Hitoshi Isahara, Sanae Fujita, Kiyotaka Uchimoto, Takayuki Kuribayashi
and Kyoko Kanzaki,”Enhancing the Japanese Wordnet” 7th Workshop on Asian
Language Resources in conjunction with the ACL-IJCNLP 2009, Singapore.
4. The Honyaku Japanese Translator, XLsoft Corp., www.xlsoft.com.
5. WhiteSmoke, Writing Software| Grammar Software| Grammar Checker ,
www.whitesmoke.com.