OSMaTran: Open-Source Machine Translation
A half-day workshop at MT Summit X
September 16, 2005
The Hilton Phuket Arcadia Resort and Spa
Phuket, Thailand


Machine translation has become a key technology in our globalized society; as a result, machine translation software is available for major language pairs and for major computer platforms, including web-based machine translation. On the other hand, the recent years have witnessed a boom of open-source software; among the most successful solutions, the operating system Linux, web browsers such as Mozilla, web servers such as Apache, and full-fledged office suite such as OpenOffice.org. However, almost all "real life" machine translation software, even if available for use at no cost, is "closed" instead of open. This is especially surprising if one considers the large number of publicly-funded groups working in machine translation.

Open-source machine translation would have, however, distinct advantages; if it is freely available, as most open-source software is, more users would have access to this technology, but, more importantly, institutions or businesses adopting an open-source machine translation system would be able to customize the system to their needs in many more ways: developing new linguistic data (vocabularies, rules, corpora), integrating it with other packages, etc.

But machine translation software is special in that it relies upon the availability of extensive linguistic resources; for an open-source machine translation architecture to be successful, clearly defined and documented standards to represent linguistic data are absolutely necessary. Data standardization would lead to interoperability and interchange, which would in turn be very beneficial to the creation of new machine translation systems. Proprietary data could also be converted into these formats to be used in conjunction with open-source architectures, leading to hybrid systems.

The existence of an open-source machine translation architecture would also be specially important for the creation of systems dealing with language pairs involving small or neglected languages, which are usually not targeted by commercial programs, but would fulfill the goals of administrations and non-government organizations dealing with these languages, and even contribute to their promotion or revival.

Open-source software is associated to a change in the business model. In the case of machine translation, it would result in a shift from license-based or charge-per-word models to a service model in which enterprises would offer users a variety of services: consulting, customization, linguistic data development, integration in multilingual document management systems, etc.

Machine translation is only one of the available language technologies which can be applied to translation; the effect of the existence of open-source software for other translation applications such as translation memory, etc., or even other natural language processing applications not related to translation, would therefore be worth examining as well.

Schedule and Venue

This half-day workshop will take place on September 16, 2005 after the regular conference sessions end. Please visit the Workshop website at http://www.torsimany.ua.es/OSMaTran/ for updates.


08:30-09:00 Opening remarks (Mikel L. Forcada)
09:00-09:30 Paper 1: The Open A.I. Kit: General Machine Learning Modules from Statistical Machine Translation (Daniel J. Walker)
09:30-10:00 Paper 2: An Open Architecture for Transfer-based Machine Translation between Spanish and Basque (Iñaki Alegria, Arantza Diaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Aingeru Mayor, Kepa Sarasola, Mikel L. Forcada, Sergio Ortiz-Rojas, Lluís Padró)
10:00-10:30 Coffee Break
10:30-11:00 Paper 3: Open Source Machine Translation with DELPH-IN (Francis Bond, Stephan Oepen, Melanie Siegel, Ann Copestake, Dan Flickinger)
11:00-11:30 Paper 4: An open-source shallow-transfer machine translation toolbox: consequences of its release and availability (Carme Armentano-Oller, Antonio M. Corbí-Bellot, Mikel L. Forcada, Mireia Ginestí-Rosell, Boyan Bonev, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez).
11:30-12:00 Round table and closing address.

The working language of the workshop will be English.


