machine translation/ swarming in between.

alm chung

Table of Contents

1. error messages, in between
2. generatable messages
3. searchable messages
4. symbiosis with existing information
5. i18n for creating new opportunities

Reflecting on how translation has long been serving as “bridges” between communities, here we will think about how internationalization of error messages would contribute to constructing an intrastructure for wider collaboration.

The word translation generally means “to bring across.⊕From More Than One Language by Barbara Cassin”. What happens when you “bring” text from one language to another? Cultural exchanges between two linguistic groups becomes permeable. We can imagine why we often use the metaphor of “building bridges” for translation.

Would it be fair to see our internationalization service as making two cultures more porous, between the origin culture (where technology was developed) and a destination culture (where technology is transplanted)? Setting this question aside, it may be helpful to revisit the main function of an error message.

During debugging⊕Debugging refers to the process of discovering errors that occur during software development. This is done with the help of an interpreter such as a debugger or an interpreter and other tools that support it (FES is also included). process, a computer runs an input code and when it encounters problem during the execution, it throws an error message. In other words, an invisible, abstract (and mechanical) phenomenon that occurs inside a computer is translated into a human-readable message.

Since an error message is generated by a machine replying to our input, on the spot⊕The current design of FES does not impromptu generate messages, however, we can expect the improvisational aspect will become more apparent with applying next generation AI technology., how does this human-computer communication differ from the communication during, let’s say, interpretation? Keeping that question as an inspiration, we will examine the place of error messages within “collaboration with machines,” a medium in the large “human-machine” network.

error messages, in between

As mentioned earlier, the translation work connects the two groups or systems that have been disconnected so that they can communicate. Then, which groups do we connect by translating error messages?

First of all, it may be clear that “translation for collaboration” is a prerequisite for communicating and working with people outside of one’s linguistic bubble. Since most new information is written and exchanged in English within the tech, people outside the English-speaking culture cannot directly access the newly opened territory as they cannot reach the new information until it has been translated. Here, the translation acts as a bridge between the two isolated areas to some extent. However, it is still a one-way communication that flows from the English-speaking world, the producer of information, to the non-English-speaking world, the recipient of the information. ⊕If you are interested in reading more about practical issues arising from this process, please take a look at previous chapters [3. Translating Technical Terms]와 [4. 번역가 간의 약속 (Korean only)].

Let's go back to our opening question. Within the human-machine hybrid labor structure we live in, who do we collaborate with can extend beyond a human collaborator to a robot, crowd, software, or system. Now, instead of one expert worker building up the system from start to finish, it has become much more common to work “in the middle,” weaving all the information provided by other workers as well as various machine agents and systems. The error messages from the debugging process also operate within the human-machine collaboration web. Given this context, let's think about what specific considerations should be taken into the design of error messages and translation process.

generatable messages

The debugging process rewrites a code written by the (human) programmer into a code that the computer can read and execute without problems. During debugging process, an error message, which informs a problem that has occurred inside the machine itself, completes the feedback structure⊕Feedback starts from a system returning an output based on an input that was fed into it. A feedback loop, is a completely cyclical form of this interaction, in which the output is fed back as a new input and restarts the process. Here we describe a situation, where the user enters code into a computer, the computer runs the code, finds an error, and outputs an error message, and the user enters an updated code again and the loop continues. between the user and the computer. Since this error message is expected to be displayed instantly when an error is encountered, while the computer simultaneously executing the code, it should be written in a way that can be easily printed out in most programming environments without a problem. No matter how clever the output sentence, technique, or design, an error message algorithm that uses a lot of computational power (and slow down the process) will make the debugging session a lot more laborious.

Currently, FES is using [i18next], one of the widely used open source⊕Open-source refers to software whose original code is publicly available and can be freely modified and redistributed, mainly through the web. internationalization (i18n)⊕From W3C: “Definitions of internationalization vary. This is a high-level working definition for use with W3C Internationalization Activity material. Some people use other terms, such as globalization to refer to the same concept. Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.” https://www.w3.org/International/questions/qa-i18n#i18n tools, to implement error messages in multiple languages. i18next is developed based on modern English languages which can be largely viewed as isolating language⊕We are saying this in the context of “isolating language whose morphemes equates individual words” and ”morpheme per word ratio close to one.” https://en.wikipedia.org/wiki/Isolating_language, and has implemented [interpolation] ⊕This name must be inspired by an approximation method used In numerical analysis: “interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points.” https://en.wikipedia.org/wiki/Interpolation for “integrating dynamic values” efficiently. Interpolation, more simply put, is “filling in the blanks to generate a sentence.” You create message templates in advance, leaving placeholders for keywords blank. Then, when an error occurs, the computer will fill in the blanks with the relevant information.

A computer can easily break sentences into modules like Lego blocks and asseble a new sentence using the modules. However, while a computer can efficiently process the task, modularizng written words and assebmling them one after another, the resulted messages may present sentence structures that can be difficult to understand or feel awkward for human readers. It's also important to remember that not all languages have the same grammar as English, an isolating languages whose morphemes equates individual words. For example, Korean is an agglutinative language⊕“An agglutinative language is a form of synthetic language in which each affix typically represents one unit of meaning (such as "diminutive," "past tense," "plural," etc.), and bound morphemes are expressed by affixes (and not by internal changes of the root of the word, or changes in stress or tone).” - University of Pittsburgh in which a central morpheme is attached to other morphemes, including prefixes and suffixes, to form a meaning. In this case, there is a limit to simply treating each word in a sentence as a basic unit. Output messages generated by simply swapping out local keywords while ignoring global structure of the sentence or conventions of a particular language may quickly become unclear or “unnatural.”

Despite the advancement of natural language synthesis, including deep learning models, the interpolation method will still be useful “enough” for generating error messages for many cases, since it allows an efficient parametric generation of a message using small resources. While we are using i18next, we will need to come up with the best version of error messages based on the structure provided by interpolation. Would this be an opportunity to embark on some poetic expeditions, assemble and reassemble our error messages like playing with Lego block, to discover the sentence structure that is readable (for both human and machine), versatile, and even beautiful?

searchable messages

The space where error messages operate extends over between computer-programmers, to the global information space which we access through Internet search. We all search online while debugging. Here, none other than error messages becomes the search words. ⊕Many developers, including the following article, recommend searching the web for error messages as a tip:
Spinellis, Diomidis. "Modern debugging: the art of finding a needle in a haystack." Communications of the ACM 61.11 (2018): 124-134. In fact, many people attempt to solve the bug by simply ‘copy -paste’ the entire error message without even reading it.

Therefore, error messages also serve as a link between the programmer and information on the Internet. At first glance, error messages written in plain language may seem more readable and easier to use. On the other hand, it would be less likely to get desired search results without using precise terms, unless our search engine is quite sophisticated for processing natural language. We can imagine that it would be harder to specify a relevant case for a description written in common (nonspecific) words. The problem of readability of a message exclusively written in plain language applies to human readers as well. Writing with a limited set of vocabulary risks making sentence structure more complex, lengthy, and repetitive, making it harder to process the meaning of text.

Then, shouldn’t a good error message provide effective search words that redirects people to information to solve their specific problem faster? How should we design of such message? If we aim to introduce modern computing to a wide audience, the design and translation process of error messages should reflect the context of the modern programming workflow.

symbiosis with existing information

While researching on technical translation practice in Korea, I learned that professional translators often incorporate translation software and machine translation into their workflow. Machine translation tools are especially helpful for drafting rough translation of technical documents from a specialized domain. It often shows high accuracy for technical terms that allows one-to-one mapping translation, learned from existing high-quality, standardized technical translations.

Technical domains are increasingly specialized, and new terms are created every day. No matter how many translators are honing their expertise on a daily basis, it will be physically difficult to keep up with the trend in real time. In this context, machine translation is helpful for translating technical documents, if the goal is to match the domain’s established conventions. There is an (human) editing process after the machine translation, which prioritizes efficiency over artistic standards, fufilling narrow timeline requested by their client. The finished document will be again fed back into the machine translation tool, which will be newly trained with other documents created in a similar process,completing a feedback structure. Can we stop a moment to imagine translations what qualities would be generated in this loop?

If most of the information on the Internet will be written or translated using machine translation, where does our Friendly Errors i18n discussion stand within this context, and what does it mean to us? What would be the best to utilize available machine translation tools for p5js, in respect to our philosophy for software development?⊕Maybe the interview with Kenneth Lim, our current steward for the p5js and p5js website, [Challenges in Open Source Software Translation] might be a good starting point.

i18n for creating new opportunities

I want to close this chapter by thinking translation project as a tool to create collaboration opportunities. This discussion on internationalization and translation issues started as co-translation project for a small part of p5js. As we started gathering resources for our project, there were a lot of aspect about internationalization I was excited and concerned about at the same time. One of them was about how most of available resources were based on the practice that global companies had established.

Internationalization strategies are actively discussed and promoted by several Silicon Valley companies to build a global economy. In most cases, the first step in internationalization is translation. Especially, a lot of resources are about how to utilize translation tools which prioritizing low-cost and “convenient” process. In the case of p5js, I hoped that there would be an opportunity to think about what different priorities and purposes we could have for internationalization task.

Therefore, I wanted to imagine our specific process to be different too. The idea was to bring internationalization task forward to create a new participatory space. It would diverge away from the one-way, handed-down tool and translation, in which communities or individuals based around the world can participate and continue discussions on translation. Could it become a “porous” site where information, tool, ideas, and insights flow and swarm in every direction? I am looking forward to the direction in which internationalization will unfold for the p5js in the coming days.