한국어

the team

Challenges in Open Source Software Translation

Kenneth Lim & Alm Chung

Table of Contents

  1. Beginning of the p5js Website Translation Project
  2. “Our” Translation Tools
  3. Contributors and Maintainers Experience
  4. Between Languages
  5. Invitation to Future Contributors

Here we present an interview with Kenneth Lim, the steward of the p5js core library and website. Kenneth shared how the translation project of the p5js website and technical documentation started and his experience in translating and maintaining open-source software.

Beginning of the p5js Website Translation Project

A (Alm Chung): How did you meet the p5.js project, and how did you get involved in the open-source project?

K (Kenneth Lim): How I get involved with p5 is similar to many other people, through Processing which I was introduced to in university. I did coding before, but I was just starting with Processing, nothing super serious. I came across p5 either in late 2015 or 2016. It basically felt like Processing, but in Javascript, which is a language I was more familiar with, so right away I started using it.

I got involved with the p5.js project by going through the documentation, which I encourage many people to do. I saw typos and minor mistakes in the documentation and thought, how can I fix it?

Gradually I moved on to larger, more technical contributions. At one point, I contributed the fixes on smartphone sensor data (for example, accelerometer data) where the existing implementation wasn’t quite working well. The inspiration came from my project using a similar API, and I felt I could fix this. So I worked on the fixes, and one thing led to the other I got to where I’m now.

A: It sounds like your involvement grew out of your personal projects using p5.js. Finding bugs or features you wanted to improve was natural, and then you jumped in.

K: Yeah, a lot of it is like that. It was not the case where I already knew very well about the topic or issue I wanted to work on. It was more like the case where I knew a bit about the issue and got a sense of it, but I still needed to spend the time on resources such as the official MDN documentation and others to see how the API actually works before I tried things out.

I think the p5 project is great in the sense that you don’t have to be the expert in any particular part to start contributing. You can just spend the time fixing issues of your choice. We’re not going to rush you. We’re not going to say, “oh are you done yet?”

You can take your time to learn and fix things. That’s one of the most important parts of the contribution process.

A: Your next big contribution was translating the p5js website. Would you share how the project began?

K: I first came across the Processing Foundation Fellowships around mid-2016. By that point, the Spanish translation had already finished, so the website was already in Spanish. That gave me the idea. Since I’m fluent in English and Chinese, I can apply to the fellowship open call with a translation project. So I put in the application, but the thing is, my proposal was not accepted immediately because the foundation didn’t have the right mentor for the project at the time.

The project was put on hold for about a year. And that’s why the translation project started in 2018 instead. I felt good about the translation project but had to submit another application for the following year. At the same time, Lauren was applying for a grant from UCLA for East Asian technology research, which fitted well with my original proposal. So she asked me can she take my proposal and submit it as a part of the grant application, and I said sure, go ahead. We got the funding, and we found a mentor, Xin. The whole process happened in 2018, and we managed to get the entire thing set up and running.

A: Thank you for sharing details of the open call process because many people outside of it would assume it is a rigid, linear, and hard-cut process.

K: Sometimes, I would try to clarify for people looking to do a translation pull for p5.js, especially the documentation translation. The whole documentation translation project is a pretty lengthy process. The existing translations that we have now are either funded by fellowships or Google Summer of Code. The translation project is time-consuming, and we would want to pay people whenever we can.

A: When people try to suggest a new project or pitch a new idea, do you need to contribute a completely new idea, something outside-of-box, outside of existing projects? What would be the best way to pitch a new idea?

K: Suppose the angle you’re coming from addresses specific issues, such as fixing a problem you faced while using the library or making the library more fitting for the language or context you are using. In that case, it is comparatively easier to get your pitch across.

Whereas, if the angle you’re coming from is that, ‘oh, I think it would be cool to have this new feature,’ then that would be a bit hard for us to accept sometimes because there is the accessibility requirement for a new feature for the p5.js. But even if we didn’t have that requirement, it is hard to convince adding a new feature to a big project.

Every newly added feature is a feature that we need to maintain. If the suggested new feature changes functional code in the library, we encourage people to do an add-on library instead. But let’s say the suggested new feature is something more non-tangible [not touching the functional code], then all those are perfectly fine for anyone to work on. With that said, you would need to go through the fellowship and etc. process if you want the project to be a part of an official project. Although, there are exceptions all the time.

“Our” Translation Tools

A: Let’s come back to questions on the internationalization project. Were there issues specifically being discussed at the beginning of the project? How did these issues evolve over time?

K: The internationalization project is ever-changing, ever-evolving. Like right now, I still have new thoughts and ideas about it. It started when I pitched in for the fellowship. Even then, the idea was not just to translate existing materials because that's just half of the project. The other half of the project is about the sustainability of the translation for an open-source project.

By “the sustainability side,” I mean that once you have your project translated, is this something you can keep up with? You can keep translation up-to-date, where you can basically ensure that if someone is looking at a translated version, they would be getting the same experience as someone looking at the original [English] version. That's one side of it. The other side is something] you can call developer experience, right? Or maybe translator experience: how can you make someone who translates lives easier.

And at the very beginning, it was a kind of very rudimentary tracking system based on git diffs. So I do git diffs on the translation files and then list diffs in a file. And say, ‘okay these other files will change, maybe you need to look at it,’ things like that.

The idea came from that line of logic, but it hit a roadblock pretty early on because the way the website and documentation are translated was completely different. They use entirely different file formats and rendering systems, so they were incompatible with each other. They are very hard to manage on our project’s scale, so I had to re-think the case. So that got put on hold for quite a bit because there were a lot of pre-requisites to get things aligned with each other. I needed to be able to change the way the references were rendered and at the same time the website was rendered, or if that is even possible.

That is a very big endeavor, so I never touched on that. It just dragged down to last year. When I was working on my MRes (Master of Research), I got a lot of time to look into this issue, and that’s where the idea of using a graphical user interface came.

One or two years ago, I came across Mozilla Pontoon and thought maybe this pre-made tool could be helpful for us. So I went ahead, tried it, researched it, implemented it, and saw how it worked.

For most of the part, Pontoon looks fine, but I’m still unsure about implementing it into our system because it is still much of a hassle to deploy a Pontoon instance. The instance itself is pretty heavy because you need to have a whole local instance, and we need to have a separate database. And those have to be paid instances because the free ones are not big enough. So it is still not an ideal solution. I’m currently thinking that maybe one of the commercial offerings would be a better solution since some of them provide free service to open source projects.

So that’s where I’m right now, and the internationalization project became an ever-changing, ever-evolving thing.

A: Can you tell us more about Pontoon? Do you know the context they were developed for?

K: Mozilla designed and developed Pontoon for their internationalization and localization needs, where the most significant product they have is Firefox. Every single translation in Firefox or their products would be done through Pontoon. Before Pontoon, they had another old system called Pootle by Translate House, which development was stopped in the early 2000 or maybe late nineties. It was an early project with the idea of browser-based interface and translation managing, and Pontoon borrowed many ideas from Pootle.

One of the things that made Pontoon hard to use for, say, all purposes such as translation and documentation, etc., is that Pontoon was originally designed for UI translation (i.e., Firefox). It is not designed for translating longer texts like documentation. And it’s not designed for having a large number of strings, as a library would typically have a large number of entries. Some of Pontoon’s limitations come from that context, which is understandable since it is not designed for other purposes. It works for some parts, but others are just some hoops we must jump through.

A: Do you think the problem is more of a UI design issue or missing a feature?

K: A bit of both. You definitely have a UI problem, like its input interface is pretty much expecting relatively short strings. In terms of English words, Pontoon expects a maximum of 20 words or something like that. If you have a string longer than that, then the interface looks chaotic because it would truncate strings in the interface or display everything in the interface, which can be really, really messy.

That’s a bit of a usability thing. Still, there are also more technical challenges, like the default format used to represent a relationship between translations is not flexible enough. If you are doing a translation task, you almost always do it through JSON: you have a JSON file with a nested structure where there are a key and a value. Pontoon’s translation file format doesn’t support any of that. No nesting structure. There is key-value pairing, but a rule for what can be a key is very limiting. It has to be alphanumeric, and you can’t use any special symbols.. So a lot of things need to work around the rule.

Also, its workflow design can be limiting. The philosophy behind how Mozilla uses Pontoon is that the translation files themselves instead of what’s in the database system. So files themselves are the source of information. And that can be a bit difficult because one of the goals is that nobody ever has to think about these files, nobody has to directly touch the files. You don’t need to do that to translate.

So a feature that I really want, for example, is the outdated string feature. Let’s say a string changed in the original language. I want to mark it as “potentially outdated” to have translators look, get their translated strings to say this string is fine and other strings they would be tweaking to match the updated original.

That feature is not really possible with Pontoon. Because the idea of Pontoon is that that the files in the original language are kept as the original, and if you have something to be updated, you update the key in the file. You update the corresponding tags in the files instead of directly updating the value. And that is pretty much aligned with how UIs work. If you have a string in a UI that changed in the original, you would want to pin the version you are building to that particular version. And then, if you are building a new version, you would want to look through the new version etc. So you can change what you're using, the source code itself. But for the p5.js, I don't want anyone to think about those files. Those files would be automatically generated to have random keys for the source. That I still couldn't find a way around it.

A: Are you researching other options and thinking of making your own tools?

K: I briefly looked into commercial offerings. POEditor and Transifex are big commercial offerings that would be useful if you have an iOS app that you need to translate.

But from the looks of it, they are not really offering that much different things than what Pontoon offers. I think Pontoon may actually offer more than them in some sense, but the main benefit I get from these commercial products is that I don’t have to worry about the deployment side or the server.

We may need to consider customization, and I think Yukie Nomiya worked on a version of translation tracking for the website.Yukie Nomiya's Google Summer of Code project: i18n Improvements and Italian translation It’s an interesting project that looks like a version of what Pontoon does but within the p5.js, which is nice. Something like that, built from the ground, is great because it will fit our purposes.

Contributors and Maintainers Experience

A: What do you think are the challenges of internationalization, broadly speaking?

K: Often people think about internationalization this way: I translated this, so it is translated. But that isn’t how a lot of these things work. Going back to the previous point, it looks like UI translation, you revise a version of your software, you are done, and ship it up. This will not be the same for, let’s say, translating documentation for a project that is actively being developed. The documentation will change, and there will be incorrect or missing information.

The rate of change for the p5.js project is a lot higher than a UI project. And we need to keep things up-to-date because we want to avoid a case where, “if you want proper documentation, you should look at the English one and so-and-so language one is not accurate.” That is just not right. Someone shouldn’t be having a subpar experience just because they are not using English.

The other part is user experience. You have many different aspects. There is the user experience of documentation and the user experience of the translation. You may want to add in the experience of the developers.

For the end-users, it would be more about how they could access this translation and how many clicks they need to have the information they need. If you are on this page in English, can I just click about them and then go to my preferred language.

And there is also the quality of the translation itself. For example, one of the things I focused on while working on the website Chinese translation is that I want to write in a way that feels like it’s written in Chinese instead of translated from English documentation. In this case, I must completely rewrite the sentence to convey the same meaning. Sometimes I will need to try to forget the original sentence structure or entirely rewrite a paragraph to make it sound better.

For translator experience, we want to show how they can actually contribute to the translation. This is why I want to avoid using git as much as possible. I don’t want them to use git, I don’t want them to be full-fledged developers.

You have to pull the repo have to understand how the website is built, more or less. Also need to figure out if this string corresponded to this place. Now maybe you need to build locally to make sure everything works and check it on the actual render page itself. And then you have to fork the repo and push it through your own fork PR, … all those sorts of things. And that’s all for prerequisites, regardless of whether you are doing full translation or correcting one sentence. That’s a bit disproportional to what we expect translators to focus on, and we want to avoid that. Then there is this quality of experience. According to my research, people benefit greatly from minimizing the need to switch between tabs while using machine translation. So they don’t have to go to, for example, the Google Translate website, type things in, look at its output, and then switch back to type in other things, maybe in another window, in another program. Minimizing that in the process will improve the quality of the translation experience a lot.

A: You also mentioned experience of the developers.

K: Yes, so for the developers, the goal is to make the whole process as straightforward as possible. This means they don’t have to worry about how or whether their code will be appropriately translated or not. Or whether it will be put into the translation system or not. If we can minimize the cognitive load to worry about these and the potential for a mistake, that would be great. But I anticipate that design task to be very, very challenging.

A: We sort of isolated the translation text and the code for the p5.js website already, and it has its own repo. Compared to that certain parts of the p5.js, for example, the Friendly Error System (FES), the website lives in a different structure. I can see the FES can be packaged and isolated out later also. But I also can see the pros and cons of keeping the translation and the main code “entangled.” The contributors coming from different backgrounds may have different motivations and priorities. For example, the motivations of instructors who want to teach p5 in a language other than English can be different from people who start looking into the p5js project to become a part of the open-source software community in the long term. In some cases, maybe a contributor wants to see every component of the project in one place instead of being isolated out into packages and being less tangible. Perhaps this design will depend on the scale of the code.

K: I think, apart from scale, it is more or less the same problem. On the one hand, ideally, I would like to see that there is no distinction between the original text and the translated text.

We all be writing into the list of files, followed by the list of strings, and regardless of the details of the structure, all strings will be pulled from there. But this structure also adds the complexity that if you look at the source code itself, you get a bunch of template names, variable names, and strings themselves, making things a bit more opaque.

So there are trade-offs there. I guess part of it is to find where this balance is because we still want people to be able to just jump in and hack things. We don’t worry too much about breaking everything while experimenting.

Between Languages

A: Let’s talk about the glossary. You made the list of technical terms in Chinese for translating the website.

I’m having difficulty translating technical terms in Korean because we have different ways of translating loanwords or foreign-originated words (외래어). You can translate a foreign word based on sound (phonetic translation) or based on meaning using Korean or Hanja (Chinese letters) expressions. The Korean language even has different ways to create a Hanja word, either looking up Japanese or other Chinese-speaking countries. Out of all these choices, how do one select one?

Long story short, please share your process of translating English technical terms into Chinese.

K: I had a specific way of translating English technical terms, having a list of translated terms. And the reason was to be consistent with your translation, which will simply reduce the chance of confusing users.

The way the list itself is built is that as I was going through individual strings when I came across a technical term — which can be programming-specific but also includes mathematic terms — I wasn’t sure how to translate it. What I did was basically start searching online for not only translation services but also domain-specific translation references. Google Translation and other translation services often don’t perform very well because they are not yet trained specifically for the programming context.

So I would search for those terms and look for forums and help pages in Chinese and others in Chinese, such as the MDN (Mozilla Developer Network) website, React website, and Vue.js website. They would already have Chinese translation, so what translated word do they use if they use that term? Do they use terms A or B, and do they conflict? And the goal is to find some consensus out there among the people. This is what most people use to refer to this concept.

A: I want to discuss a tricky situation concerning inclusivity. Let's say we translated p5 following the convention of how Korean programmers talk and use that as a baseline for translating the technical terms. Then when people try to search using the translated error messages or references to find other resources, they sometimes end up with limited results because most of the available resources are still in English. Some even suggested showing the original English term and the translated words side-by-side to remedy this.

K: I see that kind of approach sometime in the official translations as well. And what is often done on the “Getting Started” page or tutorial pages so they can mark it at the beginning. And then, if you come across another instance of the term, it is not accompanied by the original string. It is kind of how you write an essay. [You define terms] at the beginning, and then assume people know the terms from that point on. That’s one way people approach it.

The way I approach it is I refer back to when I first started learning to program. I literally went to my high school’s librarian and borrowed programming books in Chinese. I faced the same issue: different books used different terminology, and I had to make a mental note: ‘oh, they are referring to the same thing!’

A: Maybe providing a cross-reference list of terminology can be helpful.

In Chinese, there are traditional Chinese and simplified Chinese. And because of geopolitical reasons, they’ll use completely different terms in translation most of the time. One translation version that would work for simplified Chinese will probably not work for traditional Chinese, although they are the same languages. This difference is also because they are almost always translated by different people.

A: For political or cultural reasons, one ends up choosing one version of the translation and going with it. The corresponding community of people will need to maintain the part.

Maybe a good question is how to identify which community can be a good maintainer for a specific translation/language.

K: One of the things that would be good to have in, say, a maintainer or any translator is to have that kind of sensitivity to the language you’re working with. There are users using our documentation as shown in their language and do not come from an Anglo-centric background. Understanding that and coming from that perspective would be a better approach.

A: In your article, you actually talked about making the translation feel as native as possible, as opposed to what, in Korea, the expression is called 번역체 (“translationese”). Sometimes people enjoy the experience of reading in “translationese,” even though it is not how Korean is spoken.

When you work with Chinese languages, do you have similar issues? What were the problems you had because of the difference between English and Chinese languages?

K: A lot of the time, it is about sentence structure. It is about how you phrase things and how things are expressed.

But I guess a more concrete example is punctuation, which is the most obvious example. The reason is that in Chinese, the full stop of a sentence is not the period ‘.’ in Chinese. The full stop is a small circle ‘。’. The rest of the punctuations, for example, commas, although they look similar, they take up different widths. So Chinese characters, regardless of what character it is punctuation-corrected, usually will take up the full square.

The problem often comes up, like when a developer tries to write an output page and hard-code a full-stop period. This happens quite a lot.

For a template, in summer, we usually don’t translate specific names like links. They are not translated, so that didn’t go in the translation this fall. Instead, these links were hard-coded in. So if that link is at the end of the sentence, they’re just at a full stop there. In that case, not just the punctuation doesn’t work, but the whole thing doesn’t work because the subject-verb-object order is different, not just for Chinese but for many other languages as well.

If you hard-coded a specific item at a specific location, it won’t work. Because it’s hard to change the sentence structure around. So that is part of the challenge as well.

A: Is there a guideline to relieve these issues?

K: Yes, in the contributor docs, I wrote instructions and about things to be careful about when you are writing the template. So, don’t hard-code punctuations. Even if your language is just an empty string, make sure you put it in front of all the hard-coded parts so people can change the subject-verb-object order. So these are the kinds of things that call back to what we talked about before, what the developers need to be aware of.

A: Maybe this only applies to Korean, but when we are translating, we have to think about how to create a language/tone to increase inclusivity. We had to be very careful about choosing the “sentence-closing ending” of Korean sentences because different choices dramatically change how inviting or respectful the sentence sounds and re-establish the relationship between user and software.

K: One of the things I find a bit difficult translating into Chinese is that its tone of voice differs from what the English language sounds like. For me, when I’m reading the Chinese version of the text and the English version of the text, quite often, they give a different sense, a different sensibility.

I guess the tone of voice and the fluency of the translation make certain text seems like it’s written in Chinese originally instead of being translated from another language. Make sure the tone of voice is right, and pay attention to details such as the pronoun issue. It’s not exactly like gender pronouns because you can completely ignore gender pronouns in Chinese, and it works. But it is more like two different forms of pronouns for “you” in Chinese: one is more formal, and the other is more informal. So, when to use which one? In the beginning, I tried to use the formal one as much as possible and tried to enforce it, but then the text sounded too formal. Whereas if you want a more casual tone, you will mix in the informal one, but in practice, they are actually the same word that just read differently. So they look different and sound a bit different, but they would try to say the same thing.

Invitation to Future Contributors

A: As we near the end of our interview, let’s try a new topic. What would be the best personality/voice for the p5js project?

K: For p5js, I don’t feel that it should be overly formal or technical. We are not trying to present ourselves as super-professional. What we aim to be is more like a mess. I’ll admit it’s a mess, but it works! And anyone can work on it. But if you show the project to a professional Javascript software engineer, they would say, “why are you not using that?” or “why are you using that?” “why this and that,” but you know, it works for our purposes. And I think that attitude is the tone of voice that p5js had its own as well.

And it’s true that trying to translate that particular attitude across different languages and cultures will be difficult. In a case like that, I think the best thing to do is involve people who will use it. You kind of let them use it and ask for feedback.

A: It’s an evolving project, so once we have new people and meet other people, the project will take another turn, and we will have different ideas and concepts.

K: I’m kind of fresh off the research degree, so I’m still thinking in terms of the design research process. So there will be one like a focus group. And it would be a more concrete question than something more open-ended. Let’s say I phrase one thing like this, and then here are some other options that are more helpful for the task you are trying to achieve. But that’s just one approach.

A: I want to conclude the interview with the last question, what kind of new participation or contribution do you wish to see in the future?

K: I guess there isn’t really that much I personally want to prescribe here. But at the same time, I’m not exactly looking for a new idea. As we talked about before, new ideas are hard to get accepted. So if you want something a bit more concrete, as long as anyone is interested in contributing and exploring what contribution means, that is already a good enough start.

Also, when we say contribution, it’s not just code contribution. You don’t even need to touch Github. If you think about internationalization or localization, it can be correcting an existing one if the project is already in your language. If you want to have p5js translated into a new language, put in an application like Processing Foundation Fellowship or Google Summer of Code, etc. Or if you want to do it outside of those, we will be happy for anyone to do the 3rd-party translation.

And the 3rd-party translation is great because their translation is there that we can link to it, but we don’t necessarily have the responsibility to maintain it. If later a maintainer comes along and wants to do the work long-term, they can look at that what exists in translation, if possible, as a starting point. The translation project is huge, so you may not want to work on the whole thing, but then there are also periphery things that we mentioned as well: the tracking system, user interface, user experience, and how those parts work is also an important thing you can contribute to.