The mechanics of translating your package are quite straightforward. The bigger challenge is writing messages that are easy to translate. In part, this is an extension of writing messages that are easy to understand in English as well! And if itâs hard for a native English speaker to understand your message, itâs going to be even harder once itâs translated into another language. The following sections give some advice about how to write good messages, as inspired by the âPreparing translatable stringsâ section of the gettext2 manual.
Write full sentencesGenerally, you should strive to make sure that each message comes from a single string (i.e. lives within a single "â). Take this simple greeting where I translateâgood" and âmorningâ individually:
This will pose two challenges for translators:
When working with .po
files, translators see each individual string without context, and they may be in a different order to the original source. This can lead either to a poor translation or an expensive journey to the source code to get more context.
msgid "morning"
msgstr ""
msgid "Good"
msgstr ""
Prose is not like code: you canât reliably build up sentences from small fragments of text. Even if you can figure out how to do it in English, itâs unlikely the same form will work for other languages.
Instead itâs better to generate the complete message in a single string using glue()
or sprintf()
3 to interpolate in the parts that vary:
Then the translator sees something like this:
msgid "Good morning {name}!"
msgstr ""
This gives the translator enough context to create a good translation and the freedom to change word order to make a grammatically correct sentence in their language. We can make the problem more challenging by making our greeting more flexible:
This would generate the following sequence of translations for French:
msgid "Good"
msgstr "Bon"
msgid "morning"
msgstr "matin"
msgid "afternoon"
msgstr "après midi"
msgid "evening"
msgstr "soirée"
Unfortunately this breakdown wonât generate correct French. The three greetings should be âBonjourâ for morning and afternoon, and âBonsoirâ for evening. There are two problems: good morning and good afternoon both use bonjour (even though French has different words for morning and afternoon; bon après-midi is used as a farewell), and the two word English phrases turn into single French words.
If you were translating to Mongolian youâd face a different problem. While Mongolian uses the same times of day, it arranges the words in the opposite order to English: âӨглөөний мÑндâ is morning greetings, âӨдÑийн мÑндâ is afternoon greetings, and âÐÑойн мÑндâ is evening greetings.
Again, we need to resolve this problem by moving away from translating fragments and towards translating complete sentences. One way to do that here would be to restrict ourselves to a fixed set of time points and use switch()
to specify the greeting:
This works for French (and Mongolian):
msgstr: "Good morning {name}!"
msgid: "Bonjour {name}!"
msgstr: "Good afternoon {name}!"
msgid: "Bonjour {name}!"
msgstr: "Good evening {name}!"
msgid: "Bonsoir {name}!"
However, itâs still not a fully general solution as it assumes that the time of day is the most important characteristic of a greeting, and that the day is broken down into at most three components. Neither is true in general:
Danish breaks the time of day in two six parts: (âmorgenâ), pre-noon (âformiddagâ), noon (âmiddagâ), afternoon (âeftermiddagâ), evening (âaftenâ), and night (ânatâ).
In Swahili, the greeting varies based on the relationship between the people: âShikamooâ is for young to old, âHujamboâ is for old to young, and âMamboâ is for young to young.
Greetings are particularly challenging to translate because of their great cultural variation; fortunately most messages in R packages wonât require such nuance.
sprintf()
vs glue()
In R, there are two common ways to interpolate variables into a string: sprintf()
and glue()
. There are pros and cons to each:
Using glue()
requires an additional, if lightweight, dependency, but gives the translator more context (assuming you use informative names for local variables), and makes it easy to rearrange interpolated components:
msgid "{first} {second} {third}"
msgstr "{third} {first} {second}"
On the other hand, putting the name of the variable in the translated string means that you canât change it without updating all your translations, and thereâs a small risk of it also getting translated.
sprintf()
is built into base R, so is always available. The downside is that it can be hard to figure out what the sentinels refer to and the syntax for rearranging components (which uses 1$
, 2$
) is somewhat arcane.
msgid "%s %s %s"
msgstr "%3$s %1$s %2$s"
The difference may be more important than you realize â as mentioned above, some languages (e.g., Turkish, Korean, and Japanese) assemble phrases into sentences in a different order. âI have 7 applesâ becomes â7ãããããã£ã¦ãã¾ãâ in Japanese, i.e. â7 apples [Iâm] holdingâ â the verb & subject switched places. The reordering of templates in your messages is going to be quite common if you want your messages available in more than a very limited set of languages.
Un-translatable contentYou can use interpolation to avoid including un-translatable components like URLs or email addresses into a message. This is good practice because it saves work for the translators, makes it easier for them to see changes to the text, and avoids the chance of a translator accidentally introducing a typo. It works something like this:
Similarly, if youâre generating strings that include in HTML, avoid including the HTML in the translated string, and instead translate just the words:
Generally, you want to help the translator spend as much time as possible helping you out.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4