i18n: in practice

01 Sep, 2024

Depending on the size and complexity of your codebase, the prospect of doing this refactor may be daunting.

Don't be afraid! There are many tools you can use to make this work more approachable. Here are some reflections on doing that work.

Chores

Tokenisation work is a low complexity, high volume task. Doing this work can be grindy and boring. For me, I found the following helpful when doing work like this:

Get used to the process: Once you get into the flow of doing this refactor, it can become quite enjoyable. However, if you don't allow yourself to focus in, then it can really drag.
Be healthy: Doing work which isn't that complex, but that is quite voluminous, can quickly become boring. Take breaks, sleep well, hydrate and eat nutritiously - all of these things can give you the energy required to carry on with the work.
Intersperse some other work: All work and no play makes Jack a dull boy. All tokenisation and no creativity makes your developer a sour plum.

For getting used to the process, I found it useful to map out each step I would take to tokenise a file. I then did some of the work, and time roughly how long it took me by recording & watching back. I then looked to automate the more time consuming steps, which in my case involved:

Adding a snippet to my IDE which inserted the import & initialisation statement
Adding another snippet which takes the highlighted text and wraps the translation function syntax around it
Adding a third snippet which indicates that an area needs to be revisited in a comment

Once I had some hotkeys and a clear picture of how tokenising a file would go, I was able to work faster in each file. Identifying if something required more thought than what was put into the snippet workflow & revisiting those sections in another session was also useful to me.

Copilot

As of the time of writing, AI code generation tools likes Github Copilot perform well at aiding this kind of refactor.

Generally speaking a fruitful pattern was:

Import the translation function & create the translate function using a snippet hot key (e.g pressing alt+1 inserts a two line snippet: import { useTranslate } from 'our/library'; const { translate } = useTranslate('UPDATE.ME')
Update the token prefix appropriately.
Highlight the code which contains raw string literals to be tokenised.
Use a prompt which includes a basic example of usage of t() including interpolation, and describe the naming convention.

Some caveats on this:

If you have large components or a lot of nesting, then its best to highlight the section of the file in which the most user facing text and work in chunks.
Token names generated by AI were generally sensible, but the token names you assign to your phrases should be an area to which great attention is paid - these tokens will be used by your translators, and by all developers who will work on this code in the future. Ensuring these are useful is very important!

Planning

Note: This section is my own opinion, as a software developer with (at the time of writing) five years of experience as an individual contributor. If parts of this approach seem ill-advised to you, then I'd love to hear your feedback!

The first areas of focus should be:

Selecting a translation management suite
Selecting a translation library
Tagging phrases in your codebase

These tasks can begin in parallel. The first bullet point should be a decision made in conjunction with developers, but a well featured TMS for our purposes included:

User access control
Audit log
API access to the dictionary
The ability to have different namespaces

For selecting a translation library, the minimum specification for this area of code is quite small - however when handling plurals, interpolation and other more advanced cases, it is not advantageous to reinvent the wheel.

For tagging phrases in your codebase, this work will be time consuming - start as soon as you can! If you can fold this work into your normal acceptance criteria (i.e all user facing text in a file must have a token associated with it) then you may have a period of enforcement on pull requests, but you will be able to maximise the number of people working on the task.

Assigning the work of tokenising to one team or person in my opinion, is not advisable.

The output of this work is somewhat silent until your translation is completed, because you don't have much to show when doing tokenisation. This could lead to a long running task where the outcome doesn't have a clear impact to your customers.
If you have teams which operate in specific areas of the codebase, then you will inevitably step on peoples toes. Who is responsible for fixing any bugs produced by one team during translation, in another teams area of responsibility?

If all of this sounds too manual, you can consider using more advanced methods for creating a your list of tokens

When to start translating your dictionary will be a decision based on your TMS pricing structure and individual goals. If you are able to deploy a subsection of your application satisfactorily, then you may wish to translate as phrases are added & roll them out when ready.

For deployment, a useful approach might be to turn on translation for one section at a time. You can do this either by limiting where the tokenisation refactor is applied, or by marking some tokens as active / inactive.

For example, if you wish to enable multiple languages for tokens starting with app.login but not app.settings then this is possible in your translation function.

Syntactic Sugar

Some additional niceties which can be considered for developer experience or the quashing of bugs.

useTranslate

If you work in a React codebase, you may be used to using useEffect or similarly patterned functions, called hooks.

Using a hooks based approach can have some use in translation:

const { t } = useTranslate('shared.token.location.details')

const greeting = t('greeter')

const subtext = t('subtext')

In this example, we can see that we save some typing if we have many tokens in one file.

`<Trans>`

Your translation library of choice may offer a <Trans> component, which (in our case) was built for handling when your translation text includes some HTML. The exact API for this will vary between libraries, but pay attention to how your translate function handles HTML or special characters, lest you be visited by [object Object] in production - which we will visit in the next chapter.