le0c

i18n: in practice

Depending on the size and complexity of your codebase, the prospect of doing this refactor may be daunting.

Don't be afraid! There are many tools you can use to make this work more approachable. Here are some reflections on doing that work.

Chores

Tokenisation work is a low complexity, high volume task. Doing this work can be grindy and boring. For me, I found the following helpful when doing work like this:

For getting used to the process, I found it useful to map out each step I would take to tokenise a file. I then did some of the work, and time roughly how long it took me by recording & watching back. I then looked to automate the more time consuming steps, which in my case involved:

Once I had some hotkeys and a clear picture of how tokenising a file would go, I was able to work faster in each file. Identifying if something required more thought than what was put into the snippet workflow & revisiting those sections in another session was also useful to me.

Copilot

As of the time of writing, AI code generation tools likes Github Copilot perform well at aiding this kind of refactor.

Generally speaking a fruitful pattern was:

Some caveats on this:

Planning

Note: This section is my own opinion, as a software developer with (at the time of writing) five years of experience as an individual contributor. If parts of this approach seem ill-advised to you, then I'd love to hear your feedback!

The first areas of focus should be:

These tasks can begin in parallel. The first bullet point should be a decision made in conjunction with developers, but a well featured TMS for our purposes included:

For selecting a translation library, the minimum specification for this area of code is quite small - however when handling plurals, interpolation and other more advanced cases, it is not advantageous to reinvent the wheel.

For tagging phrases in your codebase, this work will be time consuming - start as soon as you can! If you can fold this work into your normal acceptance criteria (i.e all user facing text in a file must have a token associated with it) then you may have a period of enforcement on pull requests, but you will be able to maximise the number of people working on the task.

Assigning the work of tokenising to one team or person in my opinion, is not advisable.

  1. The output of this work is somewhat silent until your translation is completed, because you don't have much to show when doing tokenisation. This could lead to a long running task where the outcome doesn't have a clear impact to your customers.
  2. If you have teams which operate in specific areas of the codebase, then you will inevitably step on peoples toes. Who is responsible for fixing any bugs produced by one team during translation, in another teams area of responsibility?

If all of this sounds too manual, you can consider using more advanced methods for creating a your list of tokens

When to start translating your dictionary will be a decision based on your TMS pricing structure and individual goals. If you are able to deploy a subsection of your application satisfactorily, then you may wish to translate as phrases are added & roll them out when ready.

For deployment, a useful approach might be to turn on translation for one section at a time. You can do this either by limiting where the tokenisation refactor is applied, or by marking some tokens as active / inactive.

For example, if you wish to enable multiple languages for tokens starting with app.login but not app.settings then this is possible in your translation function.

Syntactic Sugar

Some additional niceties which can be considered for developer experience or the quashing of bugs.

useTranslate

If you work in a React codebase, you may be used to using useEffect or similarly patterned functions, called hooks.

Using a hooks based approach can have some use in translation:

const { t } = useTranslate('shared.token.location.details')

const greeting = t('greeter')

const subtext = t('subtext')

In this example, we can see that we save some typing if we have many tokens in one file.

<Trans>

Your translation library of choice may offer a <Trans> component, which (in our case) was built for handling when your translation text includes some HTML. The exact API for this will vary between libraries, but pay attention to how your translate function handles HTML or special characters, lest you be visited by [object Object] in production - which we will visit in the next chapter.