i18n: Doing it again
Having completed a project like this recently, here are some of my opinions on what I would do differently, if I were to do this work again.
Start with a dictionary
If you are thinking about building an application, think about translation early on. Creating a dictionary is a useful exercise in examining the structure and voice of your application, and this benefit itself is enough to justify the effort. It can help you answer these questions:
- Is the tone of voice in your software consistent?
- Are there phrases which are used repeatedly throughout your software?
Structuring the dictionary as a nested JSON allows you to generate a "tree" or graph of your application, where each leaf is a phrase.
Examining this representation allows you to see which areas of your application are most verbose. Similarly you can quite easily spot duplication and identify the common denominator "node" between two duplicate phrases.
The advantages of doing this early are:
- In early iterations of your application, you may not nail exactly the right way to phrase something. By maintaining a dictionary system, you decouple the work of tweaking this text from the development lifecycle - you can update your application copy without asking a developer to do so.
- Representing your application as a tree is a useful way to keep your application organised. As you develop new features, you can think about which branch it would live on (or if you are developing a new branch entirely you can figure out from where it should grow).
Doing this refactor is not easy once your codebase is complex. You will invariably say "Yes this area is translated!" and more text will emerge as bugs in your codebase.
Tag first, turn it on later
This will help you understand how big your dictionary is. You can do this incrementally, with low risk of regression in one of two ways. Firstly, you can pass a fall back parameter to the translate function:
function App () {
return <Home>{t('app.greeter', 'Welcome!')}</Home>
}
Whilst you are tokenising your codebase, your translation function can just return the fall back parameter. When you are ready to start translating, you can programmatically produce the dictionary file by looking at what tokens are called with the translate function.
Alternatively, you can do something like the following:
function App () {
return <Home translationKey="app.greeter">Welcome!</Home>
}
When you are ready to translate, you can search for all locations that use translationKey
and refactor these appropriately.
If you have a component library or wrap another one, you can implement the translation functionality inside the component itself:
function Home (props) {
return <div>{translate(props.translationKey)}</div>
}
Text is voice
This is important to know before you start work. There will inevitably be some typos generated in the course of tokenising the codebase, so determining who in your organisation, is responsible for what text, should be a top priority.
Having an access log as part of your translation management suite may be a requirement for you. It is quite important to tightly control write access to your dictionary file.
Doing this refactor decouples the copy you use in your application from the application itself. This means that now, you can make changes to your applications text through your translation management suite, and those changes should update in real time, without redeploying your application.
Understand that with great power comes great responsibility! Who is responsible for how your application speaks to its users? What is the risk to your business if this text is modified maliciously? Without access control or an audit log, you may not be able to provide answers to these questions. Traditionally, if there is a typo then a developer could be responsible for fixing this - now you don't need technical knowledge to resolve this, you can use your translation management system.
Pay attention when interpolating
When doing interpolation, pay attention to how your translate function works.
It should handle cases when the value you are trying to interpolate is undefined or not available, for example:
<Home>{translate('app.home.greeter', {name: user.name})}</Home>
If this value is undefined, your translate function should handle this gracefully, rather than returning Welcome undefined!
In addition, when assigning text to variables, pay attention to how the compiler will interpret your code. If you have a value requiring interpolation marked as a constant, then this value may not update when you expect it to. Additionally, when using React or other frameworks, be sure to understand the rendering lifecycle works: will the interpolated value for this translation be available? If you compile a file which has a phrase requiring interpolation before this value is present, then your interpolation will fail.
Tag more
Once you have completed your first pass of tokenising, you will inevitably have some phrases hidden in modals, status messages and other nooks of your application.
Add these to your dictionary file, and if appropriate spend some time cleaning it up.
You may have phrases where a token name has been updated, either in your TMS or in code - now this token points to nothing, or your phases is unused but still being charged for.
You may have included some phrases erroneously in the process of development. Now may also be an appropriate time to run a spell checker.
Testing Dictionary
Once you have most of the phrases in your application tokenised, it may be useful to create a virtual dictionary. This is a copy of your dictionary, but instead of your application copy you return a known testing value:
{
"en": {
"hello": "TESTING_VALUE",
"login": {
"greeting": "TESTING_VALUE",
"button": "TESTING_VALUE"
}
}
}
If you store this in your TMS, you will pay for storage and access - it may be cheaper to have this cloned programmatically via API either on a new deployment or periodically.
Having this dictionary is useful for identifying which text is not tokenised - if you load your application in dictionary testing mode, then any text which is not TESTING_VALUE
hasn't been integrated into your tokenisation system.
next: The Future