Automatically sentence-case i18next translations

We use i18next to handle our localization requirement. We have written in great detail how we use i18next and react-i18next libraries in our applications.

As our translations grew, we realized instead of adding every combination of the texts as separate entries in the translation file, we can reuse most of them by utilizing the i18next interpolation feature.

Interpolation is one of the most used functionalities in i18n. It allows integrating dynamic values into our translations.

1{
2  "key": "{{what}} is {{how}}"
3}

1i18next.t("key", { what: "i18next", how: "great" });
2// -> "i18next is great"

Problem

As we started to use interpolation more and more, we started seeing lot of text with irregular casing. For instance, in one of our apps, we have an Add button in a few pages.

1{
2  "addMember": "Add a member",
3  "addWebsite": "Add a website"
4}

Instead of adding each text as an entry in the translation file as shown above, we took a bit of a generic approach and started using interpolation. Now our translation files started to look like this.

1{
2  "add": "Add a {{entity}}",
3  "entities": {
4    "member": "Member",
5    "website": "Website"
6  }
7}

This is great, but it has a slight problem. The final text formed looked like this.

1Add a Member

We can see the Member is still capitalized, we needed it to be properly sentence-cased like this.

1Add a member

We first thought we would just add .toLocaleLowerCase() to the dynamic value.

1t("add", { entity: t("entities.member").toLocaleLowerCase() });

It worked fine. But often, developers would forget to add .toLocaleLowerCase() in a lot of places. Secondly, it started to pollute our code with too much .toLocaleLowerCase().

As always, we decided to extract this problem to our neeto-commons-frontend package.

Solutions we looked at

At first, it seemed like a very simple problem. We thought we can just use the post-processor feature. We just need to sentence-case the entire text on post-process like this.

1const sentenceCaseProcessor = {
2  type: "postProcessor",
3  name: "sentenceCaseProcessor",
4  process: text => {
5    // Sentence-case text.
6    return (
7      text.charAt(0).toLocaleUpperCase() + text.slice(1).toLocaleLowerCase()
8    );
9  },
10};
11
12i18next
13  .use(LanguageDetector)
14  .use(initReactI18next)
15  .use(sentenceCaseProcessor)
16  .init({
17    resources: resources,
18    fallbackLng: "en",
19    interpolation: {
20      escapeValue: false,
21      skipOnVariables: false,
22    },
23    postProcess: [sentenceCaseProcessor.name],
24  });

Voila! Now onwards all the texts will be properly sentence-cased, we no longer need to add .toLocaleLowerCase(). Great? Not really.

We soon realized that not every text should be sentence-cased, there are a lot of cases where we need to preserve the original casing. Here are some examples.

1Your file is larger than 2MB.
2Disconnect Google integration?
3No results found with your search query "Oliver".
4Your Api Key: AJg3c4TcXXXXXXXXX
5No internet, neetoForm is offline.

These examples clearly show why it's not a simple problem. We require a more targeted and nuanced solution. Upon revisiting the issue, we found that our initial solution of adding .toLocaleLowerCase() does work, but it's a bit verbose.

So we decided to try custom formatters. So instead of adding .toLocaleLowerCase() we created a nice custom formatter called lowercase.

1i18next.services.formatter.add("lowercase", (value, lng, options) => {
2  return value.toLocaleLowerCase();
3});

1{
2  "add": "Add a {{entity, lowercase}}",
3  "entities": {
4    "member": "Member",
5    "website": "Website"
6  }
7}

This works perfectly, but it doesn't solve the verbosity problem. Instead of adding .toLocaleLowerCase() in JavaScript files, we're now adding it in translation JSON files - essentially just moving the problem to a different place.

We needed a better solution that required minimal effort.

The idea here is to lowercase all dynamic values by default and create a formatter to handle exceptions. To achieve this, we combined our previous post-processor and a new formatter. The new formatter which we called anyCase can be used to flag any dynamic part in the text that needs to be excluded from lowercasing. The post-processor will ignore these particular parts of the text while sentence-casing.

1const ANY_CASE_STR = "__ANY_CASE__";
2i18next.services.formatter.add("anyCase", (value, lng, options) => {
3  return ANY_CASE_STR + value + ANY_CASE_STR;
4});

1{
2  "message": "Your file is larger than {{size, anyCase}}"
3}

The post-processor we wrote attempted to identify these parts of the text marked by anyCase formatter using pattern matching and retaining the original casing. However, this approach failed when the text contained identical words in both the dynamic and static parts of the text. It ended up lowercasing both words, which is not the output we needed.

Final solution

Before we discuss the final solution, i18next recently changed how a formatter is added, which is what we have been using so far, like below.

1i18next.services.formatter.add("underscore", (value, lng, options) => {
2  return value.replace(/\s+/g, "_");
3});

Before this, i18next had different syntax, which they now call legacy formatting is like below.

1i18next.use(initReactI18next).init({
2  resources: resources,
3  fallbackLng: "en",
4  interpolation: {
5    format: (value, format, lng, options) => {
6      // All our formatters should go here.
7    },
8  },
9});

Now back to our original problem.

We need to make sure when applying formatting it only formats dynamic parts. For this, we found that if we use the legacy version of formatting, it offers an option called alwaysFormat: true. One thing to remember here is if we choose to use this flag, the latest style of formatting does not work. That means we need to move all our custom formatters to legacy format function.

1i18next.use(initReactI18next).init({
2  resources: resources,
3  fallbackLng: "en",
4  interpolation: {
5    escapeValue: false,
6    skipOnVariables: false,
7    alwaysFormat: true,
8    format: (value, format, lng, options) => {
9      // All your formatters should go here.
10    },
11  },
12});

This is not a problem for us, because we are already maintaining all our custom formatter in one place(neeto-commons-frontend package). Now the formatter is applied to every dynamic text. This approach also overcame the "identical words in the text problem" that we encountered with the previous version of the formatter. Let's look at our updated formatter.

1const LOWERCASED = "__LOWERCASED__";
2const lowerCaseFormatter = (value, format) => {
3  if (!value || format === ANY_CASE || typeof value !== "string") {
4    return value;
5  }
6  return LOWERCASED + value.toLocaleLowerCase();
7};

To elaborate on the code, the formatter lowercases all dynamic texts and prefixes them with __LOWERCASED__. This prefixing is necessary because the formatter lacks information about where this specific piece of text originally appeared in the complete text. By adding this prefix, if the lowercased text happens to be the first part of the output, we can revert it during the post-processing stage. And that's precisely what we accomplished in the post-processor.

1const sentenceCaseProcessor = {
2  type: "postProcessor",
3  name: "sentenceCaseProcessor",
4  process: value => {
5    const shouldSentenceCase = value.startsWith(LOWERCASED); // Check if first word is lowercased.
6    value = value.replaceAll(LOWERCASED, ""); // Remove all __LOWERCASED__
7
8    return shouldSentenceCase ? sentenceCase(value) : value;
9  },
10};

Below is everything put together, If you're interested in a working example of the same, checkout this gist.

1const LOWERCASED = "__LOWERCASED__";
2const ANY_CASE = "anyCase";
3
4const sentenceCase = value =>
5  value.charAt(0).toLocaleUpperCase() + value.slice(1);
6
7const lowerCaseFormatter = (value, format) => {
8  if (!value || format === ANY_CASE || typeof value !== "string") {
9    return value;
10  }
11  return LOWERCASED + value.toLocaleLowerCase();
12};
13
14const sentenceCaseProcessor = {
15  type: "postProcessor",
16  name: "sentenceCaseProcessor",
17  process: value => {
18    const shouldSentenceCase = value.startsWith(LOWERCASED);
19    value = value.replaceAll(LOWERCASED, "");
20
21    return shouldSentenceCase ? sentenceCase(value) : value;
22  },
23};
24
25i18next
26  .use(LanguageDetector)
27  .use(initReactI18next)
28  .use(sentenceCaseProcessor)
29  .init({
30    resources: resources,
31    fallbackLng: "en",
32    interpolation: {
33      escapeValue: false,
34      skipOnVariables: false,
35      alwaysFormat: true,
36      format: (value, format, lng, options) => {
37        // other formatters
38        return lowerCaseFormatter(value, format);
39      },
40    },
41    postProcess: [sentenceCaseProcessor.name],
42    detection: {
43      order: ["querystring", "cookie", "navigator", "path"],
44      caches: ["cookie"],
45      lookupQuerystring: "lang",
46      lookupCookie: "lang",
47    },
48  });

If you liked this blog, you might also like the other blogs we have written. Check out the full archive.

Implementation of a universal timer

Labeeb Latheef

March 26, 2024

Bundle Splitting

Labeeb Latheef

January 30, 2024

Why did we build a custom ESLint plugin?

Amaljith K

August 22, 2023