Skip to content
This repository has been archived by the owner on May 2, 2022. It is now read-only.

Sync missing translation keys in app/locales/*.json files to a Google Sheet #481

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ yarn-debug.log*
yarn-error.log*
lerna-debug.log*

*-credentials.json

# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json

Expand Down
172 changes: 172 additions & 0 deletions ops/lost-in-translation.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
const { parse } = require('path');
const { readdirSync } = require('fs');
const { spawnSync } = require('child_process');
const { GoogleSpreadsheet } = require('google-spreadsheet');

/**
* This script finds all the english translation keys across
* all commits (including branches and pull requests) for each
* locale.json in app/locales. Afterwards it looks at each locale
* separately to see if it's missing any one the english keys.
*
* Important: You need to run it from the root directory.
*
* Usage:
* node ops/lost-in-translation.js
*/

/**
* Calls git cli with arguments and returns stdout.
*/
function git(...args) {
const { stdout } = spawnSync('git', args, { encoding: 'utf-8'});
return stdout;
}

/**
* Returns the commit hashes for a file path across all local branches.
*/
function findCommitHashesForFile(filePath) {
const output = git('log', '--pretty=format:"%h"', '--all', '--', filePath);
const hashes = output.split('\n').map(line => line.replace(/"/g, ''));
return hashes;
}

/**
* Returns the JSON representation of the contents of a file at certain commit hash.
*/
function retrieveJSONForFileAtCommitHash(filePath, commitHash) {
const contentsAtCommitHash = git('show', `${commitHash}:${filePath}`);
try {
return JSON.parse(contentsAtCommitHash);
} catch (error) {
// Sometimes the file is not in a proper JSON format. Simply return nothing in that case.
return {};
}
}

/**
* Normalizes translation key like we do in app/server.ts.
*/
function normalizeTranslationKey(translationKey) {
return translationKey.replace(/[\s\n\t]+/g, ' ').trim();
}

/**
* Find all the locales (ie. en-IN, no, se) in the provided directory.
*/
function retrieveAllLocales(directoryPath) {
const filenames = readdirSync(directoryPath);
const locales = filenames.map(filename => parse(filename).name);
return locales;
}

/**
* Add rows to a Google Spreadsheet that are not already added.
*/
async function addUniqueRowsToGoogleSheet(sheet, rowsToAdd) {
const alreadyAddedRows = await sheet.getRows({ limit: 1000 });
const uniqueRowsToAdd = rowsToAdd.filter(rowToAdd =>
!alreadyAddedRows.find(alreadyAddedRow => alreadyAddedRow.key === rowToAdd.key));
await sheet.addRows(uniqueRowsToAdd);
return uniqueRowsToAdd;
}

/**
* Step 1: Find all the (english) translation keys across all branches and PRs.
*
* If a Dutch developer has made a feature in a branch, we expect that him/her added a key
* to one or many locale files (usually the english locale (en.json) or to their own locale
* file (nl.json), or both).
*
* But they probably haven't added translations to all the other locales, because they don't
* speak the other languages. And this is the problem, because now we need find people to help
* translate the added keys to all the other locales.
*
* So what we do here is simply to find *all* the english translation keys across the entire
* project. That means looking at all the english translations keys in all the locale files
* since the start of the project across all branches and PRs. We throw them into a set that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds unnecessary to me, as it would also include WIP's and stale branches. But it might be some cases where this makes sense that I have not thought about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the part that sucks, but having it would make sure that all translations are added and (hopefully) translated by the time we want to merge it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I understand. So a possible solution would be to run this if this run as a cronjob or something, so that the sheet is updated as soon as there are new texts in the repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so in the case with Stano's PR, those translations are now added to the sheet (even though his PR is WIP). Not sure if we want that or not, but we could try it out?

* we use in the next step.
*/
const allLocales = retrieveAllLocales('app/locales/');
const allEnglishTranslationKeys = new Set([]);
for (const locale of allLocales) {
const filePath = `app/locales/${locale}.json`;
for (const commitHash of findCommitHashesForFile(filePath)) {
const translation = retrieveJSONForFileAtCommitHash(filePath, commitHash);
for (const translationKey of Object.keys(translation)) {
allEnglishTranslationKeys.add(normalizeTranslationKey(translationKey));
}
}
}

/**
* Step 2: For each locale file check if there are any missing english translation keys
* across all branches and PRs and all commits/changes.
*
* If there's a missing english translation key, we know that we're most likely missing
* a translation for that locale. So what we need to do is to translate it, or get help
* to translate it.
*
* We throw it into a dictionary where the key is the locale and the value is a set of
* missing translations from english to that locale.
*/
const translationKeysByLocale = {};
for (const locale of allLocales) {
const filePath = `app/locales/${locale}.json`;
for (const commitHash of findCommitHashesForFile(filePath)) {
const translation = retrieveJSONForFileAtCommitHash(filePath, commitHash);
const translationKeys = Object.keys(translation).map(key => normalizeTranslationKey(key));
if (translationKeysByLocale[locale] === undefined) {
translationKeysByLocale[locale] = new Set([]);
}
translationKeysByLocale[locale] = new Set([...translationKeysByLocale[locale], ...translationKeys]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel stuff like this is mentally much easier to visualize when translationKeysByLocale is typed :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can tweak that, wanted to get it out quickly before bed

}
}

/**
* Step 3: Add rows of missing translations to Google Spreadsheet.
michaelmcmillan marked this conversation as resolved.
Show resolved Hide resolved
*
* https://docs.google.com/spreadsheets/d/1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw
*/
(async () => {
const doc = new GoogleSpreadsheet('1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw');
await doc.useServiceAccountAuth(require('./coronastatus-translation-486cef09736e-credentials.json'));
await doc.loadInfo();

const allLocales = retrieveAllLocales('app/locales/');
for (let sheetIndex = 0; sheetIndex < doc.sheetCount; sheetIndex++) {
const sheet = doc.sheetsByIndex[sheetIndex];
for (const locale of allLocales) {
if (sheet.title !== locale) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, there is one sheet per locale?

Is it then necessary to loop through the sheets here, can't we just loop through the locales and try to add them (from the code it seems like it throws an error if it does not exist)?

try {
          await doc.addSheet({ title: locale, headerValues: ['key', 'translation'] });
        } catch (error) {
          // We don't do anything if the sheet for this locale exists.
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, take a look at the sheet here: https://docs.google.com/spreadsheets/d/1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw/edit#gid=462835362

The reason I'm looping over locales is because that way we will create sheets for new locales automatically. There is no way in the API to retrieve the sheet names in the spreadsheet. This way we don't have to worry about keeping locales and sheets in sync (it will sort it self out).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't read your comment thoroughly enough. Yeah, maybe we can get away without that loop.

Copy link
Member Author

@michaelmcmillan michaelmcmillan Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha getting late... The reason we have to loop over those are because there is no way of retrieving a sheet by its name. Does that make sense? The doc.addSheet doesn't return the sheet if it doesn't exist, it only throws an exception. So we have to, at some point, look at all the sheets to check if they match the current locale we are looping over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the for-loop ended after adding a sheet. A little hard to wrap my head around the logic, but as long as it works it is more than good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Just refactored it, better now?

continue;
}

try {
// Create a sheet if it doesn't already exist.
await doc.addSheet({ title: locale, headerValues: ['key', 'translation'] });
await new Promise(resolve => setTimeout(resolve, 10*1000));
} catch (error) {
// We don't do anything if the sheet for this locale exists.
}

// Add the missing rows by looking at the key column.
const rows = [];
for (const englishTranslationKey of allEnglishTranslationKeys) {
if (!translationKeysByLocale[locale].has(englishTranslationKey)) {
const row = { 'key': englishTranslationKey, translation: '' };
rows.push(row);
}
}

// Print out how many rows we added.
const addedRows = await addUniqueRowsToGoogleSheet(sheet, rows);
console.log(`Added ${addedRows.length} of ${rows.length} missing translations to the ${locale} sheet.`);

// Avoid getting rate limited by Google's API (max writes per 100 seconds).
console.log('Waiting before processing the next sheet.');
await new Promise(resolve => setTimeout(resolve, 20*1000));
}
}

})();
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"ejs": "^3.0.1",
"express": "^4.17.1",
"express-rate-limit": "^5.1.1",
"google-spreadsheet": "^3.0.10",
"i18n": "^0.8.6",
"node-fetch": "^2.6.0",
"pg": "^7.18.2",
Expand Down
Loading