Skip to content

feat: i18n support for log and exception messages#820

Open
JackieTien97 wants to merge 16 commits into
developfrom
worktree-tsfile-zh-locale
Open

feat: i18n support for log and exception messages#820
JackieTien97 wants to merge 16 commits into
developfrom
worktree-tsfile-zh-locale

Conversation

@JackieTien97
Copy link
Copy Markdown
Contributor

Summary

  • Adds runtime ResourceBundle-based i18n for all log and exception messages in java/common, java/tsfile, and java/tools
  • A single published JAR supports both English (default) and Simplified Chinese; locale is selected at JVM startup via -Dtsfile.locale=zh or Locale.getDefault()
  • 521 message keys, ~900+ call sites migrated across ~110 files
  • Includes complete Simplified Chinese translation
  • Drift-prevention unit tests enforce key-set alignment between messages.properties and messages_zh.properties at build time

Why runtime (vs. compile-time profile)

IoTDB's recent i18n PR (apache/iotdb#17613) uses a Maven profile to inline locale-specific strings into the JAR at compile time. TsFile cannot adopt that same model without either publishing two artifacts per release or forking the version, since downstreams (IoTDB and others) consume tsfile as a versioned dependency. A runtime ResourceBundle approach keeps a single published artifact while letting downstream IoTDB switch locale via a JVM startup option.

Locale resolution order at JVM startup (once at class load):

  1. -Dtsfile.locale=<tag> system property (e.g. zh, zh-CN)
  2. Locale.getDefault()
  3. English fallback via ResourceBundle's root-bundle mechanism

Architecture

  • org.apache.tsfile.i18n.Messages (in java/common) provides get(key) for SLF4J patterns ({} placeholders) and format(key, args) for exception patterns (%1$s positional placeholders).
  • A custom ResourceBundle.Control loads .properties files as UTF-8 (Java 8 default is ISO-8859-1) so Chinese characters can be written directly without \uXXXX escapes.
  • Resources live at java/common/src/main/resources/org/apache/tsfile/i18n/messages.properties (English, root bundle) and messages_zh.properties (Chinese).
  • MessagesTest enforces key-set alignment + all values non-empty in both locales.
  • Technical terms (TsFile, Chunk, Page, Schema, class/method names, type names, codec names like RLE/GORILLA) are kept in English in Chinese translations, matching IoTDB's convention.

IoTDB integration

IoTDB's Chinese build (the with-zh-locale profile in apache/iotdb#17613) should add -Dtsfile.locale=zh to its startup scripts (e.g. start-datanode.sh, start-confignode.sh) to also receive Chinese messages from tsfile. No coordinated release is required — IoTDB can adopt this whenever convenient.

Migration scope

Migrated all literal-string log and exception messages in:

Module Keys added
common 16
tsfile/write 70
tsfile/read/common 51
tsfile/read (rest) 84
tsfile/encoding 58
tsfile/file 17
tsfile/external 65
tsfile/encrypt 13
tsfile/compress 7
tsfile/fileSystem 41
tsfile/utils 37
tools 61
Total 521 (incl. 1 seed)

Each migration commit is its own logical scope so reviewers can review per-area.

Sites correctly skipped (not migrated):

  • Variable/data-only constructor args: throw new X(deviceId.toString()), throw new X(String.valueOf(type)), throw new X(getClass().getName())
  • Exception wrapping with no message: throw new X(e)
  • No-arg throws: throw new X()
  • Utility methods that pass through caller-supplied format strings (e.g. Validate.isTrue(boolean, String, Object...))

Test plan

  • All existing tests pass under default (English) locale: ./mvnw test -P with-java
  • MessagesTest drift-prevention tests pass — verifies en/zh key set alignment + all values non-empty
  • MessagesTest passes under -Duser.language=zh -Duser.country=CN (simulated Chinese locale)
  • Spot tests under -Dtsfile.locale=zh show Chinese output without test failures
  • ./mvnw spotless:check -P with-java clean
  • Key count verified: 521 keys in each .properties file

Replace all literal-string exception messages in the tsfile/write package
with Messages.get/format calls, adding 46 new keys under the
"# === write ===" section in both messages.properties and messages_zh.properties.
Replaces all inline string literals in throw/log statements across the
remaining tsfile/read packages (filter, query, reader, v4, and root
reader files) with Messages.get/format calls and corresponding keys in
messages.properties / messages_zh.properties.
…to i18n keys

Replace all literal-string-bearing log/exception messages in the encrypt,
compress, fileSystem, and utils sub-packages with Messages.get(key) or
Messages.format(key, args). Add aligned keys under four new sections
(# === encrypt ===, # === compress ===, # === fileSystem ===, # === utils ===)
in both messages.properties and messages_zh.properties.
This site was missed in the original Task 10 commit (bd0b26a) because
of an earlier amend issue. The key error.encrypt.sha256_not_found
already exists and is used at two other call sites in this same file.
Translate all 521 message values in messages_zh.properties to natural
Simplified Chinese, preserving:
- %1$s positional format specifiers (reordered when natural Chinese
  word order differs)
- {} SLF4J placeholders (not reordered)
- Technical terms (TsFile, Chunk, Page, Schema, class/method names,
  type names, codec names) kept in English per convention
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 18, 2026

Codecov Report

❌ Patch coverage is 12.75222% with 1081 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.11%. Comparing base (f7041d8) to head (a95c96f).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...a/org/apache/tsfile/read/TsFileSequenceReader.java 15.78% 48 Missing ⚠️
.../java/org/apache/tsfile/utils/TsPrimitiveType.java 0.00% 46 Missing ⚠️
...ava/org/apache/tsfile/utils/FilterDeserialize.java 0.00% 29 Missing ⚠️
...che/tsfile/read/filter/factory/ValueFilterApi.java 0.00% 28 Missing ⚠️
...va/org/apache/tsfile/encoding/decoder/Decoder.java 0.00% 27 Missing ⚠️
...g/apache/tsfile/external/commons/io/FileUtils.java 0.00% 25 Missing ⚠️
...java/org/apache/tsfile/compress/IUnCompressor.java 0.00% 22 Missing ⚠️
...in/java/org/apache/tsfile/write/record/Tablet.java 44.44% 20 Missing ⚠️
...org/apache/tsfile/encoding/decoder/RleDecoder.java 0.00% 19 Missing ⚠️
...g/apache/tsfile/encoding/decoder/FloatDecoder.java 20.00% 16 Missing ⚠️
... and 192 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #820      +/-   ##
===========================================
- Coverage    63.18%   61.11%   -2.08%     
===========================================
  Files          717      732      +15     
  Lines        43783    45364    +1581     
  Branches      6526     6740     +214     
===========================================
+ Hits         27664    27722      +58     
- Misses       15127    16650    +1523     
  Partials       992      992              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor Author

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

Found 1 issue (3 affected locations): two existing tests assert exact English exception message text for messages that this PR migrates to i18n, so they break under a non-English JVM locale. Each affected source-line is annotated inline below; the unmodified test files are linked from each comment.

Fix options: pin System.setProperty("tsfile.locale", "en") in those tests (same pattern MessagesTest.java already uses), or rewrite the assertions to be locale-insensitive.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

PR #820 review identified that some existing tests assert exact English
exception message text, which breaks under a non-English JVM locale.
Root-cause investigation revealed two distinct problems:

1. Messages.Utf8Control did not override getFallbackLocale, so when a
   user explicitly requested "en" via -Dtsfile.locale=en on a JVM with
   default locale zh_CN, ResourceBundle's default Control would ALSO
   load the zh bundle as a parent (because the default getFallbackLocale
   returns Locale.getDefault()). The zh bundle then became the leaf and
   BUNDLE.getString(key) returned Chinese text despite the en request.
   Fix: override getFallbackLocale to return null, keeping resolution
   strictly within the requested locale's candidate chain.

2. Many existing tests in java/tsfile and java/common assert on exact
   English exception text. Pin tsfile.locale=en via Surefire/Failsafe
   systemPropertyVariables in both modules so tests run against the
   English (root) bundle regardless of the developer's JVM default
   locale. A per-test static-init pattern was attempted first but
   discarded — class-load timing made it unreliable.

Verified under both default locale and -Duser.language=zh -Duser.country=CN:
all 854 java/tsfile tests pass; java/common MessagesTest 5/5 passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants