Skip to content

Speed up AINode CI by consolidating tests and caching PyInstaller output#17687

Merged
JackieTien97 merged 5 commits into
masterfrom
speedup-ainode-ci
May 17, 2026
Merged

Speed up AINode CI by consolidating tests and caching PyInstaller output#17687
JackieTien97 merged 5 commits into
masterfrom
speedup-ainode-ci

Conversation

@JackieTien97
Copy link
Copy Markdown
Contributor

Description

Speed up the Cluster IT - 1C1D1A CI pipeline (~52min → ~15-20min) via two optimizations:

1. Test consolidation (saves ~20min)

Merge 5 AINode IT test classes into a single AINodeSharedClusterIT that shares one 1C1D1A cluster:

  • AINodeDeviceManageIT
  • AINodeModelManageIT
  • AINodeCallInferenceIT
  • AINodeForecastIT
  • AINodeInstanceManagementIT

This reduces cluster startups from 8 to 3 (SharedCluster + ClusterConfig + ConcurrentForecast).

Also converts AINodeClusterConfigIT from @Before/@After to @BeforeClass/@AfterClass, eliminating one redundant cluster restart by merging both dialect tests into a single method.

The 1C1D1A cluster startup takes ~3.5min each time (ConfigNode + DataNode + AINode), so eliminating 5 restarts saves ~17.5min. The actual test execution time is only ~9min out of the original 52min total.

2. PyInstaller dist caching (saves ~11min)

Added hash-based caching to build_binary.py:

  • Computes SHA256 of all AINode source files, pyproject.toml, poetry.lock, and ainode.spec
  • Caches the dist/ output at ~/.cache/iotdb-ainode-build/dist-cache/ (outside project dir, survives mvn clean)
  • On cache hit, restores dist/ directly and skips the entire PyInstaller analysis + packaging phase

The PyInstaller phase scans thousands of hidden imports from torch/transformers/numpy and takes ~11min. When AINode source hasn't changed, this is entirely redundant.

Testing

All existing test logic is preserved — the tests are reorganized, not changed. AINodeConcurrentForecastIT remains separate (different data setup and LOAD/UNLOAD side effects).

- Merge 5 AINode IT test classes (DeviceManage, ModelManage, CallInference,
  Forecast, InstanceManagement) into AINodeSharedClusterIT that shares a
  single 1C1D1A cluster, reducing cluster startups from 8 to 3 (~20min saved)
- Convert AINodeClusterConfigIT from @Before/@after to @BeforeClass/@afterclass,
  merging both dialect tests into one method to eliminate a redundant cluster restart
- Add hash-based dist caching to build_binary.py: computes SHA256 of AINode source
  files and skips PyInstaller rebuild when source hasn't changed (~11min saved)
- Cache stored at ~/.cache/iotdb-ainode-build/dist-cache/, survives mvn clean
@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.43%. Comparing base (3145e83) to head (0002ce3).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #17687   +/-   ##
=========================================
  Coverage     40.42%   40.43%           
  Complexity     2574     2574           
=========================================
  Files          5179     5179           
  Lines        349261   349262    +1     
  Branches      44683    44683           
=========================================
+ Hits         141206   141232   +26     
+ Misses       208055   208030   -25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The compute_source_hash() function was including files from the build/
directory (generated by PyInstaller during the same run), causing the hash
computed at check time (before build/) to differ from the hash at save time
(after build/ exists). This made the cache always miss.

Fix: exclude build/, dist/, and __pycache__/ from the hash computation.
These are all build artifacts that don't affect the PyInstaller input.
Copy link
Copy Markdown
Contributor Author

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verdict: LGTM

The test logic is faithfully preserved — this is a reorganization, not a behavioral change. The caching implementation is sound.


Observations

Test Consolidation (AINodeSharedClusterIT.java)

  • All test methods from the original 5 classes are present and structurally identical.
  • @BeforeClass correctly calls both prepareDataInTree() and prepareDataInTable() to satisfy all tests.
  • AINodeConcurrentForecastIT correctly left separate (different data setup, LOAD/UNLOAD side effects).
  • AINodeClusterConfigIT correctly left separate (it does REMOVE AINODE which would break the shared cluster).

AINodeClusterConfigIT refactoring

  • Conversion from @Before/@After to @BeforeClass/@AfterClass eliminates one redundant cluster restart.
  • The merged test now verifies both dialects in a single method with proper sequencing. Cleaner than the original.

PyInstaller caching (build_binary.py)

  • compute_source_hash correctly excludes build/, dist/, __pycache__; includes all relevant sources + poetry.lock.
  • shutil.copytree(..., symlinks=True) is the right call for PyInstaller output with symlinked shared libs.
  • Hashes both relative path and file content — prevents collisions from file renames.

One Suggestion (non-blocking)

Consider adding Python version to the cache hash to avoid stale cache if the interpreter gets upgraded on CI:

hasher.update(sys.version.encode())

Since poetry.lock is already hashed, PyInstaller version changes are covered. But Python interpreter version is not reflected in any of the hashed files.

@sonarqubecloud
Copy link
Copy Markdown

@JackieTien97 JackieTien97 merged commit 2f57fd6 into master May 17, 2026
31 checks passed
@JackieTien97 JackieTien97 deleted the speedup-ainode-ci branch May 17, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant