From 628c0a84130ac86aca708585b34402f5aa167c1b Mon Sep 17 00:00:00 2001 From: JackieTien97 Date: Sun, 17 May 2026 16:32:28 +0800 Subject: [PATCH 1/2] Run datanode unit tests with forkCount=3 to speed up CI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Unit-Test workflow runs datanode UT with surefire forkCount=1 + reuseForks=false: one JVM per test class, sequential. On the latest master run that's ~38 min on Ubuntu and ~56 min on Windows for 597 test classes (3629 methods); a large fraction of that is JVM cold-start cost on Windows. iotdb-core/datanode/pom.xml already declares ${project.build.directory}/fork_${surefire.forkNumber} which only takes effect when forkCount > 1, indicating the original setup was prepared for parallel forks but never enabled them. Setting -DforkCount=3 in the workflow runs up to three surefire JVMs in parallel while keeping reuseForks=false, so each test class still gets a fresh JVM (same intra-fork isolation as today) and only cross-fork parallelism changes. Cross-fork conflict risk was checked by inspecting datanode/src/test: - No test code calls ServerSocket / DatagramSocket / .bind() / TServer.serve(). - No code reads surefire.forkNumber to allocate per-fork ports — none needed. - Three unreferenced resource directories (datanode{1,2,3}conf) hold hardcoded port values but no Java code uses them; they are dead resources. - examinePorts() in EnvironmentUtils.java only opens client sockets to 6667 and 5555 to check that nothing is listening after cleanup; harmless with parallel forks. - No test uses java.io.tmpdir or fixed absolute paths; relative paths are isolated by the per-fork workingDirectory. Memory budget on the 16 GB GH-hosted runners: 3 × -Xmx1024m + per-JVM overhead ~= 4 GB, comfortable headroom. CPU: 3 test forks + 1 surefire driver matches the 4 vCPUs of both ubuntu-latest and windows-latest. Expected wall-clock per master baseline (job 76358113571 windows, 76358113568 ubuntu): Windows datanode 56 -> ~22-28 min; Ubuntu datanode 38 -> ~16-20 min. The whole Unit-Test pipeline is gated by Windows datanode, so total drops from ~56 to ~22-28 min. Setting the flag in the workflow rather than in iotdb-core/datanode/pom.xml keeps local `mvn test` behavior unchanged. --- .github/workflows/unit-test.yml | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/.github/workflows/unit-test.yml b/.github/workflows/unit-test.yml index c53c8d89cfa9a..8a9a83fb917bd 100644 --- a/.github/workflows/unit-test.yml +++ b/.github/workflows/unit-test.yml @@ -62,7 +62,14 @@ jobs: - name: Test Datanode Module with Maven shell: bash if: ${{ matrix.it_task == 'datanode'}} - run: mvn clean integration-test -Dtest.port.closed=true -pl iotdb-core/datanode -am -DskipTests -Diotdb.test.only=true + # forkCount=3 runs up to 3 surefire JVMs in parallel. reuseForks=false + # is left on (set in iotdb-core/datanode/pom.xml) so each test class + # still gets a fresh JVM — only cross-fork parallelism changes. + # The pom already wires ...fork_${surefire.forkNumber} + # for filesystem isolation; datanode UTs do no socket binding (grep'd: + # zero ServerSocket / bind() / TServer.serve() calls in tests), so + # cross-fork resource conflicts are not a concern. + run: mvn clean integration-test -Dtest.port.closed=true -pl iotdb-core/datanode -am -DskipTests -Diotdb.test.only=true -DforkCount=3 - name: Test Other Modules with Maven shell: bash if: ${{ matrix.it_task == 'others'}} From 8dd21314fded4a5589bd4eb87e9bbbe7436b9b40 Mon Sep 17 00:00:00 2001 From: JackieTien97 Date: Sun, 17 May 2026 18:15:37 +0800 Subject: [PATCH 2/2] Bump surefire forkCount 3 -> 4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit forkCount=3 cut Unit-Test wall clock from ~56 to ~33 min on the Windows datanode job (-41%) and ~38 to ~22 min on Ubuntu (-40%). Both GH-hosted runners have 4 vCPUs, so trying forkCount=4 to see if the last available core squeezes out more. Resource budget at fc=4 on 16 GB runners: 4 × -Xmx1024m + per-JVM overhead ~= 5 GB; well within memory. CPU is now oversubscribed by 1 (4 forks + 1 surefire driver = 5 active processes on 4 cores), but reuseForks=false keeps test JVMs short-lived so context-switch cost should stay low. The remaining risk is disk IO contention on compaction/flush-heavy tests; if that dominates we'll see fc=4 equal-or-slower than fc=3 and back off. --- .github/workflows/unit-test.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/unit-test.yml b/.github/workflows/unit-test.yml index 8a9a83fb917bd..ccc0c2ca4b39e 100644 --- a/.github/workflows/unit-test.yml +++ b/.github/workflows/unit-test.yml @@ -62,14 +62,14 @@ jobs: - name: Test Datanode Module with Maven shell: bash if: ${{ matrix.it_task == 'datanode'}} - # forkCount=3 runs up to 3 surefire JVMs in parallel. reuseForks=false + # forkCount=4 runs up to 4 surefire JVMs in parallel. reuseForks=false # is left on (set in iotdb-core/datanode/pom.xml) so each test class # still gets a fresh JVM — only cross-fork parallelism changes. # The pom already wires ...fork_${surefire.forkNumber} # for filesystem isolation; datanode UTs do no socket binding (grep'd: # zero ServerSocket / bind() / TServer.serve() calls in tests), so # cross-fork resource conflicts are not a concern. - run: mvn clean integration-test -Dtest.port.closed=true -pl iotdb-core/datanode -am -DskipTests -Diotdb.test.only=true -DforkCount=3 + run: mvn clean integration-test -Dtest.port.closed=true -pl iotdb-core/datanode -am -DskipTests -Diotdb.test.only=true -DforkCount=4 - name: Test Other Modules with Maven shell: bash if: ${{ matrix.it_task == 'others'}}