Conversation
Generates before/after images for 6 test images (3 synthetic + 3 real photos) with 5 runs each for statistical rigor. Includes paired t-test results, diff heatmaps, and a summary COMPARISON.md documenting energy scores across all configurations (all-off, proposals 1-5, all 6 proposals). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new end-to-end “final comparison” benchmark/test that runs the generator with all features OFF vs all features ON (including progressive resolution), repeats runs for basic statistical analysis, and writes a Markdown report plus before/after/target/diff images under test-results/final-comparison/.
Changes:
- Add
FinalComparisonTestto generate images, compute summary stats, and emitCOMPARISON.md. - Commit the resulting comparison artifacts (targets, before/after renders, diff heatmaps, and photo copies).
- Add a
gradlewwrapper script to the repo.
Reviewed changes
Copilot reviewed 2 out of 30 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test-results/final-comparison/COMPARISON.md | Generated report summarizing per-image scores, significance, and linking rendered artifacts |
| test-results/final-comparison/edges_after.png | Generated “all ON” output image for edges case |
| test-results/final-comparison/edges_before.png | Generated “all OFF” output image for edges case |
| test-results/final-comparison/edges_diff.png | Generated diff heatmap for edges case |
| test-results/final-comparison/edges_target.png | Generated target/reference image for edges case |
| test-results/final-comparison/landscape_after.png | Generated “all ON” output image for landscape case |
| test-results/final-comparison/landscape_before.png | Generated “all OFF” output image for landscape case |
| test-results/final-comparison/landscape_diff.png | Generated diff heatmap for landscape case |
| test-results/final-comparison/landscape_target.png | Generated target/reference image for landscape case |
| test-results/final-comparison/nature_after.png | Generated “all ON” output image for nature case |
| test-results/final-comparison/nature_before.png | Generated “all OFF” output image for nature case |
| test-results/final-comparison/nature_diff.png | Generated diff heatmap for nature case |
| test-results/final-comparison/nature_target.png | Generated target/reference image for nature case |
| test-results/final-comparison/photo_detail_after.png | Generated “all ON” output image for photo_detail case |
| test-results/final-comparison/photo_detail_before.png | Generated “all OFF” output image for photo_detail case |
| test-results/final-comparison/photo_detail_diff.png | Generated diff heatmap for photo_detail case |
| test-results/final-comparison/photo_detail_target.png | Generated target/reference image for photo_detail case |
| test-results/final-comparison/portrait_after.png | Generated “all ON” output image for portrait case |
| test-results/final-comparison/portrait_before.png | Generated “all OFF” output image for portrait case |
| test-results/final-comparison/portrait_diff.png | Generated diff heatmap for portrait case |
| test-results/final-comparison/portrait_target.png | Generated target/reference image for portrait case |
| test-results/final-comparison/river_after.png | Generated “all ON” output image for river case |
| test-results/final-comparison/river_before.png | Generated “all OFF” output image for river case |
| test-results/final-comparison/river_diff.png | Generated diff heatmap for river case |
| test-results/final-comparison/river_target.png | Generated target/reference image for river case |
| test-results/final-comparison/photos/landscape.png | Committed photo input copy used by final comparison |
| test-results/final-comparison/photos/portrait.png | Committed photo input copy used by final comparison |
| test-results/final-comparison/photos/river.png | Committed photo input copy used by final comparison |
| src/test/java/com/bobrust/generator/FinalComparisonTest.java | New benchmark-style test that runs comparisons, computes stats, and writes artifacts |
| gradlew | Adds Gradle wrapper start script |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Runs the generator with SA, error-guided placement, and adaptive sizing OFF. | ||
| * - No error map (uniform random placement) | ||
| * - No gradient map (uniform random sizing) | ||
| * - Classic hill climbing (no SA) | ||
| * - Energy evaluation still uses whatever is compiled (batch-parallel doesn't affect quality) | ||
| * - No progressive resolution | ||
| * | ||
| * This mirrors the original algorithm before any proposals were added. |
There was a problem hiding this comment.
The Javadoc says the "all features OFF" baseline uses "classic energy", but Worker.getEnergy() always calls BorstCore.differencePartialThread(...), which will use the batch-parallel combined pass when AppConstants.USE_BATCH_PARALLEL is true. That means the baseline isn’t actually forcing classic energy evaluation. Consider either (a) explicitly routing baseline energy through differencePartialThreadClassic(...) (e.g., via a specialized Worker/State for this test) or (b) updating the doc/comments to reflect that baseline energy depends on compile-time flags.
| int times = Math.max(1, Runtime.getRuntime().availableProcessors() / 2); | ||
|
|
||
| for (int i = 0; i < maxShapes; i++) { | ||
| worker.init(model.current, model.score); |
There was a problem hiding this comment.
This baseline setup directly reads model.score (a protected field). Using the public accessor (model.getScore()) would make this test less coupled to Model internals and less fragile if visibility changes.
| worker.init(model.current, model.score); | |
| worker.init(model.current, model.getScore()); |
| double x = t * (1 - 1.0 / (4 * df)) / Math.sqrt(1 + t * t / (2.0 * df)); | ||
| return 1.0 - normalCDF(x); | ||
| } | ||
|
|
||
| private static double normalCDF(double x) { | ||
| if (x < -8) return 0; | ||
| if (x > 8) return 1; | ||
| double t = 1.0 / (1.0 + 0.2316419 * Math.abs(x)); | ||
| double d = 0.3989422804014327; | ||
| double p = d * Math.exp(-x * x / 2.0) * | ||
| (t * (0.319381530 + t * (-0.356563782 + t * (1.781477937 + t * (-1.821255978 + t * 1.330274429))))); | ||
| return x > 0 ? 1.0 - p : p; | ||
| } |
There was a problem hiding this comment.
The p-value calculation is labeled/reporting as a "paired one-sided t-test", but approximatePValue(...) uses a normal approximation (and ignores the Student’s t CDF), which can be noticeably inaccurate for df=4 (NUM_RUNS=5). Either compute the one-sided p-value from a proper t distribution (e.g., via a small stats utility or a dependency like Commons Math) or update the report text to clearly state it’s a normal approximation of the t-statistic.
| double x = t * (1 - 1.0 / (4 * df)) / Math.sqrt(1 + t * t / (2.0 * df)); | |
| return 1.0 - normalCDF(x); | |
| } | |
| private static double normalCDF(double x) { | |
| if (x < -8) return 0; | |
| if (x > 8) return 1; | |
| double t = 1.0 / (1.0 + 0.2316419 * Math.abs(x)); | |
| double d = 0.3989422804014327; | |
| double p = d * Math.exp(-x * x / 2.0) * | |
| (t * (0.319381530 + t * (-0.356563782 + t * (1.781477937 + t * (-1.821255978 + t * 1.330274429))))); | |
| return x > 0 ? 1.0 - p : p; | |
| } | |
| if (df <= 0) { | |
| throw new IllegalArgumentException("Degrees of freedom must be positive"); | |
| } | |
| double x = df / (df + t * t); | |
| return 0.5 * regularizedBeta(x, df / 2.0, 0.5); | |
| } | |
| private static double regularizedBeta(double x, double a, double b) { | |
| if (x <= 0.0) return 0.0; | |
| if (x >= 1.0) return 1.0; | |
| double bt = Math.exp( | |
| logGamma(a + b) - logGamma(a) - logGamma(b) | |
| + a * Math.log(x) + b * Math.log(1.0 - x) | |
| ); | |
| if (x < (a + 1.0) / (a + b + 2.0)) { | |
| return bt * betaContinuedFraction(x, a, b) / a; | |
| } | |
| return 1.0 - bt * betaContinuedFraction(1.0 - x, b, a) / b; | |
| } | |
| private static double betaContinuedFraction(double x, double a, double b) { | |
| final int maxIterations = 10000; | |
| final double epsilon = 3.0e-14; | |
| final double fpMin = 1.0e-300; | |
| double qab = a + b; | |
| double qap = a + 1.0; | |
| double qam = a - 1.0; | |
| double c = 1.0; | |
| double d = 1.0 - qab * x / qap; | |
| if (Math.abs(d) < fpMin) d = fpMin; | |
| d = 1.0 / d; | |
| double h = d; | |
| for (int m = 1; m <= maxIterations; m++) { | |
| int m2 = 2 * m; | |
| double aa = m * (b - m) * x / ((qam + m2) * (a + m2)); | |
| d = 1.0 + aa * d; | |
| if (Math.abs(d) < fpMin) d = fpMin; | |
| c = 1.0 + aa / c; | |
| if (Math.abs(c) < fpMin) c = fpMin; | |
| d = 1.0 / d; | |
| h *= d * c; | |
| aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2)); | |
| d = 1.0 + aa * d; | |
| if (Math.abs(d) < fpMin) d = fpMin; | |
| c = 1.0 + aa / c; | |
| if (Math.abs(c) < fpMin) c = fpMin; | |
| d = 1.0 / d; | |
| double delta = d * c; | |
| h *= delta; | |
| if (Math.abs(delta - 1.0) < epsilon) { | |
| break; | |
| } | |
| } | |
| return h; | |
| } | |
| private static double logGamma(double x) { | |
| double[] coefficients = { | |
| 676.5203681218851, | |
| -1259.1392167224028, | |
| 771.3234287776531, | |
| -176.6150291621406, | |
| 12.507343278686905, | |
| -0.13857109526572012, | |
| 9.984369578019572e-6, | |
| 1.5056327351493116e-7 | |
| }; | |
| if (x < 0.5) { | |
| return Math.log(Math.PI) - Math.log(Math.sin(Math.PI * x)) - logGamma(1.0 - x); | |
| } | |
| x -= 1.0; | |
| double sum = 0.99999999999980993; | |
| for (int i = 0; i < coefficients.length; i++) { | |
| sum += coefficients[i] / (x + i + 1.0); | |
| } | |
| double tmp = x + coefficients.length - 0.5; | |
| return 0.5 * Math.log(2.0 * Math.PI) + (x + 0.5) * Math.log(tmp) - tmp + Math.log(sum); | |
| } |
| // Real photos | ||
| File photoDir = new File("test-results/final-comparison/photos"); | ||
| photoDir.mkdirs(); | ||
| String[][] photoSources = { | ||
| {"river", "/tmp/photo1.jpg"}, | ||
| {"portrait", "/tmp/photo2.jpg"}, | ||
| {"landscape", "/tmp/photo3.jpg"} | ||
| }; | ||
|
|
||
| List<BufferedImage> realPhotos = new ArrayList<>(); | ||
| List<String> realNames = new ArrayList<>(); | ||
| for (String[] src : photoSources) { | ||
| File photoFile = new File(src[1]); | ||
| if (photoFile.exists()) { | ||
| BufferedImage photo = ImageIO.read(photoFile); | ||
| if (photo != null) { | ||
| photo = resizeTo(photo, 128, 128); | ||
| realPhotos.add(photo); | ||
| realNames.add(src[0]); | ||
| ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png")); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Hard-coding photo inputs to /tmp/photo1.jpg, etc. makes this test non-reproducible across machines/CI and also decoupled from the committed test-results artifacts. Prefer checking in known test photos under src/test/resources (or test-results/final-comparison/photos) and loading them via the classpath, or make the paths configurable via a system property and clearly skip with a message when inputs aren’t available.
| // Real photos | |
| File photoDir = new File("test-results/final-comparison/photos"); | |
| photoDir.mkdirs(); | |
| String[][] photoSources = { | |
| {"river", "/tmp/photo1.jpg"}, | |
| {"portrait", "/tmp/photo2.jpg"}, | |
| {"landscape", "/tmp/photo3.jpg"} | |
| }; | |
| List<BufferedImage> realPhotos = new ArrayList<>(); | |
| List<String> realNames = new ArrayList<>(); | |
| for (String[] src : photoSources) { | |
| File photoFile = new File(src[1]); | |
| if (photoFile.exists()) { | |
| BufferedImage photo = ImageIO.read(photoFile); | |
| if (photo != null) { | |
| photo = resizeTo(photo, 128, 128); | |
| realPhotos.add(photo); | |
| realNames.add(src[0]); | |
| ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png")); | |
| } | |
| } | |
| } | |
| // Real photos (optional, configurable via system properties) | |
| File photoDir = new File("test-results/final-comparison/photos"); | |
| photoDir.mkdirs(); | |
| String[][] photoSources = { | |
| {"river", System.getProperty("finalComparison.photo.river")}, | |
| {"portrait", System.getProperty("finalComparison.photo.portrait")}, | |
| {"landscape", System.getProperty("finalComparison.photo.landscape")} | |
| }; | |
| List<BufferedImage> realPhotos = new ArrayList<>(); | |
| List<String> realNames = new ArrayList<>(); | |
| for (String[] src : photoSources) { | |
| String configuredPath = src[1]; | |
| if (configuredPath == null || configuredPath.trim().isEmpty()) { | |
| continue; | |
| } | |
| File photoFile = new File(configuredPath); | |
| if (!photoFile.exists()) { | |
| System.out.println("Skipping configured photo '" + src[0] + "': file not found at " + configuredPath); | |
| continue; | |
| } | |
| BufferedImage photo = ImageIO.read(photoFile); | |
| if (photo == null) { | |
| System.out.println("Skipping configured photo '" + src[0] + "': unreadable image at " + configuredPath); | |
| continue; | |
| } | |
| photo = resizeTo(photo, 128, 128); | |
| realPhotos.add(photo); | |
| realNames.add(src[0]); | |
| ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png")); | |
| } | |
| if (realPhotos.isEmpty()) { | |
| System.out.println( | |
| "No real photos configured. Set system properties " + | |
| "finalComparison.photo.river, finalComparison.photo.portrait, and/or " + | |
| "finalComparison.photo.landscape to include external photo inputs." | |
| ); | |
| } |
|
|
||
| // JUnit test entry point | ||
| @org.junit.jupiter.api.Test | ||
| void runFinalComparison() throws Exception { |
There was a problem hiding this comment.
runFinalComparison() invokes main() and performs a large amount of CPU work plus filesystem writes. As a regular JUnit test, this is likely to be very slow and flaky in CI (and it has non-test side effects by writing into test-results/). Consider marking it @Disabled by default (or @Tag("benchmark") + exclude from CI), and/or gating execution behind an explicit system property/env var so it only runs when intentionally requested.
| void runFinalComparison() throws Exception { | |
| void runFinalComparison() throws Exception { | |
| org.junit.jupiter.api.Assumptions.assumeTrue( | |
| Boolean.getBoolean("runFinalComparison"), | |
| "Skipping FinalComparisonTest by default. Run with -DrunFinalComparison=true to enable." | |
| ); |
Summary
FinalComparisonTest.java) that benchmarks all features OFF vs all features ON across 6 test images (3 synthetic + 3 real photos)test-results/final-comparison/Results
Proposals 1-5 (SA + error-guided + adaptive size + batch-parallel + TSP)
Aggregate: +0.17% (real photos benefit most)
All 6 Proposals (with Progressive Resolution)
Progressive resolution trades ~5% energy score for significantly faster generation. This is expected — it generates 40% of shapes at reduced resolution for speed.
Key Findings
See
test-results/final-comparison/COMPARISON.mdfor full statistical analysis with confidence intervals.Test plan
./gradlew clean build -x testpassesFinalComparisonTestgenerates all images and statistics🤖 Generated with Claude Code