Skip to content

Final comparison: all 6 proposals combined#8

Open
VoX wants to merge 1 commit intomainfrom
docs/final-comparison
Open

Final comparison: all 6 proposals combined#8
VoX wants to merge 1 commit intomainfrom
docs/final-comparison

Conversation

@VoX
Copy link
Copy Markdown
Owner

@VoX VoX commented Apr 4, 2026

Summary

  • Adds a comprehensive comparison test (FinalComparisonTest.java) that benchmarks all features OFF vs all features ON across 6 test images (3 synthetic + 3 real photos)
  • Runs each configuration 5 times for statistical rigor with paired t-test analysis
  • Generates before/after images, target references, and diff heatmaps in test-results/final-comparison/

Results

Proposals 1-5 (SA + error-guided + adaptive size + batch-parallel + TSP)

Image Before After Improvement p-value Sig?
photo_detail 0.250774 0.250016 +0.30% 0.1051 No
nature 0.058198 0.058940 -1.27% 1.0000 No
edges 0.244876 0.245867 -0.40% 1.0000 No
river 0.090141 0.089522 +0.69% 0.0410 Yes
portrait 0.062045 0.060768 +2.06% 0.0429 Yes
landscape 0.063394 0.063036 +0.56% 0.0695 No

Aggregate: +0.17% (real photos benefit most)

All 6 Proposals (with Progressive Resolution)

Progressive resolution trades ~5% energy score for significantly faster generation. This is expected — it generates 40% of shapes at reduced resolution for speed.

Key Findings

  • Real photographs show the clearest improvement from proposals 1-5 (portrait: +2.06%, river: +0.69%)
  • Synthetic test images show mixed results — the baseline was already near-optimal for simple patterns
  • Progressive resolution adds speed at a small quality cost; appropriate for interactive use
  • Batch-parallel energy (Proposal 4) and TSP optimization (Proposal 5) primarily improve performance/drawing efficiency rather than energy scores

See test-results/final-comparison/COMPARISON.md for full statistical analysis with confidence intervals.

Test plan

  • ./gradlew clean build -x test passes
  • FinalComparisonTest generates all images and statistics
  • All comparison images committed to repo

🤖 Generated with Claude Code

Generates before/after images for 6 test images (3 synthetic + 3 real photos)
with 5 runs each for statistical rigor. Includes paired t-test results,
diff heatmaps, and a summary COMPARISON.md documenting energy scores across
all configurations (all-off, proposals 1-5, all 6 proposals).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 4, 2026 23:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new end-to-end “final comparison” benchmark/test that runs the generator with all features OFF vs all features ON (including progressive resolution), repeats runs for basic statistical analysis, and writes a Markdown report plus before/after/target/diff images under test-results/final-comparison/.

Changes:

  • Add FinalComparisonTest to generate images, compute summary stats, and emit COMPARISON.md.
  • Commit the resulting comparison artifacts (targets, before/after renders, diff heatmaps, and photo copies).
  • Add a gradlew wrapper script to the repo.

Reviewed changes

Copilot reviewed 2 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test-results/final-comparison/COMPARISON.md Generated report summarizing per-image scores, significance, and linking rendered artifacts
test-results/final-comparison/edges_after.png Generated “all ON” output image for edges case
test-results/final-comparison/edges_before.png Generated “all OFF” output image for edges case
test-results/final-comparison/edges_diff.png Generated diff heatmap for edges case
test-results/final-comparison/edges_target.png Generated target/reference image for edges case
test-results/final-comparison/landscape_after.png Generated “all ON” output image for landscape case
test-results/final-comparison/landscape_before.png Generated “all OFF” output image for landscape case
test-results/final-comparison/landscape_diff.png Generated diff heatmap for landscape case
test-results/final-comparison/landscape_target.png Generated target/reference image for landscape case
test-results/final-comparison/nature_after.png Generated “all ON” output image for nature case
test-results/final-comparison/nature_before.png Generated “all OFF” output image for nature case
test-results/final-comparison/nature_diff.png Generated diff heatmap for nature case
test-results/final-comparison/nature_target.png Generated target/reference image for nature case
test-results/final-comparison/photo_detail_after.png Generated “all ON” output image for photo_detail case
test-results/final-comparison/photo_detail_before.png Generated “all OFF” output image for photo_detail case
test-results/final-comparison/photo_detail_diff.png Generated diff heatmap for photo_detail case
test-results/final-comparison/photo_detail_target.png Generated target/reference image for photo_detail case
test-results/final-comparison/portrait_after.png Generated “all ON” output image for portrait case
test-results/final-comparison/portrait_before.png Generated “all OFF” output image for portrait case
test-results/final-comparison/portrait_diff.png Generated diff heatmap for portrait case
test-results/final-comparison/portrait_target.png Generated target/reference image for portrait case
test-results/final-comparison/river_after.png Generated “all ON” output image for river case
test-results/final-comparison/river_before.png Generated “all OFF” output image for river case
test-results/final-comparison/river_diff.png Generated diff heatmap for river case
test-results/final-comparison/river_target.png Generated target/reference image for river case
test-results/final-comparison/photos/landscape.png Committed photo input copy used by final comparison
test-results/final-comparison/photos/portrait.png Committed photo input copy used by final comparison
test-results/final-comparison/photos/river.png Committed photo input copy used by final comparison
src/test/java/com/bobrust/generator/FinalComparisonTest.java New benchmark-style test that runs comparisons, computes stats, and writes artifacts
gradlew Adds Gradle wrapper start script

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +38
/**
* Runs the generator with SA, error-guided placement, and adaptive sizing OFF.
* - No error map (uniform random placement)
* - No gradient map (uniform random sizing)
* - Classic hill climbing (no SA)
* - Energy evaluation still uses whatever is compiled (batch-parallel doesn't affect quality)
* - No progressive resolution
*
* This mirrors the original algorithm before any proposals were added.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc says the "all features OFF" baseline uses "classic energy", but Worker.getEnergy() always calls BorstCore.differencePartialThread(...), which will use the batch-parallel combined pass when AppConstants.USE_BATCH_PARALLEL is true. That means the baseline isn’t actually forcing classic energy evaluation. Consider either (a) explicitly routing baseline energy through differencePartialThreadClassic(...) (e.g., via a specialized Worker/State for this test) or (b) updating the doc/comments to reflect that baseline energy depends on compile-time flags.

Copilot uses AI. Check for mistakes.
int times = Math.max(1, Runtime.getRuntime().availableProcessors() / 2);

for (int i = 0; i < maxShapes; i++) {
worker.init(model.current, model.score);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This baseline setup directly reads model.score (a protected field). Using the public accessor (model.getScore()) would make this test less coupled to Model internals and less fragile if visibility changes.

Suggested change
worker.init(model.current, model.score);
worker.init(model.current, model.getScore());

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +229
double x = t * (1 - 1.0 / (4 * df)) / Math.sqrt(1 + t * t / (2.0 * df));
return 1.0 - normalCDF(x);
}

private static double normalCDF(double x) {
if (x < -8) return 0;
if (x > 8) return 1;
double t = 1.0 / (1.0 + 0.2316419 * Math.abs(x));
double d = 0.3989422804014327;
double p = d * Math.exp(-x * x / 2.0) *
(t * (0.319381530 + t * (-0.356563782 + t * (1.781477937 + t * (-1.821255978 + t * 1.330274429)))));
return x > 0 ? 1.0 - p : p;
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The p-value calculation is labeled/reporting as a "paired one-sided t-test", but approximatePValue(...) uses a normal approximation (and ignores the Student’s t CDF), which can be noticeably inaccurate for df=4 (NUM_RUNS=5). Either compute the one-sided p-value from a proper t distribution (e.g., via a small stats utility or a dependency like Commons Math) or update the report text to clearly state it’s a normal approximation of the t-statistic.

Suggested change
double x = t * (1 - 1.0 / (4 * df)) / Math.sqrt(1 + t * t / (2.0 * df));
return 1.0 - normalCDF(x);
}
private static double normalCDF(double x) {
if (x < -8) return 0;
if (x > 8) return 1;
double t = 1.0 / (1.0 + 0.2316419 * Math.abs(x));
double d = 0.3989422804014327;
double p = d * Math.exp(-x * x / 2.0) *
(t * (0.319381530 + t * (-0.356563782 + t * (1.781477937 + t * (-1.821255978 + t * 1.330274429)))));
return x > 0 ? 1.0 - p : p;
}
if (df <= 0) {
throw new IllegalArgumentException("Degrees of freedom must be positive");
}
double x = df / (df + t * t);
return 0.5 * regularizedBeta(x, df / 2.0, 0.5);
}
private static double regularizedBeta(double x, double a, double b) {
if (x <= 0.0) return 0.0;
if (x >= 1.0) return 1.0;
double bt = Math.exp(
logGamma(a + b) - logGamma(a) - logGamma(b)
+ a * Math.log(x) + b * Math.log(1.0 - x)
);
if (x < (a + 1.0) / (a + b + 2.0)) {
return bt * betaContinuedFraction(x, a, b) / a;
}
return 1.0 - bt * betaContinuedFraction(1.0 - x, b, a) / b;
}
private static double betaContinuedFraction(double x, double a, double b) {
final int maxIterations = 10000;
final double epsilon = 3.0e-14;
final double fpMin = 1.0e-300;
double qab = a + b;
double qap = a + 1.0;
double qam = a - 1.0;
double c = 1.0;
double d = 1.0 - qab * x / qap;
if (Math.abs(d) < fpMin) d = fpMin;
d = 1.0 / d;
double h = d;
for (int m = 1; m <= maxIterations; m++) {
int m2 = 2 * m;
double aa = m * (b - m) * x / ((qam + m2) * (a + m2));
d = 1.0 + aa * d;
if (Math.abs(d) < fpMin) d = fpMin;
c = 1.0 + aa / c;
if (Math.abs(c) < fpMin) c = fpMin;
d = 1.0 / d;
h *= d * c;
aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2));
d = 1.0 + aa * d;
if (Math.abs(d) < fpMin) d = fpMin;
c = 1.0 + aa / c;
if (Math.abs(c) < fpMin) c = fpMin;
d = 1.0 / d;
double delta = d * c;
h *= delta;
if (Math.abs(delta - 1.0) < epsilon) {
break;
}
}
return h;
}
private static double logGamma(double x) {
double[] coefficients = {
676.5203681218851,
-1259.1392167224028,
771.3234287776531,
-176.6150291621406,
12.507343278686905,
-0.13857109526572012,
9.984369578019572e-6,
1.5056327351493116e-7
};
if (x < 0.5) {
return Math.log(Math.PI) - Math.log(Math.sin(Math.PI * x)) - logGamma(1.0 - x);
}
x -= 1.0;
double sum = 0.99999999999980993;
for (int i = 0; i < coefficients.length; i++) {
sum += coefficients[i] / (x + i + 1.0);
}
double tmp = x + coefficients.length - 0.5;
return 0.5 * Math.log(2.0 * Math.PI) + (x + 0.5) * Math.log(tmp) - tmp + Math.log(sum);
}

Copilot uses AI. Check for mistakes.
Comment on lines +301 to +324
// Real photos
File photoDir = new File("test-results/final-comparison/photos");
photoDir.mkdirs();
String[][] photoSources = {
{"river", "/tmp/photo1.jpg"},
{"portrait", "/tmp/photo2.jpg"},
{"landscape", "/tmp/photo3.jpg"}
};

List<BufferedImage> realPhotos = new ArrayList<>();
List<String> realNames = new ArrayList<>();
for (String[] src : photoSources) {
File photoFile = new File(src[1]);
if (photoFile.exists()) {
BufferedImage photo = ImageIO.read(photoFile);
if (photo != null) {
photo = resizeTo(photo, 128, 128);
realPhotos.add(photo);
realNames.add(src[0]);
ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png"));
}
}
}

Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding photo inputs to /tmp/photo1.jpg, etc. makes this test non-reproducible across machines/CI and also decoupled from the committed test-results artifacts. Prefer checking in known test photos under src/test/resources (or test-results/final-comparison/photos) and loading them via the classpath, or make the paths configurable via a system property and clearly skip with a message when inputs aren’t available.

Suggested change
// Real photos
File photoDir = new File("test-results/final-comparison/photos");
photoDir.mkdirs();
String[][] photoSources = {
{"river", "/tmp/photo1.jpg"},
{"portrait", "/tmp/photo2.jpg"},
{"landscape", "/tmp/photo3.jpg"}
};
List<BufferedImage> realPhotos = new ArrayList<>();
List<String> realNames = new ArrayList<>();
for (String[] src : photoSources) {
File photoFile = new File(src[1]);
if (photoFile.exists()) {
BufferedImage photo = ImageIO.read(photoFile);
if (photo != null) {
photo = resizeTo(photo, 128, 128);
realPhotos.add(photo);
realNames.add(src[0]);
ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png"));
}
}
}
// Real photos (optional, configurable via system properties)
File photoDir = new File("test-results/final-comparison/photos");
photoDir.mkdirs();
String[][] photoSources = {
{"river", System.getProperty("finalComparison.photo.river")},
{"portrait", System.getProperty("finalComparison.photo.portrait")},
{"landscape", System.getProperty("finalComparison.photo.landscape")}
};
List<BufferedImage> realPhotos = new ArrayList<>();
List<String> realNames = new ArrayList<>();
for (String[] src : photoSources) {
String configuredPath = src[1];
if (configuredPath == null || configuredPath.trim().isEmpty()) {
continue;
}
File photoFile = new File(configuredPath);
if (!photoFile.exists()) {
System.out.println("Skipping configured photo '" + src[0] + "': file not found at " + configuredPath);
continue;
}
BufferedImage photo = ImageIO.read(photoFile);
if (photo == null) {
System.out.println("Skipping configured photo '" + src[0] + "': unreadable image at " + configuredPath);
continue;
}
photo = resizeTo(photo, 128, 128);
realPhotos.add(photo);
realNames.add(src[0]);
ImageIO.write(photo, "png", new File(photoDir, src[0] + ".png"));
}
if (realPhotos.isEmpty()) {
System.out.println(
"No real photos configured. Set system properties " +
"finalComparison.photo.river, finalComparison.photo.portrait, and/or " +
"finalComparison.photo.landscape to include external photo inputs."
);
}

Copilot uses AI. Check for mistakes.

// JUnit test entry point
@org.junit.jupiter.api.Test
void runFinalComparison() throws Exception {
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runFinalComparison() invokes main() and performs a large amount of CPU work plus filesystem writes. As a regular JUnit test, this is likely to be very slow and flaky in CI (and it has non-test side effects by writing into test-results/). Consider marking it @Disabled by default (or @Tag("benchmark") + exclude from CI), and/or gating execution behind an explicit system property/env var so it only runs when intentionally requested.

Suggested change
void runFinalComparison() throws Exception {
void runFinalComparison() throws Exception {
org.junit.jupiter.api.Assumptions.assumeTrue(
Boolean.getBoolean("runFinalComparison"),
"Skipping FinalComparisonTest by default. Run with -DrunFinalComparison=true to enable."
);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants