Conversation
Replace the pure hill-climbing refinement in HillClimbGenerator with simulated annealing (SA) that can escape local minima by probabilistically accepting worse moves early in the search. The old hill climbing is kept as a fallback behind the USE_SIMULATED_ANNEALING flag in AppConstants. Key changes: - HillClimbGenerator: add getHillClimbSA(), estimateTemperature(), and computeCoolingRate() methods; dispatch via feature flag in getHillClimb() - Model: increase parallel SA chains (times) to use availableProcessors/2 - AppConstants: add USE_SIMULATED_ANNEALING boolean flag (default true) - Add JUnit 5 benchmark tests comparing SA vs classic hill climbing - Add programmatically generated test images (128x128) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a simulated annealing (SA) based optimizer for shape refinement to better escape local minima, while keeping the classic hill climb path available behind a feature flag.
Changes:
- Added SA optimizer (
getHillClimbSA) with temperature estimation and geometric cooling; classic hill climb retained asgetHillClimbClassicand dispatched viaUSE_SIMULATED_ANNEALING. - Increased per-step optimization attempts (
times) to scale with CPU count (availableProcessors()/2). - Added JUnit benchmarks/regression tests plus programmatic test-image generation; also added PNGs under test resources.
Reviewed changes
Copilot reviewed 5 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/main/java/com/bobrust/generator/HillClimbGenerator.java |
Adds SA implementation + temperature schedule helpers; keeps classic variant and dispatches via feature flag. |
src/main/java/com/bobrust/generator/Model.java |
Scales times with CPU count to run more optimization chains per step. |
src/main/java/com/bobrust/util/data/AppConstants.java |
Adds USE_SIMULATED_ANNEALING feature flag (default true). |
src/test/java/com/bobrust/generator/SimulatedAnnealingBenchmark.java |
Adds tests comparing SA vs classic and validating temperature schedule. |
src/test/java/com/bobrust/generator/TestImageGenerator.java |
Adds deterministic in-memory test image generation + a helper main to write PNGs. |
src/test/resources/test-images/solid.png |
Adds generated test image (currently unused by tests). |
src/test/resources/test-images/gradient.png |
Adds generated test image (currently unused by tests). |
src/test/resources/test-images/edges.png |
Adds generated test image (currently unused by tests). |
src/test/resources/test-images/photo_detail.png |
Adds generated test image (currently unused by tests). |
src/test/resources/test-images/nature.png |
Adds generated test image (currently unused by tests). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| */ | ||
| static float computeCoolingRate(float initialTemp, int maxAge) { | ||
| int totalIterations = maxAge * 10; | ||
| float finalTemp = 0.001f; |
There was a problem hiding this comment.
computeCoolingRate can divide by zero when maxAge is 0 (since totalIterations = maxAge * 10 becomes 0, and the later Math.pow(..., 1.0 / totalIterations) becomes undefined). Add an explicit guard for maxAge <= 0 (and/or totalIterations <= 0) to return a sane rate (e.g., 1.0f) to keep the method safe for all callers.
| float finalTemp = 0.001f; | |
| float finalTemp = 0.001f; | |
| if (maxAge <= 0 || totalIterations <= 0) { | |
| return 1.0f; // no cooling when iteration budget is invalid | |
| } |
| // -1/ln(0.6) ≈ 1.957, but we use avgDelta / 0.5108 which is equivalent | ||
| return (float) (avgDelta / 0.5108); |
There was a problem hiding this comment.
estimateTemperature uses the magic constant 0.5108 (approx -ln(0.6)) which makes the math harder to audit and invites drift from the comment. Consider computing it from Math.log(0.6) (or declaring a named constant) to keep the intent precise and self-documenting.
| // -1/ln(0.6) ≈ 1.957, but we use avgDelta / 0.5108 which is equivalent | |
| return (float) (avgDelta / 0.5108); | |
| return (float) (avgDelta / -Math.log(0.6)); |
| private static final int max_random_states = 1000; | ||
| private static final int age = 100; | ||
| private static final int times = 1; | ||
| private static final int times = Math.max(1, Runtime.getRuntime().availableProcessors() / 2); |
There was a problem hiding this comment.
Deriving times directly from availableProcessors()/2 can scale work linearly with core count (each processStep() does times full hillclimb runs), which can cause large latency spikes on high-core machines. Consider adding an upper cap and/or making this configurable (e.g., via a constant/setting) so performance remains predictable.
| private static final int times = Math.max(1, Runtime.getRuntime().availableProcessors() / 2); | |
| private static final int max_times = 8; | |
| private static final int times = Math.min(max_times, | |
| Math.max(1, Runtime.getRuntime().availableProcessors() / 2)); |
| /** Reflective helper to get the worker from Model (package-private field) */ | ||
| private static Worker getWorker(Model model) { | ||
| try { | ||
| var field = Model.class.getDeclaredField("worker"); | ||
| field.setAccessible(true); | ||
| return (Worker) field.get(model); | ||
| } catch (Exception e) { | ||
| throw new RuntimeException(e); | ||
| } |
There was a problem hiding this comment.
The test uses reflection to access private internals (Model.worker and Model.addShape), which is brittle and can break on refactors or stricter JVM access rules. Prefer exposing a package-private test hook (e.g., a package-private accessor/add method) or restructuring the benchmark to avoid reflective access.
| System.out.println("Hill climb score: " + hillClimbScore); | ||
| System.out.println("SA score: " + saScore); | ||
| float improvement = (hillClimbScore - saScore) / hillClimbScore * 100; | ||
| System.out.println("SA improvement: " + improvement + "%"); |
There was a problem hiding this comment.
Avoid System.out.println in unit tests; it adds noise to CI logs and can slow execution. Prefer assertions-only, or log via the test framework when debugging is explicitly enabled.
| import org.junit.jupiter.api.BeforeAll; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| import java.awt.*; | ||
| import java.awt.image.BufferedImage; | ||
| import java.awt.image.DataBufferInt; | ||
| import java.io.File; | ||
| import java.io.IOException; | ||
| import java.util.ArrayList; | ||
| import java.util.Arrays; | ||
| import java.util.List; | ||
| import javax.imageio.ImageIO; |
There was a problem hiding this comment.
Several imports appear unused in this test class (BeforeAll, DataBufferInt, File, IOException, ImageIO). Removing unused imports will keep the test file focused and avoid confusion about intended resource-based testing.
| import org.junit.jupiter.api.BeforeAll; | |
| import org.junit.jupiter.api.Test; | |
| import java.awt.*; | |
| import java.awt.image.BufferedImage; | |
| import java.awt.image.DataBufferInt; | |
| import java.io.File; | |
| import java.io.IOException; | |
| import java.util.ArrayList; | |
| import java.util.Arrays; | |
| import java.util.List; | |
| import javax.imageio.ImageIO; | |
| import org.junit.jupiter.api.Test; | |
| import java.awt.*; | |
| import java.awt.image.BufferedImage; | |
| import java.util.ArrayList; | |
| import java.util.Arrays; | |
| import java.util.List; |
| // SA should not be dramatically worse (allow 5% tolerance due to stochastic nature) | ||
| assertTrue(saScore <= hillClimbScore * 1.05f, | ||
| "SA score (" + saScore + ") should not be significantly worse than hill climb (" + hillClimbScore + ")"); | ||
| } | ||
|
|
||
| // ---- Test 4: Regression — No Worse Than Baseline on Multiple Images ---- | ||
|
|
||
| @Test | ||
| void testSANeverSignificantlyWorse() { | ||
| BufferedImage[] images = { | ||
| TestImageGenerator.createSolid(), | ||
| TestImageGenerator.createGradient(), | ||
| TestImageGenerator.createEdges(), | ||
| }; | ||
| String[] names = {"solid", "gradient", "edges"}; | ||
| int maxShapes = 30; // Small for test speed | ||
|
|
||
| for (int idx = 0; idx < images.length; idx++) { | ||
| float hcScore = runGenerator(images[idx], maxShapes, false); | ||
| float saScore = runGenerator(images[idx], maxShapes, true); | ||
| System.out.println(names[idx] + " — HC: " + hcScore + ", SA: " + saScore); | ||
|
|
||
| // SA should never be more than 5% worse (stochastic tolerance) | ||
| assertTrue(saScore <= hcScore * 1.05f, | ||
| names[idx] + ": SA (" + saScore + ") should not be significantly worse than HC (" + hcScore + ")"); |
There was a problem hiding this comment.
These assertions compare two stochastic optimizers using unseeded randomness (Worker.getRandom() uses ThreadLocalRandom), so the 5% tolerance can still lead to intermittent CI failures. To avoid flaky tests, consider injecting a deterministic RNG/seed for tests or running multiple trials and asserting on an aggregate (median/mean) rather than a single run.
Summary
USE_SIMULATED_ANNEALINGflag inAppConstants(defaults totrue)timesinModel.java) toavailableProcessors / 2for better explorationChanges
HillClimbGenerator.javagetHillClimbSA(),estimateTemperature(),computeCoolingRate(); rename old method togetHillClimbClassic(); dispatch via feature flagModel.javatimesnow usesRuntime.getRuntime().availableProcessors() / 2instead of hardcoded1AppConstants.javaUSE_SIMULATED_ANNEALINGboolean flagSimulatedAnnealingBenchmark.javaTestImageGenerator.javaHow SA works
maxAge * 10iterationsexp(-delta / T); best-seen state tracked and returnedTest plan
testTemperatureSchedule— verifies temperature is positive/finite, cooling rate is in (0,1), final temp near zerotestSAProducesLowerOrEqualEnergy— SA score within 5% of HC on photo detail imagetestSANeverSignificantlyWorse— SA within 5% of HC on solid, gradient, and edges imagestestCoolingRateEdgeCases— handles tiny and large initial temperatures./gradlew clean buildpasses cleanly🤖 Generated with Claude Code