Skip to content

fix(copyright): generalize compare-target cleanup#805

Merged
mstykow merged 12 commits intomainfrom
verify/flutter-flutter
Apr 28, 2026
Merged

fix(copyright): generalize compare-target cleanup#805
mstykow merged 12 commits intomainfrom
verify/flutter-flutter

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 27, 2026

Summary

  • generalize shared copyright, holder, and author cleanup across detector/refiner/scanner paths so compare-target noise is reduced without introducing Flutter-specific string blacklists
  • add supporting extraction and normalization work for storyboard/XML text= copyright evidence, image metadata dedupe, ISO-date/unknown-year holder recovery, templated URL cleanup, escaped SPDX tag normalization, and a final cleanup pass for regex/template junk, generated-resource email noise, and distribution-metadata author tails
  • validate the branch with focused regressions, copyright golden + library-profile checks, a documented Flutter benchmark checkpoint, a clean pnpm compare check, and fresh exact compare reruns with ScanCode cache hits

Issues

  • Covers: flutter/flutter residual compare cleanup, pnpm/pnpm regression safety, and broader cross-target copyright/holder/author false positives
  • Closes:

Scope and exclusions

  • Included: shared detector/refiner/scanner cleanup, metadata extraction refinements, finder/SPDX normalization fixes, benchmark documentation refresh, two copyright golden expectation updates, and compare-run verification evidence
  • Explicit exclusions: no claim that flutter/flutter is fully clean yet; remaining sampled deltas still include planet.frag normalization duplicates, the Dart project authors vs the Dart project, Flutter code-fragment leftovers like Copyright 0 absl::StrCat(errors 0 ) / Copyright void, and scancode-toolkit fixture-token leftovers like AUTH AUTHS 2730, COMPANY 1411, and MAINT 26382

Intentional differences from Python

  • Provenant continues to preserve structured contributor/author evidence from committed AUTHORS files and labeled image metadata when that evidence is present, even where ScanCode often leaves those surfaces empty

Follow-up work

  • Created or intentionally deferred: if we continue, the next high-value buckets are the remaining code-fragment junk in Flutter license-checker fixture text and the remaining uppercase fixture-token author noise in scancode-toolkit src/cluecode/copyrights.py

Expected-output fixture changes

  • Files changed: testdata/copyright-golden/copyrights/misco4/linux-copyrights/Documentation/bpf/bpf_devel_QA.rst.yml, testdata/copyright-golden/copyrights/licco.txt.yml
  • Why the new expected output is correct: the old expectations kept stale false-positive author forms; after the generalized cleanup the focused regressions, copyright golden, and library-profile validation all pass with the corrected expectations

Compare artifacts

  • Flutter benchmark/checkpoint recorded in docs: .provenant/compare-runs/20260427T223247Z-flutter-8526/
  • pnpm clean validation checkpoint: .provenant/compare-runs/20260428T124247Z-pnpm-1113/
  • exact rerun at pushed HEAD before the final mini-pass: .provenant/compare-runs/20260428T135306Z-scancode-toolkit-92561/ (ScanCode cache hit)
  • exact rerun at pushed HEAD before the final mini-pass: .provenant/compare-runs/20260428T140858Z-camel-12496/ (ScanCode cache hit)
  • latest dirty-tree rerun including the final mini-pass: .provenant/compare-runs/20260428T151953Z-flutter-6927/ (ScanCode cache hit)
  • latest dirty-tree rerun including the final mini-pass: .provenant/compare-runs/20260428T152754Z-scancode-toolkit-16046/ (ScanCode cache hit)

mstykow and others added 10 commits April 27, 2026 23:28
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow changed the title fix(copyright): harden author extraction for compare targets fix(copyright): generalize compare-target cleanup Apr 28, 2026
mstykow and others added 2 commits April 28, 2026 17:55
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow enabled auto-merge (rebase) April 28, 2026 16:00
@mstykow mstykow merged commit 3089080 into main Apr 28, 2026
15 checks passed
@mstykow mstykow deleted the verify/flutter-flutter branch April 28, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant