Skip to content

fix(copyright): tighten wrapper and notice extraction#847

Merged
mstykow merged 5 commits intomainfrom
verify/px4-eigen
May 5, 2026
Merged

fix(copyright): tighten wrapper and notice extraction#847
mstykow merged 5 commits intomainfrom
verify/px4-eigen

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented May 4, 2026

Summary

  • Verify PX4/eigen @ 7cf1c0179eb0f5499dfc1bffbd229783a7865fe1 with compare-outputs --profile common, record the reviewed benchmark result in docs/BENCHMARKS.md, and refresh docs/scan-duration-vs-files.svg.
  • Tighten shared copyright extraction so Provenant keeps meaningful raw notice text while stripping wrapper shells such as PRODUCT_COPYRIGHT, applicationLegalese, LegalCopyright, and storyboard text= wrappers from emitted copyrights.
  • Reduce shared false positives and boundary bleed across detector/refiner/scanner layers for generated-doc junk, table/test labels, minpack example prose, locale timestamp suffixes, and large bundled notice/license files, with focused regression coverage plus rerun compare validations on Flutter and Eigen.

Issues

  • Covers:
  • Closes:

Scope and exclusions

  • Included:
    • Compare artifacts: .provenant/compare-runs/20260504T211702Z-eigen-62479/, .provenant/compare-runs/20260505T102504Z-flutter-12526/, and .provenant/compare-runs/20260505T104455Z-eigen-34962/
    • Focused regression tests in src/copyright/refiner/tests.rs, src/copyright/detector/tests.rs, and src/scanner/process/copyright_test.rs
    • Compare-layer URL slash-only delta suppression from the earlier branch work in xtask/src/bin/compare_outputs.rs
    • Benchmark/chart refresh and the parity-safe correct-copyright-minpack golden update
  • Explicit exclusions:
    • No parser/package-surface changes; PX4/eigen remains manifest-free under the shared profile
    • No compare-output policy change that hides raw copyright wrapper differences behind extra semantic normalization
    • No risky generic rewrite of malformed source-email tails; malformed Eigen <email lines remain source-faithful for now

Intentional differences from Python

  • Preserve meaningful raw emitted copyright text such as All rights reserved. when it is part of the source notice, while keeping the normalized shadow/refined comparison semantics separate.
  • Preserve source-faithful names and Unicode when Provenant is more specific than ScanCode, rather than collapsing output back to ScanCode-style lossy normalization.

Follow-up work

  • Created or intentionally deferred:
    • Remaining large-bundle normalization and holder-boundary differences in Flutter sky_engine/LICENSE
    • Remaining Eigen naming/email-shape deltas where the source itself is malformed or Provenant is arguably more complete than ScanCode (for example Desire NUENTSA WAKAM)
    • Raw-output wording differences such as Copyright (c) Microsoft Corporation vs bare (c) Microsoft Corporation that are lower-priority than the false-positive and wrapper-shell fixes completed here

Expected-output fixture changes

  • Files changed: testdata/copyright-golden/copyrights/misco3/correct-copyright-minpack.txt.yml
  • Why the new expected output is correct: the detector now recovers the structured Copyright Notice (1999) University of Chicago form and matching holder directly from the copied fixture text, and the focused regression tests cover that behavior.

mstykow added 2 commits May 5, 2026 00:05
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow force-pushed the verify/px4-eigen branch from ea72463 to 7a500db Compare May 4, 2026 22:07
mstykow added 2 commits May 5, 2026 00:40
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow changed the title fix(copyright): clean metadata header and notice detections fix(copyright): tighten wrapper and notice extraction May 5, 2026
Signed-off-by: Maxim Stykow <maxim.stykow@gmail.com>
@mstykow mstykow merged commit c7dfcfb into main May 5, 2026
15 checks passed
@mstykow mstykow deleted the verify/px4-eigen branch May 5, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant