Skip to content

fix(scan-result-shaping): remove root directory entry when --strip-root is set#853

Merged
abraemer merged 1 commit intomainfrom
fix/strip-root-remove-root-entry
May 6, 2026
Merged

fix(scan-result-shaping): remove root directory entry when --strip-root is set#853
abraemer merged 1 commit intomainfrom
fix/strip-root-remove-root-entry

Conversation

@abraemer
Copy link
Copy Markdown
Collaborator

@abraemer abraemer commented May 6, 2026

Summary

  • --strip-root now excludes the root directory entry from the files output array, matching ScanCode's skip_root=True behavior
  • Fixes a duplicate-path collision bug when a child directory shares the root's name (e.g., myproject/myproject/inner.txt produced two entries with path myproject)
  • Single-resource scans (one file or one empty directory) preserve the root entry, consistent with ScanCode's has_single_resource exception

Details

Previously, normalize_paths with --strip-root would keep the root directory entry in the output but rename its path to just the basename (e.g., /tmp/myprojectmyproject). ScanCode instead uses skip_root=True in Codebase.walk() to exclude the root entry entirely. This divergence caused duplicate path entries in the collision case and was a general parity bug.

The fix adds a dedicated step at the start of normalize_paths that removes the root directory entry when strip_root=true and the scan has more than one resource. The path normalization loop continues unchanged after removal.

The normalize_paths_for_test signature was updated from &mut [FileInfo] to &mut Vec<FileInfo>, and a workaround in golden test utilities that re-added the root entry after normalization was removed (no longer needed).

Test coverage

  • normalize_paths_strip_root_removes_root_directory_entry — root dir removed, children kept
  • normalize_paths_strip_root_keeps_root_entry_for_single_file_scan — single-resource file exception
  • normalize_paths_strip_root_keeps_root_entry_for_single_directory_scan — single-resource empty-dir exception
  • normalize_paths_strip_root_removes_root_but_keeps_same_named_child — collision case no longer produces duplicates
  • normalize_paths_without_strip_root_keeps_root_directory_entry — default behavior unchanged

…ot is set

When --strip-root is used, Provenant now excludes the root directory
entry from the files array, matching ScanCode's skip_root=True behavior.
Previously, the root entry was kept with its path shortened to just the
basename, which caused duplicate path entries when a child directory
shared the root's name (e.g. project/project/inner.txt).

Single-resource scans (one file or one empty directory) preserve the
root entry, consistent with ScanCode's has_single_resource exception.

Signed-off-by: Adrian Braemer <bradrian@gmail.com>
@abraemer abraemer merged commit 4cef65b into main May 6, 2026
15 checks passed
@abraemer abraemer deleted the fix/strip-root-remove-root-entry branch May 6, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant