Skip to content

fix: Refactor site-list generator for deterministic sorting, safer wr…#2905

Open
atewari-rh wants to merge 1 commit intosherlock-project:masterfrom
atewari-rh:refactor_sherlock_data_json
Open

fix: Refactor site-list generator for deterministic sorting, safer wr…#2905
atewari-rh wants to merge 1 commit intosherlock-project:masterfrom
atewari-rh:refactor_sherlock_data_json

Conversation

@atewari-rh
Copy link
Copy Markdown

@atewari-rh atewari-rh commented Apr 17, 2026

Overview

This issue improves the devel/site-list.py workflow so site metadata generation is deterministic, safer to run repeatedly, and easier to validate in CI.
The script now enforces project-specific ordering rules (digit-prefixed names first, then alphabetical), validates input data quality, and introduces a CLI interface for both checking and writing outputs.

Why these changes were needed

The previous implementation worked, but had a few gaps that made maintenance and automation harder:

  • Sorting behavior was generic, not project-specific
    Default lexical sorting did not clearly enforce the desired ordering strategy (digit-first, then normal alphabetical).
  • Execution depended on current working directory
    Relative paths could fail when the script was run from a different folder.
  • Output directory creation was not idempotent
    Re-running could fail if output/ already existed.
  • No explicit check mode for CI
    There was no dedicated way to validate sort/order correctness without rewriting files.
  • Potential risk of partial writes
    Direct write operations can leave files half-written if interrupted.
  • Static site count in docs output
    The generated sites.mdx description could become stale over time.
  • No defensive validation for malformed entries
    Missing or invalid urlMain fields could fail later with less actionable errors.

What changed

1) Deterministic custom sorting

  • Added an explicit sort key:
    • keys starting with digits come first
    • remaining keys sorted alphabetically (case-insensitive)
  • Preserves "$schema" at the top of data.json

2) Reliable path handling

  • Uses pathlib and script-relative defaults so the script works regardless of where it is invoked from.

3) Safer structure and execution

  • Refactored into main() with if __name__ == "__main__": guard.
  • Improves testability and prevents accidental side effects on import.

4) Validation of input records

  • Validates each site entry is an object.
  • Validates each site has a non-empty urlMain.
  • Fails early with clear, actionable error messages.

5) Atomic file writes

  • Writes via temp file + replace to prevent corrupted partial outputs.

6) CLI support for automation

  • Added:
    • --check → verify ordering without writing
    • --write → write sorted data.json and generated sites.mdx
    • --data-file → custom data source path
    • --output-dir → custom output directory

7) Dynamic generated metadata

  • sites.mdx description now uses the actual number of supported sites automatically.

How to use the current script

From the repository root (or anywhere, since paths are script-relative by default):

Check-only mode (ideal for CI)

python3 sherlock/devel/site-list.py --check

This aims to solve issue #2904

…ites, and CI-friendly checks

Signed-off-by: atewari <atewari@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant