Skip to content

Introduce composite tags#3347

Open
mraspaud wants to merge 6 commits into
pytroll:mainfrom
mraspaud:composite_tags
Open

Introduce composite tags#3347
mraspaud wants to merge 6 commits into
pytroll:mainfrom
mraspaud:composite_tags

Conversation

@mraspaud

@mraspaud mraspaud commented Mar 3, 2026

Copy link
Copy Markdown
Member

This PR introduce a system of tagging for composites, allowing switching preferred variants at load time or by configuration.

For example, having the following config:

  true_color_crefl:
    tags: ["crefl"]
    compositor: !!python/name:satpy.composites.resolution.SelfSharpenedRGB
    prerequisites: ...
    standard_name: true_color

  true_color_raw:
    tags: ["raw"]
    compositor: !!python/name:satpy.composites.resolution.SelfSharpenedRGB
    prerequisites: ...
    standard_name: true_color

  true_color:
    compositor: !!python/name:satpy.composites.resolution.SelfSharpenedRGB
    prerequisites: ...
    standard_name: true_color

  true_color_nocorr:
    tags: ["nocorr"]
    compositor: !!python/name:satpy.composites.resolution.SelfSharpenedRGB
    prerequisites: ...
    standard_name: true_color

We can then load scn.load(["true_color:crefl"]) directly to load "true_color_crefl". Alternatively, we can set the config with SATPY_PREFERRED_COMPOSITE_TAGS=crefl so that scn.load(["true_color"]) loads "true_color_crefl" by default.

This is needed in order implement WMO RGB recommendations in a backwards- and forwards-compatible manner.

Also, possibility to warn for composites at load time has been added.

Disclaimer: AI was used in the design and implementation of this PR.

  • Tests added
  • Fully documented

@mraspaud mraspaud self-assigned this Mar 3, 2026
@mraspaud mraspaud added enhancement code enhancements, features, improvements component:compositors labels Mar 3, 2026
@mraspaud mraspaud requested review from djhoese and pnuu as code owners March 3, 2026 11:11
@codecov

codecov Bot commented Mar 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.31%. Comparing base (21b03f6) to head (e157d9d).
⚠️ Report is 122 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3347      +/-   ##
==========================================
- Coverage   96.34%   96.31%   -0.03%     
==========================================
  Files         463      466       +3     
  Lines       58959    59274     +315     
==========================================
+ Hits        56802    57092     +290     
- Misses       2157     2182      +25     
Flag Coverage Δ
behaviourtests 3.60% <7.77%> (+<0.01%) ⬆️
unittests 96.40% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gerritholl

Copy link
Copy Markdown
Member

This looks great!

This PR adds a special meaning to "standard_name". Does this break backward compatibility? If I understand correctly, loading composites will no longer match by the key in the YAML file, but by the contents of the standard name?

We (at least in DWD) currently (ab)use "standard_name" to get a match between a composite and its enhancement. For example, we have many composites with standard name image_ready. I've always felt a bit uneasy by using the standard name attribute for making this match, and this PR might bite with our current use.

As an example of what we currently use:

  day_cloud_type:
    compositor: !!python/name:satpy.composites.fill.DayNightCompositor
    day_night: day_only
    prerequisites:
    - name: cloud_type
    standard_name: image_ready

Is this still loadable as sc.load(["day_cloud_type"])? Or would this be matched only by sc.load(["image_ready"])?

@pnuu pnuu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline. I guess the variant loading without defining the variant in any way will need a test where different combinations are defined as options.

Comment thread doc/source/composites.rst

1. Try each tag in the list, in order, looking for a compositor with matching
``standard_name`` and that tag.
2. If no tagged variant is found, fall back to the normal name-based lookup.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which variant is used if the tag is not defined in the load and the global env variable is not set and all/some of the following composites are defined?

  • foo_wmo
  • foo_crefl
  • foo

Each of them have the same standard_name: foo defined, no name is set. Only the first line in the YAML definition is different and for the first two there are tags set.

So what would I get with scn.load(["foo"]) call in the following cases:

  1. all variants are defined (I'd expect foo variant without any tags)
  2. only the two tagged versions are defined

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tags are not provided, neither at load time or in the configuration, the current behaviour is preserved, ie we will use the name (not standard name) to choose the composite to load.

  1. you get the composite name "foo"
  2. crash if you don't provide a tag (at load time or as config).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like this for the first case? I'm not sure how this works, but hopefully it's close enough:

        pytest.param(
            {"comp1": None, "comp1_wmo": ["wmo"]},
            None,
            "comp1",
            "comp1",
            id="use_plain_version_when_no_tag",
        ),

For case 2. the failure might need a completely separate test?

Comment thread doc/source/composites.rst
@pnuu

pnuu commented Mar 3, 2026

Copy link
Copy Markdown
Member

Good point Gerrit. I think that is actually the only use-case for standard_name for me, linking the composite and enhancement together. I think this is also related to my variant selection comment.

@mraspaud

mraspaud commented Mar 3, 2026

Copy link
Copy Markdown
Member Author

This looks great!

Thanks :)

This PR adds a special meaning to "standard_name". Does this break backward compatibility? If I understand correctly, loading composites will no longer match by the key in the YAML file, but by the contents of the standard name?

It is backwards compatible. Names of compositors that do not contain ":" are processed as usual.
The use of the standard name here is because it was the closest thing to a common name for different variants we had, so I used that instead of introducing an new item in the composite config.

We (at least in DWD) currently (ab)use "standard_name" to get a match between a composite and its enhancement. For example, we have many composites with standard name image_ready. I've always felt a bit uneasy by using the standard name attribute for making this match, and this PR might bite with our current use.

As an example of what we currently use:

  day_cloud_type:
    compositor: !!python/name:satpy.composites.fill.DayNightCompositor
    day_night: day_only
    prerequisites:
    - name: cloud_type
    standard_name: image_ready

Is this still loadable as sc.load(["day_cloud_type"])? Or would this be matched only by sc.load(["image_ready"])?

We use the same thing at SMHI, and the current behaviour should be unaffected. I agree it's not really clean to use standard_name for this purpose, would be nice if we had a good solution for that.

@djhoese

djhoese commented Mar 4, 2026

Copy link
Copy Markdown
Member

I think I've said this before, but I was never a fan of the "image_ready" standard name since, as we're discovering here, it has a more generic but also functional meaning the way it is used. There is already the other issue (PR?) about having a specific enhancement defined for a composite inside the composite YAML and that may be the long term way of doing this explicit composite definition. I think I had hoped that a more generic "if it's an RGB/A with no other enhancement defined, default to no-op" enhancement.

The other thing composite tags is bringing up which @gerritholl had mentioned in the meeting yesterday is what does a DataID/DataQuery represent and how do tags differ? For the most part DataID defines what the product is and tags are defining a "preference" I guess. The main exception is "modifiers" where these lines are blurred.

We may need to define a "family" for composites or "template" or something like that for tags instead of reusing "standard_name" for this. At least that's my gut feeling right now.

@mraspaud

Copy link
Copy Markdown
Member Author

I agree that the image_ready enhancement handling can be improved, however I don't think it is within the scope of this PR.

Regarding the similarity with the data id/data query: Indeed, this is quite similar, however, IMO this PR presents a much more user-friendly interface to a problem that many users with custom configs face: they define a variant for a composite and have a simple way to access it without overwriting the default composite. I am aware that I short circuited the data id entirely here, because I wanted to keep the tags separate from it.

I'm open to feedback of course, if you thing this is the wrong decision.

The last commit added the access to composites in the scene using tags btw, and I'd love to get a review on that.

@mraspaud mraspaud added this to the v0.61 milestone Mar 31, 2026

@djhoese djhoese left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I don't really understand. At a high level I've never really liked that we depend on a .attrs["_satpy_id"] magic attribute, but DataIDs are central to what Satpy does so it seemed like an important exception to allow this "magic". This new "_satpy_compositor_name" seems to be doing a lot for essentially an alias. If I'm not mistaken the two options were:

  1. Make the Scene and Scene.__getitem__ calls understand tag references and store composites with their YAML name.
  2. Store things as their requested/tagged name and update enhancements and composites to understand them.

So I think this PR is option number 2, but does it actually handle composites that reference a composite? For example, if a composite depends on true_color_wmo but that composite is now sitting in the Scene as true_color:wmo, does that work?

I think I'd still prefer a solution where "standard_name" wasn't what tagged names are based on, but rather some other new key. At least I think. Overall I'm just extremely scared of the dependency tree and enhancement configuration stuff needing to handle this as those things were already hard to understand and hold in my brain, not to mention a user's brain.

@mraspaud

Copy link
Copy Markdown
Member Author

I'm sorry but I don't really understand. At a high level I've never really liked that we depend on a .attrs["_satpy_id"] magic attribute, but DataIDs are central to what Satpy does so it seemed like an important exception to allow this "magic". This new "_satpy_compositor_name" seems to be doing a lot for essentially an alias.

That's correct, it is an alias to be able to go back to the yaml name, so that we can keep the tagged version superficial and be able to go back to the yaml name for the dependency tree for example.
An alternative would be to build a look-up table where we have all the variations for all composites listed with their tagged equivalent. I thought "_satpy_compositor_name" would be more contained.
Another alternative would we to gather all _satpy_… attrs into a eg _satpy_internals attribute.

If I'm not mistaken the two options were:

  1. Make the Scene and Scene.__getitem__ calls understand tag references and store composites with their YAML name.
  2. Store things as their requested/tagged name and update enhancements and composites to understand them.

So I think this PR is option number 2, but does it actually handle composites that reference a composite? For example, if a composite depends on true_color_wmo but that composite is now sitting in the Scene as true_color:wmo, does that work?

Yes, this is option 2, but it's really just a name translation, so I would expect that use-case to work. But I'll get back to you on this.

I think I'd still prefer a solution where "standard_name" wasn't what tagged names are based on, but rather some other new key. At least I think. Overall I'm just extremely scared of the dependency tree and enhancement configuration stuff needing to handle this as those things were already hard to understand and hold in my brain, not to mention a user's brain.

Well, I see standard_name as the composite category so to say, so for me it makes sense to use standard_name as a base for the tagging (and the enhancements). What would you call the alternative attribute?

@mraspaud mraspaud requested a review from adybbroe as a code owner May 15, 2026 13:18
Comment thread doc/source/composites.rst
Composite variants
------------------

.. versionadded:: 0.60

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Update to 0.61 across PR...if not 0.62 if we feel we need a 0.61 that doesn't include this.

Comment thread doc/source/composites.rst Outdated
Comment on lines +587 to +593
This also means that :attr:`~satpy.scene.Scene.missing_datasets` is empty
after a successful load, and that writing the scene with any writer will use
``"true_color:wmo"`` as the dataset name.

The same rule applies when a tagged variant is selected automatically via
``preferred_composite_tags``: the dataset is stored under the plain requested
name (e.g. ``"true_color"``), not under the compositor's YAML key name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be overthinking this and I've only been looking at the documentation before diving into the code but: There seems to be a two separate pieces here that are maybe used for the same thing. There is the composite name (the name via the YAML section) and then the standard_name + tag. This chunk of documentation here says that the dataset is stored "under the plain requested name", but how does that work?

If we have a composite with name "my_tc" and standard_name "true_color" and zero or more tags. If I do scn.load(["true_color"]) then is that treated as a "no tag"/generic composite request and "my_tc" could be found? Or is that considered a search for the specific composite named "true_color" that may or may not exist?

Another question, you say that "true_color:wmo" would be used during writing, so a geotiff with file pattern "{name}.tif" would be true_color:wmo.tif?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be like for datasets/channels. A user-configurable priority (with some satpy-proposed default) would determine which composite is loaded or returned if there are multiple. Although I don't think this is currently user-configurable for channels/datasets, we might want to make it so in the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerritholl sure, priorities are one thing, but I'm talking about the re-use of the string passed to Scene.load. Right now in a basic request you can have scn.load([str or Number]) where a number means the wavelength and a search is done based on that. The string means the "name" in the DataID. Depending on how tags are implemented, with my my_tc example I would expect to need to do scn.load(["my_tc"]) to load it by name or scn.load(["true_color:wmo"]) to load it by tag (assuming it has a wmo tag). But if it is the most "generic" true color recipe then doing scn.load(["true_color"]) seems confusing and error prone. That means that the search first by name if no : or by standard_name+tag if : present, then fallback to standard_name + no tag.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the doc was a bit outdated. I fixed it, and it should answer your questions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! Ok with those latest changes this is a totally different beast. Before it was a composite having a human-friendly-ish name and a "family" for its recipe (the "standard_name"). Now those two things are the same. I have to think about that.

@strandgren

Copy link
Copy Markdown
Collaborator

I think I'd still prefer a solution where "standard_name" wasn't what tagged names are based on, but rather some other new key. At least I think. Overall I'm just extremely scared of the dependency tree and enhancement configuration stuff needing to handle this as those things were already hard to understand and hold in my brain, not to mention a user's brain.

Well, I see standard_name as the composite category so to say, so for me it makes sense to use standard_name as a base for the tagging (and the enhancements). What would you call the alternative attribute?

I would actually agree that the standard_name would be a good name for grouping different variants of the same RGB for the tagging, but I also agree that using standard_name for both this and the enhancement is not optimal. As mentioned in earlier comments I created an issue about this a while ago: #3230. But for now that's just an idea and I wouldn't know how to properly implement it in practice - I also agree that it's probably outside the scope of the PR.

@djhoese

djhoese commented May 20, 2026

Copy link
Copy Markdown
Member

Using standard_name for this is the 3rd (?) use of standard_name. We have CF standard name which has meaning to the user and is written to NetCDF files but is also not necessarily/strictly CF compliant. We also have the enhancement stuff. We're definitely abusing a single value to mean multiple things, especially when that thing has meaning outside of pytroll/satpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:compositors enhancement code enhancements, features, improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants