Introduce composite tags#3347
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3347 +/- ##
==========================================
- Coverage 96.34% 96.31% -0.03%
==========================================
Files 463 466 +3
Lines 58959 59274 +315
==========================================
+ Hits 56802 57092 +290
- Misses 2157 2182 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This looks great! This PR adds a special meaning to "standard_name". Does this break backward compatibility? If I understand correctly, loading composites will no longer match by the key in the YAML file, but by the contents of the standard name? We (at least in DWD) currently (ab)use "standard_name" to get a match between a composite and its enhancement. For example, we have many composites with standard name As an example of what we currently use: Is this still loadable as |
pnuu
left a comment
There was a problem hiding this comment.
Some comments inline. I guess the variant loading without defining the variant in any way will need a test where different combinations are defined as options.
|
|
||
| 1. Try each tag in the list, in order, looking for a compositor with matching | ||
| ``standard_name`` and that tag. | ||
| 2. If no tagged variant is found, fall back to the normal name-based lookup. |
There was a problem hiding this comment.
Which variant is used if the tag is not defined in the load and the global env variable is not set and all/some of the following composites are defined?
foo_wmofoo_creflfoo
Each of them have the same standard_name: foo defined, no name is set. Only the first line in the YAML definition is different and for the first two there are tags set.
So what would I get with scn.load(["foo"]) call in the following cases:
- all variants are defined (I'd expect
foovariant without any tags) - only the two tagged versions are defined
There was a problem hiding this comment.
If tags are not provided, neither at load time or in the configuration, the current behaviour is preserved, ie we will use the name (not standard name) to choose the composite to load.
- you get the composite name "foo"
- crash if you don't provide a tag (at load time or as config).
There was a problem hiding this comment.
So something like this for the first case? I'm not sure how this works, but hopefully it's close enough:
pytest.param(
{"comp1": None, "comp1_wmo": ["wmo"]},
None,
"comp1",
"comp1",
id="use_plain_version_when_no_tag",
),For case 2. the failure might need a completely separate test?
|
Good point Gerrit. I think that is actually the only use-case for |
Thanks :)
It is backwards compatible. Names of compositors that do not contain ":" are processed as usual.
We use the same thing at SMHI, and the current behaviour should be unaffected. I agree it's not really clean to use standard_name for this purpose, would be nice if we had a good solution for that. |
|
I think I've said this before, but I was never a fan of the The other thing composite tags is bringing up which @gerritholl had mentioned in the meeting yesterday is what does a We may need to define a "family" for composites or "template" or something like that for tags instead of reusing "standard_name" for this. At least that's my gut feeling right now. |
|
I agree that the Regarding the similarity with the data id/data query: Indeed, this is quite similar, however, IMO this PR presents a much more user-friendly interface to a problem that many users with custom configs face: they define a variant for a composite and have a simple way to access it without overwriting the default composite. I am aware that I short circuited the data id entirely here, because I wanted to keep the tags separate from it. I'm open to feedback of course, if you thing this is the wrong decision. The last commit added the access to composites in the scene using tags btw, and I'd love to get a review on that. |
djhoese
left a comment
There was a problem hiding this comment.
I'm sorry but I don't really understand. At a high level I've never really liked that we depend on a .attrs["_satpy_id"] magic attribute, but DataIDs are central to what Satpy does so it seemed like an important exception to allow this "magic". This new "_satpy_compositor_name" seems to be doing a lot for essentially an alias. If I'm not mistaken the two options were:
- Make the Scene and
Scene.__getitem__calls understand tag references and store composites with their YAML name. - Store things as their requested/tagged name and update enhancements and composites to understand them.
So I think this PR is option number 2, but does it actually handle composites that reference a composite? For example, if a composite depends on true_color_wmo but that composite is now sitting in the Scene as true_color:wmo, does that work?
I think I'd still prefer a solution where "standard_name" wasn't what tagged names are based on, but rather some other new key. At least I think. Overall I'm just extremely scared of the dependency tree and enhancement configuration stuff needing to handle this as those things were already hard to understand and hold in my brain, not to mention a user's brain.
That's correct, it is an alias to be able to go back to the yaml name, so that we can keep the tagged version superficial and be able to go back to the yaml name for the dependency tree for example.
Yes, this is option 2, but it's really just a name translation, so I would expect that use-case to work. But I'll get back to you on this.
Well, I see standard_name as the composite category so to say, so for me it makes sense to use standard_name as a base for the tagging (and the enhancements). What would you call the alternative attribute? |
| Composite variants | ||
| ------------------ | ||
|
|
||
| .. versionadded:: 0.60 |
There was a problem hiding this comment.
TODO: Update to 0.61 across PR...if not 0.62 if we feel we need a 0.61 that doesn't include this.
| This also means that :attr:`~satpy.scene.Scene.missing_datasets` is empty | ||
| after a successful load, and that writing the scene with any writer will use | ||
| ``"true_color:wmo"`` as the dataset name. | ||
|
|
||
| The same rule applies when a tagged variant is selected automatically via | ||
| ``preferred_composite_tags``: the dataset is stored under the plain requested | ||
| name (e.g. ``"true_color"``), not under the compositor's YAML key name. |
There was a problem hiding this comment.
I could be overthinking this and I've only been looking at the documentation before diving into the code but: There seems to be a two separate pieces here that are maybe used for the same thing. There is the composite name (the name via the YAML section) and then the standard_name + tag. This chunk of documentation here says that the dataset is stored "under the plain requested name", but how does that work?
If we have a composite with name "my_tc" and standard_name "true_color" and zero or more tags. If I do scn.load(["true_color"]) then is that treated as a "no tag"/generic composite request and "my_tc" could be found? Or is that considered a search for the specific composite named "true_color" that may or may not exist?
Another question, you say that "true_color:wmo" would be used during writing, so a geotiff with file pattern "{name}.tif" would be true_color:wmo.tif?
There was a problem hiding this comment.
I think it would be like for datasets/channels. A user-configurable priority (with some satpy-proposed default) would determine which composite is loaded or returned if there are multiple. Although I don't think this is currently user-configurable for channels/datasets, we might want to make it so in the future.
There was a problem hiding this comment.
@gerritholl sure, priorities are one thing, but I'm talking about the re-use of the string passed to Scene.load. Right now in a basic request you can have scn.load([str or Number]) where a number means the wavelength and a search is done based on that. The string means the "name" in the DataID. Depending on how tags are implemented, with my my_tc example I would expect to need to do scn.load(["my_tc"]) to load it by name or scn.load(["true_color:wmo"]) to load it by tag (assuming it has a wmo tag). But if it is the most "generic" true color recipe then doing scn.load(["true_color"]) seems confusing and error prone. That means that the search first by name if no : or by standard_name+tag if : present, then fallback to standard_name + no tag.
There was a problem hiding this comment.
ok, the doc was a bit outdated. I fixed it, and it should answer your questions.
There was a problem hiding this comment.
Whoa! Ok with those latest changes this is a totally different beast. Before it was a composite having a human-friendly-ish name and a "family" for its recipe (the "standard_name"). Now those two things are the same. I have to think about that.
I would actually agree that the |
|
Using |
This PR introduce a system of tagging for composites, allowing switching preferred variants at load time or by configuration.
For example, having the following config:
We can then load
scn.load(["true_color:crefl"])directly to load "true_color_crefl". Alternatively, we can set the config withSATPY_PREFERRED_COMPOSITE_TAGS=creflso thatscn.load(["true_color"])loads "true_color_crefl" by default.This is needed in order implement WMO RGB recommendations in a backwards- and forwards-compatible manner.
Also, possibility to warn for composites at load time has been added.
Disclaimer: AI was used in the design and implementation of this PR.