Fix directory/file removal during package uninstall#13964
Fix directory/file removal during package uninstall#13964
Conversation
|
This is related to #13939 and pypa/virtualenv#3132 Though apparently i failed to run all of the unit tests.. so need to fix that |
|
side note: I added logic to jail the directory search to stay within the path list... otherwise if a package installs something at the top level of I will have to futz with the @notatallshaw @pfmoore just an FYI since you reviewed the other MR which unfortunately caused this scenario. I obviously failed to account for needing to expand coverage on other unit tests to make sure I was completely fixing what i thought i was fixing completely. This is not beautiful, but it was working in my test case... A simple repro: We would expect I will say as a side note that |
01c6d89 to
fae2b88
Compare
|
I will have to think a bit harder on the jail problem. I don't think i can actually filter on that because the point of the checks is to see if there are other adjacent files to see if they can be globbed. After thinking about the logic here, it seems possible that pip may try to remove I think namespace packages are probably not as supported as they could be. |
|
Hey @vfazio thanks for the PR, regressions in a tool as complex and widely used as pip are pretty normal and this doesn't seem like it's causing significant problems to anyone, I will try and make some time to review the problem and your PR when I can, but I can't promise any particular time window. |
|
I've been staring at this the past few days and I think I'm going slightly insane so probably need to take a break. Things I've learned:
Issues I've noticed independent of this MR
I'll clean up the commits and make a changelog entry when i've had some time to relax |
This set is not providing value so remove it. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
When walking directories via os.walk, `root` is an absolute path so the value of `dirname` in the returned tuple will thus be an absolute path and will replace the `root` value in the join. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
The `unchecked` search paths derived from the input paths were not terminated with os.path.sep but were compared against the wildcards that are, so unless the path was a subdirectory of a wildcard, it would not match. For example, root = /A/B/C/D would be skipped due to wildcard /A/B/C/ but if root = /A/B/C it would fail to be considered a match. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
Wildcards were derived from path values passed into the function which are not necessarily guaranteed to be in normcase. These values were being compared against normcase paths which could have led to failures. However, they are also used as displayable output from the function's return value so we need to maintain a mapping to the original value. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
Update the test to reflect that the path list may include directories (like __pycache__). Also, add a test case to ensure files that exist on disk and are not in the input path list but are under a directory specified in the list correctly get rolled up to a wildcard. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
The `compress_for_rename` function previously didn't fully support directories in the path list. This had impacts on the visual appearance of the planned files to be moved/removed and on the actual removal as it threw off the collapsing of directory wildcards which could cause orphaned directories in the installation location. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
…ories in path list
Mocking os.walk is like letting the neighborhood bard viciously mock your friend... At first everyone is in tears of laughter, but then it devolves into tears of pain and regret. In this case, mocking os.walk limits how `compress_for_rename` must be implemented, requires tweaking any time there's a change, and requires emulating the behavior of the underlying function. Instead, create the requested paths in a temporary directory and rely on pure os.walk behavior which we do in other test cases. Signed-off-by: Vincent Fazio <vfazio@gmail.com>
Signed-off-by: Vincent Fazio <vfazio@gmail.com>
Signed-off-by: Vincent Fazio <vfazio@gmail.com>
|
Some evidence to verify functionality: Previous: With the fixes in this MR |
|
Is this a regression in pip 26.1 ? |
Yes, Im pretty sure I accidentally introduced it with #13725 which threw a directory into the list of files to remove and I didn't do enough testing it appears. |
|
Ok, I did a few tests myself and indeed pip uninstall left empty directories behind in a few of my tests. That sounds quite significant. |
I'm hoping this hot fix is good enough in the interim. I do have another branch I'm working on but it's far more significant and reworks the logic entirely. The good news is that it's generally as fast or faster and handles more scenarios but with the large number of changes it doesn't feel like it should be a candidate for a bug fix |
|
I'm not very familiar with that part of the code base and won't have time to review this in depth until next weekend at best, unfortunately. Since this fix is not trivial and as you say you have a significant rework in progress, I would be inclined to revert #13725. @notatallshaw as you have reviewed #13725, if you have time I'd appreciate your opinion. |
|
I will take a look today, but that's also my inclination, is to revert and have a fix in the next release that tests these scenarios. |
|
Not the kind of contribution I was looking to make. 💣 💥. I have more confidence in this with the added tests but understand if it needs to be reverted to give it more time to bake |
|
@vfazio no worries :) |
I appreciate you saying that actually. I've been kicking my own ass for days because of this. Seems I'm my own harshest critic |
|
I'm going to throw out a wild idea. The original PR had simply walked That is very easy to add back in to get the benefits of the intended change and to avoid the general "directories in the path list" problem. Should only take me a few minutes to whip up and would be less difficult to actually review if that's the route we'd prefer to go. The ultimate problem is that directories just aren't supported in the path list. https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-record-file
This is a should and not a shall so while it's allowed, it's an edge case we've never had to consider. Of course, in that same block:
Directories are supported as values to be processed by |
diff --git a/src/pip/_internal/req/req_uninstall.py b/src/pip/_internal/req/req_uninstall.py
index adb215c35..ee340ce32 100644
--- a/src/pip/_internal/req/req_uninstall.py
+++ b/src/pip/_internal/req/req_uninstall.py
@@ -343,7 +343,10 @@ class UninstallPathSet:
# interpreter run at a different optimization level (PYTHONOPTIMIZE).
if os.path.splitext(path)[1] == ".py":
pycache = os.path.join(os.path.dirname(path), "__pycache__")
- self.add(pycache)
+ if os.path.isdir(pycache) and not os.path.islink(pycache):
+ for dirpath, _, dirfiles in os.walk(pycache):
+ for f in dirfiles:
+ self.add(os.path.join(dirpath, f))
def add_pth(self, pth_file: str, entry: str) -> None:
pth_file = self._normalize_path_cached(pth_file) |
Agreed, these things happen. Reverting feels to me to be the safest option. A quick PR to just walk |
|
Do i need to open a PR for the revert or will a maintainer just do that? I don't see a 26.x branch to target so i assume things just get cherry-picked or reverted on top of the tag ref? |
|
I opened the revert PR at #13973. Uninstallation is notoriously tricky, so it's better to not rush a fix under pressure. |
I'd just like to add on to this that as the pip maintainer that merged the PR it is ultimately my responsibility for any fallout. And I really appreciate that you proactively found issues reported on other repos and correlated them back to pip. |
Thanks! Sorry for the hassle everyone. |
|
Since we're taking a different tack, I will close this out with a different solution presented |
|
Sorry if I'm repeating things here but here's my summary of the issue:
This triggers for packages whose
|
I think it applies in broader circumstances? For instance I could reproduce with |
|
Sorry, I didn't provide a complete RCA post-mortem. The problem is partially captured in 37d4f84: The problem occurs if the final paths list includes a directory (it doesn't matter if it's via RECORD or via programmatic addition) and there are files in that directory that are not in the final path list (so not in RECORD or added programmatically). You can actually reproduce this in earlier versions of pip without my changes using virtualenv because virtualenv adds (.venv) vfazio@vfazio4:/tmp/tmp.OWwCbNcYli$ grep pycache /tmp/tmp.OWwCbNcYli/.new_venv/lib/python3.12/site-packages/pip-26.0.dist-info/RECORD | head -n3
pip-26.0.dist-info/licenses/src/pip/_vendor/idna/__pycache__,,
pip/_internal/__pycache__,,
pip-26.0.dist-info/licenses/src/pip/_vendor/truststore/__pycache__,,# create a new venv
virtualenv .new_venv --pip=26.0 --setuptools=82.0.1
# run pip and look at the file list
.new_venv/bin/python3 -m pip -vv uninstall pip
# run it at a different optimization level and look at the file list
PYTHONOPTIMIZE=2 .new_venv/bin/python3 -m pip -vv uninstall pip
# the well is now poisoned and will never be correctThis is, in a way, #11835 on steroids. Because there are files on disk that are not described in the file list to remove, the parent directories of those holding the files are prevented from becoming wildcards (because we may be removing files we do not own as part of the distribution). As previously mentioned, Since the files are not described in The fun part is that despite the files not being in the list and preventing the directories from rolling up, the files get removed anyway, which is a sick twist on #13939. In this case |
Address the situation where a file exists on disk but is not captured in an installation record and where its owning directory is selected for removal.
If a path is selected, all files in that path should be globbed and treated as if they were also selected.
This fixes an issue where orphaned directories could linger due to the directories not collapsing appropriately.
Fixes: (title)