Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 15.1.0
How did you install ripgrep?
cargo install ripgrep
What operating system are you using ripgrep on?
Ubuntu 20.04
Describe your bug.
ripgrep returns no matches in compressed files when the decompressor exits with a nonzero exit code and multiple files are searched in parallel. It also fails to count matches in a single file when an error code is returned from the decompressor.
I have several .zst files, all of which were closed uncleanly (end-frame is corrupt). When I decompress them, they show millions of lines, but the decompressor reports an error. This is expected.
$ zstdcat foo.zst | wc -l
foo.zst : Read error (39) : premature end
3666594
$ zstdcat bar.zst | wc -l
bar.zst : Read error (39) : premature end
3666594
$ zstdcat foo.zst > /dev/null || echo $?
foo.zst : Read error (39) : premature end
1
The following things incorrectly fail to produce any results:
- Counting results:
rg -zc needle foo.zst
- Searching multiple files in parallel:
rg -z needle foo.zst bar.zst
By contrast, these succeed:
- Using zstdcat to decompress and piping this to rg:
zstdcat foo.zst | rg -c needle
- Searching multiple files with --sort:
rg -z needle foo.zst bar.zst --sort=path
- Searching a single file:
rg -z needle foo.zst
$ rg -zc "memory locking enabled" foo.zst
rg: foo.zst:
-------------------------------------------------------------------------------
foo.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
$ rg -z "memory locking enabled" foo.zst bar.zst
rg: foo.zst:
-------------------------------------------------------------------------------
foo.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
rg: bar.zst:
-------------------------------------------------------------------------------
bar.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
$ zstdcat foo.zst | rg -c "memory locking enabled"
foo.zst : Read error (39) : premature end
1
$ rg -z "memory locking enabled" foo.zst bar.zst --sort=path
foo.zst
2:Apr 25 17:13:26.023 000000000135 I osenv.posix memory locking enabled
rg: foo.zst:
-------------------------------------------------------------------------------
foo.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
bar.zst
2:Apr 25 17:13:26.023 000000000135 I osenv.posix memory locking enabled
rg: bar.zst:
-------------------------------------------------------------------------------
bar.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
$ rg -z "memory locking enabled" foo.zst
2:Apr 25 17:13:26.023 000000000135 I osenv.posix memory locking enabled
rg: foo.zst:
-------------------------------------------------------------------------------
foo.zst : Read error (39) : premature end
-------------------------------------------------------------------------------
What are the steps to reproduce the behavior?
# Create a compressed file with end-of-file damage
$ cat /usr/share/dict/words | zstd -o foo.zst
/*stdin*\ : 30.64% ( 962 KiB => 295 KiB, foo.zst)
$ ls -l foo.zst
-rw-rw-r-- 1 phord phord 301788 Apr 25 12:39 foo.zst
$ truncate --size=301700 foo.zst
$ zstdcat foo.zst | wc
foo.zst : Read error (39) : premature end
97093 97094 917504
$ zstdcat foo.zst > /dev/null || echo $?
foo.zst : Read error (39) : premature end
1
# Searching in one file works
$ rg -z aardvarks foo.zst --no-messages
20498:aardvarks
# Searching in two files fails
$ rg -z aardvarks foo.zst foo.zst --no-messages
# Unless they're sorted (single-threaded)
$ rg -z aardvarks foo.zst foo.zst --no-messages --sort=path
foo.zst
20498:aardvarks
foo.zst
20498:aardvarks
# Counting always fails
$ rg -cz aardvarks foo.zst --no-messages
What is the actual behavior?
ripgrep gives up on files when an error is returned from the decompressor.
What is the expected behavior?
I expect rg to successfully search the ouput of decompressors even when an error is eventually encountered.
This is arguably a bug in zstd since gzip doesn't exit with an error when the same truncation error is encountered.
Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 15.1.0
How did you install ripgrep?
cargo install ripgrep
What operating system are you using ripgrep on?
Ubuntu 20.04
Describe your bug.
ripgrep returns no matches in compressed files when the decompressor exits with a nonzero exit code and multiple files are searched in parallel. It also fails to count matches in a single file when an error code is returned from the decompressor.
I have several .zst files, all of which were closed uncleanly (end-frame is corrupt). When I decompress them, they show millions of lines, but the decompressor reports an error. This is expected.
The following things incorrectly fail to produce any results:
rg -zc needle foo.zstrg -z needle foo.zst bar.zstBy contrast, these succeed:
zstdcat foo.zst | rg -c needlerg -z needle foo.zst bar.zst --sort=pathrg -z needle foo.zstWhat are the steps to reproduce the behavior?
What is the actual behavior?
ripgrep gives up on files when an error is returned from the decompressor.
What is the expected behavior?
I expect rg to successfully search the ouput of decompressors even when an error is eventually encountered.
This is arguably a bug in zstd since gzip doesn't exit with an error when the same truncation error is encountered.