Distribution (run cat /etc/os-release):
cat /etc/os-release
NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os
uname -a
Linux oryx 5.11.0-7612-generic #13~1617215757~20.04~97a8d1a-Ubuntu SMP Thu Apr 1 21:15:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Related Application and/or Package Version (run apt policy $PACKAGE NAME):
apt policy pop-default-settings
pop-default-settings:
Installed: 4.0.6~1611854075~20.04~6a2277e
Candidate: 4.0.6~1611854075~20.04~6a2277e
Version table:
*** 4.0.6~1611854075~20.04~6a2277e 1001
1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages
1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main i386 Packages
100 /var/lib/dpkg/status
Issue/Bug Description:
Commit 6a2277e reports:
fix: Set reasonable size for dirty bytes parameters
The kernel default is to buffer up to 10% of system RAM before flushing writes to the disk, which is insane. By setting a reasonable number of bytes for the dirty_bytes parameter, we can avoid sending the system into OOM during a large file transfer.
https://lwn.net/Articles/572911/
diff --git a/etc/sysctl.d/10-pop-default-settings.conf b/etc/sysctl.d/10-pop-default-settings.conf
index 987317f..0430a48 100644
--- a/etc/sysctl.d/10-pop-default-settings.conf
+++ b/etc/sysctl.d/10-pop-default-settings.conf
@@ -1 +1,3 @@
vm.swappiness = 10
+vm.dirty_bytes = 16777216
+vm.dirty_background_bytes = 4194304
Unfortunately this fix has the unintended side effect of completely trashing the performance of COW filesystems like BTRFS for regular use as rootfs/home on fast SSDs!
No penalty is observed when when writing large files to a BTRFS partition, but it has very negative effects on operations that do many small writes, like touching metadata on a btrfs receive operation or even just when writing a lot of small files (e.g. untarring a big archive with complex directory structure).
It can take up to 20 times the wall-clock time of running the same operation commenting out this change (which reverts to the default vm.dirty_ratio =20 and vm.dirty_background_ratio = 10).
When using BTRFS as rootfs and home, this is even worse, as operations as simple as apt update (or packagekit doing it in the background for you), apt upgrade but also just firefox/chrome regular operation (which can do frequent writes to the local on disk cache) can result in freezes lasting from some seconds to a few minutes where the CPU is stuck in iowait and all processes on the scheduler waiting for kernel triggered IO-trashing to be over.
Operations where the user is intentionally doing a lot of writes are even worse: compiling big projects, cloning a moderate or big git repo locally, using ccache become just unbearable!
My suggestion is to revert this change, or find a different compromise that manage to fix the occasional OOM problems writing big files to slow block devices, without making it impossible to do many small writes to fast devices.
The comments on the LWN article linked in the original commit are quite enlightening on the fact that similar problem on COW filesystems were anticipated following this path and that it might be difficult to strike a good balance without reworking the issue with actual kernel changes that would make these sysfs knobs superfluos.
Steps to reproduce (if you know):
- create a BTRFS partition on a fast SSD
- mount it (I am using options
defaults,noatime,compress=zstd but they are not particularly relevant, you can test with or without)
- have separate terminals where you are running
iotop and htop to examine CPU and IO utilization, alternatively you can also use sysstats to collect the data and visualize it afterwards
- time (tar -xpf some_large_and_complex_archive.tar --acls --xattrs -C /path/to/mountpoint ; sync )
- unmount the BTRFS partition
- sudo sysctl vm.dirty_ratio = 20; sudo sysctl vm.dirty_backgroud_ratio = 20;
- redo 1-4
- look at the difference between the spent time for the tar extraction in the 2 cases
Expected behavior:
Using Pop!OS on a BTRFS root filesystem should be usable, and its performance not crippled to avoid rare corner cases when writing large files to slow devices.
Other Notes:
My sample .tar to debug the performance issues I was seeing, that finally brought me to isolate commit 6a2277e as the root cause, was a backup of my old rootfs partition: it doesn;t need to be huge, anything that contains a lot of files, with a lot of associated metadata, will work.
Actually the smaller the ratio between total archived data size and number of files and metadata, the more the difference should be visible.
Distribution (run
cat /etc/os-release):Related Application and/or Package Version (run
apt policy $PACKAGE NAME):apt policy pop-default-settings pop-default-settings: Installed: 4.0.6~1611854075~20.04~6a2277e Candidate: 4.0.6~1611854075~20.04~6a2277e Version table: *** 4.0.6~1611854075~20.04~6a2277e 1001 1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages 1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main i386 Packages 100 /var/lib/dpkg/statusIssue/Bug Description:
Commit 6a2277e reports:
Unfortunately this fix has the unintended side effect of completely trashing the performance of COW filesystems like BTRFS for regular use as rootfs/home on fast SSDs!
No penalty is observed when when writing large files to a BTRFS partition, but it has very negative effects on operations that do many small writes, like touching metadata on a
btrfs receiveoperation or even just when writing a lot of small files (e.g. untarring a big archive with complex directory structure).It can take up to 20 times the wall-clock time of running the same operation commenting out this change (which reverts to the default
vm.dirty_ratio =20andvm.dirty_background_ratio = 10).When using BTRFS as rootfs and home, this is even worse, as operations as simple as
apt update(or packagekit doing it in the background for you),apt upgradebut also just firefox/chrome regular operation (which can do frequent writes to the local on disk cache) can result in freezes lasting from some seconds to a few minutes where the CPU is stuck in iowait and all processes on the scheduler waiting for kernel triggered IO-trashing to be over.Operations where the user is intentionally doing a lot of writes are even worse: compiling big projects, cloning a moderate or big git repo locally, using
ccachebecome just unbearable!My suggestion is to revert this change, or find a different compromise that manage to fix the occasional OOM problems writing big files to slow block devices, without making it impossible to do many small writes to fast devices.
The comments on the LWN article linked in the original commit are quite enlightening on the fact that similar problem on COW filesystems were anticipated following this path and that it might be difficult to strike a good balance without reworking the issue with actual kernel changes that would make these sysfs knobs superfluos.
Steps to reproduce (if you know):
defaults,noatime,compress=zstdbut they are not particularly relevant, you can test with or without)iotopandhtopto examine CPU and IO utilization, alternatively you can also usesysstatsto collect the data and visualize it afterwardsExpected behavior:
Using Pop!OS on a BTRFS root filesystem should be usable, and its performance not crippled to avoid rare corner cases when writing large files to slow devices.
Other Notes:
My sample
.tarto debug the performance issues I was seeing, that finally brought me to isolate commit 6a2277e as the root cause, was a backup of my old rootfs partition: it doesn;t need to be huge, anything that contains a lot of files, with a lot of associated metadata, will work.Actually the smaller the ratio between total archived data size and number of files and metadata, the more the difference should be visible.