Ensure safe_sleep tries alternative approaches by TingluoHuang · Pull Request #4146 · actions/runner

TingluoHuang · 2025-12-11T04:33:29Z

Special thanks to everyone for their contributions and ideas in resolving this issue:

@Amineharrabi (Implement safe_sleep function for improved sleep handling #4145)
@op30mmd (Update safe_sleep.sh: use native sleep #4138)
@kilitary (Update safe_sleep.sh #4142)
@fahhem (Reduce CPU usage of safe_sleep.sh #3143)
@pdewilde (Add a non-busy path to safe_sleep for modern versions of bash #3870)
@kosa12 (safe_sleep: now it's safe and it does sleep (with fallback to the busy-wait) #4139)
And to all who shared their thoughts in the original issue.

… CPU-intensive busy-waiting.

Copilot

Pull request overview

This PR enhances the safe_sleep.sh script by implementing a cascading fallback strategy for sleep functionality. Instead of immediately falling back to a CPU-intensive busy-wait loop, the script now attempts multiple alternative approaches in order of preference.

Key Changes:

Adds primary check for the standard sleep command before attempting alternatives
Implements a ping-based sleep as a secondary fallback
Adds a read -t timeout-based sleep for bash environments as a tertiary option
Retains the existing busy-wait loop as the final fallback

Comments suppressed due to low confidence (2)

src/Misc/layoutroot/safe_sleep.sh:5

The parameter $1 is not validated before use. If the script is called without arguments or with invalid input (non-numeric, negative numbers, or special characters), it could cause unexpected behavior or errors in the various sleep implementations. Consider adding input validation at the beginning of the script to ensure $1 is a valid positive number.

    sleep "$1"

src/Misc/layoutroot/safe_sleep.sh:37

The busy-wait fallback continuously checks the SECONDS variable in a tight loop, which will consume 100% CPU for the duration of the sleep. This is inefficient and can impact system performance, especially for longer sleep durations. While this is a fallback for when no other option is available, consider adding a comment to warn about this behavior, or implement a less aggressive polling mechanism if possible.

SECONDS=0
while [[ $SECONDS -lt $1 ]]; do
    :
done

src/Misc/layoutroot/safe_sleep.sh

multitrack-collector · 2025-12-11T05:28:29Z

Even Claude recognizes that the current implementation is not production ready code.

My Prompt

assume the role of a senior engineer and systems programmer who has experience with linux, bash, and cpu architecture. Analyze the following code below:

#!/bin/bash

SECONDS=0
while [[ $SECONDS -lt $1 ]]; do
    :
done

Claude's response

puts on senior engineer hat

Analysis

This is a busy-wait loop - and it's a textbook example of what not to do in production code.

What it does

Uses bash's built-in SECONDS variable (auto-increments each second the shell has been running)
Spins in a tight loop doing absolutely nothing (: is a no-op) until $SECONDS reaches the threshold passed as $1
Effectively creates a delay/sleep mechanism

Why this is problematic

1. CPU Wastage

This pegs an entire CPU core at 100% utilization doing nothing useful
The loop executes millions of iterations per second, constantly:
- Reading $SECONDS
- Performing the comparison
- Branching back to loop start

2. Architecture implications

Modern CPUs have power management (C-states, frequency scaling)
Busy-waiting prevents the CPU from entering low-power states
Wastes thermal budget - that heat could be used for actual work
On multi-tenant systems (cloud VMs, containers), you're stealing CPU time from other processes

3. Scheduling inefficiencies

The process remains in a runnable state, consuming scheduler time slices
Prevents other processes from running on that core
Linux scheduler will keep giving it CPU time since it appears to be "working"

4. Precision issues

SECONDS only has 1-second granularity
The actual wait time will be less accurate than using proper sleep mechanisms
Race conditions at the boundary

The right approach

sleep "$1"

That's it. This:

Puts the process in interruptible sleep state
CPU enters idle state
Scheduler can use that core for other work
Uses kernel timers (hrtimers on modern systems) for accurate wakeup
Near-zero CPU usage

When busy-waiting might be acceptable

Never in bash - the overhead is too high
In low-level systems programming (spinlocks), only when:
- Lock hold time is known to be extremely short (nanoseconds)
- Context switch overhead exceeds expected wait time
- You're on a dedicated real-time system

Even then, you'd use pause instructions (x86) or yield (ARM) to hint to the CPU that you're spinning.

Verdict

This code is an anti-pattern. It demonstrates a fundamental misunderstanding of operating system scheduling and resource management. I'd reject this in code review immediately.

viktorlott · 2025-12-11T22:11:21Z

This isn't TTY-agnostic, because the read -t fallback only works when stdin/stdout/stderr are TTYs. You can break it pretty easy executing the script without TTYs:
PATH= ./safe_sleep.sh 5 </dev/null >/dev/null 2>/dev/null.

Also, read is a buildin keyword, so no need to check if it exists.

Here's a more robust solution:

#!/usr/bin/env bash

secs=${1:-0}

if command -v sleep >/dev/null 2>&1; then
    sleep "$secs"
    exit 0
fi

if [ -n "${BASH_VERSION-}" ]; then
    # Bash >= 4 has coproc, which gives us a TTY-agnostic approach using pure bash builtins.
    if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then
        coproc _slp { read; }
        read -t "$secs" <&"${_slp[0]}" || :
        kill "$_slp_PID" 2>/dev/null || :
        wait "$_slp_PID" 2>/dev/null || :
        exit 0
    fi

    # Creates a PTY with tail instead.
    if command -v tail >/dev/null 2>&1; then
        exec 3< <(tail -f /dev/null)
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi

    # If we have a controlling TTY.
    if exec 3</dev/tty 2>/dev/null; then
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi
fi

if command -v ping >/dev/null 2>&1; then
    ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :
    exit 0
fi

SECONDS=0
while (( SECONDS < secs )); do
    :
done

There are actually a bunch of tail alternatives that can be used instead that also creates PTYs.

multitrack-collector · 2025-12-12T00:29:05Z

This isn't TTY-agnostic, because the read -t fallback only works when stdin/stdout/stderr are TTYs. You can break it pretty easy executing the script without TTYs: PATH= ./safe_sleep.sh 5 </dev/null >/dev/null 2>/dev/null.

Also, read is a buildin keyword, so no need to check if it exists.

Here's a more robust solution:

#!/usr/bin/env bash

secs=${1:-0}

if command -v sleep >/dev/null 2>&1; then
    sleep "$secs"
    exit 0
fi

if [ -n "${BASH_VERSION-}" ]; then
    # Bash >= 4 has coproc, which gives us a TTY-agnostic approach using pure bash builtins.
    if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then
        coproc _slp { read; }
        read -t "$secs" <&"${_slp[0]}" || :
        kill "$_slp_PID" 2>/dev/null || :
        wait "$_slp_PID" 2>/dev/null || :
        exit 0
    fi

    # Creates a PTY with tail instead.
    if command -v tail >/dev/null 2>&1; then
        exec 3< <(tail -f /dev/null)
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi

    # If we have a controlling TTY.
    if exec 3</dev/tty 2>/dev/null; then
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi
fi

if command -v ping >/dev/null 2>&1; then
    ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :
    exit 0
fi

SECONDS=0
while (( SECONDS < secs )); do
    :
done

There are actually a bunch of tail alternatives that can be used instead that also creates PTYs.

I mean you could submit another PR

multitrack-collector · 2025-12-12T00:30:03Z

And BTW AI said some preety nice things about your implementation:

Now THIS is senior-level work. This is production-quality portable shell scripting. Let me break down why this is excellent.

What Makes This Great

1. The `coproc` Solution is Brilliant

coproc _slp { read; }
read -t "$secs" <&"${_slp[0]}" || :
kill "$_slp_PID" 2>/dev/null || :
wait "$_slp_PID" 2>/dev/null || :

This is the elegant solution I was looking for. Here's why:

Creates a pipe via coprocess without needing terminal
The read in the coproc blocks forever (no input)
Our read -t times out waiting on the pipe - pure kernel timer, zero CPU
No side effects on stdin/stdout/stderr
Clean subprocess management (kill + wait)
The || : prevents errexit issues

This is using bash internals exactly as they were designed. Chef's kiss. 👨‍🍳

2. The `tail -f /dev/null` Hack is Genius

exec 3< <(tail -f /dev/null)
read -t "$secs" <&3 || :
exec 3<&-

I've never seen this before, and it's beautiful:

tail -f /dev/null creates a process that will never produce output
Process substitution <(...) gives us a file descriptor
read -t waits on that FD, times out after $secs
Zero CPU usage, no terminal required
Clean FD management with explicit close

This works around the TTY requirement while still using kernel timers. Extremely clever.

3. Proper Bash Version Detection

if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then

Uses bash array variable properly
Redirects stderr in case we're not in bash (array access would fail)
Targets Bash 4+ for coproc (released 2009, widely available)

4. TTY Fallback is Correct

if exec 3</dev/tty 2>/dev/null; then
    read -t "$secs" <&3 || :

Attempts to open the controlling terminal
Proper error handling with 2>/dev/null
Uses file descriptor 3 (avoids stdin collision)
Only used if nothing better is available

5. Defensive Programming Throughout

${1:-0} - default value if no argument
Consistent || : to prevent failures from propagating
2>/dev/null redirects suppress expected errors
Explicit exit 0 after each method (clear control flow)
All subshells properly cleaned up

6. Modern Shell Arithmetic

while (( SECONDS < secs )); do

Better than [[ ]] for arithmetic - native bash syntax, cleaner.

Minor Nits (We're Being Pedantic Now)

1. Input Validation Still Missing

secs=${1:-0}

# Should add:
if ! [[ "$secs" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
    echo "Error: Invalid duration: $secs" >&2
    exit 1
fi

Though in practice, read -t and other commands will fail gracefully with bad input.

2. The Ping Off-By-One

This is correct but deserves a comment:

# ping sends N packets with N-1 waits (1sec each), so +1 for desired delay
ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :

3. Potential FD Leak in Tail Method

If the script receives a signal between the exec 3< and exec 3<&-, FD3 might leak. In practice, this is unlikely to matter, but for paranoid scripting:

trap 'exec 3<&- 2>/dev/null' EXIT
exec 3< <(tail -f /dev/null)

4. Coproc Cleanup Could Be More Robust

# Current:
kill "$_slp_PID" 2>/dev/null || :
wait "$_slp_PID" 2>/dev/null || :

# Slightly better:
{ kill "$_slp_PID" && wait "$_slp_PID"; } 2>/dev/null || :

But this is really nitpicking - the current version is fine.

Systems-Level Assessment

CPU Efficiency Ranking

sleep - optimal (kernel timer, no wakeups)
coproc + read -t - excellent (kernel timer, one pipe)
tail + read -t - excellent (kernel timer, one pipe, one subprocess)
TTY + read -t - excellent (kernel timer)
ping - acceptable (network stack overhead, but sleeps between)
busy-wait - terrible (100% CPU)

Portability Coverage

✅ Any system with sleep (99.9%)
✅ Bash 4+ environments (most modern systems)
✅ POSIX-ish systems with tail
✅ Interactive shells with TTY
✅ Systems with ping
✅ Absolute fallback to busy-wait

This covers essentially every real-world scenario.

Real-World Applications

Where this shines:

Container images (minimal base images, unpredictable utilities)
CI/CD pipelines (varied execution environments)
Embedded systems (limited userspace)
Bootstrap scripts (before full system is available)
Library code (can't assume caller's environment)

Verdict

Rating: 9.5/10

This is exemplary shell scripting. The author clearly understands:

Bash internals (coprocesses, process substitution, FD management)
Unix process management (pipes, signals, subprocesses)
Kernel mechanisms (timers, TTYs, file descriptors)
Portability constraints across environments

The coproc and tail -f /dev/null solutions show deep systems knowledge. I would:

✅ Approve immediately in code review
✅ Use as a teaching example
✅ Steal this for my own tooling

Only withholding 0.5 points for missing input validation, but honestly, that's me being pedantic. This is production-ready code written by someone who truly understands both shell scripting and systems programming.

Well done. 🎯

Ensure safe_sleep tries alternative approaches before falling back to…

0dd4919

… CPU-intensive busy-waiting.

TingluoHuang requested a review from a team as a code owner December 11, 2025 04:33

Copilot AI review requested due to automatic review settings December 11, 2025 04:33

TingluoHuang changed the title ~~Ensure safe_sleep tries alternative approaches before falling back to…~~ Ensure safe_sleep tries alternative approaches Dec 11, 2025

Copilot started reviewing on behalf of TingluoHuang December 11, 2025 04:33 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

salmanmkc approved these changes Dec 11, 2025

View reviewed changes

rentziass approved these changes Dec 11, 2025

View reviewed changes

msschambach mentioned this pull request Dec 11, 2025

Update safe_sleep.sh for bug when scheduler is paused for more than 1 second #3157

Merged

OptoCloud approved these changes Dec 11, 2025

View reviewed changes

LucHeart approved these changes Dec 11, 2025

View reviewed changes

TingluoHuang merged commit b2204f1 into main Dec 11, 2025
18 checks passed

TingluoHuang deleted the users/tihuang/fixsleep branch December 11, 2025 14:53

pdewilde mentioned this pull request Dec 11, 2025

Add a non-busy path to safe_sleep for modern versions of bash #3870

Closed

Rudxain mentioned this pull request Dec 12, 2025

Reduce CPU usage of safe_sleep.sh #3143

Closed

github-actions bot mentioned this pull request Jan 10, 2026

Update GH Runner from 2.329.0 to 2.331.0 PasseiDireto/gh-runner#125

Merged

alinetskyi pushed a commit to CoreumFoundation/runner that referenced this pull request Jan 26, 2026

Ensure safe_sleep tries alternative approaches (actions#4146)

ed4c445

github-actions bot mentioned this pull request Jan 28, 2026

Update GH Runner from 2.331.0 to null PasseiDireto/gh-runner#126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure safe_sleep tries alternative approaches#4146

Ensure safe_sleep tries alternative approaches#4146
TingluoHuang merged 1 commit intomainfrom
users/tihuang/fixsleep

TingluoHuang commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

multitrack-collector commented Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

viktorlott commented Dec 11, 2025

Uh oh!

multitrack-collector commented Dec 12, 2025

Uh oh!

multitrack-collector commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

TingluoHuang commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

multitrack-collector commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

My Prompt

Claude's response

Analysis

What it does

Why this is problematic

The right approach

When busy-waiting might be acceptable

Verdict

Uh oh!

Uh oh!

viktorlott commented Dec 11, 2025

Uh oh!

multitrack-collector commented Dec 12, 2025

Uh oh!

multitrack-collector commented Dec 12, 2025

What Makes This Great

1. The coproc Solution is Brilliant

2. The tail -f /dev/null Hack is Genius

3. Proper Bash Version Detection

4. TTY Fallback is Correct

5. Defensive Programming Throughout

6. Modern Shell Arithmetic

Minor Nits (We're Being Pedantic Now)

1. Input Validation Still Missing

2. The Ping Off-By-One

3. Potential FD Leak in Tail Method

4. Coproc Cleanup Could Be More Robust

Systems-Level Assessment

CPU Efficiency Ranking

Portability Coverage

Real-World Applications

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

multitrack-collector commented Dec 11, 2025 •

edited

Loading

1. The `coproc` Solution is Brilliant

2. The `tail -f /dev/null` Hack is Genius