Skip to content

Ensure safe_sleep tries alternative approaches#4146

Merged
TingluoHuang merged 1 commit intomainfrom
users/tihuang/fixsleep
Dec 11, 2025
Merged

Ensure safe_sleep tries alternative approaches#4146
TingluoHuang merged 1 commit intomainfrom
users/tihuang/fixsleep

Conversation

@TingluoHuang
Copy link
Member

@TingluoHuang TingluoHuang requested a review from a team as a code owner December 11, 2025 04:33
Copilot AI review requested due to automatic review settings December 11, 2025 04:33
@TingluoHuang TingluoHuang changed the title Ensure safe_sleep tries alternative approaches before falling back to… Ensure safe_sleep tries alternative approaches Dec 11, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the safe_sleep.sh script by implementing a cascading fallback strategy for sleep functionality. Instead of immediately falling back to a CPU-intensive busy-wait loop, the script now attempts multiple alternative approaches in order of preference.

Key Changes:

  • Adds primary check for the standard sleep command before attempting alternatives
  • Implements a ping-based sleep as a secondary fallback
  • Adds a read -t timeout-based sleep for bash environments as a tertiary option
  • Retains the existing busy-wait loop as the final fallback
Comments suppressed due to low confidence (2)

src/Misc/layoutroot/safe_sleep.sh:5

  • The parameter $1 is not validated before use. If the script is called without arguments or with invalid input (non-numeric, negative numbers, or special characters), it could cause unexpected behavior or errors in the various sleep implementations. Consider adding input validation at the beginning of the script to ensure $1 is a valid positive number.
    sleep "$1"

src/Misc/layoutroot/safe_sleep.sh:37

  • The busy-wait fallback continuously checks the SECONDS variable in a tight loop, which will consume 100% CPU for the duration of the sleep. This is inefficient and can impact system performance, especially for longer sleep durations. While this is a fallback for when no other option is available, consider adding a comment to warn about this behavior, or implement a less aggressive polling mechanism if possible.
SECONDS=0
while [[ $SECONDS -lt $1 ]]; do
    :
done

@multitrack-collector
Copy link

multitrack-collector commented Dec 11, 2025

Even Claude recognizes that the current implementation is not production ready code.


My Prompt

assume the role of a senior engineer and systems programmer who has experience with linux, bash, and cpu architecture. Analyze the following code below:

#!/bin/bash

SECONDS=0
while [[ $SECONDS -lt $1 ]]; do
    :
done

Claude's response

puts on senior engineer hat

Analysis

This is a busy-wait loop - and it's a textbook example of what not to do in production code.

What it does

  • Uses bash's built-in SECONDS variable (auto-increments each second the shell has been running)
  • Spins in a tight loop doing absolutely nothing (: is a no-op) until $SECONDS reaches the threshold passed as $1
  • Effectively creates a delay/sleep mechanism

Why this is problematic

1. CPU Wastage

  • This pegs an entire CPU core at 100% utilization doing nothing useful
  • The loop executes millions of iterations per second, constantly:
    • Reading $SECONDS
    • Performing the comparison
    • Branching back to loop start

2. Architecture implications

  • Modern CPUs have power management (C-states, frequency scaling)
  • Busy-waiting prevents the CPU from entering low-power states
  • Wastes thermal budget - that heat could be used for actual work
  • On multi-tenant systems (cloud VMs, containers), you're stealing CPU time from other processes

3. Scheduling inefficiencies

  • The process remains in a runnable state, consuming scheduler time slices
  • Prevents other processes from running on that core
  • Linux scheduler will keep giving it CPU time since it appears to be "working"

4. Precision issues

  • SECONDS only has 1-second granularity
  • The actual wait time will be less accurate than using proper sleep mechanisms
  • Race conditions at the boundary

The right approach

sleep "$1"

That's it. This:

  • Puts the process in interruptible sleep state
  • CPU enters idle state
  • Scheduler can use that core for other work
  • Uses kernel timers (hrtimers on modern systems) for accurate wakeup
  • Near-zero CPU usage

When busy-waiting might be acceptable

  • Never in bash - the overhead is too high
  • In low-level systems programming (spinlocks), only when:
    • Lock hold time is known to be extremely short (nanoseconds)
    • Context switch overhead exceeds expected wait time
    • You're on a dedicated real-time system

Even then, you'd use pause instructions (x86) or yield (ARM) to hint to the CPU that you're spinning.

Verdict

This code is an anti-pattern. It demonstrates a fundamental misunderstanding of operating system scheduling and resource management. I'd reject this in code review immediately.

@TingluoHuang TingluoHuang merged commit b2204f1 into main Dec 11, 2025
18 checks passed
@TingluoHuang TingluoHuang deleted the users/tihuang/fixsleep branch December 11, 2025 14:53
@viktorlott
Copy link

This isn't TTY-agnostic, because the read -t fallback only works when stdin/stdout/stderr are TTYs. You can break it pretty easy executing the script without TTYs:
PATH= ./safe_sleep.sh 5 </dev/null >/dev/null 2>/dev/null.

Also, read is a buildin keyword, so no need to check if it exists.

Here's a more robust solution:

#!/usr/bin/env bash

secs=${1:-0}

if command -v sleep >/dev/null 2>&1; then
    sleep "$secs"
    exit 0
fi

if [ -n "${BASH_VERSION-}" ]; then
    # Bash >= 4 has coproc, which gives us a TTY-agnostic approach using pure bash builtins.
    if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then
        coproc _slp { read; }
        read -t "$secs" <&"${_slp[0]}" || :
        kill "$_slp_PID" 2>/dev/null || :
        wait "$_slp_PID" 2>/dev/null || :
        exit 0
    fi

    # Creates a PTY with tail instead.
    if command -v tail >/dev/null 2>&1; then
        exec 3< <(tail -f /dev/null)
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi

    # If we have a controlling TTY.
    if exec 3</dev/tty 2>/dev/null; then
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi
fi

if command -v ping >/dev/null 2>&1; then
    ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :
    exit 0
fi

SECONDS=0
while (( SECONDS < secs )); do
    :
done

There are actually a bunch of tail alternatives that can be used instead that also creates PTYs.

@multitrack-collector
Copy link

This isn't TTY-agnostic, because the read -t fallback only works when stdin/stdout/stderr are TTYs. You can break it pretty easy executing the script without TTYs: PATH= ./safe_sleep.sh 5 </dev/null >/dev/null 2>/dev/null.

Also, read is a buildin keyword, so no need to check if it exists.

Here's a more robust solution:

#!/usr/bin/env bash

secs=${1:-0}

if command -v sleep >/dev/null 2>&1; then
    sleep "$secs"
    exit 0
fi

if [ -n "${BASH_VERSION-}" ]; then
    # Bash >= 4 has coproc, which gives us a TTY-agnostic approach using pure bash builtins.
    if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then
        coproc _slp { read; }
        read -t "$secs" <&"${_slp[0]}" || :
        kill "$_slp_PID" 2>/dev/null || :
        wait "$_slp_PID" 2>/dev/null || :
        exit 0
    fi

    # Creates a PTY with tail instead.
    if command -v tail >/dev/null 2>&1; then
        exec 3< <(tail -f /dev/null)
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi

    # If we have a controlling TTY.
    if exec 3</dev/tty 2>/dev/null; then
        read -t "$secs" <&3 || :
        exec 3<&-
        exit 0
    fi
fi

if command -v ping >/dev/null 2>&1; then
    ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :
    exit 0
fi

SECONDS=0
while (( SECONDS < secs )); do
    :
done

There are actually a bunch of tail alternatives that can be used instead that also creates PTYs.

I mean you could submit another PR

@multitrack-collector
Copy link

And BTW AI said some preety nice things about your implementation:


Now THIS is senior-level work. This is production-quality portable shell scripting. Let me break down why this is excellent.

What Makes This Great

1. The coproc Solution is Brilliant

coproc _slp { read; }
read -t "$secs" <&"${_slp[0]}" || :
kill "$_slp_PID" 2>/dev/null || :
wait "$_slp_PID" 2>/dev/null || :

This is the elegant solution I was looking for. Here's why:

  • Creates a pipe via coprocess without needing terminal
  • The read in the coproc blocks forever (no input)
  • Our read -t times out waiting on the pipe - pure kernel timer, zero CPU
  • No side effects on stdin/stdout/stderr
  • Clean subprocess management (kill + wait)
  • The || : prevents errexit issues

This is using bash internals exactly as they were designed. Chef's kiss. 👨‍🍳

2. The tail -f /dev/null Hack is Genius

exec 3< <(tail -f /dev/null)
read -t "$secs" <&3 || :
exec 3<&-

I've never seen this before, and it's beautiful:

  • tail -f /dev/null creates a process that will never produce output
  • Process substitution <(...) gives us a file descriptor
  • read -t waits on that FD, times out after $secs
  • Zero CPU usage, no terminal required
  • Clean FD management with explicit close

This works around the TTY requirement while still using kernel timers. Extremely clever.

3. Proper Bash Version Detection

if [ "${BASH_VERSINFO[0]}" -ge 4 ] 2>/dev/null; then
  • Uses bash array variable properly
  • Redirects stderr in case we're not in bash (array access would fail)
  • Targets Bash 4+ for coproc (released 2009, widely available)

4. TTY Fallback is Correct

if exec 3</dev/tty 2>/dev/null; then
    read -t "$secs" <&3 || :
  • Attempts to open the controlling terminal
  • Proper error handling with 2>/dev/null
  • Uses file descriptor 3 (avoids stdin collision)
  • Only used if nothing better is available

5. Defensive Programming Throughout

  • ${1:-0} - default value if no argument
  • Consistent || : to prevent failures from propagating
  • 2>/dev/null redirects suppress expected errors
  • Explicit exit 0 after each method (clear control flow)
  • All subshells properly cleaned up

6. Modern Shell Arithmetic

while (( SECONDS < secs )); do

Better than [[ ]] for arithmetic - native bash syntax, cleaner.

Minor Nits (We're Being Pedantic Now)

1. Input Validation Still Missing

secs=${1:-0}

# Should add:
if ! [[ "$secs" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
    echo "Error: Invalid duration: $secs" >&2
    exit 1
fi

Though in practice, read -t and other commands will fail gracefully with bad input.

2. The Ping Off-By-One

This is correct but deserves a comment:

# ping sends N packets with N-1 waits (1sec each), so +1 for desired delay
ping -c "$((secs + 1))" 127.0.0.1 >/dev/null 2>&1 || :

3. Potential FD Leak in Tail Method

If the script receives a signal between the exec 3< and exec 3<&-, FD3 might leak. In practice, this is unlikely to matter, but for paranoid scripting:

trap 'exec 3<&- 2>/dev/null' EXIT
exec 3< <(tail -f /dev/null)

4. Coproc Cleanup Could Be More Robust

# Current:
kill "$_slp_PID" 2>/dev/null || :
wait "$_slp_PID" 2>/dev/null || :

# Slightly better:
{ kill "$_slp_PID" && wait "$_slp_PID"; } 2>/dev/null || :

But this is really nitpicking - the current version is fine.

Systems-Level Assessment

CPU Efficiency Ranking

  1. sleep - optimal (kernel timer, no wakeups)
  2. coproc + read -t - excellent (kernel timer, one pipe)
  3. tail + read -t - excellent (kernel timer, one pipe, one subprocess)
  4. TTY + read -t - excellent (kernel timer)
  5. ping - acceptable (network stack overhead, but sleeps between)
  6. busy-wait - terrible (100% CPU)

Portability Coverage

  • ✅ Any system with sleep (99.9%)
  • ✅ Bash 4+ environments (most modern systems)
  • ✅ POSIX-ish systems with tail
  • ✅ Interactive shells with TTY
  • ✅ Systems with ping
  • ✅ Absolute fallback to busy-wait

This covers essentially every real-world scenario.

Real-World Applications

Where this shines:

  • Container images (minimal base images, unpredictable utilities)
  • CI/CD pipelines (varied execution environments)
  • Embedded systems (limited userspace)
  • Bootstrap scripts (before full system is available)
  • Library code (can't assume caller's environment)

Verdict

Rating: 9.5/10

This is exemplary shell scripting. The author clearly understands:

  • Bash internals (coprocesses, process substitution, FD management)
  • Unix process management (pipes, signals, subprocesses)
  • Kernel mechanisms (timers, TTYs, file descriptors)
  • Portability constraints across environments

The coproc and tail -f /dev/null solutions show deep systems knowledge. I would:

  • ✅ Approve immediately in code review
  • ✅ Use as a teaching example
  • ✅ Steal this for my own tooling

Only withholding 0.5 points for missing input validation, but honestly, that's me being pedantic. This is production-ready code written by someone who truly understands both shell scripting and systems programming.

Well done. 🎯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants