Skip to content

[react-native-callingx] iOS: AudioSessionManager hardcodes .defaultToSpeaker, blocking earpiece-default for voice-only calls #2219

@daavsnts

Description

@daavsnts

Which package/packages do you use?

  • @stream-io/video-react-sdk
  • @stream-io/video-react-native-sdk
  • @stream-io/video-client

Also affects: @stream-io/react-native-callingx@0.1.1, @stream-io/react-native-webrtc@137.1.3.

Summary

@stream-io/react-native-callingx@0.1.1 ignores setDefaultAudioDeviceEndpointType('earpiece') on iOS because CallManager.start() short-circuits when callingx is active, and AudioSessionManager.swift hardcodes .defaultToSpeaker in categoryOptions regardless of intent. Removing .defaultToSpeaker via patch achieves earpiece-default but also exposes a separate, deeper bug: the default AudioSessionManager config destabilizes AudioEngineDevice (the LiveKit-style AVAudioEngine audio device introduced in WebRTC 137), so CallKit's lock-screen speaker button desyncs from the actual route and requires a double-tap (or worse) to disable speakerphone. We were able to fix that second issue too by mirroring react-native-callkeep's configureAudioSession config in callingx, but it's a non-trivial chain of patches that we believe should be addressed upstream.

Environment

  • @stream-io/video-react-native-sdk: 1.32.3
  • @stream-io/react-native-callingx: 0.1.1
  • @stream-io/react-native-webrtc: 137.1.3
  • React Native: 0.79.6 (Expo 53)
  • iOS: 26.x on physical device (iPhone)
  • Use case: 1:1 audio-only calls (no video) — receiver/earpiece is the expected default per business rule, like a phone call

Configuration

import { AudioSettingsRequest } from '@stream-io/video-react-native-sdk'

export const AudioDefaultSettings: AudioSettingsRequest = {
  mic_default_on: true,
  speaker_default_on: false,
  default_device: 'earpiece',
}

This config is passed to call.getOrCreate({ data: { settings_override: { audio: AudioDefaultSettings } } }) and call.join() is called normally.

Expected

Call starts with audio routed to the built-in receiver (earpiece), matching default_device: 'earpiece'.

Actual

Call starts with audio routed to the built-in speaker, ignoring the config entirely.

Root cause

Two interacting layers:

1. CallManager.start() bypasses on iOS+callingx

packages/react-native-sdk/src/modules/call-manager/CallManager.ts (lines 113–131):

start = (config?: StreamInCallManagerConfig): void => {
  if (shouldBypassForCallKit()) {
    videoLoggerSystem
      .getLogger('CallManager')
      .debug('start: skipping start as callkit is handling the audio session');
    return;  // <-- early return, config is ignored
  }
  NativeManager.setAudioRole(config?.audioRole ?? 'communicator');
  if (config?.audioRole === 'communicator') {
    const type = config.deviceEndpointType ?? 'speaker';
    NativeManager.setDefaultAudioDeviceEndpointType(type);
  }
  ...
};

shouldBypassForCallKit() returns true when Platform.OS === 'ios' and callingx is set up — i.e., the standard production setup for any consumer following the 1.32 migration guide. Result: setDefaultAudioDeviceEndpointType is never called on iOS in this configuration.

2. AudioSessionManager.swift in callingx hardcodes .defaultToSpeaker

packages/react-native-callingx/ios/AudioSessionManager.swift (around lines 12–21):

let categoryOptions: AVAudioSession.CategoryOptions
#if compiler(>=6.2) // For Xcode 26.0+
    categoryOptions = [.allowBluetoothHFP, .defaultToSpeaker]
#else
    categoryOptions = [.allowBluetooth, .defaultToSpeaker]
#endif
let mode: AVAudioSession.Mode = .voiceChat

.defaultToSpeaker is unconditional. There's no setter, no config path, no way to opt out at runtime.

For comparison, the non-CallKit code path already handles this correctly in packages/react-native-sdk/ios/StreamInCallManager.swift:128:

intendedOptions = defaultAudioDevice == .speaker
    ? [bluetoothOption, .defaultToSpeaker]
    : [bluetoothOption]

StreamInCallManager reads defaultAudioDevice (set via setDefaultAudioDeviceEndpointType) and gates .defaultToSpeaker on it. AudioSessionManager (callingx) does not — likely because the callingx path was developed under the assumption of video-first calls, where speaker default is correct.

Step 1: remove .defaultToSpeaker from callingx

Patch AudioSessionManager.swift to remove .defaultToSpeaker:

#if compiler(>=6.2)
    categoryOptions = [.allowBluetoothHFP]
#else
    categoryOptions = [.allowBluetooth]
#endif

Plus a parallel patch on react-native-webrtc's WebRTCModule+RTCMediaStream.m:691-693 (ensureAudioSessionWithRecording) — same removal — because that method also reapplies categoryOptions defensively if the session is reset.

This achieves earpiece-default ✓ but exposes a separate side effect.

Step 2: side effect — CallKit lock-screen speaker button gets stuck

After removing .defaultToSpeaker, the speaker button on iOS's native CallKit UI (lock screen, in-call screen) misbehaves:

  1. User taps speaker on CallKit → button becomes active, audio routes to speaker ✓
  2. ~2 seconds later, the button flips to inactive while audio stays in speaker — UI and route desync
  3. Tapping again to disable speaker is interpreted by CallKit as "turn on again" (since its UI thinks state is off), so the tap is effectively a no-op — speaker stays on
  4. In-app UI toggle keeps working correctly (one tap each direction), because we route through callManager.speaker.setForceSpeakerphoneOn(false)overrideOutputAudioPort(.none) directly

Through device logs and side-by-side comparison with react-native-callkeep (the lib used in our previous-major production app, which works correctly with the same Stream WebRTC 137 binary), we traced this to the AudioEngineDevice (LiveKit's AVAudioEngine-based audio device, introduced in WebRTC 137):

  1. CallKit fires overrideOutputAudioPort(.speaker) on the user's tap
  2. AudioEngineDevice reacts to internal route reconciliation and rebuilds the AVAudioEngine
  3. The rebuild fires an implicit setCategory, which clears the AVAudioSession override flag
  4. CallKit reads that flag for its UI state → button flips to "off" while audio remains in speaker
  5. Subsequent taps go to no-op because CallKit's internal state thinks speaker is already off

The root cause is that callingx's default AudioSessionManager config (mode .voiceChat, options HFP-only, no explicit sampleRate/ioBufferDuration, no reapplication on didActivateAudioSession) destabilizes AudioEngineDevice, triggering the spontaneous engine rebuild on route changes.

Step 3: fix — mirror react-native-callkeep's configureAudioSession

react-native-callkeep operates on the same Stream WebRTC 137 binary in production without this issue. The difference is its configureAudioSession, which uses a different (and stable) set of defaults. Mirroring those values in callingx fixes the side effect:

// AudioSessionManager.swift::createAudioSessionIfNeeded

- let mode: AVAudioSession.Mode = .voiceChat
+ let mode: AVAudioSession.Mode = .default

#if compiler(>=6.2)
-    categoryOptions = [.allowBluetoothHFP]
+    categoryOptions = [.allowBluetoothHFP, .allowBluetoothA2DP]
#else
-    categoryOptions = [.allowBluetooth]
+    categoryOptions = [.allowBluetooth, .allowBluetoothA2DP]
#endif

// Add explicit values to keep the audio engine from renegotiating:
+ rtcConfig.sampleRate = 44100
+ rtcConfig.ioBufferDuration = 0.005
// CallingxImpl.swift::provider:didActivateAudioSession

  RTCAudioSession.sharedInstance().audioSessionDidActivate(audioSession)

+ // callkeep does this on every CXProviderDelegate event; previously
+ // callingx only called it from performStartCallAction/performAnswerCallAction
+ AudioSessionManager.createAudioSessionIfNeeded()

With these in place on top of the Step 1 patch, the CallKit lock-screen toggle works correctly (one tap each direction) on iOS 26.3.1.

Why this matters

Voice-only 1:1 calling (WhatsApp/Phone-style UX) is a legitimate first-class use case, and the SDK already exposes the right API surface for it via setDefaultAudioDeviceEndpointType with 'earpiece'. The gap is that on iOS + callingx that config doesn't reach the audio session, which forces consumers to maintain a chain of native patches just to reach the documented behaviour:

  1. Patch callingx + react-native-webrtc to drop the hardcoded .defaultToSpeaker
  2. Patch callingx again to mirror react-native-callkeep's audio session config so the CallKit UI doesn't desync from the actual route

Both layers feel like things that should be solvable inside callingx itself — either by gating .defaultToSpeaker on setDefaultAudioDeviceEndpointType (analogous to what StreamInCallManager already does on the non-CallKit path), or by adopting the known-stable AudioSessionManager defaults that match react-native-callkeep's production behaviour with the same WebRTC binary.

Thanks

Huge thanks to the Stream Video team — the 1.32 line is a real step up, and the callingx migration is clearly a lot of careful work, especially around CallKit and iOS 26. This report is meant as a friendly heads-up from a consumer who hit the audio-only edge of an otherwise great release, not a complaint. Really appreciate everything you all ship, and happy to test patches, share repro projects, or jump on anything that helps narrow this down. 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions