Push-to-talk voice for thinking out loud into your main OpenClaw session — most often from the car.
The main use case is the drive: a feature, an essay, an app idea, or a stretch of training material you have been working on at the desk, picked back up on the highway. You switch that same OpenClaw thread to voice, use toggle talk mode to speak through the line of thought all the way through without being interrupted, and let your main agent push the work forward with full session context. Then, hours later, you reopen the thread on the laptop and read what was said.
Clawkie Talkie is not a separate assistant and not a quick-chat app. It is a voice lane into an existing OpenClaw conversation: the same session, the same thread, the same context — built for long-form prompts and long-form replies, not short back-and-forth.
You ask OpenClaw to switch to voice. It gives you a link. You open the link on your phone, press the big button to toggle talk mode, talk through the whole thought without interruption, and hear the agent answer back at length.
The original OpenClaw/Discord thread stays canonical. Your spoken direction is transcribed into that session, the agent responds in that session, and Clawkie Talkie plays the reply back to you. Everything you say and hear in the car is still there in the thread when you sit back down.
feature thread ── "switch to voice" ──▶ phone link ──▶ talk next step
▲ │
└──────────── same OpenClaw session keeps moving ◀───────┘
The normal install path is agent-run. Tell your agent:
Install Clawkie-Talkie for me by following https://github.com/davidguttman/clawkie-talkie/blob/master/AGENT-INSTALL.md.
Before installing, download or clone the repo and inspect the source; stop and ask me if anything looks suspicious, harmful, or you are not confident.
The install should set up the persistent Clawkie-Talkie daemon and the OpenClaw clawkie-voice-handoff skill so I can say "switch to voice" and get a working handoff link.
Verify the daemon/service status, logs, and skill configuration before reporting success. When finished, give me a plain English summary of what changed, what you verified, and any blockers.
The install does three important things:
- Runs the daemon as a persistent user service.
- Generates one stable
DAEMON_PEER_IDand keeps it in the daemon.env. - Installs the OpenClaw
clawkie-voice-handoffskill with the same host ID.
Manual daemon setup is documented here:
Custom frontend/signaling/TURN setup is documented here:
Agent install/upgrade/repair flow is documented here:
A lot of the most useful thinking happens away from the desk — and the most useful version of it is not a one-liner. It is a full thought, spoken straight through, while you are driving and nothing else is competing for the keyboard.
Push-to-talk is the point. You use toggle talk mode to hold the floor for as long as you need to: a paragraph, a few minutes, an entire framing for a feature or an essay. The agent does not interrupt, the phone does not interrupt, and you get to land the idea before anyone responds.
Typical sessions look like:
- drafting writing or training material out loud on a long drive, then reading the agent's response when you stop;
- exploring a new app or product idea with your main agent, with the OpenClaw session's existing context already loaded;
- thinking through a real feature or design decision in the session that already has the code, the thread, and the prior turns;
- working through review feedback or a product direction at length, instead of bouncing one-line prompts.
Clawkie Talkie is for that mode of work. It is not a quick voice chat. It is a voice lane into a real OpenClaw session for long-form steering, with the canonical written transcript waiting for you when you get back.
-
In an OpenClaw conversation, say:
switch to voice -
OpenClaw replies with a Clawkie Talkie link for that exact session.
-
Open the link in a browser, usually on your phone.
-
Tap the main button to record.
-
Talk through the whole thought — a long prompt, a paragraph of direction, a full framing for a feature or essay. Push-to-talk means the floor stays yours until you tap again.
-
Watch the live transcript while talking.
-
Tap stop. The transcript is sent immediately.
-
OpenClaw answers in the original session, at length, with the session's full context.
-
Clawkie Talkie speaks the reply back through the browser.
-
Later, at the desk, open the original OpenClaw/Discord thread and read everything that was said.
The interface is intentionally walkie-talkie shaped: one obvious start/stop control, visible transcript while speaking, a thinking state while the agent works, and spoken playback when the reply is ready. It is built for long-form voice steering of an existing session, not for becoming a separate quick-chat app.
A real handoff URL looks like this:
https://clawkietalkie.app/voice#host=<daemon>&session=<openclaw-session-id>&sessionKey=<openclaw-session-key>&channel=<channel>&target=<message-target>
Those values are not a login token or a credential bundle. They are routing information:
host— the stable local daemon ID on the user's machine.session— the OpenClaw session to continue. Prefer the actual OpenClaw sessionId UUID; fall back to the exact current session key only if the UUID is unavailable.sessionKey— optional exact OpenClaw session key, included whensessionis the UUID and used only for transcript mirroring / provider derivation / debug.target— optional explicit current message target, included when visible and used only for transcript mirroring.
channel is not a URL param. The values live in the URL hash so they are parsed by the browser locally instead of being sent to the web server as normal request parameters.
Clawkie Talkie has a hosted browser UI, but the private work happens locally.
- The browser captures microphone audio and plays reply audio.
- The browser stores only UI settings such as provider/model/voice IDs.
- The browser does not receive provider API keys.
- The browser does not receive OpenClaw credentials.
- The local daemon talks to the local
openclawCLI. - OpenClaw's existing config/auth remains the owner of LLM, STT, and TTS credentials.
The daemon is the trusted side. The browser is just the remote control and audio surface.
Phone / browser User's machine
┌────────────────────────────┐ ┌────────────────────────────┐
│ Clawkie Talkie web UI │ │ Clawkie Talkie daemon │
│ │ │ │
│ - mic capture │ WebRTC │ - session rendezvous │
│ - tap-to-talk control │ ◀──────▶ │ - OpenClaw CLI calls │
│ - live transcript display │ │ - STT / TTS orchestration │
│ - reply playback │ │ - channel/thread delivery │
└────────────────────────────┘ └────────────────────────────┘
│ │
│ signaling only │ local process calls
▼ ▼
https://api.rambly.app openclaw
The signaling service helps the browser and daemon find each other. By default this is the hosted rambly-compatible broker; custom signaling/ICE/TURN is documented in Custom Clawkie Talkie stack. The voice session itself runs over WebRTC. The daemon shells out to OpenClaw for transcription, the agent turn, reply delivery, and speech synthesis.
There is no inbound HTTP port for the daemon. It connects outbound to signaling and providers.
There is one durable daemon per machine. Its DAEMON_PEER_ID is the stable rendezvous identity used in host=.... Treat the daemon host ID and dashboard URL as bearer routing material: anyone who has it can connect to the host dashboard, enumerate recent sessions exposed by that daemon, and select one for voice handoff. Do not post it in public or shared chats; if it is exposed, rotate DAEMON_PEER_ID and update the installed OpenClaw skill/config to match.
Each voice handoff is then scoped by OpenClaw session:
voice room = daemon host + OpenClaw session
That means the same daemon can support multiple OpenClaw sessions without mixing them together. A Discord thread, a webchat session, and another channel can all produce different voice rooms through the same local daemon.
The agent does not call a daemon API to mint a link. It builds the URL directly from already-known OpenClaw context: daemon host ID and session ID. The session ID is preferably the actual OpenClaw UUID; colon-style session keys are fallback/legacy inputs and cannot be assumed to exist for every surface.
Clawkie Talkie depends on a working OpenClaw install for the same OS user that runs the daemon.
At minimum, that user needs:
openclawavailable onPATH;- a working OpenClaw session/runtime;
- a configured audio transcription provider;
- a configured text-to-speech provider.
The daemon uses these commands at runtime:
openclaw infer audio transcribe --file <wav> --json
openclaw agent --session-id <session> ...
openclaw infer tts convert --text <reply> --output <file> --jsonProvider selection is per request. Clawkie Talkie should not mutate OpenClaw's global provider preferences just because a user changes the voice settings in the browser.
From the repo root:
npm install
npm run devThat starts the daemon and the Vite client.
Common checks:
npm test
npm run typecheck
npm run buildRun only the daemon:
npm run daemonA healthy daemon prints something like:
Peer ID: <daemon-peer-id>
Join URL: https://clawkietalkie.app/dashboard#host=<daemon-peer-id>
Waiting for phone…
That host-only URL opens the host-scoped session dashboard. A real OpenClaw voice handoff still uses /voice#host=...&session=....
Check that the daemon is running and that the host value in the link matches the daemon's DAEMON_PEER_ID.
The handoff link is missing routing fields or was built for the wrong context. All handoff links need host and session; prefer the actual OpenClaw sessionId UUID, include sessionKey, channel, and target when those exact runtime values are also visible, use an exact session key in session only as fallback, and include channel/target when visible for transcript mirroring.
Check daemon logs. Common causes are:
openclawis not on the service user'sPATH;- OpenClaw is not authenticated/configured for that user;
- STT provider config is missing;
- TTS provider config is missing;
- the session ID in the link does not identify a real OpenClaw session.
DAEMON_PEER_ID changed. It should be generated once and kept stable. The daemon .env and the installed OpenClaw skill must use the same value.
This is usually an environment problem. The launchd/systemd service needs the same access to node, npm, and openclaw that your interactive shell has.
Corporate networks, VPNs, proxies, and blocked UDP/TURN traffic can prevent the browser and daemon from connecting.
client/— phone/browser UI.daemon/— local WebRTC/OpenClaw bridge.openclaw/clawkie-voice-handoff/— skill source for generating handoff links.docs/install-daemon.md— daemon install and service setup.docs/voice-handoff.md— deterministic rendezvous protocol.test/— client, daemon, protocol, and infer tests.
- macOS and Linux daemon install paths are documented.
- Windows daemon install is not documented yet.
- Clawkie Talkie is tied to OpenClaw sessions; it is not a standalone voice assistant.
- The original OpenClaw/channel thread remains the transcript source of truth.