feat: animated Clawd avatar for camera bubble with face tracking#1672
feat: animated Clawd avatar for camera bubble with face tracking#1672namearth5005 wants to merge 12 commits intoCapSoftware:mainfrom
Conversation
| &constants.queue, | ||
| uniforms.camera, | ||
| Some((avatar_xy, &avatar_frame, segment_frames.recording_time)), | ||
| encoder, | ||
| ); | ||
| self.camera_only.prepare_with_encoder( | ||
| &constants.device, | ||
| &constants.queue, |
There was a problem hiding this comment.
GPU resources not freed when avatar mode is disabled
set_avatar_enabled(false) only sets self.avatar_enabled = false, but leaves self.avatar, self.face_tracker, and self.face_pose_smoother as Some(...). The AvatarRenderer holds a wgpu::RenderPipeline, two wgpu::Textures, two wgpu::Buffers, and a wgpu::BindGroup — significant GPU memory that will remain allocated until the entire RendererLayers is dropped.
Users who toggle avatar mode off (Settings → Experimental) would expect these resources to be released:
| &constants.queue, | |
| uniforms.camera, | |
| Some((avatar_xy, &avatar_frame, segment_frames.recording_time)), | |
| encoder, | |
| ); | |
| self.camera_only.prepare_with_encoder( | |
| &constants.device, | |
| &constants.queue, | |
| pub fn set_avatar_enabled(&mut self, device: &wgpu::Device, enabled: bool) { | |
| self.avatar_enabled = enabled; | |
| if enabled && self.avatar.is_none() { | |
| self.avatar = Some(crate::avatar::AvatarRenderer::new(device)); | |
| self.face_tracker = Some(cap_face_tracking::FaceTracker::new()); | |
| self.face_pose_smoother = Some(avatar_smoothing::FacePoseSmoother::new()); | |
| } else if !enabled { | |
| self.avatar = None; | |
| self.face_tracker = None; | |
| self.face_pose_smoother = None; | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/rendering/src/lib.rs
Line: 2889-2896
Comment:
**GPU resources not freed when avatar mode is disabled**
`set_avatar_enabled(false)` only sets `self.avatar_enabled = false`, but leaves `self.avatar`, `self.face_tracker`, and `self.face_pose_smoother` as `Some(...)`. The `AvatarRenderer` holds a `wgpu::RenderPipeline`, two `wgpu::Texture`s, two `wgpu::Buffer`s, and a `wgpu::BindGroup` — significant GPU memory that will remain allocated until the entire `RendererLayers` is dropped.
Users who toggle avatar mode off (Settings → Experimental) would expect these resources to be released:
```suggestion
pub fn set_avatar_enabled(&mut self, device: &wgpu::Device, enabled: bool) {
self.avatar_enabled = enabled;
if enabled && self.avatar.is_none() {
self.avatar = Some(crate::avatar::AvatarRenderer::new(device));
self.face_tracker = Some(cap_face_tracking::FaceTracker::new());
self.face_pose_smoother = Some(avatar_smoothing::FacePoseSmoother::new());
} else if !enabled {
self.avatar = None;
self.face_tracker = None;
self.face_pose_smoother = None;
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| unsafe extern "C-unwind" { |
There was a problem hiding this comment.
Unsound transmute to retrieve results from
landmarks_request
self.landmarks_request is typed as arc::R<vn::Request> but was constructed as a VNDetectFaceLandmarksRequest (or a DetectFaceRectanglesRequest in the fallback case). The code reinterprets its raw pointer as &vn::DetectFaceRectanglesRequest to call .results():
let face_req: &vn::DetectFaceRectanglesRequest =
&*(raw as *const vn::DetectFaceRectanglesRequest);
face_req.results()In the fallback path (create_landmarks_request returns a DetectFaceRectanglesRequest transmuted to vn::Request), this double-cast works. But in the normal path where self.landmarks_request is genuinely a VNDetectFaceLandmarksRequest, calling methods through a DetectFaceRectanglesRequest pointer is technically UB in Rust, even if the Objective-C runtime happens to handle it correctly due to the class hierarchy.
Consider casting to VNDetectFaceLandmarksRequest directly (which is the correct type), or using the Objective-C runtime's objc_msgSend / cidre's typed APIs for results() on the actual class.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/face-tracking/src/macos.rs
Line: 235-240
Comment:
**Unsound transmute to retrieve results from `landmarks_request`**
`self.landmarks_request` is typed as `arc::R<vn::Request>` but was constructed as a `VNDetectFaceLandmarksRequest` (or a `DetectFaceRectanglesRequest` in the fallback case). The code reinterprets its raw pointer as `&vn::DetectFaceRectanglesRequest` to call `.results()`:
```rust
let face_req: &vn::DetectFaceRectanglesRequest =
&*(raw as *const vn::DetectFaceRectanglesRequest);
face_req.results()
```
In the fallback path (`create_landmarks_request` returns a `DetectFaceRectanglesRequest` transmuted to `vn::Request`), this double-cast works. But in the normal path where `self.landmarks_request` is genuinely a `VNDetectFaceLandmarksRequest`, calling methods through a `DetectFaceRectanglesRequest` pointer is technically UB in Rust, even if the Objective-C runtime happens to handle it correctly due to the class hierarchy.
Consider casting to `VNDetectFaceLandmarksRequest` directly (which is the correct type), or using the Objective-C runtime's `objc_msgSend` / cidre's typed APIs for `results()` on the actual class.
How can I resolve this? If you propose a fix, please make it concise.…face tracking events
…era avatar Loads avatar.riv via createRive if available, drives state machine inputs from face tracking events. Falls back to Canvas2D Clawd when no .riv file is present.
Summary
Adds an animated Clawd (Claude Code mascot) avatar that replaces the webcam feed in Cap's camera bubble, driven by real-time face tracking via Apple Vision framework.
What it does
When "Avatar mode (Clawd)" is enabled in Settings → Experimental:
How it works
The avatar is drawn entirely procedurally in a WGSL shader using SDF (signed distance field) rectangles — no external assets, textures, or model files. The character is resolution-independent and anti-aliased.
Animation features
Spring physics (two profiles)
New files
crates/face-tracking/crates/rendering/src/avatar.rscrates/rendering/src/avatar_smoothing.rscrates/rendering/src/shaders/avatar-clawd.wgslcrates/project/src/configuration.rsZero impact on existing features
DecodedFramelevel — CameraLayer doesn't know the differencePlatform support
Test plan
Greptile Summary
This PR adds a complete animated "Clawd" avatar system to Cap's camera bubble, driven by real-time Apple Vision face tracking. The feature is well-architected — a new
cap-face-trackingcrate cleanly isolates the platform-specific Vision code behind a cross-platform stub, spring physics smooth all face pose data, and a WGSL SDF shader renders the character procedurally with no external assets. The swap is done at theDecodedFramelevel so the existing camera bubble rendering path requires no changes.However, there are several correctness issues that need attention before this ships:
AvatarUniformsstruct is 56 bytes but WGSL's struct alignment rules (SizeOf = RoundUp(AlignOf, last_offset+last_size)) require 64 bytes. The_paddingfield must be[f32; 4](not[f32; 2]), and_padding: vec2<f32>in the shader must change tovec4<f32>. With wgpu validation active this will produce a binding size error.device.poll(wgpu::PollType::Wait)stalls the CPU on every rendered frame, eliminating all CPU/GPU parallelism in the pipeline.AvatarBackgroundsetting is dead code:Dark | Light | Gradientare defined and exposed in the UI, butbg_colorinAvatarUniformsis always hardcoded to the dark value — the setting has no effect.AvatarRenderer,FaceTracker, orFacePoseSmoother, leaving GPU buffers and textures allocated indefinitely.prepareandprepare_with_encoder; a shared helper method is needed.smoother.update(&raw_pose, 33.0)andavatar.render(..., 1.0/30.0)assume 30 fps; actual elapsed time should be derived from the recording timeline.Confidence Score: 2/5
crates/rendering/src/avatar.rsandcrates/rendering/src/lib.rsneed the most attention — uniform buffer padding, blocking readback, dead AvatarBackground field, resource cleanup, and code duplication all live there.Important Files Changed
Sequence Diagram
sequenceDiagram participant Webcam as Real Webcam participant CamFrame as DecodedFrame (camera) participant FaceTracker as FaceTracker (Apple Vision) participant Smoother as FacePoseSmoother (spring physics) participant AvatarRend as AvatarRenderer (wgpu) participant GPU as GPU (WGSL shader) participant CamLayer as CameraLayer::prepare() participant Output as Video Output Webcam->>CamFrame: RGBA pixels (still captured) CamFrame->>FaceTracker: track(rgba_data, w, h) Note over FaceTracker: RGBA→BGRA copy<br/>CVPixelBuffer<br/>VNDetectFaceRectanglesRequest<br/>VNDetectFaceLandmarksRequest FaceTracker-->>Smoother: raw FacePose (pitch/yaw/roll/mouth/eyes) Smoother-->>AvatarRend: smoothed FacePose + dt=33ms (hardcoded) AvatarRend->>GPU: write AvatarUniforms (56 bytes ⚠️ should be 64) GPU-->>AvatarRend: render pass → 512×512 RGBA texture Note over AvatarRend: device.poll(Wait) ⚠️ blocking AvatarRend-->>CamLayer: output_rgba() → new DecodedFrame(512×512) CamLayer->>Output: composited camera bubble (existing shader) Note over Output: Same pipeline for editor preview AND exportComments Outside Diff (5)
crates/rendering/src/avatar.rs, line 450-451 (link)The Rust
AvatarUniformsstruct is 56 bytes, but the corresponding WGSL struct requires 64 bytes due to WGSL's struct alignment rules (SizeOf(S) = RoundUp(AlignOf(S), last_offset + last_size)=RoundUp(16, 48+8) = 64).When wgpu validates the binding at draw time, it checks that the bound buffer is at least as large as the shader's declared struct size (64 bytes). With only 56 bytes in the buffer, this triggers a validation error:
_paddingshould be extended from[f32; 2]to[f32; 4]to make the Rust struct 64 bytes:The WGSL
_padding: vec2<f32>must also be updated tovec4<f32>to keep both sides in sync.Prompt To Fix With AI
crates/rendering/src/avatar.rs, line 748-759 (link)device.poll(wgpu::PollType::Wait)blocks the calling thread until ALL previously submitted GPU work completes. This is called on every single frame render and turns an inherently asynchronous GPU pipeline into a fully synchronous one.For a 512×512 RGBA texture this may be acceptable in testing, but in the recording/rendering pipeline (which renders every video frame), this stall occurs on every frame: it submits GPU work, waits for completion, then maps the buffer — eliminating any CPU/GPU parallelism.
Consider using double-buffering (submit to buffer N, read from buffer N-1 which finished last frame) or keeping the avatar data GPU-side and compositing it directly without a CPU readback. If CPU readback is truly required, at minimum pipeline the map callback asynchronously rather than block with
PollType::Wait.Prompt To Fix With AI
crates/rendering/src/lib.rs, line 2932-2997 (link)prepareandprepare_with_encoderThe entire face-tracking + avatar-render + camera-prepare block (~45 lines) is copy-pasted identically in both
prepareandprepare_with_encoder. Any future changes (e.g. applyingAvatarBackground, fixing the hardcoded dt) must be made in two places.Extract the shared logic into a helper that produces the
avatar_frameandavatar_xy, then call it from both paths:This also fixes the hardcoded
33.0 ms/1.0/30.0dt values by passing in real elapsed time.Prompt To Fix With AI
crates/rendering/src/avatar.rs, line 688 (link)AvatarBackgroundsetting is wired up but never appliedgeneral_settings.rsadds bothavatar_mode: boolandavatar_background: AvatarBackground, andconfiguration.rsdefinesAvatarBackground::Dark | Light | Gradient. However, thebg_colorpassed toAvatarUniformsis always hardcoded to[0.15, 0.15, 0.18, 1.0](dark) regardless of the user's selection.The
AvatarBackgroundvalue is never read anywhere in the rendering path, making the setting dead code. Either apply it here (mappingDark/Light/Gradientto the appropriate RGBA values) or remove the setting until it's implemented.Prompt To Fix With AI
crates/rendering/src/lib.rs, line 2941 (link)This assumes 30 fps regardless of the actual rendering frame rate. If the pipeline runs at 60 fps (16.6 ms) the spring will under-advance; at lower frame rates it will over-advance, causing stiff or springy animation artifacts.
The actual elapsed time should be derived from
segment_frames.recording_timeor passed down through the call chain, and both occurrences (inprepareandprepare_with_encoder) need the same fix.Prompt To Fix With AI
Prompt To Fix All With AI
Last reviewed commit: "chore: update Cargo...."