Skip to content

refactor(openai): Split token counting by API for easier deprecation#5930

Draft
ericapisani wants to merge 5 commits intomasterfrom
ep/py-2271-refactor-openai-token-counting-8y2
Draft

refactor(openai): Split token counting by API for easier deprecation#5930
ericapisani wants to merge 5 commits intomasterfrom
ep/py-2271-refactor-openai-token-counting-8y2

Conversation

@ericapisani
Copy link
Copy Markdown
Member

@ericapisani ericapisani commented Apr 1, 2026

  • Ensures that token usage is correctly passed into the relevant token calculation method (when applicable)
  • Replace shared _calculate_token_usage() and _get_usage() with two API-specific functions: _calculate_completions_token_usage() (Chat Completions / Embeddings) and _calculate_responses_token_usage() (Responses API)
  • Each function accesses only the exact field names for its API — no more multi-name lookup patterns like ["input_tokens", "prompt_tokens"]
  • Add support for streaming_message_token_usage from stream_options={"include_usage": True} in streaming Completions
  • Add API section comments in _set_common_output_data to clarify which branch handles which API
  • Convert all call sites to use keyword arguments for readability

Motivation

When Chat Completions is deprecated, removing it should be a simple delete operation without auditing shared code. Before this change, _calculate_token_usage handled both APIs with interleaved logic, making it unclear what was safe to remove.

We also needed to ensure that, when handling streamed responses from the chat completions API, the usage in the final chunk of the message correctly made its way to the _calculate_*_token_usage method.

Other design decisions to note

Some duplication of code between the completions api and responses api token counting

This is to allow for easier removal of the completions api logic once it's fully deprecated/removed from the OpenAI API.

ericapisani and others added 2 commits April 1, 2026 13:11
…Responses API functions

Replace the shared `_calculate_token_usage()` and `_get_usage()` with
two API-specific functions: `_calculate_completions_token_usage()` and
`_calculate_responses_token_usage()`. This makes it clear which token
fields belong to which API and enables clean removal of Chat Completions
support when it is deprecated.

- Completions function extracts `prompt_tokens`, `completion_tokens`,
  `total_tokens` and supports `streaming_message_token_usage` for
  stream_options include_usage
- Responses function extracts `input_tokens`, `output_tokens`,
  `total_tokens` plus `cached_tokens` and `reasoning_tokens` details
- Add API section comments in `_set_common_output_data`
- Update all call sites to use the appropriate API-specific function
- Convert Completions call sites to use keyword arguments
- Update and rename unit tests; add Responses API token usage tests
- Add sync and async streaming tests for usage-in-stream

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n_usage call sites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericapisani ericapisani requested a review from a team as a code owner April 1, 2026 11:30
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 1, 2026

@ericapisani ericapisani marked this pull request as draft April 1, 2026 11:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (integrations) Instrument pyreqwest tracing by servusdei2018 in #5682

Internal Changes 🔧

  • (openai) Split token counting by API for easier deprecation by ericapisani in #5930
  • (opentelemetry) Ignore mypy error by alexander-alderman-webb in #5927
  • Update validate-pr workflow by stephanie-anderson in #5931

🤖 This preview updates automatically when you update the PR.

@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Codecov Results 📊

13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 6.57s

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 14826 uncovered lines.

Files with missing lines (1)
File Patch % Lines
openai.py 4.25% ⚠️ 654 Missing

Generated by Codecov Action

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Response usage silently overrides streaming token usage
    • Changed the second if statement to elif to ensure streaming_message_token_usage takes precedence over response.usage, preventing silent data loss.

Create PR

Or push these changes by commenting:

@cursor push 6e65058f68
Preview (6e65058f68)
diff --git a/sentry_sdk/integrations/openai.py b/sentry_sdk/integrations/openai.py
--- a/sentry_sdk/integrations/openai.py
+++ b/sentry_sdk/integrations/openai.py
@@ -164,8 +164,7 @@
 
     if streaming_message_token_usage:
         usage = streaming_message_token_usage
-
-    if hasattr(response, "usage"):
+    elif hasattr(response, "usage"):
         usage = response.usage
 
     if usage is not None:

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

… usage

The refactor that split _calculate_token_usage into separate Completions
and Responses functions dropped extraction of prompt_tokens_details.cached_tokens
and completion_tokens_details.reasoning_tokens from the Completions path.
This restores those fields so spans for cached prompts and reasoning models
(e.g. o1/o3) report complete token usage metrics.

Also fixes streaming usage priority: streaming_message_token_usage now
correctly takes precedence over response.usage via elif.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test used MagicMock(message="one") where message was a plain string,
but the real OpenAI API returns Choice objects with message.content. The
counting code checks hasattr(choice.message, "content"), which failed on
strings, so manual token counting was never exercised. Use real Choice
and ChatCompletionMessage objects and fix the expected output_tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

The stream_options parameter was not available in early versions of the
OpenAI Python SDK, causing TypeError on v1.0.1 CI runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant