Skip to content

feat(Datastore): Introduce Client Side Metrics#12718

Merged
lqiu96 merged 72 commits intomainfrom
datastore-csm-impl-2
Apr 29, 2026
Merged

feat(Datastore): Introduce Client Side Metrics#12718
lqiu96 merged 72 commits intomainfrom
datastore-csm-impl-2

Conversation

@lqiu96
Copy link
Copy Markdown
Member

@lqiu96 lqiu96 commented Apr 8, 2026

No description provided.

lqiu96 added 22 commits March 31, 2026 23:41
Update DatastoreAdminStubSettings and DatastoreStubSettings to include
the library version via Version.VERSION. Add the two new Version.java
files that hold the library version constant. Update native image
reflect-config.json for both the admin and core stub packages.
…gRPC transport coverage

Replace the old MetricsRecorder / OpenTelemetryMetricsRecorder /
NoOpMetricsRecorder types with the new DatastoreMetricsRecorder family,
which extends GAX's MetricsRecorder interface for a unified recording
contract.

Key changes:
- Delete MetricsRecorder.java, OpenTelemetryMetricsRecorder.java,
  NoOpMetricsRecorder.java and their tests
- Add DatastoreMetricsRecorder interface (with simple getInstance() that
  returns an OTel recorder when metrics are enabled, NoOp otherwise)
- Add NoOpDatastoreMetricsRecorder, OpenTelemetryDatastoreMetricsRecorder,
  and DatastoreMetricsRecorderTest
- Remove the !GRPC transport guard from TelemetryUtils.recordOperationMetrics()
  and attemptMetricsCallable() so all transports record metrics uniformly
- Remove the isHttpTransport field from RetryAndTraceDatastoreRpcDecorator
  and DatastoreImpl; remove buildMetricsTracerFactory() from GrpcDatastoreRpc
- Update TelemetryConstants with the new METRIC_PREFIX, DATASTORE_METER_NAME,
  and typed AttributeKey constants needed by the new recorder classes
- Update DatastoreOptions to pass the full DatastoreOptions to getInstance()
  so the recorder factory can inspect credentials and project at creation time
Add a default-on, private OpenTelemetry SDK pipeline that automatically
exports Datastore client-side metrics to Google Cloud Monitoring without
requiring any user configuration.

Key additions:
- DatastoreBuiltInMetricsView: registers OTel views for metric renaming
  (GAX prefix → custom.googleapis.com/internal/client), custom histogram
  bucket boundaries, and an attribute allowlist that prevents unexpected
  labels from reaching Cloud Monitoring
- DatastoreCloudMonitoringExporter: MetricExporter impl that batches
  TimeSeries and calls MetricServiceClient.createTimeSeries(); graceful
  degradation (logs warnings, never crashes); best-effort flush()
- DatastoreCloudMonitoringExporterUtils: converts OTel MetricData to
  Cloud Monitoring TimeSeries protos
- BuiltInDatastoreMetricsProvider: creates a private OpenTelemetrySdk
  per Datastore client; computes stable client_uid and client_name
  attributes; registers a shutdown hook as a last-resort safety net
- CompositeDatastoreMetricsRecorder: fans out all record*() calls to
  both the built-in and any user-provided backends simultaneously
- DatastoreMetricsRecorder.getInstance(): updated to composite factory
  that wires up the built-in path (unless emulator or env-var disabled)
  alongside any custom OTel backend; adds default close() for lifecycle
- OpenTelemetryDatastoreMetricsRecorder: add ownsOpenTelemetry flag and
  close() so the built-in SDK is flushed when DatastoreImpl.close() runs
- DatastoreOpenTelemetryOptions: add exportBuiltinMetricsToGoogleCloudMonitoring
  flag (default true) with @BetaApi getter and setter; clarify isEnabled()
  Javadoc to note it does not cover the built-in path
- DatastoreImpl.close(): call getMetricsRecorder().close() after RPC
  channel teardown so the built-in SDK flushes buffered metrics
- pom.xml: promote opentelemetry-sdk, opentelemetry-sdk-common, and
  opentelemetry-sdk-metrics from test→compile scope; add
  google-cloud-monitoring and proto-google-cloud-monitoring-v3 compile deps
- New tests: DatastoreCloudMonitoringExporterTest, BuiltInDatastoreMetricsProviderTest,
  DatastoreMetricsRecorderTest (updated), DatastoreBuiltInAndCustomMetricsIT
Add DatastoreMetricsSample demonstrating default built-in metrics with
no configuration required. The sample runs a transaction flow (put,
read-modify-write, delete) that exercises transaction_latency and
transaction_attempt_count metrics, then prints instructions for
verifying the data in Cloud Monitoring Metrics Explorer.

Includes DatastoreMetricsSampleIT which runs against a real GCP project
(GOOGLE_CLOUD_PROJECT env var required) and asserts the expected console
output from the sample.
Add DatastoreMetricsSample demonstrating default built-in metrics with
no configuration required. The sample runs a transaction flow (put,
read-modify-write, delete) that exercises transaction_latency and
transaction_attempt_count metrics, then prints instructions for
verifying the data in Cloud Monitoring Metrics Explorer.

Includes DatastoreMetricsSampleIT which runs against a real GCP project
(GOOGLE_CLOUD_PROJECT env var required) and asserts the expected console
output from the sample.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces built-in metrics support for the Datastore SDK, allowing client-side metrics to be exported to Google Cloud Monitoring. It adds a new BuiltInDatastoreMetricsProvider, a DatastoreCloudMonitoringExporter, and updates DatastoreOpenTelemetryOptions to manage this feature. My review identified several critical issues: creating a new MetricServiceClient per client instance is inefficient, registering individual shutdown hooks for each client causes a memory leak, the Javadoc for the default metrics export state is contradictory, and the CompositeDatastoreMetricsRecorder.close() method lacks proper exception handling and LIFO ordering for resource cleanup.

@lqiu96 lqiu96 requested a review from jinseopkim0 April 10, 2026 21:18
@lqiu96 lqiu96 marked this pull request as ready for review April 10, 2026 22:01
@lqiu96 lqiu96 requested a review from a team as a code owner April 10, 2026 22:01
Comment thread java-datastore/samples/snippets/pom.xml Outdated
return new NoOpDatastoreMetricsRecorder();
}
// CompositeMetricsRecorder is not needed for one recorder
if (recorders.size() == 1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to use CompositeMetricsRecorder even if there is only 1 recorder.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no real difference. I just opted to return the specific type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no difference, I think it's better to always use CompositeDatastoreMetricsRecorder for consistency.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer using the specific type for clarity, but a benefit of being able to always return CompositeDatastoreMetricsRecorder is that I can get rid of the NoOpDatastoreMetricsRecorder class.

I refactored this so that it always returns a CompositeDatastoreMetricsRecorder

*/
@Override
public void close() {
if (isBuiltIn && openTelemetry instanceof OpenTelemetrySdk) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we manage the lifecycle of the builtin OpenTelemetrySdk instance outside of the recorder? The creator of the OpenTelemetrySdk instance should be responsible for it.

Copy link
Copy Markdown
Member Author

@lqiu96 lqiu96 Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be closed only when the user calls Datastore.close(). Since we don't expose the built in otel object, users won't be able to access it otherwise.

I believe the Datastore client should be able to use this Otel instance for any other reason until the user initiates the shutdown.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant outside the recorder but still managed by us. For example, we can initialize OpenTelemetry object in DatastoreImpl and close it in the same class. This also creates clear separation of concerns that recorder is only responsible for recording and always takes OpenTelemetry as an argument. Instead of responsible for both recording and managing OpenTelemetry.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean. I don't think it's impossible to do, but I believe I would just need to open up a few internal classes that I would prefer to keep package-private (BuiltInDatastoreMetricsProvider + DatastoreCloudMonitoringExporter + DatastoreBuiltInMetricsView).

I think this would be easier if we have a stable, shared Gax utility for the creating all the necessary exporter + views since that would need to be public.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, why we need to open up a few internal classes? I think we may need to open one public method that takes OpenTelemetry in DatastoreMetricsRecorder, but not sure we have to open more.

In addition, I see that DatastoreMetricsRecorder is being cached in DatastoreOptions which is not a good pattern IMHO. The option/setting classes are supposed to hold static configuration values, creating a metric recorder based on configurations should be the responsibility of the caller class (in this case DatastoreImpl).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to open one public method that takes OpenTelemetry in DatastoreMetricsRecorder

Ah, I misinterpreted your earlier message. I thought by initialize OpenTelemetry object in DatastoreImpl, you meant to move the the initialization logic over to DatastoreImpl. I can split the creation of OtelSdk to be managed by DatastoreImpl.

In addition, I see that DatastoreMetricsRecorder is being cached in DatastoreOptions which is not a good pattern IMHO. The option/setting classes are supposed to hold static configuration values, creating a metric recorder based on configurations should be the responsibility of the caller class (in this case DatastoreImpl).

I'm not sure what you mean by cached in DatastoreOptions. Settings is a good point, I'll move that out to DatastoreImpl since that is a better spot to hold the recorder.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. The built-in OtelSdk should be managed by DatastoreImpl and the recorder should be initialized there as well

@lqiu96 lqiu96 requested a review from blakeli0 April 27, 2026 17:33
*/
@Override
public void close() {
if (isBuiltIn && openTelemetry instanceof OpenTelemetrySdk) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant outside the recorder but still managed by us. For example, we can initialize OpenTelemetry object in DatastoreImpl and close it in the same class. This also creates clear separation of concerns that recorder is only responsible for recording and always takes OpenTelemetry as an argument. Instead of responsible for both recording and managing OpenTelemetry.

return new NoOpDatastoreMetricsRecorder();
}
// CompositeMetricsRecorder is not needed for one recorder
if (recorders.size() == 1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no difference, I think it's better to always use CompositeDatastoreMetricsRecorder for consistency.

private final OpenTelemetry openTelemetry;
// True when this recorder created the OpenTelemetry instance (built-in path) and therefore
// owns its lifecycle. False when the instance was provided by the user.
private final boolean isBuiltIn;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manage the lifecycle of OpenTelemetry outside of the recorder, I don't think we need this flag anymore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. Right now OpenTelemetryDatastoreMetricsRecorder is shared by both the builtin and user settings.

Current tradeoff is that I can keep the builtin classes internal (package-private) and use a flag to manage the lifecycle. The separation could be cleaner, but I think it should be OK for this case. WDYT?

@lqiu96 lqiu96 requested a review from blakeli0 April 27, 2026 22:27
logger.log(
Level.WARNING,
"Built-in metrics exporting is disabled when using NoCredentials (emulator).");
return null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider returning a OpenTelemetry.noop() instead when returning null.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Updated


/** Shuts down the exporter and the underlying {@link MetricServiceClient}. */
@Override
public CompletableResultCode shutdown() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this shutdown method being called?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, called when the OpenTelemetry is closed. When testing this locally, it seems to be invoked via a shutdown hook from the Otel instance. Otel -> MeterProvider -> MetricReader -> Exporter

Copy link
Copy Markdown
Contributor

@blakeli0 blakeli0 Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so it will be called from DatastoreImple#close which should close the monitoring client as well.

In general, I think it is a nice idea to share monitoring clients between datastore clients, but this is the first time we introduce shared states between clients, which may have other unexpected behaviors. It is possible to share ChannelProviders and CredentialProviders as well, but we don't manage the lifecycle of channels and credentials in those cases.

Since this is behind a client setting, I think it's fine for now but let's keep a close eye on it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share monitoring clients between datastore clients, but this is the first time we introduce shared states between clients

Sounds good. I was aiming to try and reduce the memory a bit as it would end up creating 1 Otel instance + 1 Monitoring client per Datastore client. Not ideal as Spanner and BigTable were able to reduce memory by having a single Otel instance shared across clients. I made the tradeoff of isolated Otel instances to reduce filter complexity for metrics.

If this does introduce problems, I can try to isolate and create separate clients.

Copy link
Copy Markdown
Contributor

@blakeli0 blakeli0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the title to feat and add descriptions to the PR body before merging.

@lqiu96 lqiu96 changed the title chore: Implement Default Client Side Metrics Otel feat(Datastore): Introduce Client Side Metrics Apr 29, 2026
@lqiu96 lqiu96 enabled auto-merge (squash) April 29, 2026 22:36
@lqiu96 lqiu96 merged commit 552a34d into main Apr 29, 2026
128 of 130 checks passed
@lqiu96 lqiu96 deleted the datastore-csm-impl-2 branch April 29, 2026 22:56
jinseopkim0 added a commit that referenced this pull request May 5, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>1.86.0</summary>

##
[1.86.0](v1.85.0...v1.86.0)
(2026-05-05)


### Features

* [aiplatform] Add asyncQueryReasoningEngine to aiplatform v1 API
([a1ab487](a1ab487))
* [aiplatform] Add asyncQueryReasoningEngine to aiplatform v1beta1
([a1ab487](a1ab487))
* [aiplatform] add OnlineEvaluator API and update Evaluation API
([a1ab487](a1ab487))
* [aiplatform] Model Registry CopyModel BYOSA
([531942b](531942b))
* [aiplatform] new field CopyModelRequest.custome_service_account
([531942b](531942b))
* [aiplatform] Support VeoLoraTuningSpec in the tuning jobs
([a1ab487](a1ab487))
* [analytics-admin] add UserProvidedDataSettings resource and
([a1ab487](a1ab487))
* [bigqueryreservation] add principal field to BigQuery Reservation
([531942b](531942b))
* [ces] Add ability to specify mocked tool responses in ExecuteTool
([a1ab487](a1ab487))
* [chat] Addition of ChatService.FindGroupChats
([a1ab487](a1ab487))
* [compute] Update Compute Engine v1 API to revision 20260331
([d3b76d9](d3b76d9))
* [health] new module for health
([#12993](#12993))
([d568a8c](d568a8c))
* [iam] new iam v3beta client for AccessPolicies, this is step 4&5
([a1ab487](a1ab487))
* [mapmanagement] new module for mapmanagement
([#12874](#12874))
([0720279](0720279))
* [modelarmor] add streaming methods StreamSanitizeUserPrompt and
([a1ab487](a1ab487))
* [netapp] add ScaleType for Storage Pools and LargeCapacityConfig
([a1ab487](a1ab487))
* [redis-cluster] [Memorystore for Redis Cluster] Updating new node
([d3b76d9](d3b76d9))
* [redis-cluster][Memorystore for Redis Cluster] Updating new node
([d3b76d9](d3b76d9))
* [shopping-merchant-products] a new optional field `video_links` is
([a1ab487](a1ab487))
* [shopping-merchant-reports] add `store_type` to
([d3b76d9](d3b76d9))
* [valkey] [Memorystore for Valkey] Updating new node types added
([d3b76d9](d3b76d9))
* [valkey] [Memorystore for Valkey] Updating new node types added
([d3b76d9](d3b76d9))
* [vectorsearch] Added CMEK support
([531942b](531942b))
* **bqjdbc:** Bypass dry-run job for read-only tokens.
([#12961](#12961))
([cd57169](cd57169))
* **bqjdbc:** Integrate the PerConnectionFileHandler with
BigQueryJdbcRootLogger
([#12933](#12933))
([2e56184](2e56184))
* **bqjdbc:** Per connection logs - Add BigQueryJdbcMdc
([#12833](#12833))
([f562667](f562667))
* **bqjdbc:** Per connection logs - Add PerConnectionFileHandler
([#12899](#12899))
([5846197](5846197))
* **datastore:** Bump to v3 major version
([#12989](#12989))
([df20994](df20994))
* **datastore:** Enable Otel metrics for custom Otel
([#12969](#12969))
([1721e7c](1721e7c))
* **Datastore:** Introduce Client Side Metrics
([#12718](#12718))
([552a34d](552a34d))
* **datastore:** Remove deprecated classes and methods and bump
ObsoleteApi to Deprecated
([#12971](#12971))
([0bca75c](0bca75c))
* **Datastore:** Swap the default transport from HttpJson to gRPC
([#12977](#12977))
([24fb234](24fb234))
* **datastore:** Update default channel pool configs to handle initial
bursts and scalability
([#12883](#12883))
([26fe0f9](26fe0f9))
* **gax-grpc:** add configurable resize delta and warning for repeated
resizing
([#12838](#12838))
([2caf026](2caf026))
* **spanner:** add connection properties for min/max RPCs for DCP
([#12951](#12951))
([dc1216e](dc1216e))
* **spanner:** add connection property for enabling/disabling grpc-gcp
([#12898](#12898))
([1e633d7](1e633d7))
* **spanner:** add shared endpoint cooldowns for location-aware
rerouting
([#12845](#12845))
([f5f273b](f5f273b))
* **spanner:** Cleanup GcpFallbackChannel creation and enable by default
([#12707](#12707))
([5251219](5251219))
* **storage:** Implement deleteSourceObjects for Compose Operation
([#12873](#12873))
([2cecd0c](2cecd0c))


### Bug Fixes

* **bq:** do not call 'getQueryResults' for stateful queries via 'query'
api
([#12999](#12999))
([44e9789](44e9789))
* **bqjdbc:** add default OAuth client id/secret
([#12946](#12946))
([9b5c4fa](9b5c4fa))
* **bqjdbc:** add Google Driver scope to all credential types
([#12847](#12847))
([5c890f8](5c890f8))
* **bqjdbc:** enhance logging with caller inference and explicit
exception tracking
([#12903](#12903))
([ce4969b](ce4969b))
* **bqjdbc:** Log exception messages - part 2
([#12907](#12907))
([5215b11](5215b11))
* **bqjdbc:** Log exception messages - part 3
([#12920](#12920))
([45b572f](45b572f))
* **bqjdbc:** metadata methods & getSqlTypeName for struct
([#12947](#12947))
([37555fb](37555fb))
* **bqjdbc:** optimize formatter in BigQueryJdbcRootLogger
([#12877](#12877))
([4233faf](4233faf))
* **bqjdbc:** Revert DatabaseMetaData field to be non-static in
BigQueryConnection
([#12778](#12778))
([ac69c8d](ac69c8d))
* **bqjdbc:** support Service Account Impersonation in ADC
([#12967](#12967))
([8ce183c](8ce183c))
* bump vectorsearch to 0.13.0 for partial release
([#12805](#12805))
([5208fc9](5208fc9))
* **ci:** update java-showcase path in excluded_modules
([#12984](#12984))
([3456ee9](3456ee9))
* correct build directory paths in graalvm cloudbuild.yaml
([#12794](#12794))
([8e6ba36](8e6ba36))
* **datastore:** Create a plaintext gRPC transport channel when using
the Emulator
([#12721](#12721))
([4bed8fd](4bed8fd))
* **datastore:** Update initial ChannelPool configs according to
Datastore best practice guide
([#12919](#12919))
([851fb89](851fb89))
* **deps:** update the Java code generator (gapic-generator-java) to
([531942b](531942b))
* do not generate Version.java files by default
([#12955](#12955))
([43a888a](43a888a))
* **gax:** record fractional latency metrics
([#12979](#12979))
([d690333](d690333))
* **gax:** Remove strict validation that resize delta must be less than
max channel count
([#12863](#12863))
([26b06f9](26b06f9))
* **generator:** use json_name when available
([#12940](#12940))
([622955b](622955b))
* manual preservation of Version.java files
([#12862](#12862))
([3d65c78](3d65c78))
* remove google/rpc exclusion to generate HttpResponse
([#12992](#12992))
([24b1719](24b1719))
* **spanner:** fix flakiness and race conditions in multiplexed session
tests
([#12949](#12949))
([3e02f18](3e02f18))
* **spanner:** fix flakiness in testCreateSessionDeadlineExceeded
([#12944](#12944))
([93f21ed](93f21ed))
* Update renovate config check template to use npx
([#12865](#12865))
([b974740](b974740))
* use org.junit.jupiter.api.Assertions.assertThrow
([#12882](#12882))
([bf243a6](bf243a6))


### Performance Improvements

* **spanner:** use StringBuilder for generating RequestId
([#12809](#12809))
([5c821a3](5c821a3))


### Dependencies

* upgrade opentelemetry
([#12953](#12953))
([522e05b](522e05b))


### Documentation

* [cloudsecuritycompliance] Updated docs for the APIs
([a1ab487](a1ab487))
* [kms] Update the comment for duration value
([a1ab487](a1ab487))
* [saasservicemgmt] rebrand from "SaaS Runtime" to "App Lifecycle
([a1ab487](a1ab487))
* Add a guide on configuring ChannelPools for gRPC
([#12905](#12905))
([ea081fe](ea081fe))
* update bazel targets in DEVELOPMENT.md to point to sdk-platform-java
([#12844](#12844))
([4568284](4568284))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Co-authored-by: Jin Seop Kim <jinseop@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants