feat(Datastore): Introduce Client Side Metrics#12718
Conversation
Update DatastoreAdminStubSettings and DatastoreStubSettings to include the library version via Version.VERSION. Add the two new Version.java files that hold the library version constant. Update native image reflect-config.json for both the admin and core stub packages.
…gRPC transport coverage Replace the old MetricsRecorder / OpenTelemetryMetricsRecorder / NoOpMetricsRecorder types with the new DatastoreMetricsRecorder family, which extends GAX's MetricsRecorder interface for a unified recording contract. Key changes: - Delete MetricsRecorder.java, OpenTelemetryMetricsRecorder.java, NoOpMetricsRecorder.java and their tests - Add DatastoreMetricsRecorder interface (with simple getInstance() that returns an OTel recorder when metrics are enabled, NoOp otherwise) - Add NoOpDatastoreMetricsRecorder, OpenTelemetryDatastoreMetricsRecorder, and DatastoreMetricsRecorderTest - Remove the !GRPC transport guard from TelemetryUtils.recordOperationMetrics() and attemptMetricsCallable() so all transports record metrics uniformly - Remove the isHttpTransport field from RetryAndTraceDatastoreRpcDecorator and DatastoreImpl; remove buildMetricsTracerFactory() from GrpcDatastoreRpc - Update TelemetryConstants with the new METRIC_PREFIX, DATASTORE_METER_NAME, and typed AttributeKey constants needed by the new recorder classes - Update DatastoreOptions to pass the full DatastoreOptions to getInstance() so the recorder factory can inspect credentials and project at creation time
…atastore-csm-impl-1
Add a default-on, private OpenTelemetry SDK pipeline that automatically exports Datastore client-side metrics to Google Cloud Monitoring without requiring any user configuration. Key additions: - DatastoreBuiltInMetricsView: registers OTel views for metric renaming (GAX prefix → custom.googleapis.com/internal/client), custom histogram bucket boundaries, and an attribute allowlist that prevents unexpected labels from reaching Cloud Monitoring - DatastoreCloudMonitoringExporter: MetricExporter impl that batches TimeSeries and calls MetricServiceClient.createTimeSeries(); graceful degradation (logs warnings, never crashes); best-effort flush() - DatastoreCloudMonitoringExporterUtils: converts OTel MetricData to Cloud Monitoring TimeSeries protos - BuiltInDatastoreMetricsProvider: creates a private OpenTelemetrySdk per Datastore client; computes stable client_uid and client_name attributes; registers a shutdown hook as a last-resort safety net - CompositeDatastoreMetricsRecorder: fans out all record*() calls to both the built-in and any user-provided backends simultaneously - DatastoreMetricsRecorder.getInstance(): updated to composite factory that wires up the built-in path (unless emulator or env-var disabled) alongside any custom OTel backend; adds default close() for lifecycle - OpenTelemetryDatastoreMetricsRecorder: add ownsOpenTelemetry flag and close() so the built-in SDK is flushed when DatastoreImpl.close() runs - DatastoreOpenTelemetryOptions: add exportBuiltinMetricsToGoogleCloudMonitoring flag (default true) with @BetaApi getter and setter; clarify isEnabled() Javadoc to note it does not cover the built-in path - DatastoreImpl.close(): call getMetricsRecorder().close() after RPC channel teardown so the built-in SDK flushes buffered metrics - pom.xml: promote opentelemetry-sdk, opentelemetry-sdk-common, and opentelemetry-sdk-metrics from test→compile scope; add google-cloud-monitoring and proto-google-cloud-monitoring-v3 compile deps - New tests: DatastoreCloudMonitoringExporterTest, BuiltInDatastoreMetricsProviderTest, DatastoreMetricsRecorderTest (updated), DatastoreBuiltInAndCustomMetricsIT
…ne after impl-1 refactor
Add DatastoreMetricsSample demonstrating default built-in metrics with no configuration required. The sample runs a transaction flow (put, read-modify-write, delete) that exercises transaction_latency and transaction_attempt_count metrics, then prints instructions for verifying the data in Cloud Monitoring Metrics Explorer. Includes DatastoreMetricsSampleIT which runs against a real GCP project (GOOGLE_CLOUD_PROJECT env var required) and asserts the expected console output from the sample.
Add DatastoreMetricsSample demonstrating default built-in metrics with no configuration required. The sample runs a transaction flow (put, read-modify-write, delete) that exercises transaction_latency and transaction_attempt_count metrics, then prints instructions for verifying the data in Cloud Monitoring Metrics Explorer. Includes DatastoreMetricsSampleIT which runs against a real GCP project (GOOGLE_CLOUD_PROJECT env var required) and asserts the expected console output from the sample.
There was a problem hiding this comment.
Code Review
This pull request introduces built-in metrics support for the Datastore SDK, allowing client-side metrics to be exported to Google Cloud Monitoring. It adds a new BuiltInDatastoreMetricsProvider, a DatastoreCloudMonitoringExporter, and updates DatastoreOpenTelemetryOptions to manage this feature. My review identified several critical issues: creating a new MetricServiceClient per client instance is inefficient, registering individual shutdown hooks for each client causes a memory leak, the Javadoc for the default metrics export state is contradictory, and the CompositeDatastoreMetricsRecorder.close() method lacks proper exception handling and LIFO ordering for resource cleanup.
| return new NoOpDatastoreMetricsRecorder(); | ||
| } | ||
| // CompositeMetricsRecorder is not needed for one recorder | ||
| if (recorders.size() == 1) { |
There was a problem hiding this comment.
I think it's OK to use CompositeMetricsRecorder even if there is only 1 recorder.
There was a problem hiding this comment.
There should be no real difference. I just opted to return the specific type
There was a problem hiding this comment.
If there are no difference, I think it's better to always use CompositeDatastoreMetricsRecorder for consistency.
There was a problem hiding this comment.
I do prefer using the specific type for clarity, but a benefit of being able to always return CompositeDatastoreMetricsRecorder is that I can get rid of the NoOpDatastoreMetricsRecorder class.
I refactored this so that it always returns a CompositeDatastoreMetricsRecorder
| */ | ||
| @Override | ||
| public void close() { | ||
| if (isBuiltIn && openTelemetry instanceof OpenTelemetrySdk) { |
There was a problem hiding this comment.
Can we manage the lifecycle of the builtin OpenTelemetrySdk instance outside of the recorder? The creator of the OpenTelemetrySdk instance should be responsible for it.
There was a problem hiding this comment.
This should be closed only when the user calls Datastore.close(). Since we don't expose the built in otel object, users won't be able to access it otherwise.
I believe the Datastore client should be able to use this Otel instance for any other reason until the user initiates the shutdown.
There was a problem hiding this comment.
I meant outside the recorder but still managed by us. For example, we can initialize OpenTelemetry object in DatastoreImpl and close it in the same class. This also creates clear separation of concerns that recorder is only responsible for recording and always takes OpenTelemetry as an argument. Instead of responsible for both recording and managing OpenTelemetry.
There was a problem hiding this comment.
Oh I see what you mean. I don't think it's impossible to do, but I believe I would just need to open up a few internal classes that I would prefer to keep package-private (BuiltInDatastoreMetricsProvider + DatastoreCloudMonitoringExporter + DatastoreBuiltInMetricsView).
I think this would be easier if we have a stable, shared Gax utility for the creating all the necessary exporter + views since that would need to be public.
There was a problem hiding this comment.
I might be missing something, why we need to open up a few internal classes? I think we may need to open one public method that takes OpenTelemetry in DatastoreMetricsRecorder, but not sure we have to open more.
In addition, I see that DatastoreMetricsRecorder is being cached in DatastoreOptions which is not a good pattern IMHO. The option/setting classes are supposed to hold static configuration values, creating a metric recorder based on configurations should be the responsibility of the caller class (in this case DatastoreImpl).
There was a problem hiding this comment.
I think we may need to open one public method that takes OpenTelemetry in DatastoreMetricsRecorder
Ah, I misinterpreted your earlier message. I thought by initialize OpenTelemetry object in DatastoreImpl, you meant to move the the initialization logic over to DatastoreImpl. I can split the creation of OtelSdk to be managed by DatastoreImpl.
In addition, I see that DatastoreMetricsRecorder is being cached in DatastoreOptions which is not a good pattern IMHO. The option/setting classes are supposed to hold static configuration values, creating a metric recorder based on configurations should be the responsibility of the caller class (in this case DatastoreImpl).
I'm not sure what you mean by cached in DatastoreOptions. Settings is a good point, I'll move that out to DatastoreImpl since that is a better spot to hold the recorder.
There was a problem hiding this comment.
Updated. The built-in OtelSdk should be managed by DatastoreImpl and the recorder should be initialized there as well
| */ | ||
| @Override | ||
| public void close() { | ||
| if (isBuiltIn && openTelemetry instanceof OpenTelemetrySdk) { |
There was a problem hiding this comment.
I meant outside the recorder but still managed by us. For example, we can initialize OpenTelemetry object in DatastoreImpl and close it in the same class. This also creates clear separation of concerns that recorder is only responsible for recording and always takes OpenTelemetry as an argument. Instead of responsible for both recording and managing OpenTelemetry.
| return new NoOpDatastoreMetricsRecorder(); | ||
| } | ||
| // CompositeMetricsRecorder is not needed for one recorder | ||
| if (recorders.size() == 1) { |
There was a problem hiding this comment.
If there are no difference, I think it's better to always use CompositeDatastoreMetricsRecorder for consistency.
| private final OpenTelemetry openTelemetry; | ||
| // True when this recorder created the OpenTelemetry instance (built-in path) and therefore | ||
| // owns its lifecycle. False when the instance was provided by the user. | ||
| private final boolean isBuiltIn; |
There was a problem hiding this comment.
If we manage the lifecycle of OpenTelemetry outside of the recorder, I don't think we need this flag anymore?
There was a problem hiding this comment.
Yes, that's true. Right now OpenTelemetryDatastoreMetricsRecorder is shared by both the builtin and user settings.
Current tradeoff is that I can keep the builtin classes internal (package-private) and use a flag to manage the lifecycle. The separation could be cleaner, but I think it should be OK for this case. WDYT?
… signature, and format changes
| logger.log( | ||
| Level.WARNING, | ||
| "Built-in metrics exporting is disabled when using NoCredentials (emulator)."); | ||
| return null; |
There was a problem hiding this comment.
Consider returning a OpenTelemetry.noop() instead when returning null.
|
|
||
| /** Shuts down the exporter and the underlying {@link MetricServiceClient}. */ | ||
| @Override | ||
| public CompletableResultCode shutdown() { |
There was a problem hiding this comment.
Where is this shutdown method being called?
There was a problem hiding this comment.
IIUC, called when the OpenTelemetry is closed. When testing this locally, it seems to be invoked via a shutdown hook from the Otel instance. Otel -> MeterProvider -> MetricReader -> Exporter
There was a problem hiding this comment.
I see, so it will be called from DatastoreImple#close which should close the monitoring client as well.
In general, I think it is a nice idea to share monitoring clients between datastore clients, but this is the first time we introduce shared states between clients, which may have other unexpected behaviors. It is possible to share ChannelProviders and CredentialProviders as well, but we don't manage the lifecycle of channels and credentials in those cases.
Since this is behind a client setting, I think it's fine for now but let's keep a close eye on it.
There was a problem hiding this comment.
share monitoring clients between datastore clients, but this is the first time we introduce shared states between clients
Sounds good. I was aiming to try and reduce the memory a bit as it would end up creating 1 Otel instance + 1 Monitoring client per Datastore client. Not ideal as Spanner and BigTable were able to reduce memory by having a single Otel instance shared across clients. I made the tradeoff of isolated Otel instances to reduce filter complexity for metrics.
If this does introduce problems, I can try to isolate and create separate clients.
blakeli0
left a comment
There was a problem hiding this comment.
Please update the title to feat and add descriptions to the PR body before merging.
🤖 I have created a release *beep* *boop* --- <details><summary>1.86.0</summary> ## [1.86.0](v1.85.0...v1.86.0) (2026-05-05) ### Features * [aiplatform] Add asyncQueryReasoningEngine to aiplatform v1 API ([a1ab487](a1ab487)) * [aiplatform] Add asyncQueryReasoningEngine to aiplatform v1beta1 ([a1ab487](a1ab487)) * [aiplatform] add OnlineEvaluator API and update Evaluation API ([a1ab487](a1ab487)) * [aiplatform] Model Registry CopyModel BYOSA ([531942b](531942b)) * [aiplatform] new field CopyModelRequest.custome_service_account ([531942b](531942b)) * [aiplatform] Support VeoLoraTuningSpec in the tuning jobs ([a1ab487](a1ab487)) * [analytics-admin] add UserProvidedDataSettings resource and ([a1ab487](a1ab487)) * [bigqueryreservation] add principal field to BigQuery Reservation ([531942b](531942b)) * [ces] Add ability to specify mocked tool responses in ExecuteTool ([a1ab487](a1ab487)) * [chat] Addition of ChatService.FindGroupChats ([a1ab487](a1ab487)) * [compute] Update Compute Engine v1 API to revision 20260331 ([d3b76d9](d3b76d9)) * [health] new module for health ([#12993](#12993)) ([d568a8c](d568a8c)) * [iam] new iam v3beta client for AccessPolicies, this is step 4&5 ([a1ab487](a1ab487)) * [mapmanagement] new module for mapmanagement ([#12874](#12874)) ([0720279](0720279)) * [modelarmor] add streaming methods StreamSanitizeUserPrompt and ([a1ab487](a1ab487)) * [netapp] add ScaleType for Storage Pools and LargeCapacityConfig ([a1ab487](a1ab487)) * [redis-cluster] [Memorystore for Redis Cluster] Updating new node ([d3b76d9](d3b76d9)) * [redis-cluster][Memorystore for Redis Cluster] Updating new node ([d3b76d9](d3b76d9)) * [shopping-merchant-products] a new optional field `video_links` is ([a1ab487](a1ab487)) * [shopping-merchant-reports] add `store_type` to ([d3b76d9](d3b76d9)) * [valkey] [Memorystore for Valkey] Updating new node types added ([d3b76d9](d3b76d9)) * [valkey] [Memorystore for Valkey] Updating new node types added ([d3b76d9](d3b76d9)) * [vectorsearch] Added CMEK support ([531942b](531942b)) * **bqjdbc:** Bypass dry-run job for read-only tokens. ([#12961](#12961)) ([cd57169](cd57169)) * **bqjdbc:** Integrate the PerConnectionFileHandler with BigQueryJdbcRootLogger ([#12933](#12933)) ([2e56184](2e56184)) * **bqjdbc:** Per connection logs - Add BigQueryJdbcMdc ([#12833](#12833)) ([f562667](f562667)) * **bqjdbc:** Per connection logs - Add PerConnectionFileHandler ([#12899](#12899)) ([5846197](5846197)) * **datastore:** Bump to v3 major version ([#12989](#12989)) ([df20994](df20994)) * **datastore:** Enable Otel metrics for custom Otel ([#12969](#12969)) ([1721e7c](1721e7c)) * **Datastore:** Introduce Client Side Metrics ([#12718](#12718)) ([552a34d](552a34d)) * **datastore:** Remove deprecated classes and methods and bump ObsoleteApi to Deprecated ([#12971](#12971)) ([0bca75c](0bca75c)) * **Datastore:** Swap the default transport from HttpJson to gRPC ([#12977](#12977)) ([24fb234](24fb234)) * **datastore:** Update default channel pool configs to handle initial bursts and scalability ([#12883](#12883)) ([26fe0f9](26fe0f9)) * **gax-grpc:** add configurable resize delta and warning for repeated resizing ([#12838](#12838)) ([2caf026](2caf026)) * **spanner:** add connection properties for min/max RPCs for DCP ([#12951](#12951)) ([dc1216e](dc1216e)) * **spanner:** add connection property for enabling/disabling grpc-gcp ([#12898](#12898)) ([1e633d7](1e633d7)) * **spanner:** add shared endpoint cooldowns for location-aware rerouting ([#12845](#12845)) ([f5f273b](f5f273b)) * **spanner:** Cleanup GcpFallbackChannel creation and enable by default ([#12707](#12707)) ([5251219](5251219)) * **storage:** Implement deleteSourceObjects for Compose Operation ([#12873](#12873)) ([2cecd0c](2cecd0c)) ### Bug Fixes * **bq:** do not call 'getQueryResults' for stateful queries via 'query' api ([#12999](#12999)) ([44e9789](44e9789)) * **bqjdbc:** add default OAuth client id/secret ([#12946](#12946)) ([9b5c4fa](9b5c4fa)) * **bqjdbc:** add Google Driver scope to all credential types ([#12847](#12847)) ([5c890f8](5c890f8)) * **bqjdbc:** enhance logging with caller inference and explicit exception tracking ([#12903](#12903)) ([ce4969b](ce4969b)) * **bqjdbc:** Log exception messages - part 2 ([#12907](#12907)) ([5215b11](5215b11)) * **bqjdbc:** Log exception messages - part 3 ([#12920](#12920)) ([45b572f](45b572f)) * **bqjdbc:** metadata methods & getSqlTypeName for struct ([#12947](#12947)) ([37555fb](37555fb)) * **bqjdbc:** optimize formatter in BigQueryJdbcRootLogger ([#12877](#12877)) ([4233faf](4233faf)) * **bqjdbc:** Revert DatabaseMetaData field to be non-static in BigQueryConnection ([#12778](#12778)) ([ac69c8d](ac69c8d)) * **bqjdbc:** support Service Account Impersonation in ADC ([#12967](#12967)) ([8ce183c](8ce183c)) * bump vectorsearch to 0.13.0 for partial release ([#12805](#12805)) ([5208fc9](5208fc9)) * **ci:** update java-showcase path in excluded_modules ([#12984](#12984)) ([3456ee9](3456ee9)) * correct build directory paths in graalvm cloudbuild.yaml ([#12794](#12794)) ([8e6ba36](8e6ba36)) * **datastore:** Create a plaintext gRPC transport channel when using the Emulator ([#12721](#12721)) ([4bed8fd](4bed8fd)) * **datastore:** Update initial ChannelPool configs according to Datastore best practice guide ([#12919](#12919)) ([851fb89](851fb89)) * **deps:** update the Java code generator (gapic-generator-java) to ([531942b](531942b)) * do not generate Version.java files by default ([#12955](#12955)) ([43a888a](43a888a)) * **gax:** record fractional latency metrics ([#12979](#12979)) ([d690333](d690333)) * **gax:** Remove strict validation that resize delta must be less than max channel count ([#12863](#12863)) ([26b06f9](26b06f9)) * **generator:** use json_name when available ([#12940](#12940)) ([622955b](622955b)) * manual preservation of Version.java files ([#12862](#12862)) ([3d65c78](3d65c78)) * remove google/rpc exclusion to generate HttpResponse ([#12992](#12992)) ([24b1719](24b1719)) * **spanner:** fix flakiness and race conditions in multiplexed session tests ([#12949](#12949)) ([3e02f18](3e02f18)) * **spanner:** fix flakiness in testCreateSessionDeadlineExceeded ([#12944](#12944)) ([93f21ed](93f21ed)) * Update renovate config check template to use npx ([#12865](#12865)) ([b974740](b974740)) * use org.junit.jupiter.api.Assertions.assertThrow ([#12882](#12882)) ([bf243a6](bf243a6)) ### Performance Improvements * **spanner:** use StringBuilder for generating RequestId ([#12809](#12809)) ([5c821a3](5c821a3)) ### Dependencies * upgrade opentelemetry ([#12953](#12953)) ([522e05b](522e05b)) ### Documentation * [cloudsecuritycompliance] Updated docs for the APIs ([a1ab487](a1ab487)) * [kms] Update the comment for duration value ([a1ab487](a1ab487)) * [saasservicemgmt] rebrand from "SaaS Runtime" to "App Lifecycle ([a1ab487](a1ab487)) * Add a guide on configuring ChannelPools for gRPC ([#12905](#12905)) ([ea081fe](ea081fe)) * update bazel targets in DEVELOPMENT.md to point to sdk-platform-java ([#12844](#12844)) ([4568284](4568284)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> Co-authored-by: Jin Seop Kim <jinseop@google.com>
No description provided.