Skip to content

Adds .NET distributed tracing instrumentation & metrics.#1200

Open
fdcastel wants to merge 21 commits intoFirebirdSQL:masterfrom
fdcastel:add-telemetry-trace
Open

Adds .NET distributed tracing instrumentation & metrics.#1200
fdcastel wants to merge 21 commits intoFirebirdSQL:masterfrom
fdcastel:add-telemetry-trace

Conversation

@fdcastel
Copy link
Member

Initial work for #1196.

@fdcastel fdcastel changed the title Add telemetry trace Adds .NET distributed tracing instrumentation. Oct 25, 2024
@fdcastel fdcastel changed the title Adds .NET distributed tracing instrumentation. Adds .NET distributed tracing instrumentation & metrics. Oct 26, 2024
@fdcastel
Copy link
Member Author

Included the final portion of #1196 (metrics) here, as it requires the same infrastructure as traces.

@willibrandon
Copy link
Contributor

Here is before and after comparison of Perf benchmark results on my machine.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.26120.2130)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=9.0.100-rc.2.24474.11
  [Host]  : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  NuGet   : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  Project : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2

Jit=RyuJit  Platform=X64  Toolchain=.NET 6.0  
WarmupCount=3  

Perf.CommandBenchmark

FirebirdSQL:master (Old) <-> fdcastel:add-telemetry-trace (New)

Diff Method Mean Error Allocated
Old Execute 40,075.4 μs 1,450.87 μs 313.71 KB
New 34,322.0 μs (-14%) 658.70 μs 313.11 KB (0%)
Old Execute 34,465.9 μs 680.65 μs 311.3 KB
New 34,248.1 μs (-1%) 384.64 μs 315.91 KB (+1%)
Old Fetch 640.1 μs 11.02 μs 56.88 KB
New 648.4 μs (+1%) 12.19 μs 56.88 KB (0%)
Old Fetch 653.2 μs 10.90 μs 51.43 KB
New 653.1 μs (0%) 9.41 μs 56.06 KB (+9%)
Old Execute 33,690.3 μs 648.64 μs 319.86 KB
New 33,559.9 μs (0%) 664.54 μs 319.85 KB (0%)
Old Execute 32,943.5 μs 235.67 μs 317.28 KB
New 34,278.5 μs (+4%) 683.34 μs 322.01 KB (+1%)
Old Fetch 642.2 μs 3.47 μs 60.75 KB
New 647.3 μs (+1%) 12.89 μs 60.75 KB (0%)
Old Fetch 664.2 μs 7.49 μs 55.31 KB
New 658.7 μs (-1%) 8.82 μs 59.92 KB (+8%)

@fdcastel
Copy link
Member Author

fdcastel commented Oct 26, 2024

Thanks @willibrandon. This surely can be improved.

Could you kindly share the steps/configuration you used for the above benchmark results?

Mine here produces a comparison between Release (built from local sources) and ReleaseNuget (NuGet package for FirebirdClient 8.5.4).

@fdcastel fdcastel mentioned this pull request Oct 26, 2024
@fdcastel
Copy link
Member Author

@willibrandon I pushed a new PR in #1203 (it is based on this PR).

Could you please clone it and run ./run-benchmark.ps1 in your system?

The script will compare the current project (built from sources with .NET8) with the latest NuGet release (10.3.1).

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22621.4317/22H2/2022Update/SunValley2)
13th Gen Intel Core i7-13700T, 1 CPU, 24 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]  : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Core80  : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  NuGet80 : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2

Jit=RyuJit  Platform=X64  Toolchain=.NET 8.0
WarmupCount=3

| Method  | Job     | BuildConfiguration | DataType             | Count | Mean        | Error     | StdDev    | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|-------- |-------- |------------------- |--------------------- |------ |------------:|----------:|----------:|------:|--------:|-------:|----------:|------------:|
| Execute | Core80  | Release            | BIGINT               | 100   | 16,219.7 us | 140.65 us | 131.57 us |  1.00 |    0.01 |      - | 306.16 KB |        1.01 |
| Execute | NuGet80 | ReleaseNuGet       | BIGINT               | 100   | 16,283.9 us |  29.35 us |  26.01 us |  1.00 |    0.00 |      - | 301.65 KB |        1.00 |
|         |         |                    |                      |       |             |           |           |       |         |        |           |             |
| Fetch   | Core80  | Release            | BIGINT               | 100   |    372.1 us |   2.91 us |   2.43 us |  1.00 |    0.01 | 2.9297 |  55.66 KB |        1.09 |
| Fetch   | NuGet80 | ReleaseNuGet       | BIGINT               | 100   |    371.3 us |   2.47 us |   2.19 us |  1.00 |    0.01 | 2.9297 |  51.15 KB |        1.00 |
|         |         |                    |                      |       |             |           |           |       |         |        |           |             |
| Execute | Core80  | Release            | VARCH(...) UTF8 [30] | 100   | 18,021.3 us | 352.58 us | 482.62 us |  1.00 |    0.03 |      - |  310.1 KB |        1.01 |
| Execute | NuGet80 | ReleaseNuGet       | VARCH(...) UTF8 [30] | 100   | 18,071.4 us | 352.48 us | 377.14 us |  1.00 |    0.03 |      - | 305.64 KB |        1.00 |
|         |         |                    |                      |       |             |           |           |       |         |        |           |             |
| Fetch   | Core80  | Release            | VARCH(...) UTF8 [30] | 100   |    416.7 us |   6.50 us |   5.43 us |  1.01 |    0.03 | 2.9297 |  59.54 KB |        1.08 |
| Fetch   | NuGet80 | ReleaseNuGet       | VARCH(...) UTF8 [30] | 100   |    413.6 us |   8.09 us |  10.52 us |  1.00 |    0.03 | 2.9297 |  55.02 KB |        1.00 |

// * Hints *
Outliers
  CommandBenchmark.Execute: NuGet80 -> 1 outlier  was  removed (17.09 ms)
  CommandBenchmark.Fetch: Core80    -> 2 outliers were removed (383.60 us, 403.48 us)
  CommandBenchmark.Fetch: NuGet80   -> 1 outlier  was  removed (389.54 us)
  CommandBenchmark.Execute: Core80  -> 1 outlier  was  removed, 3 outliers were detected (16.46 ms, 16.87 ms, 19.28 ms)
  CommandBenchmark.Fetch: Core80    -> 2 outliers were removed (442.98 us, 447.90 us)

// * Legends *
  DataType    : Value of the 'DataType' parameter
  Count       : Value of the 'Count' parameter
  Mean        : Arithmetic mean of all measurements
  Error       : Half of 99.9% confidence interval
  StdDev      : Standard deviation of all measurements
  Ratio       : Mean of the ratio distribution ([Current]/[Baseline])
  RatioSD     : Standard deviation of the ratio distribution ([Current]/[Baseline])
  Gen0        : GC Generation 0 collects per 1000 operations
  Allocated   : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
  1 us        : 1 Microsecond (0.000001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
Run time: 00:01:41 (101.22 sec), executed benchmarks: 8

Global total time: 00:01:51 (111.73 sec), executed benchmarks: 8

@willibrandon
Copy link
Contributor

@fdcastel - Interesting, and thanks! Taking a look now.
I had the same thought the Perf project toolchain was out of date.

@willibrandon
Copy link
Contributor

Okay, I see how the performance project works now with the conditional project/package reference, and with the latest Nuget release being the baseline. That is way better approach than the way I was attempting to compare the the results historically.

@fdcastel - Thank you very much for showing me how the performance project works. I just want try and confirm we haven't regressed in someway with this change.

@fdcastel fdcastel force-pushed the add-telemetry-trace branch from 47a5b9b to fe2b942 Compare June 15, 2025 02:23
@fdcastel
Copy link
Member Author

Rebased with latest master.

@fdcastel
Copy link
Member Author

fdcastel commented Dec 10, 2025

@cincuranet Do you wish to move this work forward?

If yes, I could rebase it with the latest master. Along with #1203, which is based on this one

@cincuranet
Copy link
Member

@fdcastel Yes.

@fdcastel fdcastel force-pushed the add-telemetry-trace branch from fe2b942 to 7fe1327 Compare March 23, 2026 00:53
@fdcastel
Copy link
Member Author

Rebased onto latest master.

StartActivity() returns null when no listeners are interested in the
activity. The code accessed activity.IsAllDataRequested without a null
check, causing a NullReferenceException. Use the Npgsql-style pattern
'activity is not { IsAllDataRequested: true }' for early return.
Stopwatch ticks are not TimeSpan ticks when Stopwatch.IsHighResolution
is true (most systems). TimeSpan.FromTicks(stopwatchTicks) produces
incorrect durations. Use Stopwatch.GetElapsedTime() which correctly
handles the frequency conversion.
The semantic conventions renamed db.system to db.system.name. The
well-known value for Firebird is 'firebirdsql', not 'firebird'.
Updated in both FbActivitySource (spans) and FbConnection (metrics).
The semantic conventions mark db.namespace as Conditionally Required
when available. The connection's Database property is used as the value.
The semantic conventions mark error.type as Conditionally Required on
failure. For FbException, use the SQLSTATE code; for other exceptions,
use the full exception type name.
Per OTel semantic conventions, db.query.text and db.query.parameter.*
are Opt-In level attributes that may expose sensitive data. They should
not be collected by default. Added IsQueryTextTracingEnabled flag and
reuse IsParameterLoggingEnabled for parameter tracing.
Replace hardcoded '1.0.0' version with the assembly's informational
version, keeping the telemetry version in sync with the NuGet package.
…ecording

TraceCommandException now records metrics immediately and resets
_startedAtTicks so the later TraceCommandStop (called during Release)
does not record a second, inflated duration that includes cleanup time.
- Remove db.transaction_id custom tag (not in OTel spec, causes server
  round-trip per traced command)
- Remove commented-out db.snapshot_id code
- Remove deprecated exception.escaped attribute
- Remove NormalizeDbNull helper, inline DBNull check
- Set server.port on spans when using a non-default port
- Set db.stored_procedure.name for StoredProcedure command types
- Fix server.address in MetricsConnectionAttributes to not concatenate
  the port (server.port is now a separate attribute)
- Clean up leftover placeholder comments
Set db.query.summary as the low-cardinality activity name on all spans.
For Text commands, extract the first SQL verb (SELECT, INSERT, etc.)
and use it as db.operation.name and in the activity name.
Debug.Assert checks that TraceCommandStart was called before the stop
or exception path, helping catch lifecycle bugs during development.
Prevents potential NullReferenceException if metrics are recorded
before the connection string is set.
Change GetMetrics() to return IEnumerable of tuples via Select instead
of allocating a Dictionary via ToDictionary on every collection cycle.
Also simplify GetConnectionMax to use Select instead of SelectMany.
Suggest default histogram bucket boundaries for operation duration and
connection create time histograms, improving out-of-the-box resolution.
Guarded by NET9_0_OR_GREATER since InstrumentAdvice is a .NET 9 API.
Expose the telemetry source names as public constants so consumers can
easily subscribe: builder.AddSource(FbTelemetry.ActivitySourceName) and
builder.AddMeter(FbTelemetry.MeterName). Internal code updated to use
the constants.
Document the ActivitySource and Meter names, available span attributes,
opt-in sensitive attributes, and available metrics with their types and
semantic convention references.
@fdcastel
Copy link
Member Author

New fixes / improvements:

Critical fixes:

  • Fixed NullReferenceException when StartActivity() returns null (no listeners)
  • Fixed Stopwatch elapsed time conversion — TimeSpan.FromTicks() was producing wrong durations on high-resolution timers; now uses Stopwatch.GetElapsedTime()

Semantic convention compliance:

  • Renamed db.systemdb.system.name, value firebirdfirebirdsql
  • Set db.namespace from the connection's database name
  • Set error.type attribute on failure spans (SQLSTATE or exception type)
  • Gated db.query.text and db.query.parameter.* behind opt-in flags (new FbLogManager.EnableQueryTextTracing())

Architecture improvements:

  • Read ActivitySource/Meter version from assembly metadata instead of hardcoded "1.0.0"
  • Fixed activity lifecycle — metrics now recorded on error path, preventing stale duplicate recording during cleanup
  • Removed db.transaction_id (non-standard, causes round-trip), db.snapshot_id dead code, deprecated exception.escaped, and NormalizeDbNull

New features:

  • Added server.port, db.stored_procedure.name; fixed server.address in metrics to not concatenate port
  • Added db.query.summary and db.operation.name for Text commands via SQL verb extraction
  • Added InstrumentAdvice histogram bucket boundaries on .NET 9+
  • Created public FbTelemetry class exposing ActivitySourceName and MeterName constants
  • Debug assertions, null-safe MetricsConnectionAttributes, and allocation-free metric callbacks

Documentation:

  • Added otel.md covering traces, metrics, opt-in attributes, and code samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants