Skip to content

Appsec user auth telemetry#5466

Draft
y9v wants to merge 7 commits intomasterfrom
appsec-user-auth-telemetry
Draft

Appsec user auth telemetry#5466
y9v wants to merge 7 commits intomasterfrom
appsec-user-auth-telemetry

Conversation

@y9v
Copy link
Copy Markdown
Member

@y9v y9v commented Mar 17, 2026

What does this PR do?

  • Adds distinct, namespaced events under identity.<framework>.<action> (e.g. identity.devise.login_success, identity.sdk.set_user).
  • Removes the appsec.events.user_lifecycle gateway event. Its business logic WAF check is now handled by a new watch_user_login_event watcher on the identity.* login events directly.
  • Adds AppSec::Monitor::Gateway::TelemetryWatcher that reports instrum.user_auth.missing_user_login and instrum.user_auth.missing_user_id telemetry metrics when user auth events have incomplete data.
  • Removes Datadog::AppSec::Instrumentation::Gateway::Argument::User and replaces it with a plain hash.

Motivation:
Simplify the AppSec gateway event architecture. We also want to know when we can't extract user login or id during SDK calls and Devise instrumentation, which the new TelemetryWatcher provides.

Change log entry
No. This is an internal change.

Additional Notes:
New event names:

  • identity.sdk.set_user — from Kit::Identity.set_user and V2 SDK
  • identity.sdk.login_success — from V2 SDK login success (business logic WAF)
  • identity.sdk.login_failure — from V2 SDK login failure
  • identity.devise.login_success / login_failure / signup / authenticated_request — from Devise auto-instrumentation

APPSEC-60123

How to test the change?
CI and manual testing.

@y9v y9v self-assigned this Mar 17, 2026
@y9v y9v requested review from a team as code owners March 17, 2026 16:27
@github-actions github-actions bot added integrations Involves tracing integrations appsec Application Security monitoring product labels Mar 17, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 17, 2026

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method, and clears 1 partially typed method. It increases the percentage of typed methods from 61.55% to 61.58% (+0.03%).

Partially typed methods (+1-1)Introduced:
sig/datadog/appsec/instrumentation/gateway/argument.rbs:18
└── def initialize: (untyped data, context: Context) -> void
Cleared:
sig/datadog/appsec/instrumentation/gateway/argument.rbs:35
└── def initialize: (untyped data, context: Context) -> void

Untyped other declarations

This PR introduces 2 untyped other declarations, and clears 2 untyped other declarations. It decreases the percentage of typed other declarations from 77.64% to 77.56% (-0.08%).

Untyped other declarations (+2-2)Introduced:
sig/datadog/appsec/instrumentation/gateway/argument.rbs:10
└── @data: untyped
sig/datadog/appsec/instrumentation/gateway/argument.rbs:14
└── attr_reader data: untyped
Cleared:
sig/datadog/appsec/instrumentation/gateway/argument.rbs:27
└── @data: untyped
sig/datadog/appsec/instrumentation/gateway/argument.rbs:31
└── attr_reader data: untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

@datadog-prod-us1-5
Copy link
Copy Markdown

datadog-prod-us1-5 bot commented Mar 17, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 1 Test failed

tests.debugger.test_debugger_probe_snapshot.Test_Debugger_Line_Probe_Snaphots_With_SCM.test_log_line_snapshot[rails80] from system_tests_suite (Datadog) (Fix with Cursor)
AssertionError: assert 'Snapshot was not received' is None
 +  where 'Snapshot was not received' = <built-in method join of str object at 0x7f116da2b720>(['Snapshot was not received'])
 +    where <built-in method join of str object at 0x7f116da2b720> = '\n'.join
 +    and   ['Snapshot was not received'] = <tests.debugger.test_debugger_probe_snapshot.Test_Debugger_Line_Probe_Snaphots_With_SCM object at 0x7f11579674a0>.setup_failures

self = <tests.debugger.test_debugger_probe_snapshot.Test_Debugger_Line_Probe_Snaphots_With_SCM object at 0x7f11579674a0>

    def test_log_line_snapshot(self):
>       self._assert()

...

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7248786 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@y9v y9v force-pushed the appsec-user-auth-telemetry branch from 1880047 to a1f38df Compare March 19, 2026 10:35
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 19, 2026

Benchmarks

Benchmark execution time: 2026-03-20 15:59:12

Comparing candidate commit 7248786 in PR branch appsec-user-auth-telemetry with baseline commit fa028c8 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@@ -63,7 +63,7 @@ def record_successful_signin(context, resource)
# and because of that we will trigger an additional event even
# if it was already done via the SDK
AppSec::Instrumentation.gateway.push(
'identity.set_user', AppSec::Instrumentation::Gateway::User.new(id, login)
'identity.set_user', {id: id, login: login, framework: 'devise', event_type: 'login_success'}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think event_type is a name for the event we are pushing, like identity.login_failure we are doing next.

I find it cool to scope them with identity.<event> pattern, but passing event name in event_type field when we have an event name as a first argument is not a good choice.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I changed the events we are sending - you can see the full list in the PR description

Comment on lines +39 to +52
def report_missing_user_telemetry(user_info, event_type)
tags = {framework: user_info[:framework], event_type: event_type}

missing_login = user_info[:login].nil?
missing_id = user_info[:id].nil?

if missing_login && event_type != 'authenticated_request'
AppSec.telemetry.inc(
Ext::TELEMETRY_METRICS_NAMESPACE,
'instrum.user_auth.missing_user_login',
1,
tags: tags,
)
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot to do for report method, feels like the whole logic of what to report when got offloaded to this method instead of a watcher. I think this is a wrong responsibility split.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now with separate events it is simpler - it only checks for id and login presence

allow(Datadog::AppSec::Instrumentation).to receive(:gateway).and_return(gateway)
Datadog::AppSec::Monitor::Gateway::TelemetryWatcher.watch

allow(Datadog::AppSec.telemetry).to receive(:inc).and_call_original
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call original?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, removed

@y9v y9v force-pushed the appsec-user-auth-telemetry branch from dc19805 to 297f77a Compare March 19, 2026 18:47
Comment on lines +15 to +30
IDENTITY_EVENTS = %w[
identity.sdk.set_user
identity.sdk.login_success
identity.sdk.login_failure
identity.devise.login_success
identity.devise.login_failure
identity.devise.signup
identity.devise.authenticated_request
].freeze

LOGIN_EVENTS = {
'identity.sdk.login_success' => 'server.business_logic.users.login.success',
'identity.sdk.login_failure' => 'server.business_logic.users.login.failure',
'identity.devise.login_success' => 'server.business_logic.users.login.success',
'identity.devise.login_failure' => 'server.business_logic.users.login.failure',
}.freeze
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duplication is an intermediate step, before we add pattern-based subscription (e.g. identity.*.login_success, or identity.*.*)

@y9v y9v requested a review from Strech March 20, 2026 15:30
@y9v y9v marked this pull request as draft March 23, 2026 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

appsec Application Security monitoring product integrations Involves tracing integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants