Skip to content

[test] Add axe-core tests for mui-material#48341

Merged
siriwatknp merged 54 commits into
mui:masterfrom
siriwatknp:test/a11y-demos
May 11, 2026
Merged

[test] Add axe-core tests for mui-material#48341
siriwatknp merged 54 commits into
mui:masterfrom
siriwatknp:test/a11y-demos

Conversation

@siriwatknp

@siriwatknp siriwatknp commented Apr 21, 2026

Copy link
Copy Markdown
Member

closes #47887 · supersedes #47895

Summary

Automated axe-core accessibility testing for @mui/material, driven off the real docs demos. Folded into the existing visual-regression Playwright loop (test/regressions/index.test.js) — for each enrolled demo, axe runs against the rendered [data-testid="testcase"] element in the same browser session that takes the screenshot, and the result is written to docs/data/material/components/{slug}/{slug}.a11y.json (one file per slug, keyed by demo name).

Initial scope is buttons only — BasicButtons and ColorButtons — to validate the pipeline end-to-end before broader enrolment in follow-ups. Layout wrappers (Box, Grid, Stack, Paper, Container, Portal, …) will be excluded on principle when added: they contribute no a11y signal of their own; any violation found on them really belongs to their children, already covered by the child's own enrolment.

Config shape

Two independent rule arrays — one per tool — evaluated last-match-wins (no inheritance: an override rule must restate every field it cares about) against the docs path docs/data/material/components/{slug}/{Demo} (minimatch globs). Independent arrays mean editing one tool's config can't stomp the other.

// test/regressions/demoMeta.ts
export const A11Y_RULES: A11yRule[] = [
  { test: 'docs/data/material/components/buttons/{BasicButtons,ColorButtons}', enabled: true },
];

pnpm test:regressions refreshes the *.a11y.json files. CI enforces they're up to date via a git-diff check.

For Reviewers

Why demo-based, not scenario-based

#47895 introduced scenario files (test/a11y/scenarios/Button.a11y.tsx) that hand-authored every axe case per component. Per Olivier's feedback, docs demos already cover every real usage scenario — duplicating them as test fixtures is waste. This PR replaces the scenarios folder with a thin rule list pointed at the demo routes VRT already serves.

Why folded into the VRT loop

@Janpot's earlier concern about duplicate Playwright sessions: rather than a parallel browser, axe now runs inside the existing screenshot loop. Each enrolled demo gets one axe.run against the already-rendered [data-testid="testcase"] element before the screenshot is taken. Failure isolation is preserved by recording the result on ctx.task.meta.a11y first; the assertion fires only on visual rules (color-contrast, link-in-text-block) that depend on real CSS.

Screenshot and a11y opt-outs are independent — a demo screenshot-disabled for being redundant or interaction-heavy can still be audited by axe.

Config follow-up from review

@Janpot's review comment flagged the original dual-Map structure (SLUG_A11Y + DEMO_META) as non-intuitive and at risk of cross-tool stomping. Replaced with two rule arrays + minimatch globs as suggested; per-demo opt-outs are appended after slug-wide rules; the override rule restates every field it wants (no implicit inheritance).

Output shape

One file per enrolled slug at docs/data/material/components/{slug}/{slug}.a11y.json, co-located with the slug's other docs files. Each file is an object keyed by demo name:

{
  "BasicButtons": { "passedRules": [...], "failedRules": [], "testedRules": { "wcag2a": [...], "wcag2aa": [...], "wcag22aa": [...] } },
  "ColorButtons": { ... }
}

Co-locating with {slug}.md and the demo source files makes the data discoverable from the slug folder and avoids the slug-prefix collision problem of a flat per-demo layout.

Code

CI

The existing test:regressions job already builds, serves, and runs Playwright. Adding axe is a no-op on infra. The git-diff guard ensures every *.a11y.json is up-to-date.

Agent docs

  • AGENTS.md — "Accessibility Testing" section covers the rule-array shape, brace-glob narrowing, per-demo opt-outs, and the skipAssertions escape hatch.

Not in scope (follow-ups)

  • Docs presentation (separate PR). A demo-toolbar pill reads each slug's *.a11y.json via the docs render pipeline and shows pass/fail counts plus a per-WCAG-tag rule breakdown on click.
  • Broader enrolment. Buttons is the validation slug; other components onboard via slug-wide rules in A11Y_RULES.
  • Rule graduation (global severity levels). Today every non-visual violation is recorded but not asserted; every visual violation is asserted unless skipAssertions lists it. Fine for a POC; may want finer control once more components enroll.

Runs axe-core against real docs demos in Chromium, reusing the VRT Vite
server. Enrollment is an explicit list in test/a11y/a11y.test.mjs
(POC: Button + Card, 4 demos).

Output a11y-results.json carries both per-component aggregates (for the
docs widget) and per-demo detail (for future per-demo UI).

Supersedes mui#47895 — swaps hand-written scenario modules for the docs
demos themselves, per Olivier's suggestion to mirror Janpot's
html-validate-over-docs-pages pattern in mui#48088.
@code-infra-dashboard

code-infra-dashboard Bot commented Apr 21, 2026

Copy link
Copy Markdown

Deploy preview

https://deploy-preview-48341--material-ui.netlify.app/

Bundle size

Bundle Parsed size Gzip size
@mui/material 0B(0.00%) 0B(0.00%)
@mui/lab 0B(0.00%) 0B(0.00%)
@mui/private-theming 0B(0.00%) 0B(0.00%)
@mui/system 0B(0.00%) 0B(0.00%)
@mui/utils 0B(0.00%) 0B(0.00%)

Details of bundle changes


Check out the code infra dashboard for more information about this PR.

Replaces the hardcoded ENROLLED map in a11y.test.mjs with a full component
roster at packages/mui-material/test/a11y/config.ts. Every component has
an entry; status is 'enabled' (Button, Card) or 'pending' (rest). Adoption
per PR = flip 'pending' to 'enabled' and fill in `demos`.

The config doubles as the rollout ledger — a future docs presentation can
iterate the full array to show both enrolled and pending components.

Drops ComponentAccessibilityStatus.js and the two .md embeds. Presentation
is a separate concern; the JSON shape stays (per-component aggregates +
per-demo detail) so the future widget has the data it needs.
Replaces the single a11y-results.json (383 KB / 14k lines at full rollout)
with packages/mui-material/test/a11y/results/{Component}.json. A future
docs widget can import('.../results/Button.json') and webpack/vite will
code-split per component, so each page only ships its own ~2 KB payload
instead of the whole roster.

Also updates the docs:a11y script to rimraf the directory before each
run, and the AGENTS.md reference.
Flips 45 components from 'pending' to 'enabled' in config.ts. The 26 that
currently trip color-contrast keep a skipRules: ['color-contrast'] entry
so CI stays green while the violations still land in failedRules for the
team to triage.

a11y.test.mjs now falls back to the VRT nav when a config entry omits its
'demos' list, so enrolling is one line (flip the status). Explicit demos
remain the override knob.

The 11 components VRT excludes (need interaction, no demos, flaky) stay
'pending' with inline reasons — nothing observable today.
@Janpot

Janpot commented Apr 21, 2026

Copy link
Copy Markdown
Member

Rendering all the demos, one by one, in a real browser and run tests on them. Sounds an awful lot like the current screenshot tests we have. Would it make sense (be possible?) to fold accessibility testing into the regression tests. It would likely save a ton on circleci credits compared to running another full suite where we render all demos.

`skipAssertions` better describes the behavior — the rule still runs and
still lands in the results JSON, only the test-failing assertion is
suppressed. `skipRules` read as "rule is skipped entirely", which isn't
what happens.

Also drops Box from the roster — nothing to a11y-test.
Layout wrappers (Grid, Masonry, Portal, NoSsr, Stack, Paper) and pure
behavior components (Backdrop, ClickAwayListener, Container, CssBaseline)
have no a11y semantics of their own — axe either finds nothing or
attributes violations to them that actually belong to their inner
content (already covered by the child component's own entry).

Keeps Typography (heading hierarchy), Skeleton (aria-busy), Divider
(caught a real list violation) since those contribute real signal.

44 components in roster → 36 enabled, 8 pending (interaction-only).
The status gate was infrastructure for a presentation UI that isn't in
scope for this PR. The config now lists only components that are actually
enrolled; pending ones live as `// TODO: {Component} — <blocker>` comments
so the reason stays visible without needing a parseable data shape.

If a future docs widget ever wants a "coming soon" list, convert the
comments back to structured entries then.
@siriwatknp siriwatknp marked this pull request as draft April 21, 2026 06:20
@siriwatknp

siriwatknp commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

It would likely save a ton on circleci credits compared to running another full suite where we render all demos.

Infra is already shared — pnpm docs:a11y reuses test:regressions:build and runs the Vitest axe loop concurrently with test:regressions:server. The only new spend is one Chromium cold start + 350 axe runs ≈ 44s (~1/5 of one VRT pass).

Kept it as a separate Vitest file for a standalone dev loop, not because of the infra cost. Happy to fold axe into test/regressions/index.test.js if you'd rather reclaim the 44 s — would need to swap the Vitest reporter for inline aggregation + after(all) write, add a route filter (enrolled /docs-components-* only), and split each route into two its for failure isolation. It's doable.

Do you think saving around ~40s is a hard constraint? Happy to follow your judgement.

@Janpot

Janpot commented Apr 21, 2026

Copy link
Copy Markdown
Member

Do you think saving around ~40s is a hard constraint? Happy to follow your judgement.

It'd cost about 8 euro per month, so cost-wise probably not. It'd increase the current job time (6m 12s/last 30 days) to ~7m, making it the second longest running job, after the type checking. That technically doesn't make it the bottle neck. The bottle neck currently is type checking, we plan to migrate to typescript native soon, so in not so long the test_browser job will likely become the bottleneck. There aren't many hard constraint, only the time the team is willing to spend waiting for jobs to finish during releases and deploys.

@siriwatknp

siriwatknp commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

I pushed a commit to move to regression test, the timing does not differ much but the code is simplified. Net CI wall-clock saving is ~break-even; real wins are ~16 s of duplicate Chromium setup avoided per run and infra simplification (one Vitest suite, one config, one script deleted). No new violations; 2 of 38 result files differ (all pre-existing false-positive color-contrast flags disappearing because the VRT loop parks the mouse and normalizes body.overflow — post-fold is strictly more accurate).

Per Jan's review on mui#48341: run axe inline in test/regressions/index.test.js,
reuse the same Chromium session. Drop the standalone test/a11y/ suite and the
docs:a11y:dev script. CI a11y gate moves from test_browser to test_regressions.

Slider/Table results reflect a cleaner DOM (mouse parked, scrollbar
normalized); all deltas are false-positive color-contrast flags disappearing.
@siriwatknp siriwatknp marked this pull request as ready for review April 21, 2026 07:44
@Janpot

Janpot commented Apr 21, 2026

Copy link
Copy Markdown
Member

Ok, if the win in runtime is marginal then we can go either way, your call.

Did you check how currently screenshots are being excluded? Because if you implement this as enrollment in the existing screenshots fixtures, you will also inherit its exclusion mechanism. If we go the combined route then we should probably import all demos and build separate exclusion mechanisms for a11y and screenshots. e.g.

declare const demoMeta: Map<string, {
  screenshot: {
    enabled: boolean,
    // extra options, e.g. waitForSelector...
  },
  a11y: {
    enabled: boolean,
    rules: {
      // ...
    }
  }
}>()

aside: Thanks for editing the LLM generated comment, it really demotivates me when I have to answer these. Feels like I'm putting in the effort the author should have done when interacting with their LLM.

@zannager zannager added the test label Apr 21, 2026
@zannager zannager requested a review from Janpot April 21, 2026 13:59

@Janpot Janpot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two things,

  1. My edited comment in #48341 (comment) about screenshot exclusion already being encoded in how we load the test fixtures, is it a problem?
  2. We probably want to generate the data nested under the ./docs folder somewhere, just like we do for other docs generated data?

@siriwatknp

Copy link
Copy Markdown
Member Author

aside: Thanks for editing the LLM generated comment, it really demotivates me when I have to answer these. Feels like I'm putting in the effort the author should have done when interacting with their LLM.

Sorry for that, I usually don't want LLM to write response directly on GH but it was trying to be smart and push the comment directly 🥲. I'm building guard rail around it, bear with me.

Per-tool exclusions now live in test/regressions/demoMeta.ts instead of
glob negations in index.jsx. Screenshot-specific reasons (Redundant,
Flaky image loading) no longer silently drop a11y coverage; autocomplete
Redundant variants, skeleton/Facebook, snackbars/Customized, steppers
image-heavy demos, and AccessibleTabs1/2 now get axe coverage.

Glob keeps only structural exclusions + whole slugs no tool consumes.
DEMO_META accepts both demo-level and slug-level keys; demo wins when
both exist. shouldScreenshot + resolveA11y are the test-runner gates.

Unit tests in demoMeta.test.ts cover current shape + the 3-line
workflow for enabling a11y on an interaction-heavy slug (e.g. dialogs).

a11yConfig.ts removed; AGENTS.md updated.
@siriwatknp

siriwatknp commented Apr 24, 2026

Copy link
Copy Markdown
Member Author

Pushed f337bc7 implementing the separation.

The regression exclusion can be separately configurable in demoMeta.ts file.
Meaning, the screenshot and a11y are independent from each other.

Colocates the generated JSON with other docs data, per Janpot's second
review point. Reporter's OUT_DIR and the test:regressions prettier step
updated to the new path. No other consumers touched the old path.
Migration expanded axe coverage to demos previously screenshot-excluded
(autocomplete Redundant, skeleton/Facebook, etc.). Surfaces color-contrast
near-threshold findings on fab, skeleton, bottom-navigation — same class
of issue already suppressed elsewhere. Added skipAssertions for the three
slugs and regenerated results JSONs at docs/data/material/a11y/.
@siriwatknp

Copy link
Copy Markdown
Member Author

Do you aim to maximize the coverage for the Button component in this PR or just set up the infra?

just the infra (with couple of Button demos generated).

siriwatknp added 2 commits May 5, 2026 09:34
Address Codex review on mui#48341: waitForSelector field on ScreenshotRule
was declared but never read. Migrate the hardcoded ReactVirtualizedTable
wait into a rule and apply screenshotRule.waitForSelector before axe +
screenshot in the VRT loop.

Reporter now prunes stale {slug}.a11y.json after writing — gated on the
VRT module having actually run and no -t filter, so unrelated unit-test
runs and filtered slug subsets leave untouched files alone.
@github-actions github-actions Bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 5, 2026
@siriwatknp

Copy link
Copy Markdown
Member Author
  • waitForSelector is declared on ScreenshotRule, but the runner only reads screenshotRule?.enabled before calling takeScreenshot. A future rule with waitForSelector would silently no-op. Either wire it before screenshot capture or remove the field until it’s needed.
  • Generated *.a11y.json files aren’t pruned before regeneration. Since the reporter only writes slugs seen in the current run, removing a slug from A11Y_RULES could leave stale tracked JSON behind while CI still passes. Consider deleting existing generated a11y files before the run or pruning stale files in the reporter.

nice catch, fixed both of them.

@siriwatknp siriwatknp requested a review from mj12albert May 6, 2026 08:39
@github-actions github-actions Bot added the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 7, 2026
Comment thread test/regressions/a11y/axe.ts Outdated
}
}

const violations = [...new Set([...results.violations, ...results.incomplete].map((v) => v.id))];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think incomplete should be treated as (merged into) violations here

It generally means axe-core can't evaluate this and "needs human review": https://docs.deque.com/devtools-for-web/4/en/needs-review-incomplete

IMO these are the most important ones to track as they are the automation gaps

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, fixed. the generated results contains a new field needsReview for maintainer.

// VRT module must have actually executed, and no `-t` filter narrowed it.
// Anything else (unrelated unit tests, filtered slug subset) leaves
// untouched slugs' files alone since we have no signal about them.
const ranVrtSuite = testModules.some((m) => m.moduleId === VRT_MODULE_PATH);

@mj12albert mj12albert May 7, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex finding:

This prunes when the VRT module appears and no -t filter is set, but it ignores Vitest’s reason argument ("passed" | "interrupted" | "failed"). This can delete unrelated *.a11y.json files after a partial or failed local run. I’d fix this before broad rollout, but I’d call it non-blocking for the current PR. Best fix: accept the third onTestRunEnd arg and only prune when reason === 'passed'.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good finding, fixed.

@github-actions github-actions Bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 8, 2026
Comment thread test/regressions/a11y/axe.ts Outdated
Comment on lines +91 to +94
const violations = [...new Set(results.violations.map((v) => v.id))];
const needsReview = [...new Set(results.incomplete.map((v) => v.id))].filter(
(id) => !violations.includes(id),
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: having violationsSet and then being used in the needsReview's filter is faster. up to you if you think it's worth optimizing.

Comment thread test/regressions/a11y/axe.ts Outdated
Comment on lines +113 to +118
const slugs = [...bySlug.keys()].sort();
const partial = slugs.filter((s) => bySlug.get(s)!.some((m) => m.violations.length > 0));
const needsReview = slugs.filter(
(s) => !partial.includes(s) && bySlug.get(s)!.some((m) => m.needsReview.length > 0),
);
const pass = slugs.filter((s) => !partial.includes(s) && !needsReview.includes(s));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same speed suggestion here, I would probably use sets to speed up all these including checks. we don't need them as arrays anyway here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it but the gain is sub-millisecond, so I'd leave it like this for now.

Comment thread test/regressions/index.test.js Outdated
Comment thread docs/data/material/components/buttons/buttons.a11y.json Outdated

@mj12albert mj12albert left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ~
I pushed a commit to update the docs.

One follow up is to consider whether to surface regressions on a previously passed (and committed) non-visual axe rule in a CI step. Right now such a change could be committed and be visible in a PR diff but fully pass CI.

@mj12albert mj12albert changed the title [test] Add demo-based axe-core accessibility tests for mui-material [test] Add axe-core tests for mui-material May 8, 2026
@siriwatknp siriwatknp enabled auto-merge (squash) May 9, 2026 06:57
@siriwatknp siriwatknp merged commit 9a87af4 into mui:master May 11, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[test] Set up automated test for accessibility

5 participants