[test] Add axe-core tests for mui-material by siriwatknp · Pull Request #48341 · mui/material-ui

siriwatknp · 2026-04-21T04:20:03Z

closes #47887 · supersedes #47895

Docs preview: https://deploy-preview-48341--material-ui.netlify.app/

Summary

Automated axe-core accessibility testing for @mui/material, driven off the real docs demos. Folded into the existing visual-regression Playwright loop (test/regressions/index.test.js) — for each enrolled demo, axe runs against the rendered [data-testid="testcase"] element in the same browser session that takes the screenshot, and the result is written to docs/data/material/components/{slug}/{slug}.a11y.json (one file per slug, keyed by demo name).

Initial scope is buttons only — BasicButtons and ColorButtons — to validate the pipeline end-to-end before broader enrolment in follow-ups. Layout wrappers (Box, Grid, Stack, Paper, Container, Portal, …) will be excluded on principle when added: they contribute no a11y signal of their own; any violation found on them really belongs to their children, already covered by the child's own enrolment.

Config shape

Two independent rule arrays — one per tool — evaluated last-match-wins (no inheritance: an override rule must restate every field it cares about) against the docs path docs/data/material/components/{slug}/{Demo} (minimatch globs). Independent arrays mean editing one tool's config can't stomp the other.

// test/regressions/demoMeta.ts
export const A11Y_RULES: A11yRule[] = [
  { test: 'docs/data/material/components/buttons/{BasicButtons,ColorButtons}', enabled: true },
];

pnpm test:regressions refreshes the *.a11y.json files. CI enforces they're up to date via a git-diff check.

For Reviewers

Why demo-based, not scenario-based

#47895 introduced scenario files (test/a11y/scenarios/Button.a11y.tsx) that hand-authored every axe case per component. Per Olivier's feedback, docs demos already cover every real usage scenario — duplicating them as test fixtures is waste. This PR replaces the scenarios folder with a thin rule list pointed at the demo routes VRT already serves.

Why folded into the VRT loop

@Janpot's earlier concern about duplicate Playwright sessions: rather than a parallel browser, axe now runs inside the existing screenshot loop. Each enrolled demo gets one axe.run against the already-rendered [data-testid="testcase"] element before the screenshot is taken. Failure isolation is preserved by recording the result on ctx.task.meta.a11y first; the assertion fires only on visual rules (color-contrast, link-in-text-block) that depend on real CSS.

Screenshot and a11y opt-outs are independent — a demo screenshot-disabled for being redundant or interaction-heavy can still be audited by axe.

Config follow-up from review

@Janpot's review comment flagged the original dual-Map structure (SLUG_A11Y + DEMO_META) as non-intuitive and at risk of cross-tool stomping. Replaced with two rule arrays + minimatch globs as suggested; per-demo opt-outs are appended after slug-wide rules; the override rule restates every field it wants (no implicit inheritance).

Output shape

One file per enrolled slug at docs/data/material/components/{slug}/{slug}.a11y.json, co-located with the slug's other docs files. Each file is an object keyed by demo name:

{
  "BasicButtons": { "passedRules": [...], "failedRules": [], "testedRules": { "wcag2a": [...], "wcag2aa": [...], "wcag22aa": [...] } },
  "ColorButtons": { ... }
}

Co-locating with {slug}.md and the demo source files makes the data discoverable from the slug folder and avoids the slug-prefix collision problem of a flat per-demo layout.

Code

test/regressions/demoMeta.ts — two rule arrays + resolvers (shouldScreenshot, resolveA11y).
test/regressions/a11y/axe.ts — recordA11y records onto ctx.task.meta.a11y, asserts visual rules only. skipAssertions suppresses the assertion for listed rule IDs without dropping them from recorded data — failing rules still land in failedRules for triage.
test/regressions/a11y/a11yReporter.ts — Vitest reporter that groups results by slug and writes one file per slug to docs/data/material/components/{slug}/{slug}.a11y.json.
test/regressions/index.test.js — calls recordA11y inside the existing per-route loop, before the screenshot.
test/regressions/demoMeta.test.ts — 8 unit tests for the resolvers (precedence, brace-glob enrolment, screenshot vs a11y independence).

CI

The existing test:regressions job already builds, serves, and runs Playwright. Adding axe is a no-op on infra. The git-diff guard ensures every *.a11y.json is up-to-date.

Agent docs

AGENTS.md — "Accessibility Testing" section covers the rule-array shape, brace-glob narrowing, per-demo opt-outs, and the skipAssertions escape hatch.

Not in scope (follow-ups)

Docs presentation (separate PR). A demo-toolbar pill reads each slug's *.a11y.json via the docs render pipeline and shows pass/fail counts plus a per-WCAG-tag rule breakdown on click.
Broader enrolment. Buttons is the validation slug; other components onboard via slug-wide rules in A11Y_RULES.
Rule graduation (global severity levels). Today every non-visual violation is recorded but not asserted; every visual violation is asserted unless skipAssertions lists it. Fine for a POC; may want finer control once more components enroll.

I have followed (at least) the PR section of the contributing guide.

Runs axe-core against real docs demos in Chromium, reusing the VRT Vite server. Enrollment is an explicit list in test/a11y/a11y.test.mjs (POC: Button + Card, 4 demos). Output a11y-results.json carries both per-component aggregates (for the docs widget) and per-demo detail (for future per-demo UI). Supersedes mui#47895 — swaps hand-written scenario modules for the docs demos themselves, per Olivier's suggestion to mirror Janpot's html-validate-over-docs-pages pattern in mui#48088.

code-infra-dashboard · 2026-04-21T04:23:12Z

Deploy preview

https://deploy-preview-48341--material-ui.netlify.app/

Bundle size

Bundle	Parsed size	Gzip size
@mui/material	0B^(0.00%)	0B^(0.00%)
@mui/lab	0B^(0.00%)	0B^(0.00%)
@mui/private-theming	0B^(0.00%)	0B^(0.00%)
@mui/system	0B^(0.00%)	0B^(0.00%)
@mui/utils	0B^(0.00%)	0B^(0.00%)

Details of bundle changes

Check out the code infra dashboard for more information about this PR.

Replaces the hardcoded ENROLLED map in a11y.test.mjs with a full component roster at packages/mui-material/test/a11y/config.ts. Every component has an entry; status is 'enabled' (Button, Card) or 'pending' (rest). Adoption per PR = flip 'pending' to 'enabled' and fill in `demos`. The config doubles as the rollout ledger — a future docs presentation can iterate the full array to show both enrolled and pending components. Drops ComponentAccessibilityStatus.js and the two .md embeds. Presentation is a separate concern; the JSON shape stays (per-component aggregates + per-demo detail) so the future widget has the data it needs.

Replaces the single a11y-results.json (383 KB / 14k lines at full rollout) with packages/mui-material/test/a11y/results/{Component}.json. A future docs widget can import('.../results/Button.json') and webpack/vite will code-split per component, so each page only ships its own ~2 KB payload instead of the whole roster. Also updates the docs:a11y script to rimraf the directory before each run, and the AGENTS.md reference.

Flips 45 components from 'pending' to 'enabled' in config.ts. The 26 that currently trip color-contrast keep a skipRules: ['color-contrast'] entry so CI stays green while the violations still land in failedRules for the team to triage. a11y.test.mjs now falls back to the VRT nav when a config entry omits its 'demos' list, so enrolling is one line (flip the status). Explicit demos remain the override knob. The 11 components VRT excludes (need interaction, no demos, flaky) stay 'pending' with inline reasons — nothing observable today.

Janpot · 2026-04-21T05:35:36Z

Rendering all the demos, one by one, in a real browser and run tests on them. Sounds an awful lot like the current screenshot tests we have. Would it make sense (be possible?) to fold accessibility testing into the regression tests. It would likely save a ton on circleci credits compared to running another full suite where we render all demos.

`skipAssertions` better describes the behavior — the rule still runs and still lands in the results JSON, only the test-failing assertion is suppressed. `skipRules` read as "rule is skipped entirely", which isn't what happens. Also drops Box from the roster — nothing to a11y-test.

Layout wrappers (Grid, Masonry, Portal, NoSsr, Stack, Paper) and pure behavior components (Backdrop, ClickAwayListener, Container, CssBaseline) have no a11y semantics of their own — axe either finds nothing or attributes violations to them that actually belong to their inner content (already covered by the child component's own entry). Keeps Typography (heading hierarchy), Skeleton (aria-busy), Divider (caught a real list violation) since those contribute real signal. 44 components in roster → 36 enabled, 8 pending (interaction-only).

The status gate was infrastructure for a presentation UI that isn't in scope for this PR. The config now lists only components that are actually enrolled; pending ones live as `// TODO: {Component} — <blocker>` comments so the reason stays visible without needing a parseable data shape. If a future docs widget ever wants a "coming soon" list, convert the comments back to structured entries then.

siriwatknp · 2026-04-21T06:25:27Z

It would likely save a ton on circleci credits compared to running another full suite where we render all demos.

Infra is already shared — pnpm docs:a11y reuses test:regressions:build and runs the Vitest axe loop concurrently with test:regressions:server. The only new spend is one Chromium cold start + 350 axe runs ≈ 44s (~1/5 of one VRT pass).

Kept it as a separate Vitest file for a standalone dev loop, not because of the infra cost. Happy to fold axe into test/regressions/index.test.js if you'd rather reclaim the 44 s — would need to swap the Vitest reporter for inline aggregation + after(all) write, add a route filter (enrolled /docs-components-* only), and split each route into two its for failure isolation. It's doable.

Do you think saving around ~40s is a hard constraint? Happy to follow your judgement.

Janpot · 2026-04-21T06:45:50Z

Do you think saving around ~40s is a hard constraint? Happy to follow your judgement.

It'd cost about 8 euro per month, so cost-wise probably not. It'd increase the current job time (6m 12s/last 30 days) to ~7m, making it the second longest running job, after the type checking. That technically doesn't make it the bottle neck. The bottle neck currently is type checking, we plan to migrate to typescript native soon, so in not so long the test_browser job will likely become the bottleneck. There aren't many hard constraint, only the time the team is willing to spend waiting for jobs to finish during releases and deploys.

siriwatknp · 2026-04-21T06:54:38Z

I pushed a commit to move to regression test, the timing does not differ much but the code is simplified. Net CI wall-clock saving is ~break-even; real wins are ~16 s of duplicate Chromium setup avoided per run and infra simplification (one Vitest suite, one config, one script deleted). No new violations; 2 of 38 result files differ (all pre-existing false-positive color-contrast flags disappearing because the VRT loop parks the mouse and normalizes body.overflow — post-fold is strictly more accurate).

Per Jan's review on mui#48341: run axe inline in test/regressions/index.test.js, reuse the same Chromium session. Drop the standalone test/a11y/ suite and the docs:a11y:dev script. CI a11y gate moves from test_browser to test_regressions. Slider/Table results reflect a cleaner DOM (mouse parked, scrollbar normalized); all deltas are false-positive color-contrast flags disappearing.

Janpot · 2026-04-21T07:55:05Z

Ok, if the win in runtime is marginal then we can go either way, your call.

Did you check how currently screenshots are being excluded? Because if you implement this as enrollment in the existing screenshots fixtures, you will also inherit its exclusion mechanism. If we go the combined route then we should probably import all demos and build separate exclusion mechanisms for a11y and screenshots. e.g.

declare const demoMeta: Map<string, {
  screenshot: {
    enabled: boolean,
    // extra options, e.g. waitForSelector...
  },
  a11y: {
    enabled: boolean,
    rules: {
      // ...
    }
  }
}>()

aside: Thanks for editing the LLM generated comment, it really demotivates me when I have to answer these. Feels like I'm putting in the effort the author should have done when interacting with their LLM.

Janpot

Just two things,

My edited comment in #48341 (comment) about screenshot exclusion already being encoded in how we load the test fixtures, is it a problem?
We probably want to generate the data nested under the ./docs folder somewhere, just like we do for other docs generated data?

siriwatknp · 2026-04-22T08:46:50Z

aside: Thanks for editing the LLM generated comment, it really demotivates me when I have to answer these. Feels like I'm putting in the effort the author should have done when interacting with their LLM.

Sorry for that, I usually don't want LLM to write response directly on GH but it was trying to be smart and push the comment directly 🥲. I'm building guard rail around it, bear with me.

Per-tool exclusions now live in test/regressions/demoMeta.ts instead of glob negations in index.jsx. Screenshot-specific reasons (Redundant, Flaky image loading) no longer silently drop a11y coverage; autocomplete Redundant variants, skeleton/Facebook, snackbars/Customized, steppers image-heavy demos, and AccessibleTabs1/2 now get axe coverage. Glob keeps only structural exclusions + whole slugs no tool consumes. DEMO_META accepts both demo-level and slug-level keys; demo wins when both exist. shouldScreenshot + resolveA11y are the test-runner gates. Unit tests in demoMeta.test.ts cover current shape + the 3-line workflow for enabling a11y on an interaction-heavy slug (e.g. dialogs). a11yConfig.ts removed; AGENTS.md updated.

siriwatknp · 2026-04-24T06:00:19Z

Pushed f337bc7 implementing the separation.

The regression exclusion can be separately configurable in demoMeta.ts file.
Meaning, the screenshot and a11y are independent from each other.

Colocates the generated JSON with other docs data, per Janpot's second review point. Reporter's OUT_DIR and the test:regressions prettier step updated to the new path. No other consumers touched the old path.

Migration expanded axe coverage to demos previously screenshot-excluded (autocomplete Redundant, skeleton/Facebook, etc.). Surfaces color-contrast near-threshold findings on fab, skeleton, bottom-navigation — same class of issue already suppressed elsewhere. Added skipAssertions for the three slugs and regenerated results JSONs at docs/data/material/a11y/.

siriwatknp · 2026-05-05T02:32:26Z

Do you aim to maximize the coverage for the Button component in this PR or just set up the infra?

just the infra (with couple of Button demos generated).

Address Codex review on mui#48341: waitForSelector field on ScreenshotRule was declared but never read. Migrate the hardcoded ReactVirtualizedTable wait into a rule and apply screenshotRule.waitForSelector before axe + screenshot in the VRT loop. Reporter now prunes stale {slug}.a11y.json after writing — gated on the VRT module having actually run and no -t filter, so unrelated unit-test runs and filtered slug subsets leave untouched files alone.

siriwatknp · 2026-05-05T02:58:05Z

waitForSelector is declared on ScreenshotRule, but the runner only reads screenshotRule?.enabled before calling takeScreenshot. A future rule with waitForSelector would silently no-op. Either wire it before screenshot capture or remove the field until it’s needed.

Generated *.a11y.json files aren’t pruned before regeneration. Since the reporter only writes slugs seen in the current run, removing a slug from A11Y_RULES could leave stale tracked JSON behind while CI still passes. Consider deleting existing generated a11y files before the run or pruning stale files in the reporter.

nice catch, fixed both of them.

mj12albert · 2026-05-07T15:07:02Z

+    }
+  }
+
+  const violations = [...new Set([...results.violations, ...results.incomplete].map((v) => v.id))];


I don't think incomplete should be treated as (merged into) violations here

It generally means axe-core can't evaluate this and "needs human review": https://docs.deque.com/devtools-for-web/4/en/needs-review-incomplete

IMO these are the most important ones to track as they are the automation gaps

True, fixed. the generated results contains a new field needsReview for maintainer.

mj12albert · 2026-05-07T15:08:32Z

+    // VRT module must have actually executed, and no `-t` filter narrowed it.
+    // Anything else (unrelated unit tests, filtered slug subset) leaves
+    // untouched slugs' files alone since we have no signal about them.
+    const ranVrtSuite = testModules.some((m) => m.moduleId === VRT_MODULE_PATH);


Codex finding:

This prunes when the VRT module appears and no -t filter is set, but it ignores Vitest’s reason argument ("passed" | "interrupted" | "failed"). This can delete unrelated *.a11y.json files after a partial or failed local run. I’d fix this before broad rollout, but I’d call it non-blocking for the current PR. Best fix: accept the third onTestRunEnd arg and only prune when reason === 'passed'.

Good finding, fixed.

silviuaavram · 2026-05-08T07:59:38Z

+  const violations = [...new Set(results.violations.map((v) => v.id))];
+  const needsReview = [...new Set(results.incomplete.map((v) => v.id))].filter(
+    (id) => !violations.includes(id),
+  );


nit: having violationsSet and then being used in the needsReview's filter is faster. up to you if you think it's worth optimizing.

silviuaavram · 2026-05-08T08:04:59Z

+    const slugs = [...bySlug.keys()].sort();
+    const partial = slugs.filter((s) => bySlug.get(s)!.some((m) => m.violations.length > 0));
+    const needsReview = slugs.filter(
+      (s) => !partial.includes(s) && bySlug.get(s)!.some((m) => m.needsReview.length > 0),
+    );
+    const pass = slugs.filter((s) => !partial.includes(s) && !needsReview.includes(s));


same speed suggestion here, I would probably use sets to speed up all these including checks. we don't need them as arrays anyway here.

got it but the gain is sub-millisecond, so I'd leave it like this for now.

mj12albert

Looks good ~
I pushed a commit to update the docs.

One follow up is to consider whether to surface regressions on a previously passed (and committed) non-visual axe rule in a CI step. Right now such a change could be committed and be visible in a PR diff but fully pass CI.

…nto test/a11y-demos

siriwatknp added 3 commits April 21, 2026 12:03

siriwatknp added 3 commits April 21, 2026 12:35

siriwatknp marked this pull request as draft April 21, 2026 06:20

siriwatknp mentioned this pull request Apr 21, 2026

[test] Add axe-core accessibility tests for mui-material #47895

Closed

1 task

siriwatknp added 2 commits April 21, 2026 13:57

fix ci

8c19506

siriwatknp marked this pull request as ready for review April 21, 2026 07:44

siriwatknp added 4 commits April 21, 2026 15:01

[test] Rename config.ts to a11yConfig.ts

33ea314

[test] Move a11y infra to test/regressions/a11y/

7d70287

[test] Drop docs:a11y wrapper; fold prettier into test:regressions

6751af1

fix ci

ccc7df4

zannager added the test label Apr 21, 2026

zannager requested a review from Janpot April 21, 2026 13:59

Janpot reviewed Apr 21, 2026

View reviewed changes

siriwatknp added 2 commits April 24, 2026 13:51

[test] Move a11y results to docs/data/material/a11y

909aea1

Colocates the generated JSON with other docs data, per Janpot's second review point. Reporter's OUT_DIR and the test:regressions prettier step updated to the new path. No other consumers touched the old path.

siriwatknp added 2 commits May 5, 2026 09:34

Merge branch 'master' of github.com:mui/material-ui into test/a11y-demos

64f462e

github-actions Bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 5, 2026

siriwatknp requested a review from mj12albert May 6, 2026 08:39

github-actions Bot added the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 7, 2026

mj12albert reviewed May 7, 2026

View reviewed changes

siriwatknp added 3 commits May 8, 2026 14:05

Merge branch 'master' of github.com:mui/material-ui into test/a11y-demos

acd5442

install and dedupe

c84c821

[test] Split a11y needsReview from violations; gate prune on pass

633dedb

github-actions Bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged. label May 8, 2026

prettier

4bb82fe

silviuaavram approved these changes May 8, 2026

View reviewed changes

Janpot reviewed May 8, 2026

View reviewed changes

Comment thread docs/data/material/components/buttons/buttons.a11y.json Outdated

siriwatknp added 3 commits May 8, 2026 16:08

[test] Restructure a11y JSON to per-rule status+tags map

c1bd748

[test] Surface a11y in test title when both modes run

ece39fa

[test] Surface a11y violations and incompletes in single failure

21c5aae

mj12albert approved these changes May 8, 2026

View reviewed changes

mj12albert force-pushed the test/a11y-demos branch from c2d3738 to 8e6e714 Compare May 8, 2026 16:32

mj12albert changed the title ~~[test] Add demo-based axe-core accessibility tests for mui-material~~ [test] Add axe-core tests for mui-material May 8, 2026

update docs

5d56eef

mj12albert force-pushed the test/a11y-demos branch from 8e6e714 to 5d56eef Compare May 8, 2026 16:35

siriwatknp added 2 commits May 9, 2026 13:56

Merge branch 'master' of github.com:mui/material-ui into test/a11y-demos

9c68c55

prettier

2656bda

siriwatknp enabled auto-merge (squash) May 9, 2026 06:57

siriwatknp added 3 commits May 9, 2026 13:58

Merge branch 'test/a11y-demos' of github.com:siriwatknp/material-ui i…

a3a06fe

…nto test/a11y-demos

fix lint

4df2001

pnpm dedupe

a9ebd53

siriwatknp merged commit 9a87af4 into mui:master May 11, 2026
22 checks passed

Uh oh!

Conversation

siriwatknp commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config shape

For Reviewers

Why demo-based, not scenario-based

Why folded into the VRT loop

Config follow-up from review

Output shape

Code

CI

Agent docs

Not in scope (follow-ups)

Uh oh!

code-infra-dashboard Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy preview

Bundle size

Uh oh!

Janpot commented Apr 21, 2026

Uh oh!

siriwatknp commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Janpot commented Apr 21, 2026

Uh oh!

siriwatknp commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Janpot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Janpot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siriwatknp commented Apr 22, 2026

Uh oh!

siriwatknp commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siriwatknp commented May 5, 2026

Uh oh!

siriwatknp commented May 5, 2026

Uh oh!

mj12albert May 7, 2026

Choose a reason for hiding this comment

Uh oh!

siriwatknp May 8, 2026

Choose a reason for hiding this comment

Uh oh!

mj12albert May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siriwatknp May 8, 2026

Choose a reason for hiding this comment

Uh oh!

silviuaavram May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

silviuaavram May 8, 2026

Choose a reason for hiding this comment

Uh oh!

siriwatknp May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mj12albert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

siriwatknp commented Apr 21, 2026 •

edited

Loading

code-infra-dashboard Bot commented Apr 21, 2026 •

edited

Loading

siriwatknp commented Apr 21, 2026 •

edited

Loading

siriwatknp commented Apr 21, 2026 •

edited

Loading

Janpot commented Apr 21, 2026 •

edited

Loading

Janpot left a comment •

edited

Loading

siriwatknp commented Apr 24, 2026 •

edited

Loading

mj12albert May 7, 2026 •

edited

Loading