Skip to content

Introducing Profi into Rustc for better Sample PGO quality #156898

@zamazan4ik

Description

@zamazan4ik

Hi!

Profi is an algorithm for improving Sample PGO profile quality by clever heuristics (more details read in the Reference in "Profi implementation". This functionality is already implemented in LLVM and integrated into Clang's PGO pipeline. Rust is lacking such of functionality for now.

Clang has -fsample-profile-use-profi flag. Clang already enabled using Profi default for Sample PGO use case 1 year.

At the moment, Profi can still be used with Rust by passing an additional LLVM flag: -Cllvm-args=-sample-profile-use-profi but I haven't tested it with Rustc yet.

Open questions:

  • Do we want to introduce an alternative to -fsample-profile-use-profi/-fno-sample-profile-use-profi into the Rustc compiler?
  • Do we want to enable Profi for Sample PGO by default? If yes, do we need to introduce flags to disable this behavior explicitly (see previous point). In this case, we can provide better Sample PGO experience for an average user - they don't need to know about an additional hidden LLVM argument that improves Sample PGO quality. At the moment, I have a draft commit about enabling it by default in my Rust fork: commit. We can partially mitigate it by adding such flags as a part of cargo-pgo but it's another discussion topic (especially since cargo-pgo doesn't support Sample PGO flow).
  • Do we need more evidence about Profi efficiency from practice? It's kinda difficult to find any evidency even for Clang. But in theory, Profi is a good idea.

Some references:

  • Profi implementation commits in LLVM: one, two, three
  • Adding -fsample-profile-use-profi flag into Clang: commit
  • Enabling using Profi by default in Clang: PR
  • Adding -fno-sample-profile-use-profi flag: PR
  • Some benchmarks about Profi efficiency: ChromeOS results, more results

Kindly pinging @ojeda since that functionality could be interesting for their use case too (even if seems like Google uses another approach internally). Maybe this approach with flow-sensitive discriminators should be pushed instead? Especially since exactly this approach is used in Rust-for-Linux.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-PGOArea: Profile-guided optimizations (PGO)C-discussionCategory: Discussion or questions that doesn't represent real issues.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions