Auto merge of #14252 - epage:du-less, r=ehuss
Some checks are pending
CI / docs (push) Waiting to run
CI / msrv (push) Waiting to run
CI / build_std (push) Waiting to run
CI / test_gitoxide (push) Waiting to run
CI / resolver (push) Waiting to run
CI / Tests ${{ matrix.name }} (macOS x86_64 stable, macos-13, x86_64-apple-ios, stable) (push) Waiting to run
CI / Tests ${{ matrix.name }} (macOS x86_64 nightly, macos-13, x86_64-apple-ios, nightly) (push) Waiting to run
CI / Tests ${{ matrix.name }} (macOS aarch64 stable, macos-14, x86_64-apple-darwin, stable) (push) Waiting to run
CI / Tests ${{ matrix.name }} (Windows x86_64 gnu nightly, windows-latest, i686-pc-windows-gnu, nightly-gnu) (push) Waiting to run
CI / Tests ${{ matrix.name }} (Windows x86_64 MSVC stable, windows-latest, i686-pc-windows-msvc, stable-msvc) (push) Waiting to run
CI / Tests ${{ matrix.name }} (Linux x86_64 stable, ubuntu-latest, i686-unknown-linux-gnu, stable) (push) Waiting to run
CI / Tests ${{ matrix.name }} (Linux x86_64 nightly, ubuntu-latest, i686-unknown-linux-gnu, nightly) (push) Waiting to run
CI / Tests ${{ matrix.name }} (Linux x86_64 beta, ubuntu-latest, i686-unknown-linux-gnu, beta) (push) Waiting to run
CI / check-version-bump (push) Waiting to run
CI / lockfile (push) Waiting to run
CI / lint-docs (push) Waiting to run
CI / stale-label (push) Waiting to run
CI / clippy (push) Waiting to run
CI / rustfmt (push) Waiting to run
CI / bors build finished (push) Blocked by required conditions
Contrib Deploy / deploy (push) Waiting to run

perf(source): Don't `du` on every git source load

### What does this PR try to resolve?

When profiling Zed (#14238), a major factor in their no-op run times is git patches and git dependencies.  The slowest operation for each git source is running `du`.  This is extraneous for a couple of reasons
- GC isn't stable, slowing people down for a feature they aren't using
- Size tracking was expected to be lazy, only reading sizes when the GC is configured for size, while this was eager
- Git checkouts are immutable but we check on every load
- This optimized for "while filesystem caches are warm" from a checkout operation when checkout operations are rare compared to all of the other commands run on a working directory.

This removes the `du`, relying on the lazy loading that happens in `update_null_sizes`.

For Zed, this removed about 40ms total from the runtime.  While by itself, this is below the threshold of being noticed,
- It adds up if any editor integrations are calling `cargo metadata` a lot
- Over time, small gains like this will add up

### How should we test and review this PR?

### Additional information

cc `@ehuss`
This commit is contained in:
bors 2024-07-15 17:33:50 +00:00
commit a2d45dc370
2 changed files with 6 additions and 10 deletions

View file

@ -1800,7 +1800,7 @@ pub fn is_silent_error(e: &anyhow::Error) -> bool {
/// Returns the disk usage for a git checkout directory.
#[tracing::instrument]
pub fn du_git_checkout(path: &Path) -> CargoResult<u64> {
fn du_git_checkout(path: &Path) -> CargoResult<u64> {
// !.git is used because clones typically use hardlinks for the git
// contents. TODO: Verify behavior on Windows.
// TODO: Or even better, switch to worktrees, and remove this.

View file

@ -145,13 +145,13 @@ impl<'gctx> GitSource<'gctx> {
self.path_source.as_mut().unwrap().read_packages()
}
fn mark_used(&self, size: Option<u64>) -> CargoResult<()> {
fn mark_used(&self) -> CargoResult<()> {
self.gctx
.deferred_global_last_use()?
.mark_git_checkout_used(global_cache_tracker::GitCheckout {
encoded_git_name: self.ident,
short_name: self.short_id.expect("update before download"),
size,
size: None,
});
Ok(())
}
@ -268,7 +268,7 @@ impl<'gctx> Source for GitSource<'gctx> {
fn block_until_ready(&mut self) -> CargoResult<()> {
if self.path_source.is_some() {
self.mark_used(None)?;
self.mark_used()?;
return Ok(());
}
@ -363,11 +363,7 @@ impl<'gctx> Source for GitSource<'gctx> {
self.locked_rev = Revision::Locked(actual_rev);
self.path_source.as_mut().unwrap().load()?;
// Hopefully this shouldn't incur too much of a performance hit since
// most of this should already be in cache since it was just
// extracted.
let size = global_cache_tracker::du_git_checkout(&checkout_path)?;
self.mark_used(Some(size))?;
self.mark_used()?;
Ok(())
}
@ -377,7 +373,7 @@ impl<'gctx> Source for GitSource<'gctx> {
id,
self.remote
);
self.mark_used(None)?;
self.mark_used()?;
self.path_source
.as_mut()
.expect("BUG: `update()` must be called before `get()`")