Optimize a slew of Cargo internals
Cargo has historically had very little optimization applied to it. Despite that it's pretty speedy today but there's always a desire to be faster! I've noticed Cargo being particularly sluggish on projects like Servo and rust-lang/rust, so I started profiling and found quite a few low-hanging fruit!
This PR is a slew of optimizations across Cargo for various things found here and there. The banner optimizations are:
* Resolution with a lock file should be basically a noop in terms of execution time now. An optimization was done to avoid cloning `Context` unless necessary, and that basically means it doesn't get cloned now! As the number 1 source of slowdown in Cargo this is the biggest improvement.
* Lots of pieces in `resolve` are now `Rc<T>` for being more easily cloneable.
* `Summary` now internally contains an `Rc` like `Dependency`, making it much more quickly cloneable.
* `Registry` as a trait no longer returns a `Vec` but rather takes a closure to yield summaries up, removing lots of intermediate arrays.
* We no longer spawn a thread for all units of "fresh work", only when we're about to spawn a process.
Almost everything here was guided through profiling `./x.py build` on rust-lang/rust or `cargo build -p log` on Servo. Both of these stress "noop resolution" and the former also stresses noop builds.
Runs of `./x.py build` dropped from 4 to 2 seconds (with lots of low-hanging fruit still remaining in Cargo itself) and `cargo build -p log` dropped from 1.5s to 0.3s. Massif graphs showing Cargo's memory usage also show that the peak memory usage of Cargo in a noop build of Servo dropped from 300MB to 30MB during resolution.
I'm hoping that none of these optimizations makes the code less readable and/or understandable. There are no algorithmic improvements in this PR other than those transitively picked up by making clones cheaper and/or allocating less.
Removing some allocations arounds the stored hashes by having nested hash maps
instead of tuple keys. Also remove an intermediate array when parsing
dependencies through a custom implementation of `Deserialize`. While this
doesn't make this code path blazingly fast it definitely knocks it down in the
profiles below other higher-value targets.
Previously all intermediate stages would create and return `Vec<Summary>`, but
this is a pretty costly operation once you start layering. Ideally we'd use an
iterator-based approach here but working with that in trait objects is
difficult, so this commit takes a closure-based approach to avoid all the
intermediate allocations that are thrown away.
This commit is a relatively serious optimization pass of the resolution phase in
Cargo, targeted at removing as many allocations as possible from this phase.
Executed as an iterative loop this phase of Cargo can often be costly for large
graphs but it's run on every single build!
The main optimization here is to avoid cloning the context and/or pushing a
backtracking frame if there are no candidates left in the current list of
candidates. That optimizes a fast-path for crates with lock files (almost all of
them) and gets us to the point where cloning the context basically disappears
from all profiling.
Add a GNU make jobserver implementation to Cargo
This commit adds a GNU make jobserver implementation to Cargo, both as a client
of existing jobservers and also a creator of new jobservers. The jobserver is
actually just an IPC semaphore which manifests itself as a pipe with N bytes
of tokens on Unix and a literal IPC semaphore on Windows. The rough protocol
is then if you want to run a job you read acquire the semaphore (read a byte on
Unix or wait on the semaphore on Windows) and then you release it when you're
done.
All the hairy details of the jobserver implementation are housed in the
`jobserver` crate on crates.io instead of Cargo. This should hopefully make it
much easier for the compiler to also share a jobserver implementation
eventually.
The main tricky bit here is that on Unix and Windows acquiring a jobserver token
will block the calling thread. We need to either way for a running job to exit
or to acquire a new token when we want to spawn a new job. To handle this the
current implementation spawns a helper thread that does the blocking and sends a
message back to Cargo when it receives a token. It's a little trickier with
shutting down this thread gracefully as well but more details can be found in
the `jobserver` crate.
Unfortunately crates are unlikely to see an immediate benefit of this once
implemented. Most crates are run with a manual `make -jN` and this overrides the
jobserver in the environment, creating a new jobserver in the sub-make. If the
`-jN` argument is removed, however, then `make` will share Cargo's jobserver and
properly limit parallelism.
Closes#1744
This commit adds a GNU make jobserver implementation to Cargo, both as a client
of existing jobservers and also a creator of new jobservers. The jobserver is
actually just an IPC semaphore which manifests itself as a pipe with N bytes
of tokens on Unix and a literal IPC semaphore on Windows. The rough protocol
is then if you want to run a job you read acquire the semaphore (read a byte on
Unix or wait on the semaphore on Windows) and then you release it when you're
done.
All the hairy details of the jobserver implementation are housed in the
`jobserver` crate on crates.io instead of Cargo. This should hopefully make it
much easier for the compiler to also share a jobserver implementation
eventually.
The main tricky bit here is that on Unix and Windows acquiring a jobserver token
will block the calling thread. We need to either way for a running job to exit
or to acquire a new token when we want to spawn a new job. To handle this the
current implementation spawns a helper thread that does the blocking and sends a
message back to Cargo when it receives a token. It's a little trickier with
shutting down this thread gracefully as well but more details can be found in
the `jobserver` crate.
Unfortunately crates are unlikely to see an immediate benefit of this once
implemented. Most crates are run with a manual `make -jN` and this overrides the
jobserver in the environment, creating a new jobserver in the sub-make. If the
`-jN` argument is removed, however, then `make` will share Cargo's jobserver and
properly limit parallelism.
Closes#1744
Updating doc to reflect new badges added
The badges added are:
Is it maintained: Resolution time
Is it maintained: Percentage of open issues
Codecov: Code coverage
Coveralls: Code coverage
Add error-chain errors.
Fixes#4209
Convert CargoResult, CargoError into an implementation provided by error-chain. The previous is_human machinery is mostly removed; now errors are displayed unless of the Internal kind, verbose mode will print all errors.
Remove lots of dated configuration from this repo
Lots of data build stuff is still here from awhile ago when this repo was
producing Cargo binaries, but the rust-lang/rust repo is now responsible for all
these binaries and build configurations. We no longer need to produce artifacts
or have tons of cross-compiles as rust-lang/rust does all that work, instead
let's just test the likely-to-regress platforms and have rust-lang/rust take
care of the rest.
This commit:
* Deletes the old `configure` script and `Makefile`
* Rewrites `src/doc` management as a shell script
* Trims down Travis/AppVeyor configuration
Lots of data build stuff is still here from awhile ago when this repo was
producing Cargo binaries, but the rust-lang/rust repo is now responsible for all
these binaries and build configurations. We no longer need to produce artifacts
or have tons of cross-compiles as rust-lang/rust does all that work, instead
let's just test the likely-to-regress platforms and have rust-lang/rust take
care of the rest.
This commit:
* Deletes the old `configure` script and `Makefile`
* Rewrites `src/doc` management as a shell script
* Trims down Travis/AppVeyor configuration
Fix some formatting items.
Changes Internal error kind to preserve the original error.
Changes network retry logic to inspect full error chain for spurious
errors.