FastCDC chunking #26
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fastcdc"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This switches blob storage again to FastCDC for content-defined chunking, so small changes in large files don’t waste space. Old archives still work fine because reading hasn’t changed, but to actually get the new chunking benefits, the records should be copied to a new archive. That’s why there’s now a
copycommand, which duplicates all files and records from one archive to another.copyneeds to be fixed up though, it's pretty basic. Future improvements could makecopyeven smarter, maybe only copy certain domains or paths over.WIP: FastCDC chunkingto FastCDC chunkingThe copy command is now implemented as a logical copy over all records stored in the archive.
Copying over the prod archive into the new fastcdc based archives yielded another massive improvement:
The original archive turned from roughly 117G to 44G, or roughly 37% the original size.