# WebArc `webarc` is a local website archive based on [monolith](https://github.com/Y2Z/monolith). ## Archive Format A web archive is defined as a directory containing domains in this structure: ``` web_archive/ ├─ domain.com/ │ ├─ sub/ │ │ ├─ path/ │ │ │ ├─ index_YYYY_MM_DD.html ├─ sub.domain.com/ ``` Every document of this web archive can then be found at `archive/domain/paths.../index_YYYY_MM_DD.html`. ## Usage webarc provides a CLI tool to work with the archive structure. ```sh # List domains in archive webarc [--dir ARCHIVE] archive list [-j, --json] # List all paths on a domain webarc [--dir ARCHIVE] archive list [-j, --json] [DOMAIN] # List all versions of a document webarc [--dir ARCHIVE] archive versions [-j, --json] [DOMAIN] [PATH] # Get a document # `--md` will return a markdown version webarc [--dir ARCHIVE] archive get [--md] [DOMAIN] [PATH] [VERSION] # Archive a website webarc [--dir ARCHIVE] archive download [URL] ``` ## Configuration You can configure the application using a config file. Look at the [config.toml](config.toml) file for more information. ## Web Server You can start a webserver serving an archive with `webarc serve`. Archived pages can be viewed at `/s//`. For example, `/s/en.wikipedia.org/wiki/Website` will serve `en.wikipedia.org` at `/wiki/Website`. To select an archive from a certain time, add `?time=YYYY-MM-DD` parameter to the URL.