48 lines
1.4 KiB
Markdown
48 lines
1.4 KiB
Markdown
# WebArc
|
|
`webarc` is a local website archive based on [monolith](https://github.com/Y2Z/monolith).
|
|
|
|
## Archive Format
|
|
A web archive is defined as a directory containing domains in this structure:
|
|
|
|
```
|
|
web_archive/
|
|
├─ domain.com/
|
|
│ ├─ sub/
|
|
│ │ ├─ path/
|
|
│ │ │ ├─ index_YYYY_MM_DD.html
|
|
├─ sub.domain.com/
|
|
```
|
|
|
|
Every document of this web archive can then be found at `archive/domain/paths.../index_YYYY_MM_DD.html`.
|
|
|
|
## Usage
|
|
webarc provides a CLI tool to work with the archive structure.
|
|
|
|
```sh
|
|
# List domains in archive
|
|
webarc [--dir ARCHIVE] archive list [-j, --json]
|
|
|
|
# List all paths on a domain
|
|
webarc [--dir ARCHIVE] archive list [-j, --json] [DOMAIN]
|
|
|
|
# List all versions of a document
|
|
webarc [--dir ARCHIVE] archive versions [-j, --json] [DOMAIN] [PATH]
|
|
|
|
# Get a document
|
|
# `--md` will return a markdown version
|
|
webarc [--dir ARCHIVE] archive get [--md] [DOMAIN] [PATH] [VERSION]
|
|
|
|
# Archive a website
|
|
webarc [--dir ARCHIVE] archive download [URL]
|
|
```
|
|
|
|
## Configuration
|
|
You can configure the application using a config file. Look at the [config.toml](config.toml) file for more information.
|
|
|
|
## Web Server
|
|
You can start a webserver serving an archive with `webarc serve`.
|
|
|
|
Archived pages can be viewed at `/s/<domain>/<path..>`.
|
|
For example, `/s/en.wikipedia.org/wiki/Website` will serve `en.wikipedia.org` at `/wiki/Website`.
|
|
|
|
To select an archive from a certain time, add `?time=YYYY-MM-DD` parameter to the URL.
|