webarc/README.md
JMARyA 8df8edeeca
Some checks failed
ci/woodpecker/push/build Pipeline failed
update
2025-01-02 19:00:47 +01:00

48 lines
1.4 KiB
Markdown

# WebArc
`webarc` is a local website archive based on [monolith](https://github.com/Y2Z/monolith).
## Archive Format
A web archive is defined as a directory containing domains in this structure:
```
web_archive/
├─ domain.com/
│ ├─ sub/
│ │ ├─ path/
│ │ │ ├─ index_YYYY_MM_DD.html
├─ sub.domain.com/
```
Every document of this web archive can then be found at `archive/domain/paths.../index_YYYY_MM_DD.html`.
## Usage
webarc provides a CLI tool to work with the archive structure.
```sh
# List domains in archive
webarc [--dir ARCHIVE] archive list [-j, --json]
# List all paths on a domain
webarc [--dir ARCHIVE] archive list [-j, --json] [DOMAIN]
# List all versions of a document
webarc [--dir ARCHIVE] archive versions [-j, --json] [DOMAIN] [PATH]
# Get a document
# `--md` will return a markdown version
webarc [--dir ARCHIVE] archive get [--md] [DOMAIN] [PATH] [VERSION]
# Archive a website
webarc [--dir ARCHIVE] archive download [URL]
```
## Configuration
You can configure the application using a config file. Look at the [config.toml](config.toml) file for more information.
## Web Server
You can start a webserver serving an archive with `webarc serve`.
Archived pages can be viewed at `/s/<domain>/<path..>`.
For example, `/s/en.wikipedia.org/wiki/Website` will serve `en.wikipedia.org` at `/wiki/Website`.
To select an archive from a certain time, add `?time=YYYY-MM-DD` parameter to the URL.