webarc/README.md

49 lines
1.4 KiB
Markdown
Raw Normal View History

2024-12-29 16:51:34 +01:00
# WebArc
2024-12-29 18:39:34 +01:00
`webarc` is a local website archive based on [monolith](https://github.com/Y2Z/monolith).
2024-12-29 16:51:34 +01:00
2025-01-02 19:00:47 +01:00
## Archive Format
A web archive is defined as a directory containing domains in this structure:
```
web_archive/
├─ domain.com/
│ ├─ sub/
│ │ ├─ path/
│ │ │ ├─ index_YYYY_MM_DD.html
├─ sub.domain.com/
```
2024-12-29 16:51:34 +01:00
2025-01-02 19:00:47 +01:00
Every document of this web archive can then be found at `archive/domain/paths.../index_YYYY_MM_DD.html`.
2024-12-29 18:39:34 +01:00
## Usage
2025-01-02 19:00:47 +01:00
webarc provides a CLI tool to work with the archive structure.
```sh
# List domains in archive
webarc [--dir ARCHIVE] archive list [-j, --json]
# List all paths on a domain
webarc [--dir ARCHIVE] archive list [-j, --json] [DOMAIN]
# List all versions of a document
webarc [--dir ARCHIVE] archive versions [-j, --json] [DOMAIN] [PATH]
# Get a document
# `--md` will return a markdown version
webarc [--dir ARCHIVE] archive get [--md] [DOMAIN] [PATH] [VERSION]
# Archive a website
webarc [--dir ARCHIVE] archive download [URL]
```
## Configuration
You can configure the application using a config file. Look at the [config.toml](config.toml) file for more information.
## Web Server
You can start a webserver serving an archive with `webarc serve`.
2024-12-29 18:39:34 +01:00
Archived pages can be viewed at `/s/<domain>/<path..>`.
For example, `/s/en.wikipedia.org/wiki/Website` will serve `en.wikipedia.org` at `/wiki/Website`.
To select an archive from a certain time, add `?time=YYYY-MM-DD` parameter to the URL.