parent
0f6e5f5b10
commit
8df8edeeca
15 changed files with 591 additions and 124 deletions
43
README.md
43
README.md
|
@ -1,14 +1,47 @@
|
|||
# WebArc
|
||||
`webarc` is a local website archive based on [monolith](https://github.com/Y2Z/monolith).
|
||||
|
||||
## Configuration
|
||||
You can configure the application using environment variables:
|
||||
## Archive Format
|
||||
A web archive is defined as a directory containing domains in this structure:
|
||||
|
||||
- `$ROUTE_INTERNAL` : Rewrite links to point back to the archive itself
|
||||
- `$DOWNLOAD_ON_DEMAND` : Download missing routes with monolith on demand
|
||||
- `$BLACKLIST_DOMAINS` : Blacklisted domains (Comma-seperated regex, example: `google.com,.*.youtube.com`)
|
||||
```
|
||||
web_archive/
|
||||
├─ domain.com/
|
||||
│ ├─ sub/
|
||||
│ │ ├─ path/
|
||||
│ │ │ ├─ index_YYYY_MM_DD.html
|
||||
├─ sub.domain.com/
|
||||
```
|
||||
|
||||
Every document of this web archive can then be found at `archive/domain/paths.../index_YYYY_MM_DD.html`.
|
||||
|
||||
## Usage
|
||||
webarc provides a CLI tool to work with the archive structure.
|
||||
|
||||
```sh
|
||||
# List domains in archive
|
||||
webarc [--dir ARCHIVE] archive list [-j, --json]
|
||||
|
||||
# List all paths on a domain
|
||||
webarc [--dir ARCHIVE] archive list [-j, --json] [DOMAIN]
|
||||
|
||||
# List all versions of a document
|
||||
webarc [--dir ARCHIVE] archive versions [-j, --json] [DOMAIN] [PATH]
|
||||
|
||||
# Get a document
|
||||
# `--md` will return a markdown version
|
||||
webarc [--dir ARCHIVE] archive get [--md] [DOMAIN] [PATH] [VERSION]
|
||||
|
||||
# Archive a website
|
||||
webarc [--dir ARCHIVE] archive download [URL]
|
||||
```
|
||||
|
||||
## Configuration
|
||||
You can configure the application using a config file. Look at the [config.toml](config.toml) file for more information.
|
||||
|
||||
## Web Server
|
||||
You can start a webserver serving an archive with `webarc serve`.
|
||||
|
||||
Archived pages can be viewed at `/s/<domain>/<path..>`.
|
||||
For example, `/s/en.wikipedia.org/wiki/Website` will serve `en.wikipedia.org` at `/wiki/Website`.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue