📜 Website Archive
Find a file
JMARyA 38287e77e7
Some checks failed
ci/woodpecker/push/build Pipeline failed
increase chunk size
2024-12-31 02:30:21 +01:00
.woodpecker init 2024-12-29 16:51:34 +01:00
migrations update chunked embed 2024-12-31 02:03:03 +01:00
src increase chunk size 2024-12-31 02:30:21 +01:00
.dockerignore remove db 2024-12-29 19:35:56 +01:00
.gitignore remove db 2024-12-29 19:35:56 +01:00
Cargo.lock fix 2024-12-30 21:51:00 +01:00
Cargo.toml add vector search 2024-12-30 21:25:40 +01:00
docker-compose.yml add vector db 2024-12-30 14:06:32 +01:00
Dockerfile fix 2024-12-30 22:06:15 +01:00
env update chunked embed 2024-12-31 02:03:03 +01:00
README.md docs 2024-12-29 23:39:50 +01:00

WebArc

webarc is a local website archive based on monolith.

Configuration

You can configure the application using environment variables:

  • $ROUTE_INTERNAL : Rewrite links to point back to the archive itself
  • $DOWNLOAD_ON_DEMAND : Download missing routes with monolith on demand
  • $BLACKLIST_DOMAINS : Blacklisted domains (Comma-seperated regex, example: google.com,.*.youtube.com)

Usage

Archived pages can be viewed at /s/<domain>/<path..>.
For example, /s/en.wikipedia.org/wiki/Website will serve en.wikipedia.org at /wiki/Website.

To select an archive from a certain time, add ?time=YYYY-MM-DD parameter to the URL.