webarc v2 #1

Merged
jmarya merged 33 commits from v2 into main 2025-11-17 18:09:59 +00:00
Owner

WebArc v2

Complete Architectural Overhaul

WebArc no longer stores pre-rendered Monolith HTML files.
Instead, v2 introduces a blob-store–backed HTTP capture system, storing the actual request/response data.

This enables:

  • Full fidelity archival of HTTP content
  • Support for non-HTML assets and arbitrary file types
  • Future compatibility with tools beyond browsers
  • Deduplication and efficient binary storage

The archive format has been redesigned around normalized HTTP blobs and metadata allowing precise, reproducible replay.

Transparent HTTP Proxy

WebArc v2 adds a built-in transparent HTTP proxy that allows:

  • Browsing the archive directly in a normal web browser
  • Seamless time-based replay of archived HTTP traffic
  • Automatic archival of any visited resource (“browse to archive”)
  • Mirroring of non-HTML repositories such as:
    • Nix caches
    • Tarballs
    • Source code archives
    • Package repositories
    • API responses

Web UI Overhaul

The Web UI should be rewritten to support the new underlying archive model:

  • Browsing requests & responses
  • Viewing headers, metadata, and body variants
  • Timeline navigation for captured resources
  • More intuitive domain/path exploration
  • UI with structured HTTP data rather than monolithic documents

SQLite Migration

WebArc v2 should use SQLite as the default and only database backend.

Benefits:

  • Zero external dependencies
  • Easier deployment
  • Portable, file-based archives
  • Excellent performance for the new blob-store model
# WebArc v2 ## Complete Architectural Overhaul WebArc no longer stores pre-rendered Monolith HTML files. Instead, v2 introduces a **blob-store–backed HTTP capture system**, storing the actual request/response data. This enables: * Full fidelity archival of HTTP content * Support for non-HTML assets and arbitrary file types * Future compatibility with tools beyond browsers * Deduplication and efficient binary storage The archive format has been redesigned around normalized HTTP blobs and metadata allowing precise, reproducible replay. ## Transparent HTTP Proxy WebArc v2 adds a built-in **transparent HTTP proxy** that allows: * Browsing the archive directly in a normal web browser * Seamless time-based replay of archived HTTP traffic * Automatic archival of any visited resource (“browse to archive”) * Mirroring of non-HTML repositories such as: * Nix caches * Tarballs * Source code archives * Package repositories * API responses ## Web UI Overhaul The Web UI should be rewritten to support the new underlying archive model: * Browsing requests & responses * Viewing headers, metadata, and body variants * Timeline navigation for captured resources * More intuitive domain/path exploration * UI with structured HTTP data rather than monolithic documents ## SQLite Migration WebArc v2 should use **SQLite** as the default and only database backend. Benefits: * Zero external dependencies * Easier deployment * Portable, file-based archives * Excellent performance for the new blob-store model
- Removed AI-related code, including embeddings and vector search.
- Eliminated static Postgres connection pool and its helper macro.
- Deleted fragment handling modules and related routes.
- Transfered Document Indexing to SQLite
Author
Owner

Pending TODOs

Abstract

  • Rework UI for new architecture
  • Working UI
  • Check Deployment files (Dockerfile, flake.nix, etc) for required changes
  • Update or add documentation, readme
  • Rework CLI for new architecture
  • Implement changes to config for new architecture

Features

  • webarc import command for importing arbitrary data into request archive
  • webarc migrate to migrate previous monolith dir structures
# Pending TODOs ## Abstract - [x] Rework UI for new architecture - [x] Working UI - [x] Check Deployment files (Dockerfile, flake.nix, etc) for required changes - [x] Update or add documentation, readme - [x] Rework CLI for new architecture - [x] Implement changes to config for new architecture ## Features - [x] `webarc import` command for importing arbitrary data into request archive - [x] `webarc migrate` to migrate previous monolith dir structures
- Implement enable_fetch boolean in Config
- Implemented alter_config to mutate the global config at early runtime.
- Updated WebsiteArchive::archive_url to respect enable_fetch and return a new FetchError::Disabled when disabled.
- Updated main.rs to enable fetching during CLI downloads via alter_config.
- Refactored proxy code
jmarya changed title from WIP: webarc v2 to webarc v2 2025-11-17 18:09:46 +00:00
jmarya merged commit fac2844568 into main 2025-11-17 18:09:59 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jmarya/webarc!1
No description provided.