knowledge/technology/applications/cli/htmlq.md
2024-04-26 08:11:28 +02:00

2.4 KiB

obj repo rev
application https://github.com/mgdm/htmlq 2024-04-25

htmlq

Like jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.

Usage

Usage: htmlq [FLAGS] [OPTIONS] [--] [selector]...

Options

Option Description
-B, --detect-base Try to detect the base URL from the <base> tag in the document. If not found, default to the value of --base, if supplied
-w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace
-p, --pretty Pretty-print the serialised output
-t, --text Output only the contents of text nodes inside selected elements
-a, --attribute <attribute> Only return this attribute (if present) from selected elements
-b, --base <base> Use this URL as the base for links
-f, --filename <FILE> The input file. Defaults to stdin
-o, --output <FILE> The output file. Defaults to stdout
-r, --remove-nodes <SELECTOR>... Remove nodes matching this expression before output. May be specified multiple times