knowledge/technology/applications/cli/ripgrep-all.md

120 lines
12 KiB
Markdown
Raw Normal View History

2024-02-27 10:06:44 +00:00
---
obj: application
repo: https://github.com/phiresky/ripgrep-all
---
# ripgrep-all
rga is a line-oriented search tool that allows you to look for a [regex](../../tools/Regex.md) in a multitude of file types. rga wraps the awesome [ripgrep](ripgrep.md) and enables it to search in [pdf](../../files/PDF.md), docx, [sqlite](../../dev/programming/SQLite.md), jpg, movie subtitles ([mkv](../../files/media/Matroska.md), mp4), etc.
## USAGE:
> rga \[RGA OPTIONS\] \[RG OPTIONS\] PATTERN \[PATH \...\]
## FLAGS:
| Option | Description |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --rga-accurate | Use more accurate but slower matching by [mime](../../files/MIME.md) type<br>By default, rga will match files using file extensions. Some programs,<br>such as sqlite3, don't care about the file extension at all, so users<br>sometimes use any or no extension at all. With this flag, rga will try<br>to detect the [mime](../../files/MIME.md) type of input files using the magic bytes (similar<br>to the [`file`](system/file.md) utility), and use that to choose the adapter. Detection is<br>only done on the first 8KiB of the file, since we can't always seek on the input (in archives). |
| --rga-no-cache | Disable caching of results<br>By default, rga caches the extracted text, if it is small enough, to a<br>database in ${XDG_CACHE_DIR-~/.cache}/ripgrep-all on Linux,<br>~Library/Caches/ripgrep-all on macOS, or C:\\Users\\username\\AppData\\Local\\ripgrep-all on Windows. This way,<br>repeated searches on the same set of files will be much faster. If you<br>pass this flag, all caching will be disabled. |
| --rga-list-adapters | List all known adapters |
| --rga-print-config-schema | Print the [JSON Schema](../../tools/JSON%20Schema.md) of the configuration file |
| --rg-help | Show help for [ripgrep](ripgrep.md) itself |
| --rg-version | Show version of [ripgrep](ripgrep.md) itself |
| -V, --version | Prints version information |
## OPTIONS:
| Option | Description |
| ----------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --rga-adapters=\<adapters\>... | Change which adapters to use and in which priority order (descending)<br>"foo,bar" means use only adapters foo and bar. "-bar,baz" means use all default adapters except for bar and baz. "+bar,baz" means use all default adapters and also bar and baz. |
| --rga-cache-compression-level=\<compression-level\> | ZSTD compression level to apply to adapter outputs before storing in<br>cache db<br>Ranges from 1 - 22 \[default: 12\] |
| --rga-config-file=\<config-file-path\> | |
| --rga-max-archive-recursion=\<max-archive-recursion\> | Maximum nestedness of archives to recurse into \[default: 5\] |
| --rga-cache-max-blob-len=\<max-blob-len\> | Max compressed size to cache<br>Longest byte length (after compression) to store in cache. Longer<br>adapter outputs will not be cached and recomputed every time.<br>Allowed suffixes on command line: k M G \[default: 2000000\] |
| --rga-cache-path=\<path\> | Path to store cache db \[default: /home/user/.cache/ripgrep-all\] |
| -h | Shows a concise overview |
| --help | Shows more detail and advanced options |
## Available Adapters
rga works with _adapters_ that adapt various [file formats](../../files/File%20Formats.md). It comes with a few adapters integrated:
```
rga --rga-list-adapters
```
Adapters:
- **pandoc**
Uses pandoc to convert binary/unreadable text documents to plain [markdown](../../files/Markdown.md)-like text
Runs: pandoc --from= --to=plain --wrap=none --markdown-headings=atx
Extensions: .epub, .odt, .docx, .fb2, .ipynb, .html, .htm
- **poppler**
Uses pdftotext (from poppler-utils) to extract plain text from [PDF](../../files/PDF.md) files
Runs: pdftotext - -
Extensions: .pdf
Mime Types: application/pdf
- **postprocpagebreaks**
Adds the page number to each line for an input file that specifies page breaks as ascii page break character.
Mainly to be used internally by the poppler adapter.
Extensions: .asciipagebreaks
- **ffmpeg**
Uses [ffmpeg](../media/ffmpeg.md) to extract video metadata/chapters, subtitles, lyrics, and other metadata
Extensions: .mkv, .mp4, .avi, .mp3, .ogg, .flac, .webm
- **zip**
Reads a [zip](../../files/ZIP.md) file as a stream and recurses down into its contents
Extensions: .zip, .jar
Mime Types: application/zip
- **decompress**
Reads compressed file as a stream and runs a different extractor on the contents.
Extensions: .als, .bz2, .gz, .tbz, .tbz2, .tgz, .xz, .zst
Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
- **tar**
Reads a [tar](compression/tar.md) file as a stream and recurses down into its contents
Extensions: .tar
- **sqlite**
Uses [sqlite](../../dev/programming/SQLite.md) bindings to convert [sqlite](../../dev/programming/SQLite.md) databases into a simple plain text format
Extensions: .db, .db3, .sqlite, .sqlite3
Mime Types: application/x-sqlite3
The following adapters are disabled by default, and can be enabled using '--rga-adapters=+foo,bar':
- **mail**
Reads mailbox/mail files and runs extractors on the contents and attachments.
Extensions: .mbox, .mbx, .eml
Mime Types: application/mbox, message/rfc822
## Configuration
In addition to the command-line flags, you can configure rga via the config file. The config file is located in `~/.config/ripgrep-all/config.jsonc` or your OS-equivalent. The file is in the json-with-comments format.
### Custom adapters
Since version 1.0, you can specify custom adapters that invoke external preprocessing scripts in the config file.
See [Community Adapters](https://github.com/phiresky/ripgrep-all/discussions/categories/show-your-adapter) for a list of adapters that other people are using.
For example, the integrated PDF-to-text adapter would look like the following in the config file:
```json
"custom_adapters": [
{
"name": "poppler",
"version": 1,
"description": "Uses pdftotext (from poppler-utils) to extract plain text from PDF files",
"extensions": ["pdf"],
"mimetypes": ["application/pdf"],
"binary": "pdftotext",
"args": ["-", "-"],
"disabled_by_default": false,
"match_only_by_mime": false,
"output_path_hint": "${input_virtual_path}.txt.asciipagebreaks"
}
]
```