120 lines
12 KiB
Markdown
120 lines
12 KiB
Markdown
---
|
|
obj: application
|
|
repo: https://github.com/phiresky/ripgrep-all
|
|
rev: 2024-02-27
|
|
---
|
|
|
|
# ripgrep-all
|
|
rga is a line-oriented search tool that allows you to look for a [regex](../../tools/Regex.md) in a multitude of file types. rga wraps the awesome [ripgrep](ripgrep.md) and enables it to search in [pdf](../../files/PDF.md), docx, [sqlite](../../dev/programming/SQLite.md), jpg, movie subtitles ([mkv](../../files/media/Matroska.md), mp4), etc.
|
|
|
|
## USAGE:
|
|
|
|
> rga \[RGA OPTIONS\] \[RG OPTIONS\] PATTERN \[PATH \...\]
|
|
|
|
## FLAGS:
|
|
|
|
| Option | Description |
|
|
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| --rga-accurate | Use more accurate but slower matching by [mime](../../files/MIME.md) type<br>By default, rga will match files using file extensions. Some programs,<br>such as sqlite3, don't care about the file extension at all, so users<br>sometimes use any or no extension at all. With this flag, rga will try<br>to detect the [mime](../../files/MIME.md) type of input files using the magic bytes (similar<br>to the [`file`](system/file.md) utility), and use that to choose the adapter. Detection is<br>only done on the first 8KiB of the file, since we can't always seek on the input (in archives). |
|
|
| --rga-no-cache | Disable caching of results<br>By default, rga caches the extracted text, if it is small enough, to a<br>database in ${XDG_CACHE_DIR-~/.cache}/ripgrep-all on Linux,<br>~Library/Caches/ripgrep-all on macOS, or C:\\Users\\username\\AppData\\Local\\ripgrep-all on Windows. This way,<br>repeated searches on the same set of files will be much faster. If you<br>pass this flag, all caching will be disabled. |
|
|
| --rga-list-adapters | List all known adapters |
|
|
| --rga-print-config-schema | Print the [JSON Schema](../../tools/JSON%20Schema.md) of the configuration file |
|
|
| --rg-help | Show help for [ripgrep](ripgrep.md) itself |
|
|
| --rg-version | Show version of [ripgrep](ripgrep.md) itself |
|
|
| -V, --version | Prints version information |
|
|
|
|
## OPTIONS:
|
|
|
|
| Option | Description |
|
|
| ----------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| --rga-adapters=\<adapters\>... | Change which adapters to use and in which priority order (descending)<br>"foo,bar" means use only adapters foo and bar. "-bar,baz" means use all default adapters except for bar and baz. "+bar,baz" means use all default adapters and also bar and baz. |
|
|
| --rga-cache-compression-level=\<compression-level\> | [ZSTD compression](../../files/Zstd%20Compression.md) level to apply to adapter outputs before storing in<br>cache db<br>Ranges from 1 - 22 \[default: 12\] |
|
|
| --rga-config-file=\<config-file-path\> | |
|
|
| --rga-max-archive-recursion=\<max-archive-recursion\> | Maximum nestedness of archives to recurse into \[default: 5\] |
|
|
| --rga-cache-max-blob-len=\<max-blob-len\> | Max compressed size to cache<br>Longest byte length (after compression) to store in cache. Longer<br>adapter outputs will not be cached and recomputed every time.<br>Allowed suffixes on command line: k M G \[default: 2000000\] |
|
|
| --rga-cache-path=\<path\> | Path to store cache db \[default: /home/user/.cache/ripgrep-all\] |
|
|
| -h | Shows a concise overview |
|
|
| --help | Shows more detail and advanced options |
|
|
|
|
|
|
## Available Adapters
|
|
rga works with _adapters_ that adapt various [file formats](../../files/MIME.md). It comes with a few adapters integrated:
|
|
|
|
```
|
|
rga --rga-list-adapters
|
|
```
|
|
|
|
Adapters:
|
|
- **pandoc**
|
|
Uses pandoc to convert binary/unreadable text documents to plain [markdown](../../files/Markdown.md)-like text
|
|
Runs: pandoc --from= --to=plain --wrap=none --markdown-headings=atx
|
|
Extensions: .epub, .odt, .docx, .fb2, .ipynb, .html, .htm
|
|
|
|
- **poppler**
|
|
Uses pdftotext (from poppler-utils) to extract plain text from [PDF](../../files/PDF.md) files
|
|
Runs: pdftotext - -
|
|
Extensions: .pdf
|
|
Mime Types: application/pdf
|
|
|
|
- **postprocpagebreaks**
|
|
Adds the page number to each line for an input file that specifies page breaks as ascii page break character.
|
|
Mainly to be used internally by the poppler adapter.
|
|
Extensions: .asciipagebreaks
|
|
|
|
- **ffmpeg**
|
|
Uses [ffmpeg](../media/ffmpeg.md) to extract video metadata/chapters, subtitles, lyrics, and other metadata
|
|
Extensions: .mkv, .mp4, .avi, .mp3, .ogg, .flac, .webm
|
|
|
|
- **zip**
|
|
Reads a [zip](../../files/ZIP.md) file as a stream and recurses down into its contents
|
|
Extensions: .zip, .jar
|
|
Mime Types: application/zip
|
|
|
|
- **decompress**
|
|
Reads compressed file as a stream and runs a different extractor on the contents.
|
|
Extensions: .als, .bz2, .gz, .tbz, .tbz2, .tgz, .xz, .zst
|
|
Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
|
|
|
|
- **tar**
|
|
Reads a [tar](compression/tar.md) file as a stream and recurses down into its contents
|
|
Extensions: .tar
|
|
|
|
- **sqlite**
|
|
Uses [sqlite](../../dev/programming/SQLite.md) bindings to convert [sqlite](../../dev/programming/SQLite.md) databases into a simple plain text format
|
|
Extensions: .db, .db3, .sqlite, .sqlite3
|
|
Mime Types: application/x-sqlite3
|
|
|
|
The following adapters are disabled by default, and can be enabled using '--rga-adapters=+foo,bar':
|
|
|
|
- **mail**
|
|
Reads mailbox/mail files and runs extractors on the contents and attachments.
|
|
Extensions: .mbox, .mbx, .eml
|
|
Mime Types: application/mbox, message/rfc822
|
|
|
|
## Configuration
|
|
In addition to the command-line flags, you can configure rga via the config file. The config file is located in `~/.config/ripgrep-all/config.jsonc` or your OS-equivalent. The file is in the json-with-comments format.
|
|
|
|
### Custom adapters
|
|
Since version 1.0, you can specify custom adapters that invoke external preprocessing scripts in the config file.
|
|
|
|
See [Community Adapters](https://github.com/phiresky/ripgrep-all/discussions/categories/show-your-adapter) for a list of adapters that other people are using.
|
|
|
|
For example, the integrated PDF-to-text adapter would look like the following in the config file:
|
|
```json
|
|
"custom_adapters": [
|
|
{
|
|
"name": "poppler",
|
|
"version": 1,
|
|
"description": "Uses pdftotext (from poppler-utils) to extract plain text from PDF files",
|
|
|
|
"extensions": ["pdf"],
|
|
"mimetypes": ["application/pdf"],
|
|
|
|
"binary": "pdftotext",
|
|
"args": ["-", "-"],
|
|
"disabled_by_default": false,
|
|
"match_only_by_mime": false,
|
|
"output_path_hint": "${input_virtual_path}.txt.asciipagebreaks"
|
|
}
|
|
]
|
|
```
|