79 lines
24 KiB
Markdown
79 lines
24 KiB
Markdown
---
|
||
obj: application
|
||
arch-wiki: https://wiki.archlinux.org/title/Wget
|
||
wiki: https://en.wikipedia.org/wiki/Wget
|
||
repo: https://git.savannah.gnu.org/cgit/wget.git
|
||
---
|
||
# wget
|
||
GNU Wget is a free utility for non-interactive download of files from the Web. It supports [HTTP](../../../internet/HTTP.md), HTTPS, and [FTP](../../../internet/FTP.md) protocols, as well as retrieval through [HTTP](../../../internet/HTTP.md) proxies.
|
||
|
||
Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user's presence, which can be a great hindrance when transferring a lot of data.
|
||
|
||
Wget can follow links in [HTML](../../../internet/HTML.md), XHTML, and [CSS](../../../internet/CSS.md) pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard (/robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing.
|
||
|
||
Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
|
||
|
||
## Options
|
||
| Option | Description |
|
||
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `-b, --background` | Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log. |
|
||
| `-e, --execute command` | Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e. |
|
||
|
||
### Logging Options
|
||
| Option | Description |
|
||
| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `-o, --output-file=logfile` | Log all messages to logfile. The messages are normally reported to standard error. |
|
||
| `-a, --append-output=logfile` | Append to logfile. This is the same as -o, only it appends to logfile instead of overwriting the old log file. |
|
||
| `-q, --quiet` | Turn off Wget's output. |
|
||
| `-i, --input-file=file` | Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file literally named -.). If this function is used, no URLs need be present on the command line. If there are URLs both on the command line and in an input file, those on the command lines will be the first ones to be retrieved. If --force-html is not specified, then file should consist of a series of URLs, one per line. However, if you specify --force-html, the document will be regarded as [html](../../../internet/HTML.md). In that case you may have problems with relative links, which you can solve either by adding "\<base href="url">" to the documents or by specifying --base=url on the command line. If the file is an external one, the document will be automatically treated as [html](../../../internet/HTML.md) if the Content-Type matches text/html. Furthermore, the file's location will be implicitly used as base href if none was specified. |
|
||
| `-B, --base=URL` | Resolves relative links using URL as the point of reference, when reading links from an [HTML](../../../internet/HTML.md) file specified via the -i/--input-file option (together with --force-html, or when the input file was fetched remotely from a server describing it as [HTML](../../../internet/HTML.md)). This is equivalent to the presence of a "BASE" tag in the [HTML](../../../internet/HTML.md) input file, with URL as the value for the "href" attribute. |
|
||
|
||
### Download Options
|
||
| Option | Description |
|
||
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `-t, --tries=number` | Set number of tries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused" or "not found" (404), which are not retried. |
|
||
| `-O, --output-document=file` | The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If - is used as file, documents will be printed to standard output, disabling link conversion. (Use ./- to print to a file literally named -.) |
|
||
| `--backups=backups` | Before (over)writing a file, back up an existing file by adding a .1 suffix (\_1 on VMS) to the file name. Such backup files are rotated to .2, .3, and so on, up to `backups` (and lost beyond that) |
|
||
| `-c, --continue` | Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. |
|
||
| `--show-progress` | Force wget to display the progress bar in any verbosity. |
|
||
| `-T, --timeout=seconds` | Set the network timeout to `seconds` seconds. |
|
||
| `--limit-rate=amount` | Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix. For example, --limit-rate=20k will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don't want Wget to consume the entire available bandwidth. |
|
||
| `-w, --wait=seconds` | Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix. |
|
||
| `--waitretry=seconds` | If you don't want Wget to wait between every retrieval, but only between retries of failed downloads, you can use this option. Wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify. |
|
||
| `--random-wait` | Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis. |
|
||
| `--user=user, --password=password` | Specify the username and password for both [FTP](../../../internet/FTP.md) and [HTTP](../../../internet/HTTP.md) file retrieval. |
|
||
| `--ask-password` | Prompt for a password for each connection established. |
|
||
|
||
### Directory Options
|
||
| Option | Description |
|
||
| ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `-nH, --no-host-directories` | Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior. |
|
||
| `--cut-dirs=number` | Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved. |
|
||
| `-P, --directory-prefix=prefix` | Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory). |
|
||
|
||
### HTTP Options
|
||
| Option | Description |
|
||
| ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `--no-cookies` | Disable the use of cookies. |
|
||
| `--load-cookies file` | Load cookies from file before the first [HTTP](../../../internet/HTTP.md) retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file. |
|
||
| `--save-cookies file` | Save cookies to file before exiting. This will not save cookies that have expired or that have no expiry time (so-called "session cookies"), but also see --keep-session-cookies. |
|
||
| `--keep-session-cookies` | When specified, causes --save-cookies to also save session cookies. Session cookies are normally not saved because they are meant to be kept in memory and forgotten when you exit the browser. Saving them is useful on sites that require you to log in or to visit the home page before you can access some pages. With this option, multiple Wget runs are considered a single browser session as far as the site is concerned. |
|
||
| `--header=header-line` | Send header-line along with the rest of the headers in each [HTTP](../../../internet/HTTP.md) request. The supplied header is sent as-is, which means it must contain name and value separated by colon, and must not contain newlines. |
|
||
| `--proxy-user=user, --proxy-password=password` | Specify the username user and password password for authentication on a proxy server. Wget will encode them using the "basic" authentication scheme. |
|
||
| `--referer=url` | Include 'Referer: url' header in [HTTP](../../../internet/HTTP.md) request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them. |
|
||
| `-U, --user-agent=agent-string` | Identify as `agent-string` to the [HTTP](../../../internet/HTTP.md) server. |
|
||
|
||
### HTTPS Options
|
||
| Option | Description |
|
||
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||
| `--no-check-certificate` | Don't check the server certificate against the available certificate authorities. Also don't require the URL host name to match the common name presented by the certificate. |
|
||
| `--ca-certificate=file` | Use file as the file with the bundle of certificate authorities ("CA") to verify the peers. The certificates must be in PEM format. |
|
||
| `--ca-directory=directory` | Specifies directory containing CA certificates in PEM format. |
|
||
|
||
### Recursive Retrieval Options
|
||
| Option | Description |
|
||
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||
| `-r, --recursive` | Turn on recursive retrieving. The default maximum depth is 5. |
|
||
| `-l, --level=depth` | Set the maximum number of subdirectories that Wget will recurse into to depth. |
|
||
| `-k, --convert-links` | After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-[HTML](../../../internet/HTML.md) content, etc. |
|
||
| `-p, --page-requisites` | This option causes Wget to download all the files that are necessary to properly display a given [HTML](../../../internet/HTML.md) page. This includes such things as inlined images, sounds, and referenced stylesheets. |
|