To download a bunch of files, create a plain text document listing the URL to each file line-by-line and call it, for instance, list.txt
. Now save list.txt
somewhere in a directory, open a shell and type:
wget --continue --tries=inf --input-file=list.txt
where:
–continue
(or for short -c
) will resume the download in case the download failed,–tries=inf
(or for short -t inf
) will infinitely retry to download a file - this helps with spurious disconnects.–input-file=list.txt
(or -i list.txt
for short) specifies that the URLs should be read from the file list.txt
.wget will then download all the files to the directory where you issued the command.
When wget
retrieves files from a long URL, it counts the URL components as being part of the file name. Pass -np
as parameter to wget
that will make wget
ignore the parent directory.
The following command:
wget --page-requisites --convert-links --span-hosts --no-directories https://SITE.TLD//
where:
site.tld
is a website addresswill download the entire website to the current folder but will convert all the links and also download any referenced resources by the website such that the website will be browseable offline without an Internet connection.
Note that URLs within JavaScript code or within contexts that do not represent an HTML link will remain unchanged.
One of the issues with wget
is that it is programmed to create a directory path whenever an URL is specified. For example, consider the following command:
wget \ ---ftp-user=ftp \ ---ftp-password=ftp \ --r \ "ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/*"
that, looking at the call, is supposed to download all the files from ftp2.grandis.nu
that are to be found within the remote path at /Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/
to the local directory.
However, for some reason, wget considers that the full hostname and directory structure has to be created, such that the current directory will have a ftp2.grandis.nu
folder, and within that folder, a Retroplay WHDLoad Packs
and within that one, another folder Commodore_Amiga_-_WHDLoad_-_Games
that will then contain the files as instructed.
This is a design flaw that violates the principle of least surprise and introduces a behavior that is particular just to wget because an URL locator along with the protocol such as ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/*
is a non-ambiguous path that makes it clear where the files to be downloaded are located.
Unfortunately, due to this design flaw, messing around with parameters seems to be the only way to get this accomplished and apparently the subpaths must even be counted. Here is the solution:
wget \ --no-host-directories \ --cut-dirs=2 \ ---ftp-user=ftp \ ---ftp-password=ftp \ --r \ "ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/*"
where:
–no-host-directories
is specifically a command instructing wget to not create a folder named ftp2.grandis.nu
and,–cut-dirs=2
will tell wget to not create the folders Retroplay WHDLoad Packs
and Commodore_Amiga_-_WHDLoad_-_Games
There is one option –no-directories
but it just annihilates any folder structure entirely such that iff. the path ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/
has any descendants then those directories will be eliminated as well with wget ending up putting every single file into the current directory.
wget is an incredibly counter-intuitive command.