Download Files from a List

To download a bunch of files, create a plain text document listing the URL to each file line-by-line and call it, for instance, list.txt. Now save list.txt somewhere in a directory, open a shell and type:

wget --continue --tries=inf --input-file=list.txt

where:

  • –continue (or for short -c) will resume the download in case the download failed,
  • –tries=inf (or for short -t inf) will infinitely retry to download a file - this helps with spurious disconnects.
  • –input-file=list.txt (or -i list.txt for short) specifies that the URLs should be read from the file list.txt.

wget will then download all the files to the directory where you issued the command.

Avoiding "Filename too long" Issues

When wget retrieves files from a long URL, it counts the URL components as being part of the file name. Pass -np as parameter to wget that will make wget ignore the parent directory.

Mirror a Website Locally and Download all Resources

The following command:

wget --page-requisites --convert-links --span-hosts --no-directories https://SITE.TLD//

where:

  • site.tld is a website address

will download the entire website to the current folder but will convert all the links and also download any referenced resources by the website such that the website will be browseable offline without an Internet connection.

Note that URLs within JavaScript code or within contexts that do not represent an HTML link will remain unchanged.

Download All Files Recursively from an URL Path

One of the issues with wget is that it is programmed to create a directory path whenever an URL is specified. For example, consider the following command:

wget \
  ---ftp-user=ftp \
  ---ftp-password=ftp \
  --r \
   "ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/*"

that, looking at the call, is supposed to download all the files from ftp2.grandis.nu that are to be found within the remote path at /Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/ to the local directory.

However, for some reason, wget considers that the full hostname and directory structure has to be created, such that the current directory will have a ftp2.grandis.nu folder, and within that folder, a Retroplay WHDLoad Packs and within that one, another folder Commodore_Amiga_-_WHDLoad_-_Games that will then contain the files as instructed.

This is a design flaw that violates the principle of least surprise and introduces a behavior that is particular just to wget because an URL locator along with the protocol such as ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/* is a non-ambiguous path that makes it clear where the files to be downloaded are located.

Unfortunately, due to this design flaw, messing around with parameters seems to be the only way to get this accomplished and apparently the subpaths must even be counted. Here is the solution:

wget \
  --no-host-directories \
  --cut-dirs=2 \
  ---ftp-user=ftp \
  ---ftp-password=ftp \
  --r \
   "ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/*"

where:

  • –no-host-directories is specifically a command instructing wget to not create a folder named ftp2.grandis.nu and,
  • –cut-dirs=2 will tell wget to not create the folders Retroplay WHDLoad Packs and Commodore_Amiga_-_WHDLoad_-_Games

There is one option –no-directories but it just annihilates any folder structure entirely such that iff. the path ftp://ftp2.grandis.nu/Retroplay WHDLoad Packs/Commodore_Amiga_-_WHDLoad_-_Games/ has any descendants then those directories will be eliminated as well with wget ending up putting every single file into the current directory.

wget is an incredibly counter-intuitive command.


fuss/wget.txt · Last modified: 2025/06/06 03:10 by office

Wizardry and Steamworks

© 2025 Wizardry and Steamworks

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.