2 wget snippets

Backup or mirror a website using wget

To create a local mirror or backup of a website with wget, run:

wget -r -l 5 -k -w 1 --random-wait <URL>

Where:

  • -r (or --recursive) will cause wget to recursively download files
  • -l N (or --level=N) will limit recursion to at most N levels below the root document (defaults to 5, use inf for infinite recursion)
  • -k (or --convert-links) will cause wget to convert links in the downloaded documents so that the files can be viewed locally
  • -w (or --wait=N) will cause wget to wait N seconds between requests
  • --random-wait will cause wget to randomly vary the wait time to 0.5x to 1.5x the value specified by --wait

Some additional notes:

  • --mirror (or -m) can be used as a shortcut for -r -N -l inf --no-remove-listing which enables infinite recursion and preserves both the server timestamps and FTP directory listings.
  • -np (--no-parent) can be used to limit wget to files below a specific "directory" (path).
Published 10 Feb 2014

 

Pre-generate pages or load a web cache using wget

Many web frameworks and template engines will defer the generation the HTML version of a document the first time it is accessed. This can make the first hit on a given page significantly slower than subsequent hits.

You can use wget to pre-cache web pages using a command such as:

wget -r -l 3 -nd --delete-after <URL>

Where:

  • -r (or --recursive) will cause wget to recursively download files
  • -l N (or --level=N) will limit recursion to at most N levels below the root document (defaults to 5, use inf for infinite recursion)
  • -nd (or --no-directories) will prevent wget from creating local directories to match the server-side paths
  • --delete-after will cause wget to delete each file as soon as it is downloaded (so the command leaves no traces behind.)
Published 10 Feb 2014

 

This page was generated at 4:16 PM on 26 Feb 2018.
Copyright © 1999 - 2018 Rodney Waldhoff.