wget

简介

非交互式下载器

选项

Startup:

-V,  --version           display the version of Wget and exit. 查看版本信息
-h,  --help              print this help. 打印这个帮助文档
-b,  --background        go to background after startup. 启动后放到后台
-e,  --execute=COMMAND   execute a `.wgetrc'-style command. 执行wgetrc格式的命令

Logging and input file:

-o,  --output-file=FILE    log messages to FILE. 日志输出到文件
-a,  --append-output=FILE  append messages to FILE. 日志附加到文件
-d,  --debug               print lots of debugging information. 启动debug模式
-q,  --quiet               quiet (no output). 静默执行
-v,  --verbose             be verbose (this is the default). 输出长信息
-nv, --no-verbose          turn off verboseness, without being quiet. 关闭长信息,非静默执行
-i,  --input-file=FILE     download URLs found in FILE. 下载文件中包含的URL
-F,  --force-html          treat input file as HTML. 把输入文件当成HTML对待
-B,  --base=URL            prepends URL to relative links in -F -i file. 在输入文件的路径前加相对URL

Download:

-t,  --tries=NUMBER            set number of retries to NUMBER (0 unlimits). 尝试次数
     --retry-connrefused       retry even if connection is refused. 执着模式,即使被拒绝了也加班
-O,  --output-document=FILE    write documents to FILE. 将下载的文档写入文件
-nc, --no-clobber              skip downloads that would download to
                               existing files. 如果下载的文件已存在就跳过下载
-c,  --continue                resume getting a partially-downloaded file. 断点续传功能
     --progress=TYPE           select progress gauge type. 选择进度记录类型
-N,  --timestamping            don't re-retrieve files unless newer than
                               local. 除非文件比现有的新,否则不下载
-S,  --server-response         print server response. 打印服务器响应
     --spider                  don't download anything. 不下载任何东西,就是爬一下
-T,  --timeout=SECONDS         set all timeout values to SECONDS. 设置所有的超时时间
     --dns-timeout=SECS        set the DNS lookup timeout to SECS. 设置DNS的超时时间
     --connect-timeout=SECS    set the connect timeout to SECS. 设置连接的超时时间
     --read-timeout=SECS       set the read timeout to SECS. 设置读取的超时时间
-w,  --wait=SECONDS            wait SECONDS between retrievals. 两次获取之间的等待时间
     --waitretry=SECONDS       wait 1..SECONDS between retries of a retrieval.  在重新获取之间等待时间
     --random-wait             wait from 0...2*WAIT secs between retrievals. 获取之间等待时间,晕了~
     --no-proxy                explicitly turn off proxy. 关闭代理
-Q,  --quota=NUMBER            set retrieval quota to NUMBER. 设置获取配额
     --bind-address=ADDRESS    bind to ADDRESS (hostname or IP) on local host. 绑定本地特定的地址
     --limit-rate=RATE         limit download rate to RATE. 下载限速
     --no-dns-cache            disable caching DNS lookups. 关闭DNS解析
     --restrict-file-names=OS  restrict chars in file names to ones OS allows. 限制文件名中出现的字符
     --ignore-case             ignore case when matching files/directories. 匹配文件或目录时忽略大小写
-4,  --inet4-only              connect only to IPv4 addresses. 仅连接ipv4地址
-6,  --inet6-only              connect only to IPv6 addresses. 仅连接ipv6地址
     --prefer-family=FAMILY    connect first to addresses of specified family,
                               one of IPv6, IPv4, or none. 首先连接指定的地址类型
     --user=USER               set both ftp and http user to USER. 设置http或者ftp的用户名
     --password=PASS           set both ftp and http password to PASS. 设置http或者ftp的密码

Directories:

-nd, --no-directories           don't create directories. 不创建目录
-x,  --force-directories        force creation of directories. 强制创建
-nH, --no-host-directories      don't create host directories. 创建本机目录
     --protocol-directories     use protocol name in directories. 在目录名中用协议名
-P,  --directory-prefix=PREFIX  save files to PREFIX/... 将文件保存到特定目录下
     --cut-dirs=NUMBER          ignore NUMBER remote directory components. 没看懂~

HTTP options:

--http-user=USER        set http user to USER. 设置http用户
--http-password=PASS    set http password to PASS. 设置http密码
--no-cache              disallow server-cached data. 告诉server不要缓存的内容
-E,  --html-extension        save HTML documents with `.html' extension. 以html后缀保存文件
     --ignore-length         ignore `Content-Length' header field. 忽略内容长度报头header
     --header=STRING         insert STRING among the headers. 插入header字段
     --max-redirect          maximum redirections allowed per page. 每个网页最大跳转数
     --proxy-user=USER       set USER as proxy username. 设置代理用户
     --proxy-password=PASS   set PASS as proxy password. 设置代理密码
     --referer=URL           include `Referer: URL' header in HTTP request. 请求中包含referer这个header
     --save-headers          save the HTTP headers to file. 把header保存到文件中
-U,  --user-agent=AGENT      identify as AGENT instead of Wget/VERSION. 指定agent
     --no-http-keep-alive    disable HTTP keep-alive (persistent connections). 关闭keep-alive
     --no-cookies            don't use cookies. 不适用cookie
     --load-cookies=FILE     load cookies from FILE before session. 在会话之前从文件中载入cookie
     --save-cookies=FILE     save cookies to FILE after session. 将cookie保存到文件中
     --keep-session-cookies  load and save session (non-permanent) cookies. 保存并载入cookie
     --post-data=STRING      use the POST method; send STRING as the data. 用post方法时发送的字符
     --post-file=FILE        use the POST method; send contents of FILE. 用post方法,发送文件内容
     --content-disposition   honor the Content-Disposition header when
                             choosing local file names (EXPERIMENTAL). 没看懂~
     --auth-no-challenge     Send Basic HTTP authentication information
                             without first waiting for the server's
                             challenge. 不等待server端发起challenge请求直接发送基本的验证信息

HTTPS (SSL/TLS) options:

--secure-protocol=PR     choose secure protocol, one of auto, SSLv2,
                         SSLv3, and TLSv1.选择安全协议类型
--no-check-certificate   don't validate the server's certificate. 不验证服务器证书
--certificate=FILE       client certificate file. 客户端证书文件
--certificate-type=TYPE  client certificate type, PEM or DER. 客户端证书类型
--private-key=FILE       private key file. 私钥文件
--private-key-type=TYPE  private key type, PEM or DER. 私钥类型
--ca-certificate=FILE    file with the bundle of CA's. 没看懂~
--ca-directory=DIR       directory where hash list of CA's is stored. 没看懂~
--random-file=FILE       file with random data for seeding the SSL PRNG. 没看懂~
--egd-file=FILE          file naming the EGD socket with random data. 没看懂~

FTP options:

--ftp-user=USER         set ftp user to USER. 设置ftp用户
--ftp-password=PASS     set ftp password to PASS. 设置ftp密码
--no-remove-listing     don't remove `.listing' files. 不删除.listing文件
--no-glob               turn off FTP file name globbing. 关闭ftp文件名替换
--no-passive-ftp        disable the "passive" transfer mode. 关闭被动模式
--retr-symlinks         when recursing, get linked-to files (not dir). 没看懂~
--preserve-permissions  preserve remote file permissions. 保留服务器端文件权限

Recursive download:

-r,  --recursive          specify recursive download. 指定递归下载
-l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite). 最大递归深度
     --delete-after       delete files locally after downloading them. 下载完成后本地删除文件
-k,  --convert-links      make links in downloaded HTML point to local files. 没看懂~
-K,  --backup-converted   before converting file X, back up as X.orig. 转换前备份文件
-m,  --mirror             shortcut for -N -r -l inf --no-remove-listing. 镜像站点用的吧
-p,  --page-requisites    get all images, etc. needed to display HTML page. 下载所有网页上呈现的所有元素
     --strict-comments    turn on strict (SGML) handling of HTML comments. 开启严格处理HTML注释

Recursive accept/reject:

-A,  --accept=LIST               comma-separated list of accepted extensions. 逗号分隔的可以接受的拓展名
-R,  --reject=LIST               comma-separated list of rejected extensions. 逗号分隔的拒绝接受的扩展名
-D,  --domains=LIST              comma-separated list of accepted domains. 逗号分隔的可以接受的域名
     --exclude-domains=LIST      comma-separated list of rejected domains. 逗号分隔的决绝接受的域名
     --follow-ftp                follow FTP links from HTML documents. 接受从HTML到ftp的跳转
     --follow-tags=LIST          comma-separated list of followed HTML tags. 逗号分隔的可接受的跳转HTML标签
     --ignore-tags=LIST          comma-separated list of ignored HTML tags. 逗号分隔的拒绝接受的HTML标签
-H,  --span-hosts                go to foreign hosts when recursive. 当递归的时候可以到外面的host
-L,  --relative                  follow relative links only. 仅仅跳转相对链接
-I,  --include-directories=LIST  list of allowed directories. 允许的目录列表
-X,  --exclude-directories=LIST  list of excluded directories. 排除的目录列表
-np, --no-parent                 don't ascend to the parent directory. 不追溯父目录

示例

镜像站点:

wget -c -r -np -k -L -b --reject=gif http://mirrors.163.com/centos/6/os/x86_64/ -e robots=off