How can I set wget to download websites from a list, fixing a maximum byte
quota for every website?
I am trying to download a sampling of about 600kb of content from 500
websites that I have in a list.
I am using wget with the option -i to open the different urls, and with
the options -r -l2 to recursively download part of the websites.
Unfortunately, I cannot find a way to fix a maximum quota of kilobytes for
every website listed in the input file. I tried with -Q, but in this way
the quota regards the entire list, not the single listed urls.
Do you have some suggestions on how can I download documents and materials
for every site in the list, stopping the recursive retrieval and passing
to the following website when I have downloaded 600Kb from that website?
Just for knowledge, this is the command that I created:
wget -i listaq.txt -r -l 2 -R
pdf,jpeg,gif,css,txt,rft,doc,ppt,ps,dwf,klm,kmz,xls,js,mpeg,ico,png,svg,jpg
--cut-dirs=1 --cut-dirs=2 --cut-dirs=3 --cut-dirs=4 --cut-dirs=5
--cut-dirs=6 --cut-dirs=7 -Q 600K
The long list of -R pdf...jpg, and the series of --cut-dirs respectively
exclude the filetype that are not interesting for me, and stop the
creation of a directory tree. Everything is working, except for -Q 600K
No comments:
Post a Comment