Skip to Content

Generating download lists

Ever wanted to download a huge number of files that have enumerated filenames like foo1.png foo2.png foo3.png etc?

Well, Alaa wrote two nice little tools to help you do so. That was back in 2002.


Compiling

First to compile them

g++ -o series series.cc
g++ -o nseries nseries.cc


Series

Introduction

series simply takes some string like foo then generates a list like

 foo01
 foo02
 foo03
 .....
 .....
 foo13

then adds a string to each item in the list like .png so the end product would be

 foo01.png
 foo02.png
 foo03.png
 .........
 foo13.png

To generate the previous list you type:

./series 1 13 2 '.png' 'foo'
  • 1 is the start of the list
  • 13 is the end of the list
  • 2 is the pad (minimuim number of digits) (If the pad is 3 then the number 1 is generated as 001 while the number 23 is generated as 023)
  • '.png' is the postfix (The string after the numbers)
  • 'foo' is the prefix (The string before the numbers)

To sum things up, the syntax for series is:

./series start end pad postfix prefix

No prefix, or no postfix

You can have blank prefix or postfix by putting '' in their places
note it's 2 of (') and it's not (")
For example, to generate a list from 000 to 999 you just type

./series 0 999 3 '' ''

Multiple prefixes

series can generate one list for many prefixes in one command the syntax would become:

./series start end pad postfix prefix1 prefix2 prefix3 etc

For example, to generate the following list

foo1
foo2
foo3
foo4
fubar1
fubar2
fubar3
fubar4

You type:

./series 1 4 1 '' 'foo' 'fubar'

Nested lists

series has at least 4 arguments. The start, end, pad and postfix.
The fifth argument is the prefix(es), it is either:

  1. Expilicitly specified.
  2. Left empty. In this case it will ask/wait for you to input a prefix, then it'll generate a list for this prefix and then ask/wait for another one and so on until you press Ctrl-D.
  3. Pipelined. This can be used to redirect the program's output as its input and so accounts for the ability of this tool to generate complex nested lists.

Suppose you want to generate the following list

601
602
603
604
701
702
703
704
801
802
803
804

This is done by typing

./series 6 8 1 '' ''| ./series 1 4 2 ''


Nseries

nseries adds an incrementing number to each input, the inputs are either piped in or entered one by one.

nseries' syntax is:

./nseries start pad postfix

so if you want to generate the following list

a01.png
b02.png
c03.png
d04.png
.
.

you type

./nseries 1 2 '.png'

and then press Enter, nseries will ask for the first input, you type 'a' then press Enter, then 'b', 'c', 'd' and so on.


or you can have a file named "alphabets" containing the following

a
b
c
d
.
.

and you type

cat alphabets | nseries 1 2 '.png'

Note how different nseries is from series.
If series is given multiple input lines it generates a list from "start" to "end" for each input line.
If nseries is given multiple input lines it increments on "start" on each input line

Garfield for all of us

Now as a practical example, we use series and nseries to download all garfield comic strips since 1978. The first strip can be found here:

Notice that the format is foo/year1/year2_month_day.gif

where year2 is the last two digits in year1

cat years | ./series 1 12 2 '' | ./series 1 31 2 '.gif' >> all_garfield

Now all_garfield has the urls of all garfield comic strips. Remove the dates before 19/6/1978

To download with wget

wget -i all_garfield

to pause press Ctrl-C
to resume type:

wget -nc -i all_garfield


Recitation of the whole Quran.

Now, to download Yasser Salama's recitation of the Quran from Islamway.com

Now yasser_list has the urls of all the Suras.

To download with wget

wget -i yasser_list

to pause press Ctrl-C
to resume type:

wget -nc -i yasser_list

Just note that when killing wget with Ctrl-C and then resuming the list, the partially downloaded file (the one that you pressed Ctrl-C while being downloaded) will be assumed to be fully downloaded and wget will start downloading the next file on the list.

Comments

Alaa's picture

nobody judge the code

this was a very quick hack to solve a problem while demonstrating the power of software as tools concept (pipes and redirects as part of the design), the code is naive.

I've used these two nifty tools thousands of times, but of course no tool was really needed, gnu seq combined with other tools can do the same, languages like awk are perfect for this, etc.

to quote from my original email, here is a cool thing you can do with series and nseries so lets say I want to to generat all three digit octal numbers and their decimal ../series 0 7 1 | ./series 0 7 1 '' | ./series 0 7 1 '=' | ./nseries 0 1 '' cool eh Alaa


"context is over-rated. who are you anyway?"

Pastebin deleted

Seems that pastebin deleted the source code of both tools that were posted there, it's probably autodeleted after several hours or so.

As noted, I couldn't put the code here. I've tried the pre and code tags and it didn't work.

Alaa's picture

reinstalled code filter

please leave the book page even if the result is b0rked so we can debug. did I say we need an upgrade? Alaa
"context is over-rated. who are you anyway?"

Still borked

here are they:

What's "now"?

Alaa's picture

fixed using temporary hack

the 4.6 wiki module doesn't have this bug, the real solution is to upgrade eglug to 4.6

we have to do that soon anyways because once drupal 4.7 gets released 4.5 will not be supported anymore.

Alaa


"context is over-rated. who are you anyway?"

whirlpool's picture

OMG!!!

man curl


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


Dr. Radut | book