F.A.Q. (Frequently Asked Questions)

Why do you crawl the .dk domain.?
In order to obtain statistics on what webservers are used in Denmark.
Why does that interest you?
Well. It just does.
Why don't you just look at Netcrafts survey?
Their data isn't specific to Denmark (but I do study their data with interest).
Why don't you look a E-soft's survey then?
Their sample of the danish domains is quite small, less than 10%. My data does seem to agree with theirs, though.
How often do you crawl .dk?
Once a month, at the start of the month.
Why not more often?
The crawl does take up a fair amount of bandwidth and memory, and the machine it runs on has other duties as well.
How long does it take, then?
Approximately 11 hours, with 20 crawlers running in parallel.
How many times does the crawler visit a server?
Well, that depends... Each crawler only visits a given ip-adress once, unless the server is running MicroSofts IIS.
Why do you visit servers running IIS more often?
Because IIS doesn't tell the crawler what modules it uses, in it's server header. So, to check for PHP, you have to check every domain.
What pages does the crawler fetch from a server?
Only the front page, and the crawler only asks the webserver how big the the page is, and when it was last modified.
How much bandwidth does the crawl consume?
Something like 6-700 MB, I think.
What software do you use?
A tool I've written myself (.dk-bot), written in perl. The sourcecode is available here.

Last updated: 2002-08-11 02:57
Last updated: 2002-08-11 02:57