Tuesday, 2 September 2008
There is a new version of SocSciBot 4 online (on the SocSciBot web site) that has a new "multiple crawl" mode. Using this mode, you can give it a set of home pages and then it will run crawls of all the sites simultaneously. This should make it easier to run projects involving many crawls of small web sites. The maximum number of URLs in total for all crawls is 900,000, with 15,000 per individual site (approximately) so it probably would not be a good idea to crawl more than 40 sites unless they are all small. Also it may take a long time to crawl many sites!