Saturday, 8 March 2008

Too slow and freezing when downloading some pages

SocSciBot 4 is buggy at the moment, sorry. It is too slow but this is corrected in the newest version that should be online in the week of 10 March 2008. It also freezes when downloading some pages - this problem hasn't been solved yet but a post will go here when it has. I think this only affects 1 in 200 web sites, but am not sure yet

5 comments:

Mike Thelwall said...

The new SocSciBot version that is online now fixes the freezing problem - it was caused by attempting to download non-HTML pages as HTML pages.

Martin said...

Dear Mr Thelwall,

after testing the SSb4 on many different HTML pages I now wanted to check out your myspace functions with a friend. But unfortuanetly we don`t realy get it started. Where do we start right? Crawling a myspace profile, oder many after another? What ist the super-advanced function for? We don`t have the needed files in the default folder.
And the most importandt question for us is, will SSB4 analyse all myspace.profiles for its own or will it also give the link structure between the profiles while a link stands for a relationship to a friend (friendlist entries)?
Thanks for your help!!!
Sincerely yours Martin and jörg

Mike Thelwall said...

Dear Martin,

The MySpace feature is just for my use really. It does not do link analysis - if you want this then MySpace/Perl can do it. Please only use SocSciBot 4 for MySpace if it is part of a MySpace research project as it may use up a lot of MySpace server time.
If you want to try SocSciBot out then:
1) generate your own list of URLs of MySpace profile pages
2) create a new project, set it up to crawl "www.xyzxyz.com" (but don't click on the crawl button
3) Use the list from one as the start.txt file of additional URLs to crawl, check the option to use an additional list of URLs to crawl and click the start crawl button.
4) when the crawl has finished, set up a second crawl in the same project for the site http://www.yyyyy.com/ and again don't click on the crawl site button.
5) Access the MySpace analysis tools from the GoTo menu in SocSciBot4. There are lots of options and they all work by processing all the profiles saved in 3).
Best wishes,
Mike

Martin said...

Dear Mr. Thelwall,

is it or will it be possible in the future to set up a list of URLs to crawl but not like you describe it in the FAQs but the way that the crawler automaticly adds the URLs from the list as new crawls in a project not. So that you could say crawl e.g. 5 URLs from my list in a project and Socscibot would start with the first just like I did it personaly and when its done it continues with the nex URL. So as result I would get a project with 5 individually crawled URLs in it. This would be great.
Sincerely yours

Martin Klaus

Mike Thelwall said...

Dear Martin,

I have just posted a new version of SocSciBot 4 that should do what you want. It has a new "multiple crawl" mode and using this mode you can give it a set of home pages and then it will run crawls of all the sites simultaneously. In order to minimise server load, it does not crawl the sites one after the other but swaps between sites during the crawl. I hope this helps.

Best wishes,
Mike