Blog for users of SocSciBot4 to post reports of problems with using SocSciBot4 and software bugs. Please log issues by adding a comment to the most recent post on this blog.
Hi, would like to know how socscibot works. is it possible to provide the data model of Socscibot? i have used it to conduct an analysis of french univerities websites. However, my tutor will not allow me to present my thesis unless I explain how Socscibot works (any parser used, formulars, UML models..etc). appreciate your feedback. many Thanks
Hi Naouel,This is a tricky one. I don't have any documentation other than what is on the web site. The only paper that describes much of SocSciBot is this one - but it describes an old version. I can send you it if you send me a direct email? m.thewall at wlv.ac.uk. Thelwall, M. (2001). A web crawler design for data mining, Journal of Information Science 27(5), 319-325.If this isn't enough and you really need more then you could have the dot net source code, just for the purpose of analysing it. Best wishes,Mike
Dear Mr. Thelwall,how do I cite your SocSciBot 4 programm correctly? How would you like users to cite you it they use your crawler?Thankys in advanceL
Please could you cite the following paper? Although it is about a different version of SocSciBot I think that it is the most suitable publication about it.Best wishes,Mike
Whoops - here is the paper to cite.Thelwall, M. (2001). A web crawler design for data mining, Journal of Information Science 27(5), 319-325.
Thank you for your fast answer!L
Dear MR Thelwall,First I would like to wish you a happy new year and hope you are doing fine. Than I was thinking if it would be possible to just crawl the linking structure between a list of URLs and not also crawling the pages with all it`s content as well. So the crawling would run much faster for long list if it doesn`t need to save the web content.
Happy new year to you too and I hope you are well.Sorry but it isn't possible to crawl just the links and not the pages. This is because the links are inside the web pages so the only way to find them is to download the web pages and then search through the pages to find the links in them. This is how SocSciBot works.Best wishes,Mike
Dear Mr. Thelwall, when trying to analyse 233 crawled websites I get the following (German) error message: "Strange error. Der index war außerhalb des Arraybereichs"An English translation might be "The index was outside the array" (?). The header of the window reads "i_lookupDomainNameOrSiteNameFromFileName.."After okaying it, I only get the first three reports as results but not the 'from - to' reports. Are there too many websites or does SocSciBot not find the domainnames file? Is there any fix? Thanks in advance for your help. Best regards. Jill
Sorry Jill, this is an error in the program I think. Please download a new version of SocSciBot from herehttp://cid-63ea9dd591e1bebb.skydrive.live.com/self.aspx/Public/SocSciBot4.zip(available for a few weeks here) and unizip and try this one out. You will need to recalculate the reports to see the missing ones. To do this, view the reports in SocSciBot4 Tools and then use the Recalculate option from the File menu. If you still have problems, please let me know the new error message and send me the file \info\CrawlFileNames.txt.Best wishes,Mike
Dear Mr. Thelwall, Thanks for your quick help! The one error message is gone but now a new one appears, for instance: “Error! Cant find site blogs.euobserver.com/mahony in URL atlanticreview/archives/1389-progress-in-the-balkans.html”. It also comes up with other websites. I emailed the file ‘CrawlFileNames’ to you. Best regards, Jill
Hi Jill,Sorry for this additional problem. I can't find out what this problem is but the CrawlFileNames file should help. It hasn't arrived by email so maybe my server has blocked it. Please could you try copying the contents of the file to an email or email it to my work address, which is m.thelwall --@-- wlv.ac.uk.Best wishes,Mike
Dear Sir, I tried to use the same data that you have given in the tutorial-1 for understanding the work process. But while analyzing the data:Step 5: Viewing link analysis reports about the project of three sites with SocSciBot ToolsI clicked the Yes (standardized home page file name). The pop up menu is showing some error. "An unhandled exception has occurred in your application if you click continue ....the process cannot access the file...."I kindly request you to suggest me to over come the problemThanking you in anticipation
Dear Vasanth,Thanks for your message and sorry for this problem. It might be that the crawling part of SocSciBot is still going somewhere in your computer. Please can you try switching the computer off and on again and then trying the analysis again as soon as the computer is on? If this doesn't work, please email me a list of the web sites you are crawling.Best wishes,Mike
dear Sir,Thanks for the replay and I have e-mailed the list of websites. Thank you very much
Dear Mr. Thelwall,First I would like to wish you a happy new year and hope you are doing fine.I tried to use the same data that you have given in the tutorial-1 for understanding the work process. But the software created only these main reports: • Page and link counts• All external links• Known external links with count• Known external links • Unknown external links with count• Unknown external links What could be the problem?Thanks in advance for your help. Best Regards.Jörg
Hi Jörg,Thanks for your comment, happy new year and sorry for this problem.Have you checked the other tabs in SocSciBot Tools as some of the other reports are available via the other tabs. Some of the reports have to be created by clicking an extra button too - the network diagrams. If you have done this and there are missing reports anyway, please can you list some missing reports?Best wishes,Mike
Hi Maik,Yes I checked the other tabs but I refer to the tab „Main Reports“ and according to the tutorial 1 screenshot the following reports are missing:ADM count summarydirectory document counts from-todomain document counts from-tofile document counts from-tosite document counts from-toBest Regards.Jörg
sorry, MIKE...this was German (Maik)...jörg
Hi Jörg,Thanks for your this extra information. This is my fault - I changed the names of the files and forgot to change the tutorial.The reports called ...from-to have now been renamed to .... interlinking,reports called known... have been changed to Selected... and reports called Unknown.... have been changed to Unselected.....I hope that this makes sense and sorry for the confusion.Best wishes,Mike
Hi Mike,SocScibot 4 only created the 6 reports mentioned in my first post and no other. But still there are 5 reports missing to the screenshot (mentionedbefore). Even if you renamed some of the report there still should be11 reports, right. What I am looking for is the connections (in- and out-links) between theURLs I created before with SocSciBot. I think these informations weregiven in a domain-domain report in an older version of your beautifulsoftware. Unfortunately I can`t find this and some other information in the 6 reports which are given in your newest version. Could you provide me an older version of SocSciBot or tell where and how I can find the information I need.Thank you very much,Jörg
Dear Jörg,Thanks for your message. The number of reports should be the same in all versions of SocSciBot. Did it crash when it was processing the data or did anything else strange happen? Are the missing reports for your own data or for the tutorial? I am away now but will answer your reply on Sunday and try to think of a solution.Best wishes,Mike
Dear Jörg,I have just run the tutorial again myself and got all 11 reports in the Main Reports tab. I can't think why the other reports are not there unless the program crashed. Please can you crawl the sites again or ask SocSciBot to recalculate all the reports (the third option in the File menu when viewing the link analysis reports).Best wishes,Mike
Hi Mike,I have also run the tutorial again and there is no crash and I recalculate all reports but there are only 5 in the tab "Main Reports". What could be the next step?Best wishes,Jörg
Dear Jörg,I am really sorry, this is due to an error in my program. Please can you download a new version of the program from herehttp://cid-63ea9dd591e1bebb.office.live.com/self.aspx/Public/SocSciBot4Free25012012.exeand then run the tutorial again.Best wishes,Mike
Sorry, here is the URL again - the ending was cut off http://cid-63ea9dd591e1bebb.office.live.com/self.aspx/Public/SocSciBot4Free25012012.exe
Dear Mike,I have run the tutorial with the new version again and now there are 11 reports in the "Main Report" tab.Thank you very much for your help.Best wishes,Jörg
Hi Mike I didn't get ADM Count Summary Result will you pelase help me in my ADM Count Summary Excel File in all Columns it shows 0 (Zero)Count.
Hello Svetal,The problem is that your web sites don't link to each other and this is why the results are all zero. Only one of the four universities has a lot of links and it doesn't link to the other three.Best wishes,Mike
Post a comment