Friday 22 May 2009

SocSciBot help manuals

SocSciBot 4 has entended tutorials and a FAQ online now. Its graph drawing tool SocSciBot Network also has an online manual now.

29 comments:

Unknown said...

Hi,
would like to know how socscibot works. is it possible to provide the data model of Socscibot?
i have used it to conduct an analysis of french univerities websites. However, my tutor will not allow me to present my thesis unless I explain how Socscibot works (any parser used, formulars, UML models..etc). appreciate your feedback. many Thanks

Mike Thelwall said...

Hi Naouel,
This is a tricky one. I don't have any documentation other than what is on the web site. The only paper that describes much of SocSciBot is this one - but it describes an old version. I can send you it if you send me a direct email? m.thewall at wlv.ac.uk.
Thelwall, M. (2001). A web crawler design for data mining, Journal of Information Science 27(5), 319-325.
If this isn't enough and you really need more then you could have the dot net source code, just for the purpose of analysing it.
Best wishes,
Mike

Anonymous said...

Dear Mr. Thelwall,

how do I cite your SocSciBot 4 programm correctly? How would you like users to cite you it they use your crawler?
Thankys in advance
L

Mike Thelwall said...

Please could you cite the following paper? Although it is about a different version of SocSciBot I think that it is the most suitable publication about it.
Best wishes,
Mike

Mike Thelwall said...

Whoops - here is the paper to cite.
Thelwall, M. (2001). A web crawler design for data mining, Journal of Information Science 27(5), 319-325.

Anonymous said...

Thank you for your fast answer!
L

Anonymous said...

Dear MR Thelwall,

First I would like to wish you a happy new year and hope you are doing fine.
Than I was thinking if it would be possible to just crawl the linking structure between a list of URLs and not also crawling the pages with all it`s content as well. So the crawling would run much faster for long list if it doesn`t need to save the web content.

Mike Thelwall said...

Happy new year to you too and I hope you are well.

Sorry but it isn't possible to crawl just the links and not the pages. This is because the links are inside the web pages so the only way to find them is to download the web pages and then search through the pages to find the links in them. This is how SocSciBot works.

Best wishes,
Mike

Anonymous said...

Dear Mr. Thelwall,

when trying to analyse 233 crawled websites I get the following (German) error message:
"Strange error. Der index war außerhalb des Arraybereichs"
An English translation might be "The index was outside the array" (?).
The header of the window reads "i_lookupDomainNameOrSiteNameFromFileName.."
After okaying it, I only get the first three reports as results but not the 'from - to' reports.

Are there too many websites or does SocSciBot not find the domainnames file? Is there any fix?

Thanks in advance for your help.
Best regards.
Jill

Mike Thelwall said...

Sorry Jill, this is an error in the program I think. Please download a new version of SocSciBot from here
http://cid-63ea9dd591e1bebb.skydrive.live.com/self.aspx/Public/SocSciBot4.zip

(available for a few weeks here) and unizip and try this one out. You will need to recalculate the reports to see the missing ones. To do this, view the reports in SocSciBot4 Tools and then use the Recalculate option from the File menu. If you still have problems, please let me know the new error message and send me the file \info\CrawlFileNames.txt.
Best wishes,
Mike

Anonymous said...

Dear Mr. Thelwall,

Thanks for your quick help! The one error message is gone but now a new one appears, for instance:
“Error! Cant find site blogs.euobserver.com/mahony in URL atlanticreview/archives/1389-progress-in-the-balkans.html”.

It also comes up with other websites. I emailed the file ‘CrawlFileNames’ to you.

Best regards,
Jill

Mike Thelwall said...

Hi Jill,
Sorry for this additional problem. I can't find out what this problem is but the CrawlFileNames file should help. It hasn't arrived by email so maybe my server has blocked it. Please could you try copying the contents of the file to an email or email it to my work address, which is m.thelwall --@-- wlv.ac.uk.
Best wishes,
Mike

Vasanth said...

Dear Sir,

I tried to use the same data that you have given in the tutorial-1 for understanding the work process. But while analyzing the data:
Step 5: Viewing link analysis reports about the project of three sites with SocSciBot Tools
I clicked the Yes (standardized home page file name). The pop up menu is showing some error.

"An unhandled exception has occurred in your application if you click continue ....
the process cannot access the file...."
I kindly request you to suggest me to over come the problem

Thanking you in anticipation

Mike Thelwall said...

Dear Vasanth,
Thanks for your message and sorry for this problem. It might be that the crawling part of SocSciBot is still going somewhere in your computer. Please can you try switching the computer off and on again and then trying the analysis again as soon as the computer is on? If this doesn't work, please email me a list of the web sites you are crawling.
Best wishes,
Mike

Vasanth said...

dear Sir,

Thanks for the replay and I have e-mailed the list of websites.
Thank you very much

Jörg said...

Dear Mr. Thelwall,
First I would like to wish you a happy new year and hope you are doing fine.
I tried to use the same data that you have given in the tutorial-1 for understanding the work process.
But the software created only these main reports:
• Page and link counts
• All external links
• Known external links with count
• Known external links
• Unknown external links with count
• Unknown external links
What could be the problem?
Thanks in advance for your help.
Best Regards.
Jörg

Mike Thelwall said...

Hi Jörg,
Thanks for your comment, happy new year and sorry for this problem.
Have you checked the other tabs in SocSciBot Tools as some of the other reports are available via the other tabs. Some of the reports have to be created by clicking an extra button too - the network diagrams. If you have done this and there are missing reports anyway, please can you list some missing reports?
Best wishes,
Mike

Jörg said...

Hi Maik,
Yes I checked the other tabs but I refer to the tab „Main Reports“ and according to the tutorial 1 screenshot the following reports are missing:
ADM count summary
directory document counts from-to
domain document counts from-to
file document counts from-to
site document counts from-to

Best Regards.
Jörg

Jörg said...

sorry, MIKE...this was German (Maik)...jörg

Mike Thelwall said...

Hi Jörg,
Thanks for your this extra information. This is my fault - I changed the names of the files and forgot to change the tutorial.
The reports called ...from-to have now been renamed to .... interlinking,
reports called known... have been changed to Selected... and reports called Unknown.... have been changed to Unselected.....
I hope that this makes sense and sorry for the confusion.
Best wishes,
Mike

Jörg said...

Hi Mike,
SocScibot 4 only created the 6 reports mentioned in my first post and no other. But still there are 5 reports missing to the screenshot (mentioned
before). Even if you renamed some of the report there still should be
11 reports, right. What I am looking for is the connections (in- and out-links) between the
URLs I created before with SocSciBot. I think these informations were
given in a domain-domain report in an older version of your beautiful
software. Unfortunately I can`t find this and some other information in the 6 reports which are given in your newest version. Could you provide me an older version of SocSciBot or tell where and how I can find the information I need.
Thank you very much,
Jörg

Mike Thelwall said...

Dear Jörg,
Thanks for your message. The number of reports should be the same in all versions of SocSciBot. Did it crash when it was processing the data or did anything else strange happen? Are the missing reports for your own data or for the tutorial? I am away now but will answer your reply on Sunday and try to think of a solution.
Best wishes,
Mike

Mike Thelwall said...

Dear Jörg,
I have just run the tutorial again myself and got all 11 reports in the Main Reports tab. I can't think why the other reports are not there unless the program crashed. Please can you crawl the sites again or ask SocSciBot to recalculate all the reports (the third option in the File menu when viewing the link analysis reports).
Best wishes,
Mike

Jörg said...

Hi Mike,
I have also run the tutorial again and there is no crash and I recalculate all reports but there are only 5 in the tab "Main Reports". What could be the next step?
Best wishes,
Jörg

Mike Thelwall said...

Dear Jörg,
I am really sorry, this is due to an error in my program. Please can you download a new version of the program from here
http://cid-63ea9dd591e1bebb.office.live.com/self.aspx/Public/SocSciBot4Free25012012.exe
and then run the tutorial again.
Best wishes,
Mike

Mike Thelwall said...

Sorry, here is the URL again - the ending was cut off
http://
cid-63ea9dd591e1bebb.office.live.com/
self.aspx/
Public/
SocSciBot4Free25012012.exe

Jörg said...

Dear Mike,
I have run the tutorial with the new version again and now there are 11 reports in the "Main Report" tab.
Thank you very much for your help.
Best wishes,
Jörg

idealscience said...

Hi Mike
I didn't get ADM Count Summary Result will you pelase help me in my ADM Count Summary Excel File in all Columns it shows 0 (Zero)Count.

Mike Thelwall said...

Hello Svetal,
The problem is that your web sites don't link to each other and this is why the results are all zero. Only one of the four universities has a lot of links and it doesn't link to the other three.
Best wishes,
Mike