[Mycroft] [Statistics] December overall

Ricky ricky at linuxbourg.ch
Thu Jan 1 20:46:22 EST 2004


Hello folks

First of all, Happy New Year and long live the mycroft project! By the 
way, did you know the project is already 3 years old? It started on 
december 15, 2000, and was called by that time "Sherlock".

Ok, here are the stats. I still am trying to figure out what is really 
interesting but here we go. Hope to hear some comments from you on how 
to improve them. Don't forget, statistics are not for the love of the 
numbers, but more for improving our service.

These stats include the last week of november and data up to Dec. 28.

General summary
   From                  24.11.2003 14:32:43
   To                    28.12.2003 23:59:53
   Duration                        825:27:10
   Searches/Hour         91.88

Total Searches         75840    100%
   0 result              7674     10%
   1-15 results         15284     20%
   16-30 results         1559      2%
   31+ results           4935      7%
   QuickLink            46276     61%

Search for a name      70806     93%    100%
   10 most frequent     49964     66%     71%
   Preprocessed          1900      3%      3%
   Cache                11146     15%     16%

Top 75% searches       53199             75%
   1.  google           10351             15%
   2.  dictionary        9478             13%
   3.  yahoo             5591              8%
   4.  imdb              4375              6%
   5.  astalavista       3835              5%
   6.  ebay              3764              5%
   7.  altavista         3659              5%
   8.  amazon            3385              5%
   9.  alltheweb         3196              5%
   10. leo               1554              2%
   11. wikipedia          916              1%
   12. dogpile            915              1%
   13. labourstart        462              1%
   14. java               268              0%
   15. google.de          253              0%
   16. php                216              0%
   17. freshmeat          215              0%
   18. teoma              192              0%
   19. msn                150              0%
   20. flash              149              0%
   21. sourceforge        139              0%
   22. webster            136              0%

This top 75% can also be expressed in the following way. All different 
ways of searching for one engine (for example "google", "google.de", 
"google.com" have been grouped together, even obvious misspellings like 
"goggle". All these 22 entries have been analyzed.

   1.  google           12338             17%
   2.  dictionary        9648             14%
   3.  yahoo             5928              8%
   4.  imdb              4420              6%
   5.  astalavista       3895              6%
   6.  ebay              3837              5%
   7.  altavista         3805              5%
   8.  amazon            3514              5%
   9.  alltheweb         3280              5%
   10. leo               1706              2%
   11. wikipedia          980              1%
   12. dogpile            972              1%
   13. labourstart        467              1%
   14. java               342              0%
   15. webster            298              0%
   16. php                257              0%
   17. freshmeat          231              0%
   18. teoma              227              0%
   19. msn                210              0%
   20. flash              184              0%
   21. sourceforge        170              0%

We can see with these figures that webster had enough alternate 
spellings (merriam-webster, m-w) to steal some ranks.

Top 10 bad query strings
   1.  shockwave           47   0.06%
       metager             47   0.06%
   3.  google.co.jp        30   0.04%
   4.  quicktime           29   0.04%
       gamefaqs            29   0.04%
   6.  ask.com             27   0.04%
       mycroft             27   0.04%
   8.  google              23   0.03%
       webcrawler          23   0.03%
   10. all the web         21   0.03%
       realplayer          21   0.03%
       apple               21   0.03%
       m-w.com             21   0.03%
       metager.de          21   0.03%
       cracks              21   0.03%

There is still a high ratio of strings that shouldn't be here in the 
first place: "shockwave", "quicktime", "realplayer", "apple" (referring 
to "Apple quicktime"?). Then we can see the metager (first of all if we 
consider "metager" + "metager.de" (68) and the gamefaqs, requests, which 
seem to be the most relevant ones, but unfortunately, their site uses 
post...



More information about the Mycroft mailing list