17 November 2006 Changes (HHH) ------------------------------- 1. Action: added URL end rule Added: BadURL_WordEnds[i++]="sexe"; Reason: almost 600 reasons. The only chance for a false positive are so remote I would have to see it to believe it. 2232 sexe_Parts.txt 2720 sexe_Starts_and_Ends.txt 684 sexe_Passed_All_Rules.txt ------------------------------ 5636 total grep sexe\\. sexe_Passed_All_Rules.txt | wc -l 591 2. Action: Added bad domain rule Added: BadDomains[i++] = ".da.ru"; Reason: It isn't just a porn domain. It does DNS wild-carding, so the count is misleading. At one time it would bite your machine. 430 da.ru_Parts.txt 81 da.ru_Starts_and_Ends.txt 268 da.ru_Passed_All_Rules.txt ------------------------------ 779 total 3. Action: Added bad domain rule Added: BadDomains[i++] = ".homestead.com"; Reason: Not only is it a porn clearing house, there are some hosts that kill the browser, etc. 84 homestead.com_Parts.txt 12 homestead.com_Starts_and_Ends.txt 60 homestead.com_Passed_All_Rules.txt -------------------------------------- 156 total 4. Action: *** CONTEMPLATED *** Added: BadURL_WordStarts[i++]="hard[(b|c|e|p|s)]"; Reason: I have been looking at these. Here is the count among our Porn hosts: 82 s * 52 c * 52 e * 36 b * 28 p ? 21 a I have ran stuff past DNS and approximately 65% of the hosts were still alive. Of the ones that were alive (161), maybe 1/3 of them were parked. The rest can be handled by hosts file entries. I would like to say that the rule can be downgraded but it seems like the word appears more in the rest of the URL than in the host name. 17 November 2006 Unresolved False Positives (HHH) ------------------------------------------------- 1. Word: "pink" Rules: BadURL_WordStarts[i++]="pink"; BadURL_WordEnds[i++]="pink"; Reason: adisneyparks.disney.go.com/media/disneyparks/en_US/media/ btn_pink_continue.gif btn_pink_login.gif btn_pink_sendpassword.gif btn_pink_submit.gif Remove Both rules: 426 pink_Passed_All_Rules.txt Remove Start rule: 263 pink_Passed_All_Rules.txt Remove End rule: 139 pink_Passed_All_Rules.txt I could understand this *IF* they had the following: button_pink_continue.gif button_pink_login.gif button_pink_sendpassword.gif button_pink_submit.gif they don't so why can't they have? btn_pnk_continue.gif btn_pnk_login.gif btn_pnk_sendpassword.gif btn_pnk_submit.gif So I ask you, what should I do. Those passing are just host names. There is awful lot of PINK behind hosts at porn URLS 2. Word: "exposed" Rules: BadURL_Parts[i++] = "exposed"; Reason: www.hackinglinuxexposed.com/articles/20031231.html 96 expos_Parts.txt 70 expos_Starts_and_Ends.txt 92 expos_Passed_All_Rules.txt ------------------------------ 258 total 47 of the 92 passing are exposed, the others are exposure, expose, and some variations of sexpost. Downgrading the rule does NOTHING. In fact I am thinking of shortening it to "expose" where it is usually used in the French pronunciation of eks-po-zay 3. Word: "girl" at portal.opera.com Rules: BadURL_Parts[i++] = "dreamgirl"; BadURL_Parts[i++] = "girlfriend"; BadURL_Parts[i++] = "schoolgirl"; BadURL_Parts[i++] = "teengirls"; BadURL_WordStarts[i++]="girl"; BadURL_WordEnds[i++]="girl"; Reason: "girl" at portal.opera.com I have monitored my phttp.log for more than six weeks now. Here is what has shown up: http://www.estdomains.com/anacreon/images/homegirl.jpg (triggered by www.estdomains.com in hosts file) http://images.ig.com.br/homev8/novas/ic_girl18_box_novo.gif (triggered by images.ig.com.br in hosts file) http://www.kcsm.org/Reconnections/images/computer_girl.png (Going to kcsm.org takes you to: http://www.w3.org/Protocols/ ) (I never would have known it without a grep through logs!) www.clubhardball.com/templates/icons/search_girl.gif (porn site - Start or End rule) www.agentlemanschoice.com/images/join_girl.jpg (porn site - Start or End rule) All but the third one were either spy or porn domains.