04 November 2006 Changes (HHH) ------------------------------- 1. Action: Downgraded "tease" rule From: BadURL_Parts[i++] = "tease"; To: BadHostParts[i++] = "tease"; Reason: EVERYBODY uses "teaser" in the URL to indicate something used to entice somebody. Further it indicates you see only a small part of what is offered. Here is the count of the 438,000+ host names if I remove the tease rule altogether: 99 tease_Parts.txt 49 tease_Starts_and_Ends.txt 124 tease_Passed_All_Rules.txt ------------------------------ 272 total 2. Action: Changed comments at header of proxy.txt file From: GPL license, Eric's name To: Licensed to us, removed Eric's name. Reason: I looked at the Creative Commons Licensed and finally decided the only to go is to copyright this. 3. Action: Changed a rule From: BadDomains[i++] = ".247realmedia.com"; To: BadHostParts[i++] = "oasc"; Reason: This spy domain doesn't go to a server in the 247realmedia.com directly. I have observed it doing it this way: oascentral.hamptonroads.com which is an alias to: oasc05032.247realmedia.com which is itself an alias to: oasc05a.247realmedia.com. There are 27 *realmedia.com (19 *.247realmedia.com) hosts in Mike Burgess' blocking hosts file (I scrapped my file and am using his file now as basis for a true superset file). On the other hand there are 49 oascentral.* hosts. By stopping them we stop who knows how many that map over into the *.247realmedia.com AND its sister *.realmedia.com domains. How many more are there? I don't know. Originally, I was going to make this a BadDomains rule with "oascentral.". After looking at it though, many of those "oascentral.*" get mapped over to the "oasc*.247realmedia.com" or "oasc*.realmedia.com", so if they call them directly we still catch them. Well, we catch SOME of them. If you want to add 247realmedia.com back in you can, but do it for now as a personal rule and feed all names that are not in Mike Burgess 4. Action: Added "joven" rule back in but in modified form Added: BadURL_Parts[i++] = "joven"; Reason: I removed this because [a] I didn't have very many, [b] I did notice the Eric had it as "jovenes" and "jovencitas". Well there is that, "jovencita" (no plural), "jovenchicas" "jovencito", etc. The root that all of them share is "joven" (WHICH IS *NOT* A WORD). Here is the count without the rule (and why I added it back in, but in the form that works for ALL of them: 78 jove_Parts.txt 21 jove_Starts_and_Ends.txt 153 jove_Passed_All_Rules.txt ----------------------------- 252 total NOTE: ALL have "joven" 5. Action: Removed overture.com rule Removed: BadDomains[i++] = ".overture.com"; Reason: Mike Burgess said: "Overture is the "Ad Client" division of Yahoo, and no it is not needed." There are a lot of them, but use a blocking hosts file instead. 6. Action: Downgraded Start "fotos" rule From: BadURL_WordStarts[i++]="fotos"; BadURL_WordEnds[i++]="fotos"; To: BadHostWordStarts[i++]="fotos"; BadHostWordEnds[i++]="fotos"; Reason: I can already see the potential for even more false positives in the URL. Here is the count: 544 fotos_Parts.txt 108 fotos_Starts_and_Ends.txt 286 fotos_Passed_All_Rules.txt ------------------------------ 938 total These were the hosts that caused it: http://multiclinmed.com.br/swf/fotos.swf http://pantanalvip.com.br/m_fotos.gif If I downgrade the start, the end rule kills me so it was both or nothing. The word "photo" and all derivatives in English is spelled "foto" with similar derivatives in Spanish, Portuguese, Italian, Catalan and it is Foto in German. Only the French share our spelling. Even without that, "foto" has entered the English lexicon in many ways and usually NOT in a pornographic way. 7. Action: Added good domain ".freebsd" rule Added: GoodDomains[i++] = ".freebsd.org"; Reason: There were too many hosts with "freeb" to exclude the trailing 'b'. Here are the counts: 18959 free_Parts.txt 3448 free_Starts_and_Ends.txt 7432 free_Passed_All_Rules.txt ------------------------------- 29839 total 336 freeb_Parts.txt 24 freeb_Starts_and_Ends.txt 138 freeb_Passed_All_Rules.txt ------------------------------ 498 total To be sure, that is a small portion of the overall count but there is NO general pattern. See the freeb folder. I am NOT including the free folder. There are just too many. At least now you will have a healthy understanding of just how BAD the word "free is. It isn't just porn either. Likely as not, a host using the name "free" in it will harm you. I think we will also have to add some of the "free." download hosts that just download software. 8. Action: Added Bad Parts URL rule "bestial" Added: BadURL_Parts[i++] = "bestial"; Reason: 181 hosts? Almost no chance for false positives? How many more reasons do you need? It looks like a no-brainer to me! 110 bestial_Parts.txt 29 bestial_Starts_and_Ends.txt 181 bestial_Passed_All_Rules.txt ------------------------------ 320 total 9. Action: Shortened "beastiality" to "beastial" From: BadURL_Parts[i++] = "beastiality"; To: BadURL_Parts[i++] = "beastial"; Reason: It only added 10 more, but it makes the pattern match easier and almost NO chance for a false positive. 10. Action: Deactivated GoodDomains rule From: GoodDomains[i++] = "wikimedia.org"; To: // GoodDomains[i++] = "wikimedia.org"; Reason: What is going to go out the door will have this commented out. That is so they have to edit the file right away to activate the rule. 11. Action: Restructured 2o7.net rule From: BadDomains[i++] = ".2o7.net"; To: BadDomains[i++] = "112.2o7.net"; BadDomains[i++] = "122.2o7.net"; Reason: What we have is doing NOTHING. If you get a zag.112.2o7.net for example, the extra dot prevents the triggering of the rule. We either put in a Host Parts rule or this. I opt for this. Next to doubleclick, they are the number one spy domain out there. WE MUST STOP THEM! 12. Action: Changed header comments for copyright reasons From: what it was To: what it IS Reason: It just needed tidying and may need some more cleaning up in the future. 14. Action: Downgraded "cherry" rule From: BadURL_Parts[i++] = "cherry"; To: BadHostParts[i++] = "cherry"; Reason: us.a2.yimg.com/us.yimg.com/a/co/consumerinfo/\ 100606_yahoo_25x25_0705_cherry01.gif ANALYSIS: --------- 94 cherry_Parts.txt 27 cherry_Starts_and_Ends.txt 139 cherry_Passed_All_Rules.txt ------------------------------- 260 total I can already see the potential for even more than this showing up so I am nipping them in the bud right now. 140 hosts is too many to ignore so we should NOT just remove it.