09 March 2007 Changes (HHH) --------------------------- 1. Action: "tit[^abhilmtu]" is going full time From: BadHostWordStarts[i++]="tit[^abhilmtu]"; // INFO GATHER RULE To: BadHostWordStarts[i++]="tit[^abhilmu]"; Reason: Despite titter (titter-totter), etc. the "titt" needs to be there. Have had no false positives, rule stands. 2. Action: modified ".qsrch." rule From: BadHostParts[i++] = ".qsrch."; To: BadHostParts[i++] = "qsrch"; Reason: "qsrch.com" 3. Action: ".ebay.com" Exclusion Added: GoodDomains[i++] = ".ebay.com"; Reason: thumbs.ebay.com May be overly broad exclusion but we can live with it. 4. Action: "officemax.com" Exclusion Added: GoodDomains[i++] = "officemax.com"; Reason: Primarily a lot of "hot" matches. Actually a downgrade of the "hot" URL rule to being a HOST rule would kill almost all of them. 5. Action: "oral" END rule REMOVED Removed: BadURL_WordEnds[i++]="oral"; Reason: Here are some of the words I have come up with: amoral, auroral, cantoral, chloral, choral, coral, femoral, floral, immoral, mayoral, moral, unmoral. If I drop the END rule altogether I get the following host count: 690 oral_Parts.txt 839 oral_Starts_and_Ends.txt 0 oral_Passed_All_Rules.txt ----------------------------- 1529 total See the results for "moral" below in the RESOLVED section. If I keep URL but change to "[^m]oral" I still allow "immoral" and "unmoral" through. From a HOST perspetive I don't need the rule. From a URL perspective I am perhaps more likely to get the Moral Majority. Believe it or not, I have got about half of the above, and most often in VERY noticeable ways. 6. Action: "kontakt" PARTS rule REMOVED Removed: BadURL_Parts[i++] = "kontakt"; Reason: de.nntp2http.com/soc/kontakte/freizuegig/2004/02\ d132bc858d397290b156b192f8311ff6.html 232 kontakt_Parts.txt 562 kontakt_Starts_and_Ends.txt 0 kontakt_Passed_All_Rules.txt 794 total All this means in German is the same as "contact" in English. It would cause LOTS of problems for Germans. The rule is almost useless in English. 7. Action: Downgraded "[^c]lips" END rule from URL to HOST From: BadURL_WordEnds[i++]="[^c]lips"; To: BadHostWordEnds[i++]="[^c]lips"; Reason: See the RESOLVED False Positives. All I got was false positives. Yes, they sometimes meant what they said, but it is downgraded and MAY even be removed. 8. Action: ".adtech.de" BAD DOMAIN INFO GATHER RULE Added: BadDomains[i++] = ".adtech.de"; // INFO GATHER RULE Reason: Mike has 25 of them. Let's see if we can find some more. 9. Action: "extrem" HOST rule Added: BadHostParts[i++] = "extrem"; Reason: // TEST RULE, See enclosed files ... We have the following count: 628 extrem_Parts.txt 482 extrem_Starts_and_Ends.txt 74 extrem_Passed_All_Rules.txt -------------------------------- 1184 total That doesn't seem bad but some of them can cause harm to a machine. We have 17 "extrem.", 7 with embedded "extreme" (not at start or end), 12 "extrema" in various positions, 9 "extremo". By harm to a machine, I mean one running MS Windows. 10. Action: "teen" HOST rule Added: BadHostParts[i++] = "teen"; Reason: Too many hosts. There are probably even more now. 16499 teen_Parts.txt 4508 teen_Starts_and_Ends.txt 339 teen_Passed_All_Rules.txt ------------------------------- 21346 total The only reluctance I have here is what problems it has as URL rule. I am running that myself to see what problems it causes. 09 March 2007 UNresolved False Positives (HHH) ---------------------------------------------- FINALLY - WE HAVE NONE!!! 09 March 2007 RESOLVED False Positives (HHH) -------------------------------------------- 1. Pattern: "hot" OLD, NOW "hot[^em]" Rules: BadHostWordStarts[i++]="hot[^em]"; BadURL_WordEnds[i++]="[^s]hot"; Reason: hotmail.com I added the exclusion for e & m for hotel, AND hotmail (but that means I now need to scope out if the exclusion rules are needed any more for hotmail and what hosts need to be added. If anybody complains, we will drop the URL rule but if you use clear GIF images you almost NEVER see anything wrong. Solution: "hot[^em]" - drop to HOSTS if necessary. 2. Pattern: "oral" Rules: BadURL_WordStarts[i++]="oral"; BadURL_WordEnds[i++]="oral"; Reason: Nov 28 20:20:53 www.wliw.org/productions/images/doral_logo.jpg Sun Dec 17 20:24:37 byub.org/programaz/images/byuphilharmonicchoral.jpg Jan 15 08:09:51 iowa.brickriver.com/files/oZone_Objects_XNCYVE/\ 070112_Moral_Witness_PWMXY96T.jpg hostsfile.mine.nu/img/coral.gif The START rule is okay; it is the END rule that kills us. suggested "[^cdhm]oral" ? Here is the moral count for the hosts (immoral): 8 moral_Parts.txt 11 moral_Starts_and_Ends.txt 8 moral_Passed_All_Rules.txt ----------------------------- 27 total From a hosts perspective we can drop the END rule. I am doing it. It catches little and causes some problems. Solution: DROPPED END RULE ENTIRELY. 3. Pattern: "lips" Rules: BadHostWordStarts[i++]="lips"; BadURL_WordEnds[i++]="[^c]lips"; Reason: creativosparc.ads.uigc.net/RealMedia/ads/Creatives/\ OasDefault/BR_20061201_BUSCAPE-BOND/br_20061201_\ buscape-bond-BP-hometheaterphilips_pop.gif My initial hunch is to just downgrade the rules. The pattern is too short. Here is what happens if remove the rules for the hosts ... Both rules removed: =================== 454 lips_Parts.txt 110 lips_Starts_and_Ends.txt 263 lips_Passed_All_Rules.txt ----------------------------- 827 total Start rule removed: =================== 454 lips_Parts.txt 206 lips_Starts_and_Ends.txt 167 lips_Passed_All_Rules.txt ----------------------------- 827 total End rule removed: ================= 454 lips_Parts.txt 148 lips_Starts_and_Ends.txt 225 lips_Passed_All_Rules.txt Solution: Dropped the scope of the END rule from URL to HOST May drop altogether with more false positives.