comment spam filtering - it's all about the IPs


Sam describes his new comment spam filtering system. Quote:

Then it struck me: from an ip address I have never seen before. Light bulb.

Hit and run. Strangers. People who never have been here before. These people are unlikely to be seen again.

And if they don't come back, it is not possible to have a two way conversation, is it?

So I set to work. I wrote a script to scan my Apache logs for everybody who has ever visited my weblog within the last week. Bots, aggregators, and an occasional carbon based life form, I make no distinction.

Then I add in everybody who has left a comment in the last ninety days. And not just ip addresses, but also urls.

All these people are welcome to comment freely.

Very, very cool idea! Use the logs to establish an implicit community effect, a kind of automatic self-updating whitelist. It leaves me thinking "and where else can this be applied?". Mmm...

Categories: technology
Posted by diego on January 21 2004 at 11:43 PM
Comments (please see the comments & trackback policy).

Wouldn't that mean that someone who spammed you before is now a spammer with an official permission?
What about automatically deleting URLs in the comment body, and/or checking that the URL someone enters is really a blog?

Posted by: Frank Koehntopp at January 22, 2004 7:34 AM

Not overly good for all those folks on dial-up or even those on cable and dsl who's dhcp lease has expired since the last comment.

It's difficult to successfully use static ip as the only differentiating factor.

Posted by: Jim Hughes at January 22, 2004 9:23 AM

Using IP addresses for banning is just extremely stupid. I have what most would consider the most high-end home setup (DSL) in Germany, and I get new IP addresses all the time.
I believe the only reasonable countermeasure, if one bans visitors, is to base it on the URLs people link to - after all, raising their PageRank is what they're after.

Posted by: Stefan Tilkov at January 22, 2004 9:53 AM

Frank, Jim, and Stefan: I agree with what are you saying, which (I might be wrong, please correct me if so) boils down to all the problems related with banning an IP that might later be reused by another person, etc.

I don't think that Sam advocated using *only* IP banning. Regardless, I found it a very cool idea to use the logs in combination with other things. This can't be the only solution, but I still think the idea of using the logs as an implicit whitelist is cool, and has good potential when combined with other information.

Posted by: Diego at January 22, 2004 11:32 AM

Copyright © Diego Doval 2002-2007.
Powered by
Movable Type 3.35