Moving Toward a Spam-Free Diet

Quick update on SpamBayes: I FREAKIN’ LOVE IT!. So far I’ve tried a grand total of two client-side spam blockers – Norton Antispam, which came installed with a six-month trial on my new Dell at work, and SpamBayes, which is open source and free for all.

I won’t mince words here – Norton sucked. In fact, I’ve noticed more and more that their offerings have been going very downhill as of late. Norton Antivirus takes up entirely too much resources, especially when compared with TrendMicro’s offerings. Systemworks doesn’t seem to fix as money problems as it used to. And the Antispam program, which I had to physically enable even though I’d installed it, didn’t do a thing to my number of spam.

SpamBayes is the day to Norton Antispam’s night. I’m now using both the Outlook Plugin version and the POP/SMTP proxy through Outlook Express. Both are laughably easy to set up. The plugin is far easier to train – just put all your existing spam in one folder and all your good mail in another, tell the program where each is and let it go. The proxy required me to set up a rule in Express to forward everything from each folder to the local fake email addresses I set up (to train the proxy, you send bad spam to “spambayes_spam@localhost” and good mail to “spambayes_ham@localhost” and it grabs them before the SMTP server sees them).

After the plugin version is trained, it dumps anything identified as spam in your “Junk Mail” folder and anything it’s unsure about in the “Junk Suspects” folder. In the last seven days, I have received a whopping 474 pieces of Spam mail. Only about 10 of them slipped through into my good email and maybe 50 – 75 have ended up in the Suspects folder. Of my good mail, only about 20-30 understandably questionable, but still good by my standards, emails ended up in the Suspects folder and absolutely none ended up in the Junk folder.

The proxy works a little differently. Instead of assigning emails to folders (which isn’t possible in POP from the server side) it signifies it’s impression of each incoming mail by adding “spam”, “unsure” or, optionally, “ham” someplace in the message (I set it up to place the word at the beginning of each subject). It’s then up to my mail client to sort through them. The results on the proxy have been roughly the same as with the plugin so far. I’m truly impressed.

What most interesting to me is that, now that my email is organized this way, I actually read some of the spam I get. I have absolutely, positively no intention of ever purchasing anything from it, but I find my attitude toward spam changing. I used to angrily delete or simply ignore all the spam sitting in my inbox. I was infuriated that these people were wasting my time and cluttering my email with useless crap that had absolutely no relevance to my life. Now that they’re regimented to different folders automatically, however, that hatred is gone, and it’s becoming sort of entertaining to read their pitches.

The proxy, while easy to install, I’d imagine would be a bit tricky for the uninitiated to configure. The plugin, on the other hand, is pretty user friendly. It installs a button in Outlook labeled “Delete as Spam” and another when you’re in the spam folder labeled “Recover from Spam”. I think it’s pretty easy how you’re meant to continue training it. And, trust me, both versions absolutely improve over time with training.

The downside of this system is that it sits on the client. Essentially, you’ve already downloaded all of the bandwidth-sucking spam onto your mail server and then onto your client computer by the time SpamBayes gets to it. So, while it does clear the clutter that you see, it doesn’t speed up your email connection. In fact, it actually slows it just a bit as the program checks every message and assigns it a score. Of course, you can install the proxy on your mail server, but then you lose the coolness of the plugin’s functionality.

At any rate, if you use POP3 to get your email or use Microsoft Outlook, SpamBayes will save your life from the clutches of spam. It’s much better than the SpamAssassin/Earthlink tactic of acting as a gateway protector, sending automated message to spammy-looking emailers requesting some sort of user intervention. If I already sent you the email and your server rejected it, I’m not going to be too thrilled about having to send a second message or send a web page to prove I am who I am. Whitelists and blacklists are too easy to defeat and are a maintenance nightmare.

The Bayesian method, on the other hand, leaves it to you to determine what is good and bad and learns from your decisions. It’s tailored specifically to you, which makes it that more effective. And while the setup and initial training can be a bit time consuming, the maintenance is a breeze. Just be sure to always tell the filter when a bad email slipped through and it’ll just keep improving itself. How can you complain about software that actually makes itself more effective at its job?

Venting About Weighing in on MT 3.0

As you may have heard, the folks who make MoveableType have recently announced a new licensing structure which turns what was once a nice little free blogging system into something that better resembles greedware.

Here’s the thing, I’m OK with Ben and Mena (Six Apart’s founders) making money off of their creation. I use an older version of MT on this site and I’m still amazed at what I’m able to accomplish in it. It’s a truly wonderful little program and they deserve their kudos. This site alone would still be able to use the free version of MT (which limits me to one author and three sites) but if I wanted Danielle to be able to add her two cents here, the price would immediately ratchet up to $99. For a personal website on which I make absolutely no money, that’s just not worth it.

So, let’s assume I find a way to make money off this site. For all intents and purposes, it immediately becomes a commercial venture. That ratchets the price of MT 3.0 way up to $199. If I’m making just enough money to cover my hosting fees, that’s just not worth it.

Now, to be fair to the Six Apart gang, they have indicated that they will enforce these licenses at their discretion. But, frankly, I’m not comfortable playing the “definition of what is ‘personal’” game with them. And, in the end, I’m really just a tightwad. I want my free software to be truly and actually free.

I could go on a major riff about “free meaning free” and the benefits of open source versus commercial software, but Someone else has already done it, and better than I could. Bottom line: I either need to stick with an old version of MT that has far fewer restrictions but no more support, or I need to find an alternative.

Here at the RobZazueta Labs, we’ve been busy cranking out a couple of projects that have been on our plates for months. They should be done VERY soon, which will make room for the next big thing: Yet Another RobZazueta.com Redesign (Codename: YARZR). My goals with YARZR are as follows:

  1. Put the blog back on the front. I had envisioned writing a tech column once a week and promoting it up front. That didn’t happen. Will it happen someday? God, I hope so. But for now I’m going to try to focus on just being better at updating the blog.
  2. Do something with the color scheme. My wife has complained that the color scheme on the blog is ugly and hard to read. My best friend likes the blog colors, but thinks the professional front-end is too bright and hurts the ol’ eyes. Me? I’m sort of sick of blue.
  3. Add some more fun, experimental features that may draw more visitors and generate a little buzz. I’ve got some tricks up my sleeve yet.

Now to that list I’m going to have to add “Transition to new, GPLed blog service” whether it’s one of the ones out there now or one I build myself. And, before you inundate me with recommendations, Pivot is too confusing, WordPress is too limiting, PHPNuke isn’t GPL (unless I missed something) and none of them seem to allow for building a static filesystem through their interface.

One of the things that really sets MT apart for me is that I can use it to build pages and sub-pages – practically an entire site – all from within its interface. A weblog can represent a real weblog, a portal, a section of a site or whatever I want. MT is extremely flexible if you know what you’re doing. As popular as the MVC pattern is, I don’t want the overhead and messy maintenance of a single page controlling my entire site pulling data from the database. Let MySQL store the data that generates the pages, let Apache serve static pages and let PHP fill in the blanks. To me it just makes sense.

So, if you know of a system that does this, you let me know. And if I don’t find one by the time YARZR is ready for my plate, I promise to GPL whatever CMS I end up creating. I actually already have the guts of one written, I’d just need to flesh it out a bit more. Actually, a LOT more.

Unstuck and Eats, Shoots and Leaves

I positively *MUST* get better at updating this. I must have read at least half a dozen books in the last several months, most of them worth mentioning. At the moment, I’m working on two:

Unstuck Unstuck promises to get your creative juices flowing and knock you out of your rut through creative strategizing. It’s essentially a choose-your-own-adventure for creatively crammed adults.

Eats, Shoots and Leaves I’ve seen Eats, Shoots and Leaves at the top of many best-seller lists and on the new-release tables of every bookstore I’ve been into lately. It’s a manifesto on proper punctuation. For a hack writer like myself, it’s candy.

Vegetarian Email

My spam situation is getting out of hand, especially at work. I now monitor at least a dozen different email addresses through my Outlook box which has increased my spam intake exponentially (’cause the older the account, the more spam it tends to get).

So I finally decided to give ol’ SpamBayes a try. I’m researching some options for spam control and prefer Bayesian probability methods over black lists and whitelists, primarily because I want this to be as low maintenance as possible. I haven’t been saving my spam as religiously as I should have been, but I still started out with about 180 pieces of it and more than 5000 instances of “good” email.

I installed the Outlook plugin version of SpamBayes, trained it on my good and bad emails, then set it to work. So far, it’s already caught a piece of spam without even bugging me about it. IT didn’t delete it – it just tossed it in the ol’ Junkmail folder – but it was, beyond shadow of a doubt, spam (specifically: porn mail).

So, one hour of use isn’t enough to make a judgment call yet, but it has been a fascinating look at my email habits so far. For instance, SpamBayes has this way nifty feature that allows me to see how it determined the “spamminess” of an email. It shows each token and the score it calculated based on how many pieces of spam or ham (the good mail) contained the same token. It’s sort of surprising and fun to see which words indicate ham for me.

What will be a true challenge is that fact that I work for a marketing firm, which means I frequently both send and receive targeted, opt-in marketing communications (not spam). The unfortunate fact of that is a lot of our email may appear spammy to it, but I’ll have to rescue those form the spam frier if they get tagged. This could be an interesting experiment.

I’m hoping that SpamBayes is user-friendly enough for me to install on all of our clients or, better yet, on the server. The thing about Bayesian filtering is that it tries to improve as the spammers change their tactics. Of course, spammers have been trying all kinds of stuff recently to fool the Bayesian systems, like filling the subject line and body with completely irrelevant or, in some cases, nonsensical words. This, of course, makes them even more identifiable as spam, but only to a system (like a human) that is capable of natural language processing. Assuming we figure that one out, I’d be willing to bet the spammers would then turn to using foreign words, quite possibly a mixture of them from different languages (i.e. “Subject: Voulez pinata reichstag missa sunt arrivaderci”). So then we need to make the natural language processor multilingual, grammatically flexible and gibberish resistant. This is costly and annoying for all involved. But there’s a silver lining.

Already, many spam emails are more gibberish than actual marketing. Here’s an example of a subject to an email I received the other day: “Fwd: V+a+lium – xana+x+ ` v1@grA $ V|cod|:n Som@ % .P.ntermin lnjfscnylwhx”. It resembles the original words just enough that I know it has something to do with Xanax, Valium and Viagra, but the rest of it is almost totally gibberish. How useful is that for a customer? Spammers still exist because people actually buy crap from spammers. But if the spam itself can’t even tell us what its trying to sell, how can a sale be made? So, yeah, this may get annoying for a while, and on the surface they may have cracked the Bayesian code, but anti-spammers have driven the spammers so deep into the forest that their message is getting lost in the noise. Hopefully, as more people adopt anti-spam measures, more spammers will find it to be a waste of their time to send out these mass untargeted and unsolicited emails. And then they’ll probably go back to air-dropping fliers or something.

Hey, Rosie, You Missed a Spot.

Heya robot developers! Been a while since we last rapped. Mind if I ask you a big favor? You think we can finally move past cute but pointless robots and machines bent on destroying each other and maybe focus on building something useful that actually improves my life for a change?

That would be fantastic. Thanks.

Quote of the Day

“I said to myself, ‘There should be some cancer patients who could actually hold a note.’”

– Audience member viewing the WB’s new show Superstar USA.