One of the joys of maintaining a website is dealing with spam. Over the years, I’ve had to deal with several different varieties of spam here, including comment spam, trackback spam, even my old forum got inundated with spam. As such, countermeasures were deployed with varying degrees of success. Movable Type has improved its spam blocking capabilities considerably, and I use a plugin to close comments on posts older than 60 days, so the blog has remained relatively spam free for a while now. I replaced my forum with a new system that requires registration (ironically, even the new forum was spammed with a bizzarely intriciate scheme to sell, no joke, biodynamic cheese).
This leaves referrer spam. I don’t know that there’s anything to really be done about that short of banning IP addresses and the like, but I never really used my site’s raw referral logs that extensively, so even though I’m sure I get a decent amount of referrer spam, I don’t really see it. Instead, I use sitemeter, a popular web stats application that uses an image and javascript to collect the appropriate info (you can see the little multicolored image towards the bottom of every page on Kaedrin). I’m not sure if sitemeter does something on their end to prevent referral spam, or if spambots simply ignore the technology they use, but I get next to no referrer spam there.
Until this morning.
I awoke to find my site had several hundred hits overnight (much more than usual). When I looked at the referrals, I noticed that I was getting a huge amount of traffic from a bunch of sites that were all variations of the same domain. A sampling includes:
http://qfm96.listenernetwork.com/SearchWeb.asp
http://wmvx.listenernetwork.com/SearchWeb.asp
http://973thebrew.listenernetwork.com/SearchWeb.asp
http://98online.listenernetwork.com/SearchWeb.asp
As you can see, all the referrs are coming from some sort of search application. Going to the various “listenernetwork.com” home pages, it became obvious that they were all radio station sites that were apparently all using some central application to produce cheap, easy sites for themselves (they all use the same template with content and styles tailored towards individual stations). The sites and referrals were distributed all throughout the country. At a glance, they seemed to be legit stations. How odd.
All of the referrals were going to my Neal Stephenson category archive page, which was strange. At first, I thought, hey, maybe Neal Stephenson announced a new book on the radio this morning! Of course, that doesn’t make much sense, but I’m a sucker for Stephenson and so I wanted to believe. In any case, it immediately became obvious that something else was going on (damn!).
The most frustrating thing about these referrals is that they’re obviously coming from these radio station sites’ built-in search engine, which apparently uses a HTTP POST request instead of a GET request. Most search engines use GET requests because then the search parameters are contained in the URL, which allows users to bookmark searches. POST requests hide search parameters, so users can’t bookmark their searches and referred sites can’t see what the search terms are. So not only was I getting all this traffic from a mysterious search engine, but I didn’t even know what people were searching for…
Back to the logs I go. After rooting around a bit, I found some other search engines like ask and google were referring to the same Neal Stephenson page… but they had the search terms in their URL:
what unit of length used in nuclear physics is named after a famed manhattan project scientist?
Allright, so I’m making progress. My Stephenson category page contains most of those terms, so that kinda makes sense. I went to one of the refferring sites and was quickly able to reproduce the search on their site and see my page come up in the results. But this question is rather odd, and there were many people searching with that exact question. What the heck is going on here?
Confused and a little intrigued, I started clicking around one of the referring radio station’s sites hunting for clues. Then I found it. Apparently, all these stations run some sort of big national contest, and the mysterious question above was today’s “Really Hard Trivia” question. The site even conveniently notes: “Don’t know the answer? Search the web below.” Bingo.
So it appears that these are all indeed legitimate referrals, though I can’t imagine anyone becoming a reader, as they didn’t find the answer on my page. However, in the off chance that someone is still looking, the answer appears to be the Bohr Radius, named after Neils Bohr.
It turns out that I probably could have saved myself a good deal of effort by simply googling “listenernetwork referrer spam,” as this issue has apparently struck others before. Still, it was somewhat intriguing and I’m glad it didn’t turn out to be referrer spam…