The second in what I fear will become a continuing series: on the increasingly out-of-control software for weeding out comments spam on this blog. The first installment was my posting “Travails of blogging 6/21/21”, and I’ll start with that.
From 6/21:
Spam comments. I’ve written before about WP’s scheme for protection from spam comments [farmed out to the Akismet software]. It’s a serious issue; according to WP’s records, since this blog started in [mid-December] 2008, the spam filter has deleted over 4 million spam [now around 5 million] comments. The scheme involves two steps that are essentially out of my hands — an initial culling that happens entirely automatically (through an algorithm that is entirely mysterious), and then a file of candidates that are offered to me for examination before deletion — but since that file has on the order of 500 to 5000 comments in it each day, I can’t possibly deal with it, so it too vanishes without any judgment from me.
The scheme is supposed to weed out comments from addresses / urls that I haven’t approved (yet) and are suspicious on internal grounds. That leaves relatively innocent-looking sources that are probably just new commenters (or old commenters posting from a new address); these I get sent to me by e-mail, one by one, for my inspection. The expectation is there won’t be a lot of these, and most of them will be innocent. This expectation has held for over 12 years, but is now crumbling fairly dramatically.
In recent weeks, my moderation task has climbed from one every few days to today’s 15 (a bit later: 18), and they aren’t easy to inspect; their identity as advertising for dubious products is concealed inside complex addresses or urls, so it takes some time to detect.
An Akismet report from this morning:
(#1) The 81,589 figure is deeply misleading, because the count has been reset several times in software upgrades; the true figure is around 5 million, in comparison to 13,047 genuine comments (on 9,225 postings since the blog began on 12/17/08)
I’m now in the zone of 30 to 40 daily spomments (I recently discovered the useful portmanteau spomment = spam + comment); when I check my e-mail at the beginning of the day, there are usually 5 candidates waiting for moderation. A few of them are genuine contributions from new commenters, but most have suspicious features: they are comments on postings from long ago; they come from places known to be frequent sources of spam, like Russia, Belarus, Korea, Indonesia, and (alas) Germany, the last of which supplies many intermediate sites in transporting spam; they are very brief, supplying not actual comments but just the name of a suspicious site or a link to it (or, occasionally, the most vapid of positive comments, like “Great posting” or “Important contribution”); and the title of the candidate comment, or its URL, contains obvious advertising content (“ONLINE POKER” appears several times a day, as well as references to playing the slots on-line, and to items for sale).
Now, such advertisements should not of course be banned from the net (after all, people do search for places to play poker on-line and so on), and they are sometimes even relevant to postings of mine. postings that are, at least in part, about advertisements — but the Akismet algorithm should learn from the information I provide through my moderations. It does learn from my approvals, faultlessly, as far as I can tell, but apparently it can be tricked by the most trivial alterations in the material in spomments; I’ve marked stuff with ONLINE POKER or POKER ONLINE in it as spam hundreds of times by now, but still they come, again and again.
Consequently, the Akismet algorithm strikes me as exceptionally stupid. I have learned to suss out spomments with some accuracy, by generalizing from my own experience; why doesn’t the Akismet algorithm?
Up in my e-mail an hour or so ago, this request for moderation:
Initial flags for suspicion: the comment is on a posting from 2012, 9 years ago; it looks like it comes from NYU computer science grad student John “Jeffrey” Westhoff, but its real source seems to be in Indonesia; the total comment is the brief (and seemingly inscrutable) “Agen MPOPelangi” (but the pelangi I recognized from earlier spomments; it’s Indonesian (and Malay) for ‘rainbow’); and then a continuation of Westhoff’s blog URL turns into garbled and incomplete advertising copy: “eat suggestions to keep you finances so as”. And of course none of this has anything to do with aphasia.
So spam it is. But it took some work for me to feel comfortable in making that judgment. I note that I had marked as spam a number of previous comments with pelangi in them (that’s when I figured out that these comments probably came from Indonesia), so that I had learned from my previous moderations).
(I have just gotten another request for moderation, at first appearing to be from a music site, commenting on a music posting of mine. But a continuation of the music URL diverts you to an MPO site — see the MPO in #2 above — which I’d seen many times before but now I discover it’s apparently the name of a family of Indonesian sites offering on-line slots. Oh lord.)
I am now spending a quite unreasonable amount of time discovering the devious ways of spomments, so I can delete them while rescuing legitimate comments. Somebody really needs to fix Akismet.
More grief. Meanwhile, my saga of trying to install Stanford’s new net security software Cardinal Key is entering its second month. So far I figure I’ve invested 30 hours of time on it. My Stanford IT guy will take another bash at it on Friday.
I note that when I started the Cardinal Key adventure, I was told that whatever the problem was, it was triggered by my using a non-Stanford computer to access features of the Stanford system. That seemed preposterous to me, and now it seems that I was merely the first Stanford faculty using their own computer to try to install Cardinal Key. (I note that the requirement to shift to Cardinal Key came during summer vacation, during a pandemic. Who could have foreseen that there would be compliance problems?)
Now there are a bunch of us. I, at least, am not actually in trouble, since I started all this in time, and the IT folks, faced with a really baffling situation, got me a lifetime waiver from having to install Cardinal Key. (But I am uncomfortable being a possible weak point in security, capable of inadvertently allowing great damage to be done. So I collaborate with the project of actually fixing things.)
Once there were a bunch of us, using a variety of different machines and operating systems, it became clear that my problem was not with my computer setup at all, but with Cardinal Key, specifically with the standard software, BigFix, for preparing computers to install Cardinal Key. There is a glitch somewhere in BigFix. So on Friday my IT guy will uninstall my defective version of BigFix and try some alternative software for the task. Think secure thoughts for me.
Even more grief. Meanwhile meanwhile, I continue to suffer from the WordPress techies’ passion for “upgrades”, large and small, to my blog’s software — so that every few days I have to root around in the system to to discover where particular features have gone, what their names are now, and what you have to do now to get them to work. This constant churning of the software consumes a huge amount of my time and seriously undermines my ability to do my work, for reasons familiar from the psychology of action and and the psychology of learning. More on this topic to come.
All of this makes me desperate. I have so little time now, in a day and probably in my life, and so much of it goes to wrestling with computer systems that should be designed to ease my work.