Analyzing the Picket Line Server Logs • TPL

Who reads The Picket Line, how, and why? Every once in a while someone asks me, and I never have an answer. I’m finally curious enough to spend some time scanning through the logs. (If you’re not curious, stop reading now, as it’s going to get boring quickly.)

This is a more complex question than it used to be, now that many people do their blog reading through feed aggregators rather than by going to the blog itself. Because of this, much of what I see in my logs isn’t people coming to view the site, but aggregators and search engines coming in to pull content from here to show to their users.

I analyzed my logs from a day one week ago today. Here’s what I found:

The most fastidious visitors were the Yahoo! and Google search engine spiders, which cataloged 831 and 799 pages of The Picket Line respectively (I only count 707 pages on the site myself, so this is thorough). Microsoft’s spider cataloged 186 pages. Other search engines to stop by included Kosmix, Voila, Exalead, StackRambler, Majestic-12, and GAIS but those sites only looked at a handful of pages.

In addition, two image-specific spiders dropped by to look for new images to grab, including Yahoo!’s image crawler and the Kinja image bot.

Many other spiders and bots visited that are more specialized, and that crawl the web for purposes other than creating general-purpose public search engines. Among those to visit The Picket Line last Tuesday were the “Adult” web index crawler Eonpal, the “Kyluka crawler,” Yahoo! Slurp, WebCorp, YodaoBot, Alexa’s ia_archiver, Blog Carnival Index, something called “PyQuery,” the IRL-crawler, the Brandimensions robot, and the sogou spider.

I was also visited by at least one hostile probe (I would probably have found more if I’d checked my error log) that successfully downloaded an uncreatively-named JavaScript file from my site to scan it for vulnerabilities.

Many feed syndicators, aggregators, searchers, and robotic plagiarizers came by to look at the variety of syndicated content feeds I offer. These included InfoSquire, NewzCrawler, Tailrank, AllResearch, Google’s Feedfetcher, Bloglines, NewsGator, LiveJournal, NewsOnFeeds, Kinja, Chello, Blogrunner, Blogdigger, TopicBlogs, Sphere, mioNews, Swamii, Modwest, Plazoo, RSSMicro, Netvibes, Gregarius, LaughingSquid, BoardReader, Technorati, syndic8, BlogPulse, StrategicBoard and the more academic Blog Conversation Project. Feels like 1999 around here.

Some of these aggregators include a note in their referrer headers to indicate how many people they serve my feeds to. For instance, Google tells me that 26 people read this blog in Google Reader, Bloglines reports 20 readers, NewsGator can’t decide whether it’s four or five readers there, LiveJournal reports 28, Kinja has one lonely reader, and Netvibes three. Many of these people may read The Picket Line regularly but only seldom actually visit the site it’s hosted on.

Occasionally, I’d actually see someone from one of these aggregators read my feed, as they’d pull a supporting image from my site rather than from the aggregator’s. The Taxblogger aggregation feed is like this, as are (sometimes anyway) Google’s various feed readers.

On top of this, some people use their browsers as feed readers, or they use specialized software like SharpReader, Omea Pro, RSS Bandit, JetBrains, Google Desktop, or Liferea to make their computer a personal aggregator. About 30 people seemed to read my site this way, with some individuals pulling a feed from my site as many as 20 times during the day.

Back to the robot category, there were a couple of phony referrers — these are typically bots that visit your website, pretending that they have gotten there by following a link from some site they are promoting, and hoping that I, as the webmaster viewing the visitor logs, will be curious enough to take a look. I got two of these; one was for a nonexistent site, another was for a diploma mill that was already unreachable today.

Some people still surf the web the old-fashioned way, by visiting web pages and reading them. Of these, many came here from following links at Wikipedia. Six people visited Thoreau’s Herald of Freedom page by following the link on Wikipedia’s page about the essay. One person came to the Thoreau on John Brown page by following a link on Wikipedia’s A Plea for Captain John Brown page. Eighteen people came by to read Civil Disobedience from Wikipedia, including one who did it four times over two hours (another went there after having come to Herald… from Wikipedia first). Three people came by to look at the Excerpts from Thoreau’s journals from Wikipedia but only one made it past the index. One person came to see Thoreau’s The Service and one to see his Slavery in Massachusetts (twice) from Wikipedia. The only non-Thoreau Wikipedia page to send any visitors here last Tuesday was the page on Julia Butterfly Hill, which brought one person here to view my article about her tax resistance (15 October 2003).

The biggest category of referrer was the search engine, most typically Google, with a bit of Yahoo! (a 37:4 ratio) and a single MSN Live Search. The searches that drove people to The Picket Line last Tuesday were:

“spending habits of Russia” (21 February 2005)
“how to avoid paying federal income taxes” (my how-to guide is Yahoo!’s #1 result)
“hiroshima bombing morals” (6 August 2005)
“rose wilder danbury” (26 July 2006)
“picket line” (main page)
“encouraging a soldier to desert” (28 February 2005)
“8880 retirement” or “retirement tax credit head of household” (my page on the subject)
“thoreau's opinion of slavery in massachusetts” (Slavery in Massachusetts)
“transatlantic cable thoreau quotations” (Excerpts from Thoreau’s journals (1858))
“robert mcgee” or “robert mcgee tax” (6 July 2006)
“british women anti-slavery politics sugar boycott” or “Sugar and the Carribean” (11 January 2005)
“march of the abolitionst” (my 11 January 2005 entry also had that typo)
“people who helped solved picket line problems” (19 March 2006)
“Assaults on picket line cases in USA” (my “best of” page is, oddly, the #1 Google result)
“pictures of violence in the picket line” (23 July 2004)
“The Picket LIne” (my main page is the #1 Google result)
“Tolstoy letter to Nicolas 2” (Letter to the Liberals)
“How To STOP Paying Federal Income Tax LEGALLY” or “don't pay income tax” or “i don't pay income tax” or “howto taxes” (for each, my how-to guide is Google’s #1 result)
“"Slavery in massachusetts"” or “slavery in Massachusetts” or “"slavery in massachusetts" full text” (Slavery in Massachusetts)
“can i take phone expense off for home business on my federal tax” (my how-to guide)
“moonlighting tax documentation” (10 January 2006)
“debt collection gimmicks” (31 October 2006)
“"Those who, while they disapprove of the character” (Resistance to Civil Government)
“, "The Paradise within the Reach of all Men, without Labor, by Powers of Nature and Machinery. An Address to all Intelligent Men" by J.A. Etzler” (Paradise (To Be) Regained)
“false pays no federal income tax” (my FAQ)
“dave gross” or “David Gross” (main page)
“thoreau reform age” (Reform and the Reformers)
“picket line” (main page)
“picket line ethics” (6 July 2006)
“"look homeward, america"” (8 January 2007)
“1040 walkthrough” (19 March 2006)
“Petrulionis” (20 January 2007)

In addition to these were image searches, again through Google and Yahoo! exclusively:

“federal discretionary budget” or “discretionary budget graph” (the “Death and Taxes” graphic I featured on 18 September 2006)
“womens suffrage” or “women suffrage” (the image of the Women’s Tax Resistance League poster from 4 November 2005)
“income tax where it goes” or “U.S. income pie graph” (the War Resisters League pie chart I reproduced on 27 February 2006)
“increase defense spending” (a chart on 2 September 2004)
“janet jackson” or “janet jackson and justin timberlake” (a deceptively-captioned image on 3 February 2004)
“billboards cuba” or “billboards” (an example on 18 December 2004)
“go fuck yourself” (17 January 2006)
“You Talk of Sacrifice...He Knew the Meaning of Sacrifice” (an old war propaganda poster image from the 24 September 2003 entry)
“julia butterfly” or “Julia Butterfly Hill” (an image from 15 October 2003)
“EITC graph” (the #1 result is the graph from my 19 February 2006 entry)
“rofl” (an image from my site that had been hotlinked by someone who left a comment on an article at the Atlanta Journal-Constitution’s website)
“current wars or protests or anything happening in libya” (an image from my site that had been hotlinked by someone who left a comment on John Cole’s Balloon Juice)
(There were three other cases of hotlinked images that were retrieved as orphans from The Picket Line)

Only in one of those many search referrals did someone, after visiting the page they found through the search engine, stick around and visit any other pages on the site.

Aside from these, there were miscellaneous referrals. A link I left in a comment to a blog post elsewhere back in October, 2005 gained me a couple of visitors. Claire Wolfe’s blog sent a reader here, as did her discussion board, and a link from an article she wrote for Backwoods Home Magazine. The Sparing Change blog sent two readers here, and nonviolence.org sent one. AllExperts, a para-site that pulls pages from Wikipedia and surrounds them with ads, sent one reader this way via a copied-from-Wikipedia link. The Northern California War Tax Resistance links page sent the most curious reader of this bunch here, who then visited four other pages on-site before departing.

One person came here from a link in their Yahoo! Mail, and one from a link in their GMail.

Fifty-four visitors came without giving me any idea of where they were coming from — a link? a bookmark? typing it in by hand? Fifteen of these came first to my main page and then visited one or a few recently-posted entries (this is pretty much the use case the blog was designed for). Most other visitors came in to a specific page, got it, and left without looking at anything else. A few stuck around and browsed for a while.

What does this all add up to? The logs give me evidence of roughly 125–130 regular readers, most of whom keep track of the blog through one of its syndicated feeds, but some of whom visit my site periodically to catch up on the latest entries. In many cases of feed aggregation sites I have no way of knowing how many readers there are, so the 125–130 figure is only a minimum.

In addition to this, sixty-two people came here in response to a search engine result, thirty people came here seeking more information after viewing a Wikipedia page, ten people followed various other links salted elsewhere around the web to get here, and at least two people followed a link someone sent them in email. This in addition to thirty-nine people who showed up without giving any indication of where they were coming from.

All told, roughly 270 known readers last Tuesday — if you don’t count the bots, spiders, probes, hotlinked images, and such — and an unknown number of unknowns. Fairly modest by blog standards, I imagine, but that’s what I get for having such specialized subject matter. I’m sure we make up in quality what we lack in quantity.