Analyzing the Picket Line Server Logs

Who reads The Picket Line, how, and why? Every once in a while someone asks me, and I never have an answer. I’m finally curious enough to spend some time scanning through the logs. (If you’re not curious, stop reading now, as it’s going to get boring quickly.)

This is a more complex question than it used to be, now that many people do their blog reading through feed aggregators rather than by going to the blog itself. Because of this, much of what I see in my logs isn’t people coming to view the site, but aggregators and search engines coming in to pull content from here to show to their users.

I analyzed my logs from . Here’s what I found:

The most fastidious visitors were the Yahoo! and Google search engine spiders, which cataloged 831 and 799 pages of The Picket Line respectively (I only count 707 pages on the site myself, so this is thorough). Microsoft’s spider cataloged 186 pages. Other search engines to stop by included Kosmix, Voila, Exalead, StackRambler, Majestic-12, and GAIS but those sites only looked at a handful of pages.

In addition, two image-specific spiders dropped by to look for new images to grab, including Yahoo!’s image crawler and the Kinja image bot.

Many other spiders and bots visited that are more specialized, and that crawl the web for purposes other than creating general-purpose public search engines. Among those to visit The Picket Line were the “Adult” web index crawler Eonpal, the “Kyluka crawler,” Yahoo! Slurp, WebCorp, YodaoBot, Alexa’s ia_archiver, Blog Carnival Index, something called “PyQuery,” the IRL-crawler, the Brandimensions robot, and the sogou spider.

I was also visited by at least one hostile probe (I would probably have found more if I’d checked my error log) that successfully downloaded an uncreatively-named JavaScript file from my site to scan it for vulnerabilities.

Many feed syndicators, aggregators, searchers, and robotic plagiarizers came by to look at the variety of syndicated content feeds I offer. These included InfoSquire, NewzCrawler, Tailrank, AllResearch, Google’s Feedfetcher, Bloglines, NewsGator, LiveJournal, NewsOnFeeds, Kinja, Chello, Blogrunner, Blogdigger, TopicBlogs, Sphere, mioNews, Swamii, Modwest, Plazoo, RSSMicro, Netvibes, Gregarius, LaughingSquid, BoardReader, Technorati, syndic8, BlogPulse, StrategicBoard and the more academic Blog Conversation Project. Feels like around here.

Some of these aggregators include a note in their referrer headers to indicate how many people they serve my feeds to. For instance, Google tells me that 26 people read this blog in Google Reader, Bloglines reports 20 readers, NewsGator can’t decide whether it’s four or five readers there, LiveJournal reports 28, Kinja has one lonely reader, and Netvibes three. Many of these people may read The Picket Line regularly but only seldom actually visit the site it’s hosted on.

Occasionally, I’d actually see someone from one of these aggregators read my feed, as they’d pull a supporting image from my site rather than from the aggregator’s. The Taxblogger aggregation feed is like this, as are (sometimes anyway) Google’s various feed readers.

On top of this, some people use their browsers as feed readers, or they use specialized software like SharpReader, Omea Pro, RSS Bandit, JetBrains, Google Desktop, or Liferea to make their computer a personal aggregator. About 30 people seemed to read my site this way, with some individuals pulling a feed from my site as many as 20 times during the day.

Back to the robot category, there were a couple of phony referrers — these are typically bots that visit your website, pretending that they have gotten there by following a link from some site they are promoting, and hoping that I, as the webmaster viewing the visitor logs, will be curious enough to take a look. I got two of these; one was for a nonexistent site, another was for a diploma mill that was already unreachable today.

Some people still surf the web the old-fashioned way, by visiting web pages and reading them. Of these, many came here from following links at Wikipedia. Six people visited Thoreau’s Herald of Freedom page by following the link on Wikipedia’s page about the essay. One person came to the Thoreau on John Brown page by following a link on Wikipedia’s A Plea for Captain John Brown page. Eighteen people came by to read Civil Disobedience from Wikipedia, including one who did it four times over two hours (another went there after having come to Herald… from Wikipedia first). Three people came by to look at the Excerpts from Thoreau’s journals from Wikipedia but only one made it past the index. One person came to see Thoreau’s The Service and one to see his Slavery in Massachusetts (twice) from Wikipedia. The only non-Thoreau Wikipedia page to send any visitors here last Tuesday was the page on Julia Butterfly Hill, which brought one person here to view my article about her tax resistance ().

The biggest category of referrer was the search engine, most typically Google, with a bit of Yahoo! (a 37:4 ratio) and a single MSN Live Search. The searches that drove people to The Picket Line were:

In addition to these were image searches, again through Google and Yahoo! exclusively:

  • “federal discretionary budget” or “discretionary budget graph” (the “Death and Taxes” graphic I featured on )
  • “womens suffrage” or “women suffrage” (the image of the Women’s Tax Resistance League poster from )
  • “income tax where it goes” or “U.S. income pie graph” (the War Resisters League pie chart I reproduced on )
  • “increase defense spending” (a chart on )
  • “janet jackson” or “janet jackson and justin timberlake” (a deceptively-captioned image on )
  • “billboards cuba” or “billboards” (an example on )
  • “go fuck yourself” ()
  • “You Talk of Sacrifice...He Knew the Meaning of Sacrifice” (an old war propaganda poster image from the entry)
  • “julia butterfly” or “Julia Butterfly Hill” (an image from )
  • “EITC graph” (the #1 result is the graph from my entry)
  • “rofl” (an image from my site that had been hotlinked by someone who left a comment on an article at the Atlanta Journal-Constitution’s website)
  • “current wars or protests or anything happening in libya” (an image from my site that had been hotlinked by someone who left a comment on John Cole’s Balloon Juice)
  • (There were three other cases of hotlinked images that were retrieved as orphans from The Picket Line)

Only in one of those many search referrals did someone, after visiting the page they found through the search engine, stick around and visit any other pages on the site.

Aside from these, there were miscellaneous referrals. A link I left in a comment to a blog post elsewhere back in gained me a couple of visitors. Claire Wolfe’s blog sent a reader here, as did her discussion board, and a link from an article she wrote for Backwoods Home Magazine. The Sparing Change blog sent two readers here, and sent one. AllExperts, a para-site that pulls pages from Wikipedia and surrounds them with ads, sent one reader this way via a copied-from-Wikipedia link. The Northern California War Tax Resistance links page sent the most curious reader of this bunch here, who then visited four other pages on-site before departing.

One person came here from a link in their Yahoo! Mail, and one from a link in their GMail.

Fifty-four visitors came without giving me any idea of where they were coming from — a link? a bookmark? typing it in by hand? Fifteen of these came first to my main page and then visited one or a few recently-posted entries (this is pretty much the use case the blog was designed for). Most other visitors came in to a specific page, got it, and left without looking at anything else. A few stuck around and browsed for a while.

What does this all add up to? The logs give me evidence of roughly 125–130 regular readers, most of whom keep track of the blog through one of its syndicated feeds, but some of whom visit my site periodically to catch up on the latest entries. In many cases of feed aggregation sites I have no way of knowing how many readers there are, so the 125–130 figure is only a minimum.

In addition to this, sixty-two people came here in response to a search engine result, thirty people came here seeking more information after viewing a Wikipedia page, ten people followed various other links salted elsewhere around the web to get here, and at least two people followed a link someone sent them in email. This in addition to thirty-nine people who showed up without giving any indication of where they were coming from.

All told, roughly 270 known readers  — if you don’t count the bots, spiders, probes, hotlinked images, and such — and an unknown number of unknowns. Fairly modest by blog standards, I imagine, but that’s what I get for having such specialized subject matter. I’m sure we make up in quality what we lack in quantity.