Computers & Internet

Voters and Lurkers

Debating online, whether it be through message boards or blogs or any other method, can be rewarding, but it can also be quite frustrating. When most people think of a debate, they think of a group arguing an opponent, and one of the two factions “winning” the argument. It’s a process of expression in which different people with different points of view will express their opinions, and are criticised by one another.

I’ve often found that specific threads tend to boil down to a point where the argument is going back and forth between two sole debaters (with very few interruptions from others). Inevitably, the debate gets to the point where both sides’ assumptions (or axioms) have been exposed, and neither side is willing to agree with the other. To the debaters, this can be intensely frustrating. As such, anyone who has spent a significant amount of time debating others online can usually see that they’re probably never going to convince their opponents. So who wins the argument?

The debaters can’t decide who wins – they obviously think their argument is better than their opponents (or, at the very least, are unwilling to admit it) and so everyone thinks that they “won.” But the debaters themselves don’t “win” an argument, it’s the people witnessing the debate that are the real winners. They decide which arguments are persuasive and which are not.

This is what the First Amendment of the US Constitution is based on, and it is a fundamental part of our democracy. In a vigorous marketplace of ideas, the majority of voters will discern the truth and vote accordingly.

Unfortunately, there never seems to be any sort of closure when debating online, because the audience is primarily comprised of lurkers, most of whom don’t say anything (plus, there are no votes), and so it seems like nothing is accomplished. However, I assure you that is not the case. Perhaps not for all lurkers, but for a lot of them, they are reading the posts with a critical eye and coming out of the debate convinced one way or the other. They are the “voters” in an online debate. They are the ones who determine who won the debate. In a scenario where only 10-15 people are reading a given thread, this might not seem like much (and it’s not), but if enough of these threads occur, then you really can see results…

I’m reminded of Benjamin Franklin’s essay “An apology for printers,” in which Franklin defended those who printed allegedly offensive opinion pieces. His thought was that very little would be printed if publishers only produced things that were not offensive to anybody.

Printers are educated in the Belief, that when Men differ in Opinion, both sides ought equally to have the Advantage of being heard by the Public; and that when Truth and Error have fair Play, the former is always an overmatch for the latter.

What is a Weblog, Part II

What is a weblog? My original thoughts leaned towards thinking of blogs as a genre within the internet. Like all genres, there is a common set of conventions that define the blogging genre, but the boundaries are soft and some sites are able to blur the lines quite thoroughly. Furthermore, each individual probably has their own definition as to what constitutes a blog (again similar to genres). The very elusiveness of a definition for blog indicates that perception becomes an important part of determining whether or not something is a blog. It has become clear that there is no one answer, but if we spread the decision out to a broad number of people, each with their own independent definition of blog, we should be able to come to the conclusion that a borderline site like Slashdot is a blog because most people call it a blog.

So now that we have a (non)definition for what a blog is, just how important are blogs? Caesar at Arstechnica writes that according to a new poll, Americans are somewhat ambivalent on blogs. In particular, they don’t trust blogs.

I don’t particularly mind this, however. For the most part, blogs don’t make much of an effort to be impartial, and as I’ve written before, it is the blogger’s willingness to embrace their subjectivity that is their primary strength. Making mistakes on a blog is acceptable, so long as you learn from your mistakes. Since blogs are typically more informal, it’s easier for bloggers to acknowledge their mistakes.

Lexington Green from ChicagoBoyz recently wrote about blogging to a writer friend of his:

To paraphrase Truman Capote’s famous jibe against Jack Kerouac, blogging is not writing, it is typing. A writer who is blogging is not writing, he is blogging. A concert pianist who is sitting down at the concert grand piano in Carnegie Hall in front of a packed house is the equivalent to an author publishing a finished book. The same person sitting down at the piano in his neighborhood bar on a Saturday night and knocking out a few old standards, doing a little improvisation, and even doing some singing — that is blogging. Same instrument — words, piano — different medium. We forgive the mistakes and wrong-guesses because we value the immediacy and spontaneity. Plus, publish a book, it is fixed in stone. Write a blog post you later decide is completely wrong, it is actually good, since it gives you a good hook for a later post explaining your thoughts that led to the changed conclusion. The essence of a blog is to air things informally, to throw things out, to say “this interests me because …” From time to time a more considered and article-like post is good. But most people read blogs by skimming. If a post is too long, in my observation, it does not get much response and may not be read at all.

Of course, his definition of what a blog is could be argued (as there are some popular and thoughtful bloggers who routinely write longer, more formal essays), but it actually struck me as being an excellent general description of blogging. Note his favorable attitude towards mistakes (“it gives you a good hook for a later post” is an excellent quote, though I think you might have to be a blogger to fully understand it). In the blogosphere, it’s ok to be wrong:

Everyone makes mistakes. It’s a fact of life. It isn’t a cause for shame, it’s just reality. Just as engineers are in the business of producing successful designs which can be fabricated out of less-than-ideal components, the engineering process is designed to produce successful designs out of a team made up of engineers every one of which screws up routinely. The point of the process is not to prevent errors (because that’s impossible) but rather to try to detect them and correct them as early as possible.

There’s nothing wrong with making a mistake. It’s not that you want to be sloppy; everyone should try to do a good job, but we don’t flog people for making mistakes.

The problem with the mainstream media is that they purport to be objective, as if they’re just reporting the facts. Striving for objectivity can be a very good thing, but total objectivity is impossible, and if you deny the inherent subjectivity in journalism, then something is lost.

One thing Caesar mentions is that “the sensationalism surrounding blogs has got to go. Blogs don’t solve world hunger, cure disease, save damsels in distress, or any of the other heroic things attributed to them.” I agree with this too, though I do think there is something sensational about blogs, or more generally, the internet.

Steven Den Beste once wrote about what he thought were the four most important inventions of all time:

In my opinion, the four most important inventions in human history are spoken language, writing, movable type printing and digital electronic information processing (computers and networks). Each represented a massive improvement in our ability to distribute information and to preserve it for later use, and this is the foundation of all other human knowledge activities. There are many other inventions which can be cited as being important (agriculture, boats, metal, money, ceramic pottery, postmodernist literary theory) but those have less pervasive overall affects.

Regardless of whether or not you agree with the notion that these are the most important inventions, it is undeniable that the internet provides a stairstep in communication capability, which, in turn, significantly improves the process of large-scale collaboration that is so important to human existence.

When knowledge could only spread by speech, it might take a thousand years for a good idea to cross the planet and begin to make a difference. With writing it could take a couple of centuries. With printing it could happen in fifty years.

With computer networks, it can happen in a week if not less. After I’ve posted this article to a server in San Diego, it will be read by someone on the far side of a major ocean within minutes. That’s a radical change in capability; a sufficient difference in degree to represent a difference in kind. It means that people all over the world can participate in debate about critical subjects with each other in real time.

And it appears that blogs, with their low barrier to entry and automated software processes, will play a large part in the worldwide debate. There is, of course, a ton of room for improvement, but things are progressing rapidly now and perhaps even accelerating. It is true that some blogging proponents are preaching triumphalism, but that’s part of the charm. They’re allowed to be wrong and if you look closely at what happens when someone makes such a comment, you see that for every exaggerated claim, there are 10 counters in other blogs that call bullshit. Those blogs might be on the long tail and probably won’t garner as much attention, but that’s part of the point. Blogs aren’t trustworthy, which is precisely why they’re so important.

Update 4.24.05: I forgot to link the four most important inventions article (and I changed some minor wording: I had originally referred to the four “greatest” inventions, which was not the wording Den Beste had used).

What is a Weblog?

Caesar at ArsTechnica has written a few entries recently concerning blogs which interested me. The first simply asks: What, exactly, is a blog? Once you get past the overly-general definitions (“a blog is a frequently updated webpage”), it becomes a surprisingly difficult question.

Caesar quotes Wikipedia:

A weblog, web log or simply a blog, is a web application which contains periodic time-stamped posts on a common webpage. These posts are often but not necessarily in reverse chronological order. Such a website would typically be accessible to any Internet user. “Weblog” is a portmanteau of “web” and “log”. The term “blog” came into common use as a way of avoiding confusion with the term server log.

Of course, as Caesar notes, the majority of internet sites could probably be described in such a way. What differentiates blogs from discussion boards, news organizations, and the like?

Reading through the resulting discussion provides some insight, but practically every definition is either too general or too specific.

Many people like to refer to Weblogs as a medium in itself. I can see the point, but I think it’s more general than that. The internet is the medium, whereas a weblog is basically a set of commonly used conventions used to communicate through that medium. Among the conventions are things like a main page with chronological posts, permalinks, archives, comments, calendars, syndication (RSS), blogging software (CMS), trackbacks, &c. One problem is that no single convention is, in itself, definitive of a weblog. It is possible to publish a weblog without syndication, comments, or a calendar. Depending on the conventions being eschewed, such blogs may be unusual, but may still be just as much a blog as any other site.

For lack of a better term, I tend to think of weblogs as a genre. This is, of course, not totally appropriate but I think it does communicate what I’m getting at. A genre is typically defined as a category of artistic expression marked by a distinctive style, form, or content. However, anyone who is familiar with genre film or literature knows that there are plenty of movies or books that are difficult to categorize. As such, specific genres such as horror, sci-fi, or comedy are actually quite inclusive. Some genres, Drama in particular, are incredibly broad and are often accompanied by the conventions of other genres (we call such pieces “cross-genre,” though I think you could argue that almost everything incorporates “Drama”). The point here is that there is often a blurry line between what constitutes one genre from another.

On the medium of the internet, there are many genres, one of which is a weblog. Other genres include commercial sites (i.e. sites that try to sell you things, Amazon.com, Ebay, &c.), reference sites (i.e. dictionaries & encyclopedias), Bulletin Board Systems and Forums, news sites, personal sites, weblogs, wikis, and probably many, many others.

Any given site is probably made up of a combination of genres and it is often difficult to pinpoint any one genre as being representative. Take, for example, Kaedrin.com. It is a personal site with some random features, a bunch of book & movie reviews, a forum, and, of course, a weblog (which is what you’re reading now). Everything is clearly delineated here at Kaedrin, but other sites blur the lines between genres on every page. Take ArsTechnica itself: Is it a news site or a blog or something else entirely? I would say that the front page is really a combination of many different things, one of which is a blog. It’s a “cross-genre” webpage, but that doesn’t necessarily make it any less effective (though there is something to be said for simplicity and it is quite possible to load a page up with too much stuff, just as it’s possible for a book or movie to be too ambitious and take on too much at once) just as Alien isn’t necessarily a less effective Science Fiction film because it incorporates elements of Horror and Drama (or vice-versa).

Interestingly, much of what a weblog is can be defined as an already existing literary genre: the journal. People have kept journals and diaries all throughout history. The major difference between a weblog and a journal is that a weblog is published for all to see on the public internet (and also that weblogs can be linked together through the use of the hyperlink and the infrastructure of the internet). Historically, diaries were usually private, but there are notable exceptions which have been published in book form. Theoretically, one could take such diaries and publish them online – would they be blogs? Take, for instance, The Diary of Samuel Pepys which is currently being published daily as if it’s a weblog circa 1662 (i.e. Today’s entry is dated “Thursday 17 April 1662”). The only difference is that the author of that diary is dead and thus doesn’t interact or respond to the rest of the weblog community (though there is still interaction allowed in the form of annotations).

A few other random observations about blogs:

  • Software: Many people brought up the fact that most blogs are produced with the assistance of Weblogging Software, such as Blogger or Movable Type. From my perspective, such tools are necessary for the spread of weblogs, but shouldn’t be a part of the definition. They assist in the spread of weblogs because they automate the overly-technical details of publishing a website and make it easy for normal folks to participate. They’re also useful for automatically propagating weblog conventions like permalinks, comments, trackbacks, and archives. However, it’s possible to do all of this without the use of blogging specific software and it’s also possible to use blogging software for other purposes (for instance, Kaedrin’s very own Tandem Stories are powered by Movable Type). It’s interesting that other genres have their own software as well, particularly bulletin boards and forums. Ironically, one could use such BBS software to publish a blog (or power tandem stories), if they were so inclined. The Pepys blog mentioned above actually makes use of wiki software (though that software powers the entries, it’s mostly used to allow annotations). To me content management systems are important, but they don’t define so much as propagate the genre.
  • Personality: One mostly common theme in definitions is that weblogs are personal – they’re maintained by a person (or small group of people), not an official organization. A personality gets through. There is also the perception that a blog is less filtered than official communications. Part of the charm of weblogs is that you can be wrong (more on this later, possibly in another post). I’m actually not sure how important this is to the definition of a blog. Someone who posts nothing but links doesn’t display much of a personality, except through more subtle means (the choice of links can tell you a lot about an individual, albeit in an indirect way that could lead to much confusion).
  • Communities: Any given public weblog is part of a community, whether it wants to be or not. The boundaries of any specific weblog are usually well delineated, but since weblogs are part of the internet, which is an on-demand medium (as opposed to television or radio, which are broadcast), blogs are often seen as relative to one another. Entries and links from different blogs are aggregated, compared, correlated and published in other weblogs. Any blog which builds enough of a readership provides a way connect people who share various interests through the infrastructure of the internet.

Some time ago, Derek Powazek asked What the Hell is a Weblog? You tell me. and published all the answers. It turns out that I answered this myself (last one on that page), many years ago:

I don’t care what the hell a weblog is. It is what I say it is. Its something I update whenever I find an interesting tidbit on the web. And its fun. So there.

Heh. Interesting to note that my secondary definition there (“something I update whenever I find an interesting tidbit on the web”) has changed significantly since I contributed that definition. This is why, I suppose, I had originally supplied the primary definition (“I don’t care what the hell a weblog is. It is what I say it is.”) and to be honest, I don’t think that’s changed (though I guess you could call that definition “too general”). Blogging is whatever I want it to be. Of course, I could up and call anything a blog, but I suppose it is also required that others perceive your blog as a blog. That way, the genre still retains some shape, but is still permeable enough to allow some flexibility.

I had originally intended to make several other points in this post, but since it has grown to a rather large size, I’ll save them for other posts. Hopefully, I’ll gather the motivation to do so before next week’s scheduled entry, but there’s no guarantee…

Accelerating Change

Slashdot links to a fascinating and thought provoking one hour (!) audio stream of a speech “by futurist and developmental systems theorist, John Smart.” The talk is essentially about the future of technology, more specifically information and communication technology. Obviously, there is a lot of speculation here, but it is interesting so long as you keep it in the “speculation” realm. Much of this is simply a high-level summary of the talk with a little commentary sprinkled in.

He starts by laying out some key motivations or guidelines to thinking about this sort of thing, and he paraphrases David Brin (and this is actually paraphrasing Smart):

We need a pragmatic optimism, a can-do attitude, a balance between innovation and preservation, honest dialogue on persistent problems, … tolerance of the imperfect solutions we have today, and the ability to avoid both doomsaying and a paralyzing adherence to the status quo. … Great input leads to great output.

So how do new systems supplant the old? They do useful things with less matter, less energy, and less space. They do this until they reach some sort of limit along those axes (a limitation of matter, energy, or space). It turns out that evolutionary processes are great at this sort of thing.

Smart goes on to list three laws of information and communication technology:

  1. Technology learns faster than you do (on the order of 10 million times faster). At some point, Smart speculates that there will be some sort of persistent Avatar (neural-net prosthesis) that will essentially mimic and predict your actions, and that the “thinking” it will do (pattern recognitions, etc…) will be millions of times faster than what our brain does. He goes on to wonder what we will look like to such an Avatar, and speculates that we’ll be sort of like pets, or better yet, plants. We’re rooted in matter, energy, and space/time and are limited by those axes, but our Avatars will have a large advantage, just as we have a large advantage over plants in that respect. But we’re built on top of plants, just as our Avatars will be built on top of us. This opens up a whole new can of worms regarding exactly what these Avatars are, what is actually possible, and how they will be perceived. Is it possible for the next step in evolution to occur in man-made (or machine-made) objects? (This section is around 16:30 in the audio)
  2. Human beings are catalysts rather than controllers. We decide which things to accelerate and which to slow down, and this is tremendously important. There are certain changes that are evolutionarily inevitable, but the path we take to reach those ends is not set and can be manipulated. (This section is around 17:50 in the audio)
  3. Interface is extremely important and the goal should be a natural high-level interface. His example is calculators. First generation calculators simply automate human processes and take away your math skills. Second generation calculators like Mathematica allow you to get a much better look at the way math works, but the interface “sucks.” Third generation calculators will have a sort of “deep, fluid, natural interface” that allows a kid to have the understanding of a grad student today. (This section is around 20:00 in the audio)

Interesting stuff. His view is that most social and technological advances of the last 75 years or so are more accelerating refinements (changes in the microcosm) rather than disruptive changes (changes in the macrocosm). Most new technological advances are really abstracted efficiencies – it’s the great unglamorous march of technology. They’re small and they’re obfuscated by abstraction, thus many of the advances are barely noticed.

This about halfway through the speech, and he goes on to list many examples and he explores some more interesting concepts. Here are some bits I found interesting.

  • He talks about transportation and energy, and he argues that even though, on a high level we haven’t advanced much (still using oil, natural gas – fossil fuels), there has actually been a massive amount of change, but that the change is mostly hidden in abstracted accelerating efficiencies. He mentions that we will probably have zero-emission fossil fuel vehicles 30-40 years from now (which I find hard to believe) and that rather than focusing on hydrogen or solar, we should be trying to squeeze more and more efficiency out of existing systems (i.e. abstracted efficiencies). He also mentions population growth as a variable in the energy debate, something that is rarely done, but if he is correct that population will peak around 2050 (and that population density is increasing in cities), then that changes all projections about energy usage as well. (This section is around 31:50-35 in the audio) He talks about hybrid technologies and also autonomous highways as being integral in accelerating efficiencies of energy use (This section is around 37-38 in the audio) I found this part of the talk fascinating because energy debates are often very myopic and don’t consider things outside the box like population growth and density, autonomous solutions, phase shifts of the problem, &c. I’m reminded of this Michael Crichton speech where he says:

    Let’s think back to people in 1900 in, say, New York. If they worried about people in 2000, what would they worry about? Probably: Where would people get enough horses? And what would they do about all the horseshit? Horse pollution was bad in 1900, think how much worse it would be a century later, with so many more people riding horses?

    None of which is to say that we shouldn’t be pursuing alternative energy technology or that it can’t supplant fossil fuels, just that things seem to be trending towards making fossil fuels more efficient. I see hybrid technology becoming the major enabler in this arena, possibly followed by the autonomous highway (that controls cars and can perhaps give an extra electric boost via magnetism). All of which is to say that the future is a strange thing, and these systems are enormously complex and are sometimes driven by seemingly unrelated events.

  • He mentions an experiment in genetic algorithms used for process automation. Such evolutionary algorithms are often used in circuit design and routing processes to find the most efficient configuration. He mentions one case where someone made a mistake in at the quantum level of a system, and when they used the genetic algorithm to design the circuit, they found that the imperfection was actually exploited to create a better circuit. These sorts of evolutionary systems are robust because failure actually drives the system. It’s amazing. (This section is around 47-48 in the audio)
  • He then goes on to speculate as to what new technologies he thinks will represent disruptive change. The first major advance he mentions is the development of a workable LUI – a language-based user interface that utilizes a natural language that is easily understandable by both the average user and the computer (i.e. a language that doesn’t require years of study to figure out, a la current programming languages). He thinks this will grow out of current search technologies (perhaps in a scenario similar to EPIC). One thing he mentions is that the internet right now doesn’t give an accurate represtenation of the wide range of interests and knowledge that people have, but that this is steadily getting better over time. As more and more individuals, with more and more knowledge, begin interacting on the internet, they begin to become a sort of universal information resource. (This section is around 50-53 in the audio)
  • The other major thing he speculates about is the development of personality capture and parallel computing, which sort of integrates with the LUI. This is essentially the Avatar I mentioned earlier which mimics and predicts your actions.

As always, we need to keep our feet on the ground here. Futurists are fun to listen to, but it’s easy to get carried away. The development of a LUI and a personality capture system would be an enormous help, but we still need good information aggregation and correlation systems if we’re really going to progress. Right now the problem is finding the information we need, and analyzing the information. A LUI and personality capture system will help with the finding of information, but not so much with the analysis (the separating of the signal from the noise). As I mentioned before, the speech is long (one hour), but it’s worth a listen if you have the time…

A tale of two software projects

A few weeks ago, David Foster wrote an excellent post about two software projects. One was a failure, and one was a success.

The first project was the FBI’s new Virtual Case File system; a tool that would allow agents to better organize, analyze and communicate data on criminal and terrorism cases. After 3 years and over 100 million dollars, it was announced that the system may be totally unusable. How could this happen?

When it became clear that the project was in trouble, Aerospace Corporation was contracted to perform an independent evaluation. It recommended that the software be abandoned, saying that “lack of effective engineering discipline has led to inadequate specification, design and development of VCF.” SAIC has said it believes the problem was caused largely by the FBI: specifically, too many specification changes during the development process…an SAIC executive asserted that there were an average of 1.3 changes per day during the development. SAIC also believes that the current system is useable and can serve as a base for future development.

I’d be interested to see what the actual distribution of changes were (as opposed to the “average changes per day”, which seems awfully vague and somewhat obtuse to me), but I don’t find it that hard to believe that this sort of thing happened (especially because the software development firm was a separate entity). I’ve had some experience with gathering requirements, and it certainly can be a challenge, especially when you don’t know the processes currently in place. This does not excuse anything, however, and the question remains: how could this happen?

The second project, the success, may be able to shed some light on that. DARPA was tapped by the US Army to help protect troops from enemy snipers. The requested application would spot incoming bullets and identify their point of origin, and it would have to be easy to use, mobile, and durable.

The system would identify bullets from their sound..the shock wave created as they travelled through the air. By using multiple microphones and precisely timing the arrival of the “crack” of the bullet, its position could, in theory, be calculated. In practice, though, there were many problems, particularly the high levels of background noise–other weapons, tank engines, people shouting. All these had to be filtered out. By Thanksgiving weekend, the BBN team was at Quantico Marine Base, collecting data from actual firing…in terrible weather, “snowy, freezing, and rainy” recalls DARPA Program Manager Karen Wood. Steve Milligan, BBN’s Chief Technologist, came up with the solution to the filtering problem: use genetic algorithms. These are a kind of “simulated evolution” in which equations can mutate, be tested for effectivess, and sometimes even “mate,” over thousands of simulated generations (more on genetic algorithms here.)

By early March, 2004, the system was operational and had a name–“Boomerang.” 40 of them were installed on vehicles in Iraq. Based on feedback from the troops, improvements were requested. The system has now been reduced in size, shielded from radio interference, and had its display improved. It now tells soldiers the direction, range, and elevation of a sniper.

Now what was the biggest difference between the remarkable success of the Boomerang system and the spectacular failure of the Virtual Case File system? Obviously, the two projects present very different challenges, so a direct comparison doesn’t necessarily tell the whole story. However, it seems to me that discipline (in the case of the Army) or the lack of discipline (in the case of the FBI) might have been a major contributor to the outcomes of these two projects.

It’s obviously no secret that discipline plays a major role in the Army, but there is more to it than just that. Independence and initiative also play an important role in a military culture. In Neal Stephenson’s Cryptonomicon, the way the character Bobby Shaftoe (a Marine Raider, which is “…like a Marine, only more so.”) interacts with his superiors provides some insight (page 113 in my version):

Having now experienced all the phases of military existence except for the terminal ones (violent death, court-martial, retirement), he has come to understand the culture for what it is: a system of etiquette within which it becomes possible for groups of men to live together for years, travel to the ends of the earth, and do all kinds of incredibly weird shit without killing each other or completely losing their minds in the process. The extreme formality with which he addresses these officers carries an important subtext: your problem, sir, is doing it. My gung-ho posture says that once you give the order I’m not going to bother you with any of the details – and your half of the bargain is you had better stay on your side of the line, sir, and not bother me with any of the chickenshit politics that you have to deal with for a living.

Good military officers are used to giving an order, then staying out of their subordinate’s way as they carry out that order. I didn’t see any explicit measurement, but I would assume that there weren’t too many specification changes during the development of the Boomerang system. Of course, the developers themselves made all sorts of changes to specifics and they also incorporated feedback from the Army in the field in their development process, but that is standard stuff.

I suspect that the FBI is not completely to blame, but as the report says, there was a “lack of effective engineering discipline.” The FBI and SAIC share that failure. I suspect, from the number of changes requested by the FBI and the number of government managers involved, that micromanagement played a significant role. As Foster notes, we should be leveraging our technological abilities in the war on terror, and he suggests a loosely based oversight committe (headed by “a Director of Industrial Mobilization”) to make sure things like this don’t happen very often. Sounds like a reasonable idea to me…

An Exercise in Aggregation

A few weeks ago I collected a ton of posts regarding the Iraqi elections. I did this for a few reasons. The elections were important and I wanted to know how they were going, but I could have just read up on them if that was the only reason. The real reason I made that post was to participate in and observe information aggregation and correlation in real time.

It was an interesting experience, and I learned a few things which should help in future exercises. Some of these are in my control to fix, some will depend on the further advance of technology.

  • Format – It seems to me that simply posting a buttload of links in a long list is not the best way to aggregate and correlate data. It does provide a useful service, it being a central place with links to diverse articles, but it would be much better if the posts were separated into smaller groups. This would better facilitate scanning and would allow those interested to focus on things that interest them. It would also be helpful to indicate threads of debate between different bloggers. For example, it seems that a ton of people responded to Juan Cole’s comments, though I only listed one or two (and I did so in a way that wasn’t exactly efficient).
  • Categorization – One thing that is frustrating about such an exercise is that many blogs are posting up a storm on the subject throughout the day, which means that someone like myself who is attempting to aggregate posts would have to continually check the blog throughout the day as well. Indeed, simply collecting all the links and posting them can be a challenge. What I ended up doing was linking to a few specific posts and then just including a general link to the blog with the instruction to “Keep scrolling.” Dean Esmay demonstrated how bloggers can help aggregation by providing a category page where all of his Iraqi election posts were collected (and each individual post had an index of posts as well). This made things a lot easier for me, as I didn’t have to collect a large number of links. All I had to do is post one link. Unfortunately this is somewhat rare, and given the tools we have to use, it is also understandable. Most people are concerned with getting their voice out there, and don’t want to spend the time devising a categorization scheme. Movable Type 3.x has subcategories, which could help with this, but it takes time to figure this stuff out. Hopefully this is something that will improve in time as more enhancements are made to blogging software.
  • Trackbacks – Put simply, they suck for an exercise like this. For those who don’t know, trackbacks are a way of notifying other websites that you’re linking to them (and a way of indicating that other websites have linked to you). Movable type has a nifty feature that will automatically detect a trackback-enabled blog when you link to it, and set the site to be pinged. This is awesome when you’re linking to a single post or even a handful of posts. However, when I was compiling the links for my Iraqi election post, I naturally had tons of trackbacks to send. I started getting trackback failures that weren’t really failures. And because I was continually updating that post with new data, I ended up sending duplicate pings to the same few blogs (some got as many as five or six extraneous pings). I suppose I could have turned off the auto-detection feature and manually pinged the sites I wanted for that post, but that is hardly convenient.
  • Other notes – There has to be a better way to collect permalinks and generate a list than simply copying and pasting. I’m sure there are some bookmarklets or browser features that could prove helpful, though this would require a little research and a little tweaking to be useful.

Writing that post proved to be a most interesting exercise in aggregation, and I look forward to incorporating some of the lessons learned above in future posts…

Evölutiön

In a stroke of oddly compelling genius (or possibly madness), Jon Udell has put together a remarkable flash screencast (note: there is sound and it looks best in full screen mode) detailing the evolution of the Heavy metal umlaut page on Wikipedia.

It’s a wonderfully silly topic, but my point is somewhat serious too. The 8.5-minute screencast turns the change history of this Wiki page into a movie, scrolls forward and backward along the timeline of the document, and follows the development of several motifs. Creating this animated narration of a document’s evolution was technically challenging, but I think it suggests interesting possibilities.

Wikis are one of those things that just don’t sound right when you hear about what they are and how they work. It’s one thing to institute a collaborative encyclopedia, but Wikis embrace a philosophy of openness that seems entirely too permissive. Wikis are open to the general public and allow anyone to modify their contents without any sort of prior review. What’s to stop a troll from vandalizing a page? Nothing, except that someone will come along and correct it shortly thereafter (Udell covers an episode of vandalism in the screencast). It’s a textbook self-organizing system (note that wikis focus not on the content, but rather on establishing an efficient mechanism for collaboration; the content is an emergent property of the system). It should be interesting to see how it progresses… [via Jonathon Delacour, who also has an interesting discussion about umlauts and diaereses and another older post about wikis]

Long Tails, TV, and DVR

Apparently Chris Anderson (author of the Wired article I posted last week) has a blog in which he comments regularly on the long tail concept.

In one post, he speculates how the long tail relates to television programs, DVRs and the internet. In short, he proposes a browser plugin that you could use when you see a reference to a TV show that you are interested in and want to record. You would simply need to highlight the show title and right-click, where a new option would be available called “Record to DVR,” at which point you could go about setting up your DVR to record the show.

I don’t have a DVR, so perhaps I’m not the best person to comment, but it strikes me that if you’re reading a recommendation for a show, you might want to go back and watch all the previous shows as well. For instance, a lot of people have been recommending Lost to me recently. If I had a PVR, I might set it to record the show, but I’d have missed a significant portion of the show (I don’t know how important that would be or not). What I’d really love is to go back and watch the series from the beginning.

Comcast has a feature called “On Demand” which would be perfect for this, but they don’t seem to have much in the way of capacity (though if you have HBO, I understand they sometimes make whole seasons of various popular shows available) and they don’t have Lost. Evan Kirchoff recently posted something that put an interesting twist on this subject: other people are his PVR. When he finds a show he wants to watch, he simply downloads it via torrents:

What I really wanted all this time, it turns out, is just the assurance that somebody out there in the luminiferous aether is faithfully recording every show, in case I later decide that I want it. Setting a VCR in advance is way too much work, but having to download a 350-megabyte file is an action that’s just affirmative enough to distill one’s preferences.

It’s certainly an interesting perspective – a typical emergent property of the self-organizing internet (along with all the warts that entails) – and it’s a hell of a lot better than waiting for reruns. I don’t have the 400 gigs of hard drive space on my system that Evan does, but I might check out an episode or two. Of course, there’s something to be said about the quality of the watching-tv-on-a-computer experience and, as Evan mentions, I’m not quite sure about the legality of such a practice (his reasoning seems logical, but that doesn’t necessarily mean anything). Perhaps a micropayment solution (i.e. download an episode for a dollar, or one season for $10) would work. Of course, this would destroy the DVD market (which I imagine some people would be none to happy about), but it would also lengthen the tail, as quality niche shows (i.e. the long tail) might be able to carve out a profitable piece of the pie.

The best solution would, of course, combine all the various features above into one application/experience, but I’m not holding my breath just yet.

Chasing the Tail

The Long Tail by Chris Anderson : An excellent article from Wired that demonstrates a few of the concepts and ideas I’ve been writing about recently. One such concept is well described by Clay Shirky’s excellent article Power Laws, Weblogs, and Inequality. A system governed by a power law distribution is essentially one where the power (whether it be measured in wealth, links, etc) is concentrated in a small population (when graphed, the rest of the population’s power values resemble a long tail). This concentration occurs spontaneously, and it is often strengthened because members of the system have an incentive to leverage their power to accrue more power.

In systems where many people are free to choose between many options, a small subset of the whole will get a disproportionate amount of traffic (or attention, or income), even if no members of the system actively work towards such an outcome. This has nothing to do with moral weakness, selling out, or any other psychological explanation. The very act of choosing, spread widely enough and freely enough, creates a power law distribution.

As such, this distribution manifests in all sorts of human endeavors, including economics (for the accumulation of wealth), language (for word frequency), weblogs (for traffic or number of inbound links), genetics (for gene expression), and, as discussed in the Wired article, entertainment media sales. Typically, the sales of music, movies, and books follow a power law distribution, with a small number of hit artists who garner the grand majority of the sales. The typical rule of thumb is that 20% of available artists get 80% of the sales.

Because of the expense of producing the physical product, and giving it a physical point of sale (shelf-space, movie theaters, etc…), this is bad news for the 80% of artists who get 20% of the sales. Their books, movies, and music eventually go out of print and are generally forgotten, while the successful artists’ works are continually reprinted and sold, building on their own success.

However, with the advent of the internet, this is beginning to change. Sales are still governed by the power law distribution, but the internet is removing the physical limitations of entertainment media.

An average movie theater will not show a film unless it can attract at least 1,500 people over a two-week run; that’s essentially the rent for a screen. An average record store needs to sell at least two copies of a CD per year to make it worth carrying; that’s the rent for a half inch of shelf space. And so on for DVD rental shops, videogame stores, booksellers, and newsstands.

In each case, retailers will carry only content that can generate sufficient demand to earn its keep. But each can pull only from a limited local population – perhaps a 10-mile radius for a typical movie theater, less than that for music and bookstores, and even less (just a mile or two) for video rental shops. It’s not enough for a great documentary to have a potential national audience of half a million; what matters is how many it has in the northern part of Rockville, Maryland, and among the mall shoppers of Walnut Creek, California.

The decentralized nature of the internet makes it a much better way to distribute entertainment media, as that documentary that has a potential national (heck, worldwide) audience of half a million people could likely succeed if distributed online. The infrastructure for films isn’t there yet, but it has been happening more in the digital music world, and even in a hybrid space like Amazon.com, which sells physical products, but in a non-local manner. With digital media, the cost of producing and distributing entertainment media goes way down, and thus even average artists can be considered successful, even if their sales don’t approach that of the biggest sellers.

The internet isn’t a broadcast medium; it is on-demand, driven by each individual’s personal needs. Diversity is the key, and as Shirkey’s article says: “Diversity plus freedom of choice creates inequality, and the greater the diversity, the more extreme the inequality.” With respect to weblogs (or more generally, websites), big sites are, well, bigger, but links and traffic aren’t the only metrics for success. Smaller websites are smaller in those terms, but are often more specialized, and thus they do better both in terms of connecting with their visitors (or customers) and in providing a more compelling value to their visitors. Larger sites, by virtue of their popularity, simply aren’t able to interact with visitors as effectively. This is assuming, of course, that the smaller sites do a good job. My site is very small (in terms of traffic and links), but not very specialized, so it has somewhat limited appeal. However, the parts of my site that get the most traffic are the ones that are specialized (such as the Christmas Movies page, or the Asimov Guide). I think part of the reason the blog has never really caught on is that I cover a very wide range of topics, thus diluting the potential specialized value of any single topic.

The same can be said for online music sales. They still conform to a power law distribution, but what we’re going to see is increasing sales of more diverse genres and bands. We’re in the process of switching from a system in which only the top 20% are considered profitable, to one where 99% are valuable. This seems somewhat counterintuitive for a few reasons:

The first is we forget that the 20 percent rule in the entertainment industry is about hits, not sales of any sort. We’re stuck in a hit-driven mindset – we think that if something isn’t a hit, it won’t make money and so won’t return the cost of its production. We assume, in other words, that only hits deserve to exist. But Vann-Adib�, like executives at iTunes, Amazon, and Netflix, has discovered that the “misses” usually make money, too. And because there are so many more of them, that money can add up quickly to a huge new market.

With no shelf space to pay for and, in the case of purely digital services like iTunes, no manufacturing costs and hardly any distribution fees, a miss sold is just another sale, with the same margins as a hit. A hit and a miss are on equal economic footing, both just entries in a database called up on demand, both equally worthy of being carried. Suddenly, popularity no longer has a monopoly on profitability.

The second reason for the wrong answer is that the industry has a poor sense of what people want. Indeed, we have a poor sense of what we want.

The need to figure out what people want out of a diverse pool of options is where self-organizing systems come into the picture. A good example is Amazon’s recommendations engine, and their ability to aggregate various customer inputs into useful correlations. Their “customers who bought this item also bought” lists (and the litany of variations on that theme), more often than not, provide a way to traverse the long tail. They encourage customer participation, allowing customers to write reviews, select lists, and so on, providing feedback loops that improve the quality of recommendations. Note that none of these features was designed to directly sell more items. The focus was on allowing an efficient system of collaborative feedback. Good recommendations are an emergent result of that system. Similar features are available in the online music services, and the Wired article notes:

For instance, the front screen of Rhapsody features Britney Spears, unsurprisingly. Next to the listings of her work is a box of “similar artists.” Among them is Pink. If you click on that and are pleased with what you hear, you may do the same for Pink’s similar artists, which include No Doubt. And on No Doubt’s page, the list includes a few “followers” and “influencers,” the last of which includes the Selecter, a 1980s ska band from Coventry, England. In three clicks, Rhapsody may have enticed a Britney Spears fan to try an album that can hardly be found in a record store.

Obviously, these systems aren’t perfect. As I’ve mentioned before, a considerable amount of work needs to be done with respect to the aggregation and correlation aspects of these systems. Amazon and the online music services have a good start, and weblogs are trailing along behind them a bit, but the nature of self-organizing systems dictates that you don’t get a perfect solution to start, but rather a steadily improving system. What’s becoming clear, though, is that the little guys are (collectively speaking) just as important as the juggernauts, and that’s why I’m not particularly upset that my blog won’t be wildly popular anytime soon.

Everyone Contributes in Some Way

Epic : A fascinating and possibly prophetic flash film of things to come in terms of information aggregation, recommendations, and filtering. It focuses on Google and Microsoft’s (along with a host of others, including Blogger, Amazon, and Friendster) competing contributions to the field. It’s eight minutes long, and well worth the watch. It touches on many of the concepts I’ve been writing about here, including self-organization and stigmergy, but in my opinion it stops just short of where such a system would go.

It’s certainly interesting, but I don’t think it gets it quite right (Googlezon?). Or perhaps it does, but the pessimistic ending doesn’t feel right to me. Towards the end, it claims that a comprehensive social dossier would be compiled by Googlezon (note the name on the ID – Winston Smith) and that everyone would receive customized newscasts which are completely automated. Unfortunately, they forsee majority of these customized newscasts as being rather substandard – filled with inaccuracies, narrow, shallow and sensational. To me, this sounds an awful lot like what we have now, but on a larger (and less manageable) scale. Talented editors, who can navagate, filter, and correlate Googlezon’s contents, are able to produce something astounding, but the problem (as envisioned by this movie) is that far too few people have access to these editors.

But I think that misses the point. Individual editors would produce interesting results, but if the system were designed correctly, in a way that allowed everyone to be editors and a way to implement feedback loops (i.e. selection mechanisms), there’s no reason a meta-editor couldn’t produce something spectacular. Of course, there would need to be a period of adjustment, where the system gets lots of things wrong, but that’s how selection works. In self-organizing systems, failure is important, and it ironically ensures progress. If too many people are getting bad information in 2014 (when the movie is set), all that means is that the selection process hasn’t matured quite yet. I would say that things would improve considerably by 2020.

The film is quite worth a watch. I doubt this specific scenario will play out, but it’s likely that something along these lines will occur. [Via the Commissar]