Computers & Internet

GPL & Asimov’s First Law

Ars Technica reports on a Open source project called GPU. The purpose of this project is to provide an infrastructure for distributed computing (i.e. sharing CPU cycles). The developers of this project are apparently pacifists, and they’ve modified the GPL (the GNU General Public License, which is the primary license for open source software) to make that clear. One of the developers explains it thusly: “The fact is that open source is used by the military industry. Open source operating systems can steer warplanes and rockets. [This] patch should make clear to users of the software that this is definitely not allowed by the licenser.”

Regardless of what you might think about the developers’ intentions, the thing I find strangest about this is the way they’ve chosen to communicate their desires. They’ve modified the standard GPL to include a “patch” which is supposedly for no military use (full text here). Here is what this addition says [emphasis mine]:

PATCH FOR NO MILITARY USE

This patch restricts the field of endeavour of the Program in such a way that this

license collides with paragraph 6 of the Open Source Definition. Therefore, this

modified version of the GPL is no more OSI compliant.

The Program and its derivative work will neither be modified or executed to harm a

ny human being nor through inaction permit any human being to be harmed.

This is Asimov’s first law of Robotics.

This is astoundingly silly, for several reasons. First, as many open source devotees have pointed out (and as the developers themselves even note in the above text), you’re not allowed to modify the GPL. As Ars Technica notes:

Only sentences after their patch comes the phrase, “Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.” This is part of the GPL, and by modifying the license, the developers seem to run afoul of it. The Free Software Foundation has already contacted them about the matter.

Next, Asimov’s laws of robotics were written for autonomous beings called robots. This might seem obvious to some, but apparently not to the developers, who have applied it to software. As Ars notes: “Code is not an autonomous agent that can go around bombing people or hauling them from burning buildings.” Also, Asimov always alluded to the fact that the plain English definitions (which is what the developers used in their “patch”) just gave you the basic idea of what the law did – the code that implemented this functionality in his robots was much more complex.

Third, we have a military for a reason, and their purpose extends far beyond bombing the crap out of people. For example, many major disasters are met with international aid delivered and administered by… military transports and personnel (there are many other examples, but this is a common one that illustrates the point well). Since this software is not allowed, through inaction, to permit any human being from being harmed, wouldn’t the military be justified (if not actually required) to use it? Indeed, this “inaction” clause seems like it could cause lots of unintended consequences.

Finally, Asimov created the laws of robotics in a work of fiction as a literary device that allowed him to have fun with his stories. Anyone who has actually read the robot novels knows that they’re basically just an extended exercise in subverting the three laws (eventually even superseding them with a “zeroth” law). He set himself some reasonable sounding laws, then went to town finding ways to get around them. For crying out loud, he had robots attempting murder on humans all throughout the series. The laws were created precisely to demonstrate how foolish it was to have such laws. Granted, many fictional stories with robots have featured Asimov’s laws (or some variation), but that’s more of an artistic homage (or parody, in a lot of cases). It’s not something you put into a legal document.

Ars notes that not all the developers agree on the “patch,” which is good, I guess. If I were more cynical, I’d say this was just a ploy to get more attention for their project, but I doubt that was the intention. If they were really serious about this, they’d probably have been a little more thorough with their legalese. Maybe in the next revision they’ll actually mention that the military isn’t allowed to use the software.

Update: It seems that someone on Slashdot has similar thoughts:

Have any of them actually read I, Robot? I swear to god, am I in some tiny minority who doesn’t believe that this book was all about promulgating the infallible virtue of these three laws, but was instead a series of parables about the failings that result from codifying morality into inflexible dogma?

And another commenter does too:

From a plain English reading of the text “the program and its derivative work will neither be modified or executed to harm any human being nor through inaction permit any human being to be harmed”, I am forced to conclude that the program will not through inaction allow any human being to be harmed. This isn’t just silly; it’s nonsensical. The Kwik-E-Mart’s being robbed, and the program, through inaction (since it’s running on a computer in another state, and has nothing to do with a convenience store), fails to save Apu from being shot in the leg. Has it violated the terms of it’s own license? What does this clause even mean?

Heh.

IMDB Bookmarklet

In last week’s post, I ended up linking to a whole bunch of movies on the IMDB. The process was somewhat tedious, and I lamented the lack of movable type plugins that would help. There are a few plugins that could potentially help, but not in the exact context I’m looking for (MT-Textile does have some IMDB shortcuts, but they’re for IMDB searches).

So after a looking around, I decided that the best way to go would be to write a bookmarklet that would generate the code to insert a link to IMDB. I’m no expert on this stuff and I’m sure there’s something wrong with the below code, but it appears to work passably well (maybe I should just call it IMDB Bookmarklet – Beta). Basically, all you need to do is go to the movie you want to link to on IMDB, click the bookmarklet in your browser, then copy and paste the text into your post (IE actually has a function that will copy a string directly to your clipboard, but no other browser will do so because of obvious security reasons. Therefore, I simply used a prompt() function to display the generated text which you have to then copy manually.)

This turned out to be something of a pain, mainly because I primarily use the Opera web browser, which is apparently more strict about javascript than any other browser. My first attempt at the bookmarklet appeared to work fine when I just pasted it into the location bar, but when I actually set up the bookmark, it choked. This apparently had something to do with single and double quotes (I thought you were supposed to be able to use both in javascript, but for whatever reason, Opera kept throwing syntax errors.)

Anyway, here’s the code:

javascript:mname=document.title;murl=document.location;mdatepos=mname.lastIndexOf(‘ (‘);if(mdatepos!=-1){mname2=mname.slice(0,mdatepos);}else{mname2=mname;} temp=prompt(‘Copy text for a link to IMDB movie:’,'<a href=\”+murl+’\’ title=\’IMDB: ‘+ mname2 +’\’>’+mname2+'</a>’);focus();

Or just use this link: <a href="javascript:mname=document.title;murl=document.location;mdatepos=mname.lastIndexOf(' (');if(mdatepos!=-1){mname2=mname.slice(0,mdatepos);}else{mname2=mname;} temp=prompt('Copy text for a link to IMDB movie:','‘+mname2+’‘);focus();”>Generate IMDB Link

Again, all you need to do is go to the movie you want to link to on IMDB, click the bookmarklet in your browser, then copy and paste the text into your post. This is the output of the bookmarklet when you use it on IMDB’s Miami Vice page:

<a href=’http://imdb.com/title/tt0430357/’ title=’IMDB: Miami Vice’>Miami Vice</a>

A few nerdy coding things to note here:

  • The link that is generated uses single quotes (‘) instead of the usual double quotes (“). Both work in HTML, but I usually use double quotes and would prefer consistency. However, as previously mentioned, using double quotes does not appear to work in Opera (even when escaped with \”). If you use firefox and want to get double quotes in the generated link, try this:

    javascript:mname=document.title;murl=document.location;mdatepos=mname.lastIndexOf(‘ (‘);if(mdatepos!=-1){mname2=mname.slice(0,mdatepos);}else{mname2=mname;} temp=prompt(‘text’,'<a href=\”‘+murl+’\” title=\”IMDB: ‘+ mname2 +’\”>’+mname2+'</a>’);focus();

  • The code is generated by reading in the page’s URL and title tag. As such, I had to do some manipulation to remove the year from the page’s title (otherwise the link would show up saying Miami Vice (2006). The way I did this may cause problems if a title has an open parentheses, but I tried to account for it. I might change it so that the year shows up in the title attribute of the link, but I don’t think it’s that big of a deal.
  • Foreign movies will still show up with the foreign title. So Sympathy for Mr. Vengeance will show up as Boksuneun naui geot. Personally, I think this still helps, but I don’t see an easy way of generating the link with the English title (and sometimes it’s nice to use the foreign title).
  • Now that I think about it, this would be helpful for linking to Amazon too. It seems like they make it more difficult to link using your Associates ID these days, so an automated way to do so will probably be helpful.

And that’s it. If you’re a javascript or bookmarklet expert and see something wrong with the above, please do let me know.

I realize this post has next to no appeal to the grand majority of my readers, but I ended up spending more time on this than I wanted. I’ll see if I can make another post during the week this week…

Art for the computer age…

I was originally planning on doing a movie review while our gentle web-master is away, but a topic has come up too many times in the past few weeks for me not to write about it.

First it came up in the tag map of Kaedrin, when I noticed that some people were writing pages just to create appealing tag-maps.

Then it came up in Illinois and Louisiana. They’ve passed laws regulating the sale and distribution of “violent games” to minors. This, of course, has led to lawsuits and claims that the law violates free speech.

After that, it was the guys at Penny Arcade. They posted links to We Feel Fine and Listening Post.. Those projects search the internet for blogs (maybe this one?) and pull text from them about feelings, and present those feelings to an audience in different ways. Very interesting.

Finally, it came up when I opened up the July issue of Game Informer, and read Hideo Kojima’s quote:

I believe that games are not art, and will never be art. Let me explain � games will only match their era, meaning what the people of that age want reflects the outcome of the game at that time. So, if you bring a game from 20 years ago out today, no one will say �wow.� There will be some essence where it�s fun, but there won�t be any wows or touching moments. Like a car, for example. If you bring a car from 20 years ago to the modern day, it will be appealing in a classic sense, but how much gasoline it uses, or the lack of air conditioning will simply not be appreciated in that era. So games will always be a kind of mass entertainment form rather than art. Of course, there will be artistic ways of representing games in that era, but it will still be entertainment. However, I believe that games can be a culture that represent their time. If it�s a light era, or a dark era, I always try to implement that era in my works. In the end, when we look back on the projects, we can say �Oh, it was that era.� So overall, when you look back, it becomes a culture.�

Every time I reread that quote, I cringe. Here’s a man who is one of the most significant forces in video games today, the creator of Metal Gear, and he’s saying “No, they’re not art, and never will be.” I find his distinction between mass entertaintment and art troubling, and his comparison to a car flawed.

It’s true that games will always be a reflection of their times- just like anything else is. The limitations of the time and the attitudes of the culture at the time are going to have an effect on everything coming out of that time. A car made in the 60s is going to show the style of the 60s, and is going to have the tech of the 60s. That makes sense. Of course, a painting made in the 1700s is going to show the limits and is going to reflect the feelings of that time, too. The paints, brushes, and canvas used then aren’t necessarily going to be the same as the ones used now, especially with the popular use of computers in painting. The fact that something is a reflection of the times isn’t going to stop people from appreciating the artistic worth of that thing. The fact that the Egyptians hadn’t mastered perspective doesn’t stop anyone from wanting to see their statues.

What does that really tell us, though? Nothing. A car from the 80s may not be appreciated as much as a new model car as a means of transport, but Kojima seems to be completely forgetting that there are many cars that are appreciated as special. Nobody buys a 60s era muscle car because they think it’s a good car for driving around in- they buy it because they think it’s special, because some people view older cars as collectable. Some people do see them as more than a mere means of transportation. People are very much “wowed” by old cars. Is there any reason why this can’t be true of games?

I am 8 Bit seems to suggest that there are people who are still wowed by those games. Kojima may be partially correct, though. Maybe most of those early games won’t hold up in the long run. That shouldn’t be a surprise. They’re the first generation of games. The 8-Bit era was the begining of the new wave of games, though. For the first time, creators could start to tell real stories, beyond simple high-score pursuit. Game makers were just getting their wings, and starting to see what games were really capable of. Maybe early games aren’t art. Does that mean that games aren’t art?

The problem mostly seems to be that we’re asking the wrong questions. We shouldn’t be asking “are video games art” any more than we’d ask “are movies art.” It’s a loaded question and you’ll never come to any real answer, because the answer is going to depend completely on what movie you’re looking at, and who you’re asking. The same holds true with games. The question shouldn’t be whether all games are art, but whether a particular game has some artistic merrit. How we decide what counts as art is constantly up for debate, but there are games that raise such significant moral or philosophical questions, or have such an amazing sense of style, or tell such an amazing story, that it seems hard to argue that they have no artistic merrit.

All of this really is leading somewhere. Computers have changed everything. I know that seems obvious, but I think it’s taking some people- people like Kojima- a little longer to realize it. Computers have opened up a level of interactivity and access to information that we’ve never really had before. I can update Kaedrin from Michigan, and can send a message to a friend in Germany, all while buying videos from Japan and playing chess with a man in Alaska (not that I’m actually doing those things… but I could). These changes are going to be reflected in the art our culture produces. There’s going to be backlash and criticism, and we’re going to find that some people just don’t “get it” or don’t want to. We’ve gone through the same thing countless times before. Nobody thought movies would be seen as art when they came on the scene, and they were sure that the talkies wouldn’t. When Andy Warhol came out, there were plenty of nay-sayers. Soup cans? As art? Computers have generally been accepted as a tool for making art, but I think we’re still seeing the limits pushed. We’ve barely scratched the surface. The interaction between art, artist, and viewer is blurring, and I, for one, can’t wait to see what happens.

The Mindless Internet and Choice

Nicholas Carr has observed a few things about the internet and its effect on the way we think:

You can’t have too much information. Or can you? Writing in the Guardian, Andrew Orlowski examines the “glut of hazy information, the consequences of which we have barely begun to explore, that the internet has made endlessly available.” He wonders whether the “aggregation of [online] information,” which some see as “synonymous with wisdom,” isn’t actually eroding our ability to think critically … Like me, you’ve probably sensed the same thing, in yourself and in others – the way the constant collection of information becomes an easy substitute for trying to achieve any kind of true understanding.

Internet as “infocrack,” as it were. In a follow up entry, Carr further comments:

The more we suck in information from the blogosphere or the web in general, the more we tune our minds to brief bursts of input. It becomes harder to muster the concentration required to read books or lengthy articles – or to follow the flow of dense or complex arguments in general. Haven’t you, dear blog reader, noticed that, too?

As a matter of fact, I have. A few years ago, I blogged about Information Overload:

Some time ago, I used to blog a lot more often than I do now. And more than that, I used to read a great deal of blogs, especially new blogs (or at least blogs that were new to me). Eventually this had the effect of inducing a sort of ADD in me. I consumed way too many things way too quickly and I became very judgemental and dismissive. There were so many blogs that I scanned (I couldn’t actually read them, that would take too long for marginal gain) that this ADD began to spread across my life. I could no longer sit down and just read a book, even a novel.

Eventually, I recognized this, took a bit of a break from blogging, and attempted to correct, with some success.

Carr seems to place the blame firmly on the internet (and technology in general). I don’t agree, and you can see why in the above paragraph – as soon as I realized what happened, I took steps to mitigate and reverse the effect. It’s a matter of choice, as Loryn at growstate writes:

Technology may change our intellectual environment, but doesn’t govern our behavior. We choose how we adapt. We choose our objectives and data sources and whether we challenge our assumptions. We choose on what to focus. We can choose.

Indeed. She does an impressive job demolishing Carr’s argument as well… And yes, I’m aware that this post is made up almost entirely of pull-quotes, seemingly confirming Carr’s argument. However, is there anything wrong with that?

Insert clever title for what is essentially a post full of links.

Again short on time, so just a few links turned up by the chain-smoking monkey research staff who actually run the blog:

  • The Beauty of Simplicity: An article that examines one of the more difficult challenges of new technology: usability. In last week’s post, I mentioned the concept of the Nonesuch Beast, applications which are perfect solutions to certain complex problems. Unfortunately, these perfect solutions don’t exist, and one of the biggest reasons they don’t is that one requirement for complex problems is a simple, easy-to-use solution. It’s that “easy-to-use” part that gets difficult.
  • Pandora: An interesting little web application that recommends music for you. All you’ve got to do is give it a band or song and it starts playing recommendations for you (it’s like you’re own radio station). You can tell it that you like or dislike songs, and it learns from your input. I’m not sure how much of what is being recommended is “learned” by the system (or how extensive their music library is), but as Buckethead notes, its recommendations are based on more than just genre. So far, it hasn’t turned up much in the way of great recommendations for me, but still, it’s interesting and I’m willing to play around with it on the assumption that it will get better.
  • Robodump 1.0: “I also decided to dress it in businessware to make coworkers less likely to try to talk to it… if it looks like a customer or visiting bigwig, they’ll be less likely to offer help or ask for a courtesy flush.” To understand this, you really just need to go there and look at the pictures.
  • Wikipedia’s next five years: Jon Udell speculates as to upcoming enhancements to Wikipedia. I think the most interesting of these is the thought of having “stable” versions of articles:

    Stable versions. Although Wikipedia’s change history does differentiate between minor and major edits, there’s nothing corresponding to stable versions in open source software projects. In the early life of most articles that would be overkill. But for more mature articles, and especially active ones, version landmarks might be a useful organizational tool. Of course it’s an open question as to how exactly a version could be declared stable.

    Having stable versions might go a long way towards indicating how trustworthy an individual article is (which is currently something of a challenge right now).

  • The Edge Annual Question – 2006: Every year, Edge asks a question to several notable thinkers and scientists and posts their answers. The answers are usually quite interesting, but I think this year’s question: “What’s your dangerous idea?” wasn’t quite as good as the past few years’ questions. Still, there’s a lot of interesting stuff in there.

That’s all for now. Again, I’ve been exceptionally busy lately and will probably continue to be so for at least another week or so…

Good Enough

Time is short this week, so just a quick pointer towards an old Collision Detection post in which Clive Thompson talks about iPods and briefly digresses into some differences between Apple and Microsoft computers:

Back in the early days of Macintoshes, Apple engineers would reportedly get into arguments with Steve Jobs about creating ports to allow people to add RAM to their Macs. The engineers thought it would be a good idea; Jobs said no, because he didn’t want anyone opening up a Mac. He’d rather they just throw out their Mac when they needed new RAM, and buy a new one.

Of course, we know who won this battle. The “Wintel” PC won: The computer that let anyone throw in a new component, new RAM, or a new peripheral when they wanted their computer to do something new. Okay, Mac fans, I know, I know: PCs also “won” unfairly because Bill Gates abused his monopoly with Windows. Fair enough.

But the fact is, as Hill notes, PCs never aimed at being perfect, pristine boxes like Macintoshes. They settled for being “good enough” — under the assumption that it was up to the users to tweak or adjust the PC if they needed it to do something else.

The concept of being “good enough” presents a few interesting dynamics that I’ve been considering a lot lately. One problem is, of course, how do you know what’s “good enough” and what’s just a piece of crap? Another interesting thing about the above anecdote is that “good enough” boils down to something that’s customizable.

One thing I’ve been thinking about a lot lately is that some problems aren’t meant to have perfect solutions. I see a lot talk about problems that are incredibly complex as if they really aren’t that complex. Everyone is trying to “solve” these problems, but as I’ve noted many times, we don’t so much solve problems as we trade one set of problems for another (with the hope that the new set of problems is more favorable than the old). As Michael Crichton noted in a recent speech on Complexity:

…one important assumption most people make is the assumption of linearity, in a world that is largely non-linear. … Our human predisposition treat all systems as linear when they are not. A linear system is a rocket flying to Mars. Or a cannonball fired from a canon. Its behavior is quite easily described mathematically. A complex system is water gurgling over rocks, or air flowing over a bird’s wing. Here the mathematics are complicated, and in fact no understanding of these systems was possible until the widespread availability of computers.

Everyone seems to expect a simple, linear solution to many of the complex problems we face, but I’m not sure such a thing is really possible. I think perhaps what we’re looking for is a Nonesuch Beast; it doesn’t exist. What are these problems? I think one such problem is the environment, as mentioned in Crichton’s speech, but there are really tons of other problems. The Nonesuch Beast article above mentions a few scenarios, all of which I’m familiar with because of my job: Documenation and Metrics. One problem I often talk about on this blog is the need for better information analysis, and if all my longwinded talk on the subject hasn’t convinced you yet, I don’t think there’s any simple solution to the problem.

As such, we have to settle for systems that are “good enough” like Wikipedia and Google. As Shamus Young notes in response to my posts last week, “deciding what is ‘good enough’ is a bit abstract: It depends on what you want to do with the emergent data, and what your standards are for usefulness.” Indeed, and it really depends on the individual using the system. Wikipedia, though, is really just a specific example of the “good enough” wiki system, which can be used for any number of applications. As I mentioned last week, Wikipedia has run into some issues because people expect an encyclopedia to be accurate, but other wiki systems don’t necessarily suffer from the same issues.

I think Wiki systems belong to a certain class of applications that are so generic, simple, and easy to use that people want to use it for all sorts of specialized purposes. Another application that fits this mold is Excel. Excel is an incredibly powerful application, but it’s generic and simple enough that people use it to create all sorts of ad hoc applications that take advantage of some of the latent power in Excel. I look around my office, and I see people using Excel in many varied ways, some of which are not obvious uses of a spreadsheet program. I think we’re going to see something similar with Wikis in the future (though Wikis may be used for different problems like documentation and collaboration). All this despite Wiki’s obvious and substantial drawbacks. Wikis aren’t “the solution” but they might be “good enough” for now.

Well, that turned out to be longer than I thought. There’s a lot more to discuss here, but it will have to wait… another busy week approaches.

Cheating Probabilistic Systems

Shamus Young makes some interesting comments regarding last week’s post on probabilistic systems. He makes an important distinction between weblogs, which have no central point of control (“The weblog system is spontaneous and naturally occurring.”), and the other systems I mentioned, which do. Systems like the ones used by Google or Amazon are centrally controlled and usually reside on a particular set of servers. Shamus then makes the observation that such centralization lends itself to “cheating.” He uses Amazon as an example:

You’re a company like Amazon.com. You buy a million red widgets and a million blue widgets. You make a better margin on the blue ones, but it turns out that the red widgets are just a little better in quality. So the feedback for red is a little better. Which leads to red being recommended more often than blue, which leads to better sales, more feedback, and even more recommendations. Now you’re down to your last 100,000 red but you still have 500,000 blue.

Now comes the moment of truth: Do you cheat? You’d rather sell blue. You see that you could “nudge” the numbers in the feedback system. You own the software, pay the programmers who maintain it, and control the servers on which the system is run. You could easily adjust things so that blue recommendations appear more often, even though they are less popular. When Amazon comes up with “You might also enjoy… A blue widget” a customer has no idea of the numbers behind it. You could have the system try to even things out between the more popular red and the more profitable blue.

His post focuses mostly on malicious uses of the system by it’s owners. This is certainly a worry, but one thing I think I need to note is that no one really thinks that these systems should be all that trustworthy. The reason the system works is that we all hold a certain degree of skepticism about it. Wikipedia, for instance, works best when you use it as a starting point. If you use it as the final authority, you’re going to get burned at some point. The whole point of a probabilistic system is that the results are less consistent than traditional systems, and so people trust them less. The reason people still use such systems is that they can scale to handle the massive amounts of information being thrown at them (which is where traditional systems begin to break down).

Today Wikipedia offers 860,000 articles in English – compared with Britannica’s 80,000 and Encarta’s 4,500. Tomorrow the gap will be far larger.

You’re much more likely to find what you’re looking for at Wikipedia, even though the quality of any individual entry at Wikipedia ranges from poor and inaccurate to excellent and helpful. As I mentioned in my post, this lack of trustworthiness isn’t necessarily bad, so long as it’s disclosed up front. For instance, the problems that Wikipedia is facing are related to the fact that some people consider everything they read there to be very trustworthy. Wikipedia’s policy of writing entries from a neutral point of view tends to exacerbate this (which is why the policy is a controversial one). Weblogs do not suffer from this problem because they are written in overtly subjective terms, and thus it is blatantly obvious that you’re getting a biased view that should be taken with a grain of salt. Of course, that also makes it more difficult to glean useful information from weblogs, which is why Wikipedia’s policy of writing entries from a neutral point of view isn’t necessarily wrong (once again, it’s all about tradeoffs).

Personally, Amazon’s recommendations rarely convince me to buy something. Generally, I make the decision independently. For instance, in my last post I mentioned that Amazon recommended the DVD set of the Firefly TV series based on my previous purchases. At that point, I’d already determined that I wanted to buy that set and thus Amazon’s recommendation wasn’t so much convincing as it was convenient. Which is the point. By tailoring their featured offerings towards a customer’s preferences, Amazon stands to make more sales. They use the term “recommendations,” but that’s probably a bit of a misnomer. Chances are, they’re things we already know about and want to buy, hence it makes more sense to promote those items… When I look at my recommendations page, many items are things I already know I want to watch or read (and sometimes even buy, which is the point).

So is Amazon cheating with its recommendations? I don’t know, but it doesn’t really matter that much because I don’t use their recommendations as an absolute guide. Also, if Amazon is cheating, all that really means is that Amazon is leaving room for a competitor to step up and provide better recommendations (and from my personal experience working on such a site, retail websites are definitely moving towards personalized product offerings).

One other thing to consider, though, is that it isn’t just Amazon or Google that could be cheating. Gaming Google’s search algorithms has actually become a bit of an industry. Wikipedia is under a constant assault of spammers who abuse the openness of the system for their own gain. Amazon may have set their system up to favor items that give them a higher margin (as Shamus notes), but it’s also quite possible that companies go on Amazon and write glowing reviews for their own products, etc… in an effort to get their products recommended.

The whole point is that these systems aren’t trustworthy. That doesn’t mean they’re not useful, it just means that we shouldn’t totally trust them. You aren’t supposed to trust them. Ironically, acknowledging that fact makes them more useful.

In response to Chris Anderson’s The Probabilistic Age post , Nicholas Carr takes a skeptical view of these systems and wonders what the broader implications are:

By providing a free, easily and universally accessible information source at an average quality level of 5, will Wikipedia slowly erode the economic incentives to produce an alternative source with a quality level of 9 or 8 or 7? Will blogging do the same for the dissemination of news? Does Google-surfing, in the end, make us smarter or dumber, broader or narrower? Can we really put our trust in an alien logic’s ability to create a world to our liking? Do we want to be optimized?

These are great questions, but I think it’s worth noting that these new systems aren’t really meant to replace the old ones. In Neal Stephenson’s The System of the World, the character Daniel Waterhouse ponders how new systems supplant older systems:

“It has been my view for some years that a new System of the World is being created around us. I used to suppose that it would drive out and annihilate any older Systems. But things I have seen recently … have convinced me that new Systems never replace old ones, but only surround and encapsulate them, even as, under a microscope, we may see that living within our bodies are animalcules, smaller and simpler than us, and yet thriving even as we thrive. … And so I say that Alchemy shall not vanish, as I always hoped. Rather, it shall be encapsulated within the new System of the World, and become a familiar and even comforting presence there, though its name may change and its practitioners speak no more about the Philosopher’s Stone.” (page 639)

And so these new probabilistic systems will never replace the old ones, but only surround and encapsulate them…

Amazon’s Recommendations are Probabilistic

Amazon.com is a fascinating website. It’s one of the first eCommerce websites, but it started with a somewhat unique strategy. The initial launch of the site included such a comprehensive implementation of functionality that there are sites today that are still struggling to catch up. Why? Because much of the functionality that Amazon implemented early and continued to improve didn’t directly attempt to solve the problems most retailers face: What products do I offer? How often do we change our offerings? And so on. Instead, Amazon attempted to set up a self-organizing system based on past usage and user preferences.

For the first several years of Amazon’s existence, they operated at a net loss due to the high initial cost in setup. Competitors who didn’t have such expenses seemed to be doing better. Indeed, Amazon’s infamous recommendations were often criticized, and anyone who has used Amazon regularly has certainly had the experience of wondering how in the world they managed to recommend something so horrible. But over time, Amazon’s recommendations engine has gained steam and produced better and better recommendations. This is due, in part, to improvements in the system (in terms of the information collected, the analysis of that information, and the technology used to do both of those things). Other factors include the growth of both Amazon’s customer base and their product offerings, both of which improved their recommendation technology.

As I’ve written about before, the important thing about Amazon’s system is that it doesn’t directly solve retailing problems, it sets up a system that allows for efficient collaboration. By studying purchase habits, product ratings, common wishlist items, etc… Amazon is essentially allowing it’s customers to pick recommendations for one another. As their customer base and product offerings grow, so does the quality of their recommendations. It’s a self-organizing system, and recommendations are the emergent result. Many times, Amazon makes connections that I would have never made. For instance, a recent recommendation for me was the DVD set of the Firefly TV series. When I checked to see why (this openness is an excellent feature), it told me that it was recommended because I had also purchased Neal Stephenson’s Baroque Cycle books. This is a connection I probably never would have made on my own, but once I saw it, it made sense.

Of course, the system isn’t perfect. Truth be told, it probably won’t ever be perfect, but overall, I’d bet that its still better than any manual process.

Chris Anderson (a writer for Wired who has been exploring the Long Tail concept) has an excellent post on his blog concerning these systems, which he refers to as “probabalistic systems:”

When professionals–editors, academics, journalists–are running the show, we at least know that it’s someone’s job to look out for such things as accuracy. But now we’re depending more and more on systems where nobody’s in charge; the intelligence is simply emergent. These probabilistic systems aren’t perfect, but they are statistically optimized to excel over time and large numbers. They’re designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.

Anderson’s post is essentially a response to critics of probabilistic systems like Wikipedia, Google, and blogs, all of which have come under fire because of their less-than-perfect emergent results. He does an excellent job summarizing the advantages and disadvantages of these systems and it is highly recommended reading. I reference it for several reasons. It seems that Amazon’s website qualifies as a probabilistic system, and so the same advantages and disadvantages Anderson observes apply. It also seems that Anderson’s post touches on a few themes that often appear on this blog.

First is that human beings rarely solve problems outright. Instead, we typically seek to exchange one set of disadvantages for another in the hopes that the new set is more desirable than the old. Solving problems is all about tradeoffs. As Anderson mentions, a probabilistic system “sacrifices perfection at the microscale for optimization at the macroscale.” Is this tradeoff worth it?

Another common theme on this blog is the need for better information analysis capabilities. Last week, I examined a study on “visual working memory,” and it became apparent that one thing that is extremely important when facing a large amount of information is the ability to figure out what to ignore. In information theory, this is referred to as the signal-to-noise ratio (technically, this is a more informal usage of the terms). One of the biggest challenges facing us is an increase in the quantity of information we are presented with. In the modern world, we’re literally saturated in information, so the ability to separate useful information from false or irrelevant information has become much more important.

Naturally, these two themes interact. As I concluded last week’s post: ” Like any other technological advance, systems that help us better analyze information will involve tradeoffs.” While Amazon, Wikipedia, Google or blogs may not be perfect, they do provide a much deeper look into a wider variety of subjects than their predecessors.

Is Wikipedia “authoritative”? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it’s not infallible either; indeed, it’s a lot more flawed that we usually give it credit for.

Britannica’s biggest errors are of omission, not commission. It’s shallow in some categories and out of date in many others. And then there are the millions of entries that it simply doesn’t–and can’t, given its editorial process–have. But Wikipedia can scale to include those and many more. Today Wikipedia offers 860,000 articles in English – compared with Britannica’s 80,000 and Encarta’s 4,500. Tomorrow the gap will be far larger.

The good thing about probabilistic systems is that they benefit from the wisdom of the crowd and as a result can scale nicely both in breadth and depth.

[Emphasis Mine] The bad thing about probabilistic systems is that they sacrifice perfection on the microscale. Any individual entry at Wikipedia may be less reliable than its Britannica counterpart (though not necessarily), and so we need to take any single entry with a grain of salt.

The same is true for blogs, no single one of which is authoritative. As I put it in this post, “blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail–it is, by definition, variable and diverse.” But collectively they are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.

I once wrote a series of posts concerning this subject, starting with how the insights of reflexive documentary filmmaking are being used on blogs. Put simply, Reflexive Documentaries achieve a higher degree of objectivity by embracing and acknowledging their own biases and agenda. Ironically, by acknowledging their own subjectivity, these films are more objective and reliable. Probabilistic systems would also benefit from such acknowledgements. Blogs seem to excell at this, though it seems that many of the problems facing Wikipedia and other such systems is that people aren’t aware of their subjective nature and thus assume a greater degree of objectivity than is really warranted.

It’s obvious that probabilistic systems are not perfect, but that is precisely why they work. Is it worth the tradeoffs? Personally, I think they are, provided that such systems properly disclose their limitations. I also think it’s worth noting that such systems will not fully replace non-probabilistic systems. One commonly referenced observation about Wikipedia, for instance, is that it “should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts.”

Podcast Reviews

As I’ve hinted at in recent entries, I’ve been delving a bit into podcasts. For the uninitiated, a “podcast” is just a fancy word for pre-recorded radio shows that you can subsribe to on the internet (people often download podcasts to listen to on their iPod, hence the name – though the term really is a misnomer, as you don’t need an iPod to listen to a podcast, and it’s not broadcast either).

In any case, my short commute actually doesn’t lend itself to listening, so I haven’t listened to that many podcasts and all of the ones I’ve listened to are at least tangentially movie-related. So here are a few short reviews of podcasts that I’ve listened to (again, mostly movie related):

  • The CHUD Show: A few months ago a friend of mine recommended CHUD’s podcast to me. I’ve always been a fan of the site (which features lots of movie news, etc…), so it was the first podcast I checked out, and I was quite happy with it, though I have to admit, it’s got limited appeal. Once you realize that the name of their site (Cinematic Happenings Under Development – CHUD) is partly an homage to a cheesy 80s horror flick (in which CHUD stands for “Cannibalistic Humanoid Underground Dwellers”), you get the idea. I’m a strange guy, so it doesn’t bother me much, but the CHUD folks seem to have an affinity for really bad jokes and obscure movies (which most would also consider bad, but people like myself don’t mind much). It’s not the highest quality audio, and they appear to be released only sporadically (there’s only 5 podcasts in 3-4 months), but they are extremely long (1 hour+) and for fans of cheesy horror and obscure actors, it’s a real treat. If you hear the plot for the movie Castle Freak (a topic of discussion in one of their shows) and think it sounds like your type of movie, you’ll probably love CHUD. I like it, but it’s not for everybody…
  • Cinecast: A much more polished and slick podcast, Cinecast is also great and it has a broader appeal as well. This podcast is almost the polar opposite of CHUD. It’s orderly, regularly published, and it usually features more mainstream fare. They release two 40 minute podcasts a week, and in each episode, they start with a movie review (each week they review a current release and an older film which is usually part of some genre that they’re studying – they’re currently watching horror films, much to my pleasure), they talk about comments they’ve received about previous podcasts, and they give a top 5 list (i.e. top 5 war movies, top 5 actors, etc…). It’s quite entertaining, and the high frequency of new episodes helps greatly (much like a high frequency of blogging helps in that realm). Naturally, whether you’ll like it or not greatly depends on if you’ve seen the movies they’re talking about, but as podcasts go, this is probably the most professional I’ve heard yet.
  • Cinemaslave: I really wanted to like this one, but I just can’t get into it. Reading through the topics on each podcast got me really excited to listen, but it ended up being quite disappointing. I think the biggest problem here is that it’s just one guy talking the entire time (CHUD and Cinecast have at least 2 commentators) and the lack of interplay really takes its toll.
  • Bleatcast: I already wrote about this one, but it’s worth mentioning again because Lileks is a fascinating fellow. If you enjoy the Bleat, chances are that you’ll also enjoy the bleatcast.

So that’s it for now. Do you have any podcasts that you enjoy (or that you think I’d enjoy)? Drop a comment below…

Bookmark Aggregation

This is hardly new, but since I’ve often observed the need for better information aggregation tools I figured I’d give del.icio.us a plug. del.icio.us is essentially an online bookmark (or favorites, in IE-speak) repository. It allows you to post sites to your own personal collection of links. This is great for those who frequently access the internet from multiple locations and different browsers (i.e. from work and home) as it is always accessible on the web. But the really powerful thing about del.icio.us is that everyone’s bookmarks are public and easily viewable, and there are all sorts of ways to aggregate and correllate bookmarks. They like to call the system a social bookmarks manager.

The system uses a tagging scheme (or flat hierarchy, if you prefer) to organize links. In the context of a system like del.icio.us, tagging essentially means that for each bookmark you add, you choose a number of labels or categories (tags) which are used to organize your bookmarks so you can find them later. Again, since del.icio.us is a public system, you can see what other people are posting to the same tags. This becomes a good way to keep up on a particular topic (for example, CSS, the economy, movies, tacos or cheese). Jon Udell speculates that posted links would follow a power law distribution, where a few individuals really stand out as the most reliable contributors of valuable links for a given topic. Unfortunately, del.icio.us isn’t particularly great at sorting that out yet (though you may be able to notice such patterns emerging if you really keep up on a topic and who is posting what, which can be somewhat daunting for popular tags like CSS, but perhaps not so for something more obscure like unicode). Udell also notes how useful tagging is when trying to organize something that you think will be useful in the future.

Tagging is a concept whose time has come, and despite its drawbacks, I have a feeling that 10 years from now, we’re all going to look back and wonder how the heck we accomplished anything before something like tagging rolled around. del.icio.us certainly isn’t the only site using tagging (Flickr has tagged photos, Technorati uses tags for blog posts, and there are several other sites). Of course, the concept does have its problems; namely, how do you know which tags to use? For instance, one of the more popular general subjects on del.icio.us is blogs and blogging, but what tags should be used? Blog, Blogging, Blogs, Weblog, Weblogs, blogosphere and so on… Luckily del.icio.us is getting better and better at this – their “experimental post” works wonders because it is actually able to recommend tags you should use based on what tags other people have used.

The system is actually quite simple and easy to use, but there’s not much in the way of documentation. Check out this blog post or John Udell’s screencast for some quick tutorials on how to get started. I’ve been playing around with it more and more, and it’s proving very useful on multiple levels (organizing links I come across as well as finding new links in the first place!). If you’re interested, you can check out my bookmarks. Some other interesting functionality:

  • Every page you view on del.icio.us has an RSS feed, so you can subscribe to feeds you like and read them along with your favorite news sites, blogs, &c.
  • One interesting thing you can do with tags is to create a continually updated set of links directed at one specific person. For instance, let’s say I’m always finding links that I think my brother would enjoy. I can bookmark them with the tag “attn: goober” and send him the link, which will always be updated with the latest links I’ve sent him (and he could subscribe to the RSS for that page too).
  • del.icio.us/popular/ shows the pages that are being bookmarked most frequently – a good way to keep up with the leading edge. You can also add a tag to see only popular items for that tag. For example, to keep up with the most popular links about blogs, you could try del.icio.us/popular/blogs.
  • There’s a lot of integration with Mozilla/Firefox, which is one reason for the service’s popularity.
  • There also appears to be a lot of development that leverages del.icio.us data for other uses or in other applications.
  • del.icio.us picks your nose for you! Ok, er, it doesn’t actually do that (and um, even if it did, would anyone use that feature?), but it does lots of other things too. Go sign up and check it out.

Again, it’s a very useful site once you figure out what you’re doing, and I have a few ideas that might show up on the blog (eventually). It should be particularly useful when I attempt to do something like this or this again. The system is far from perfect, and it’s difficult to tell where some of the driving concepts are really going, but it certainly seems like there’s something interesting and very useful going on here.

The important thing about del.icio.us is not that it was designed to create the perfect information resource, but rather an efficient system of collaboration. It’s a systemic improvement; as such, the improvement in information output is an emergent property of internet use. Syndication, aggregation, and filtering on the internet still need to improve considerably, but this seems like a step in the right direction.