Sunday, January 29, 2006
Insert clever title for what is essentially a post full of links.
Again short on time, so just a few links turned up by the chain-smoking monkey research staff who actually run the blog:
Sunday, January 22, 2006
Time is short this week, so just a quick pointer towards an old Collision Detection post in which Clive Thompson talks about iPods and briefly digresses into some differences between Apple and Microsoft computers:
Back in the early days of Macintoshes, Apple engineers would reportedly get into arguments with Steve Jobs about creating ports to allow people to add RAM to their Macs. The engineers thought it would be a good idea; Jobs said no, because he didn't want anyone opening up a Mac. He'd rather they just throw out their Mac when they needed new RAM, and buy a new one.The concept of being "good enough" presents a few interesting dynamics that I've been considering a lot lately. One problem is, of course, how do you know what's "good enough" and what's just a piece of crap? Another interesting thing about the above anecdote is that "good enough" boils down to something that's customizable.
One thing I've been thinking about a lot lately is that some problems aren't meant to have perfect solutions. I see a lot talk about problems that are incredibly complex as if they really aren't that complex. Everyone is trying to "solve" these problems, but as I've noted many times, we don't so much solve problems as we trade one set of problems for another (with the hope that the new set of problems is more favorable than the old). As Michael Crichton noted in a recent speech on Complexity:
...one important assumption most people make is the assumption of linearity, in a world that is largely non-linear. ... Our human predisposition treat all systems as linear when they are not. A linear system is a rocket flying to Mars. Or a cannonball fired from a canon. Its behavior is quite easily described mathematically. A complex system is water gurgling over rocks, or air flowing over a bird’s wing. Here the mathematics are complicated, and in fact no understanding of these systems was possible until the widespread availability of computers.Everyone seems to expect a simple, linear solution to many of the complex problems we face, but I'm not sure such a thing is really possible. I think perhaps what we're looking for is a Nonesuch Beast; it doesn't exist. What are these problems? I think one such problem is the environment, as mentioned in Crichton's speech, but there are really tons of other problems. The Nonesuch Beast article above mentions a few scenarios, all of which I'm familiar with because of my job: Documenation and Metrics. One problem I often talk about on this blog is the need for better information analysis, and if all my longwinded talk on the subject hasn't convinced you yet, I don't think there's any simple solution to the problem.
As such, we have to settle for systems that are "good enough" like Wikipedia and Google. As Shamus Young notes in response to my posts last week, "deciding what is 'good enough' is a bit abstract: It depends on what you want to do with the emergent data, and what your standards are for usefulness." Indeed, and it really depends on the individual using the system. Wikipedia, though, is really just a specific example of the "good enough" wiki system, which can be used for any number of applications. As I mentioned last week, Wikipedia has run into some issues because people expect an encyclopedia to be accurate, but other wiki systems don't necessarily suffer from the same issues.
I think Wiki systems belong to a certain class of applications that are so generic, simple, and easy to use that people want to use it for all sorts of specialized purposes. Another application that fits this mold is Excel. Excel is an incredibly powerful application, but it's generic and simple enough that people use it to create all sorts of ad hoc applications that take advantage of some of the latent power in Excel. I look around my office, and I see people using Excel in many varied ways, some of which are not obvious uses of a spreadsheet program. I think we're going to see something similar with Wikis in the future (though Wikis may be used for different problems like documentation and collaboration). All this despite Wiki's obvious and substantial drawbacks. Wikis aren't "the solution" but they might be "good enough" for now.
Well, that turned out to be longer than I thought. There's a lot more to discuss here, but it will have to wait... another busy week approaches.
Wednesday, January 18, 2006
On Sunday, I wrote about cheating in probabilistic systems, but one thing I left out was that these systems are actually neutral systems. A while ago, John Robb (quoting the Nicholas Carr post I referenced) put it well:
Like all advances in technology, the progress of self-organizing systems and emergent results can be used for good or for ill. In the infamous words of Buckethead:To people, "optimization" is a neutral term. The optimization of a complex mathematical, or economic, system may make things better for us, or it may make things worse. It may improve society, or degrade it. We may not be able to apprehend the ends, but that doesn't mean the ends are going to be good.He's exactly right. Evolution and emergent intelligence doesn't naturally flow towards some eschatological goodness. It moves forward under its own logic. It often solves problems we don't want solved. For example, in global guerrilla open source warfare, this emergent community intelligence is slowly developing forms of attack (such as systems disruption), that make it an extremely effective foe for nation-states.
Like the atom, the flyswatter can be a force for great good or great evil.Indeed.
Tuesday, January 17, 2006
Happy Birthday, Ben
Today is Ben Franklin's 300th birthday. In keeping with the theme of tradeoffs and compromise that often adorns this blog, and since Franklin himself has also been a common subject, here is a quote from Franklin's closing address to the Constitutional Convention in Philadelphia:
I confess that I do not entirely approve this Constitution at present; but sir, I am not sure I shall ever approve it: For, having lived long, I have experienced many instances of being obliged, by better information or fuller consideration, to change opinions even on important subjects, which I once thought right, but found to be otherwise. It is therefore that, the older I grow, the more apt I am to doubt my own judgment of others.There are some people today (and even in Franklin's time) who seem to think of compromise as some sort of fundamental evil, but it appears to me to be an essential part of democracy.
Update 1.18.06: Mister Snitch points to The Benjamin Franklin Tercentenary, an excellently designed site dedicated to Franklin's 300th birthday...
Sunday, January 15, 2006
Cheating Probabilistic Systems
Shamus Young makes some interesting comments regarding last week's post on probabilistic systems. He makes an important distinction between weblogs, which have no central point of control ("The weblog system is spontaneous and naturally occurring."), and the other systems I mentioned, which do. Systems like the ones used by Google or Amazon are centrally controlled and usually reside on a particular set of servers. Shamus then makes the observation that such centralization lends itself to "cheating." He uses Amazon as an example:
You’re a company like Amazon.com. You buy a million red widgets and a million blue widgets. You make a better margin on the blue ones, but it turns out that the red widgets are just a little better in quality. So the feedback for red is a little better. Which leads to red being recommended more often than blue, which leads to better sales, more feedback, and even more recommendations. Now you’re down to your last 100,000 red but you still have 500,000 blue.His post focuses mostly on malicious uses of the system by it's owners. This is certainly a worry, but one thing I think I need to note is that no one really thinks that these systems should be all that trustworthy. The reason the system works is that we all hold a certain degree of skepticism about it. Wikipedia, for instance, works best when you use it as a starting point. If you use it as the final authority, you're going to get burned at some point. The whole point of a probabilistic system is that the results are less consistent than traditional systems, and so people trust them less. The reason people still use such systems is that they can scale to handle the massive amounts of information being thrown at them (which is where traditional systems begin to break down).
Today Wikipedia offers 860,000 articles in English - compared with Britannica's 80,000 and Encarta's 4,500. Tomorrow the gap will be far larger.You're much more likely to find what you're looking for at Wikipedia, even though the quality of any individual entry at Wikipedia ranges from poor and inaccurate to excellent and helpful. As I mentioned in my post, this lack of trustworthiness isn't necessarily bad, so long as it's disclosed up front. For instance, the problems that Wikipedia is facing are related to the fact that some people consider everything they read there to be very trustworthy. Wikipedia's policy of writing entries from a neutral point of view tends to exacerbate this (which is why the policy is a controversial one). Weblogs do not suffer from this problem because they are written in overtly subjective terms, and thus it is blatantly obvious that you're getting a biased view that should be taken with a grain of salt. Of course, that also makes it more difficult to glean useful information from weblogs, which is why Wikipedia's policy of writing entries from a neutral point of view isn't necessarily wrong (once again, it's all about tradeoffs).
Personally, Amazon's recommendations rarely convince me to buy something. Generally, I make the decision independently. For instance, in my last post I mentioned that Amazon recommended the DVD set of the Firefly TV series based on my previous purchases. At that point, I'd already determined that I wanted to buy that set and thus Amazon's recommendation wasn't so much convincing as it was convenient. Which is the point. By tailoring their featured offerings towards a customer's preferences, Amazon stands to make more sales. They use the term "recommendations," but that's probably a bit of a misnomer. Chances are, they're things we already know about and want to buy, hence it makes more sense to promote those items... When I look at my recommendations page, many items are things I already know I want to watch or read (and sometimes even buy, which is the point).
So is Amazon cheating with its recommendations? I don't know, but it doesn't really matter that much because I don't use their recommendations as an absolute guide. Also, if Amazon is cheating, all that really means is that Amazon is leaving room for a competitor to step up and provide better recommendations (and from my personal experience working on such a site, retail websites are definitely moving towards personalized product offerings).
One other thing to consider, though, is that it isn't just Amazon or Google that could be cheating. Gaming Google's search algorithms has actually become a bit of an industry. Wikipedia is under a constant assault of spammers who abuse the openness of the system for their own gain. Amazon may have set their system up to favor items that give them a higher margin (as Shamus notes), but it's also quite possible that companies go on Amazon and write glowing reviews for their own products, etc... in an effort to get their products recommended.
The whole point is that these systems aren't trustworthy. That doesn't mean they're not useful, it just means that we shouldn't totally trust them. You aren't supposed to trust them. Ironically, acknowledging that fact makes them more useful.
In response to Chris Anderson's The Probabilistic Age post , Nicholas Carr takes a skeptical view of these systems and wonders what the broader implications are:
By providing a free, easily and universally accessible information source at an average quality level of 5, will Wikipedia slowly erode the economic incentives to produce an alternative source with a quality level of 9 or 8 or 7? Will blogging do the same for the dissemination of news? Does Google-surfing, in the end, make us smarter or dumber, broader or narrower? Can we really put our trust in an alien logic's ability to create a world to our liking? Do we want to be optimized?These are great questions, but I think it's worth noting that these new systems aren't really meant to replace the old ones. In Neal Stephenson's The System of the World, the character Daniel Waterhouse ponders how new systems supplant older systems:
"It has been my view for some years that a new System of the World is being created around us. I used to suppose that it would drive out and annihilate any older Systems. But things I have seen recently ... have convinced me that new Systems never replace old ones, but only surround and encapsulate them, even as, under a microscope, we may see that living within our bodies are animalcules, smaller and simpler than us, and yet thriving even as we thrive. ... And so I say that Alchemy shall not vanish, as I always hoped. Rather, it shall be encapsulated within the new System of the World, and become a familiar and even comforting presence there, though its name may change and its practitioners speak no more about the Philosopher's Stone." (page 639)And so these new probabilistic systems will never replace the old ones, but only surround and encapsulate them...
Sunday, January 08, 2006
Amazon's Recommendations are Probabilistic
Amazon.com is a fascinating website. It's one of the first eCommerce websites, but it started with a somewhat unique strategy. The initial launch of the site included such a comprehensive implementation of functionality that there are sites today that are still struggling to catch up. Why? Because much of the functionality that Amazon implemented early and continued to improve didn't directly attempt to solve the problems most retailers face: What products do I offer? How often do we change our offerings? And so on. Instead, Amazon attempted to set up a self-organizing system based on past usage and user preferences.
For the first several years of Amazon's existence, they operated at a net loss due to the high initial cost in setup. Competitors who didn't have such expenses seemed to be doing better. Indeed, Amazon's infamous recommendations were often criticized, and anyone who has used Amazon regularly has certainly had the experience of wondering how in the world they managed to recommend something so horrible. But over time, Amazon's recommendations engine has gained steam and produced better and better recommendations. This is due, in part, to improvements in the system (in terms of the information collected, the analysis of that information, and the technology used to do both of those things). Other factors include the growth of both Amazon's customer base and their product offerings, both of which improved their recommendation technology.
As I've written about before, the important thing about Amazon's system is that it doesn't directly solve retailing problems, it sets up a system that allows for efficient collaboration. By studying purchase habits, product ratings, common wishlist items, etc... Amazon is essentially allowing it's customers to pick recommendations for one another. As their customer base and product offerings grow, so does the quality of their recommendations. It's a self-organizing system, and recommendations are the emergent result. Many times, Amazon makes connections that I would have never made. For instance, a recent recommendation for me was the DVD set of the Firefly TV series. When I checked to see why (this openness is an excellent feature), it told me that it was recommended because I had also purchased Neal Stephenson's Baroque Cycle books. This is a connection I probably never would have made on my own, but once I saw it, it made sense.
Of course, the system isn't perfect. Truth be told, it probably won't ever be perfect, but overall, I'd bet that its still better than any manual process.
Chris Anderson (a writer for Wired who has been exploring the Long Tail concept) has an excellent post on his blog concerning these systems, which he refers to as "probabalistic systems:"
When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy. But now we're depending more and more on systems where nobody's in charge; the intelligence is simply emergent. These probabilistic systems aren't perfect, but they are statistically optimized to excel over time and large numbers. They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.Anderson's post is essentially a response to critics of probabilistic systems like Wikipedia, Google, and blogs, all of which have come under fire because of their less-than-perfect emergent results. He does an excellent job summarizing the advantages and disadvantages of these systems and it is highly recommended reading. I reference it for several reasons. It seems that Amazon's website qualifies as a probabilistic system, and so the same advantages and disadvantages Anderson observes apply. It also seems that Anderson's post touches on a few themes that often appear on this blog.
First is that human beings rarely solve problems outright. Instead, we typically seek to exchange one set of disadvantages for another in the hopes that the new set is more desirable than the old. Solving problems is all about tradeoffs. As Anderson mentions, a probabilistic system "sacrifices perfection at the microscale for optimization at the macroscale." Is this tradeoff worth it?
Another common theme on this blog is the need for better information analysis capabilities. Last week, I examined a study on "visual working memory," and it became apparent that one thing that is extremely important when facing a large amount of information is the ability to figure out what to ignore. In information theory, this is referred to as the signal-to-noise ratio (technically, this is a more informal usage of the terms). One of the biggest challenges facing us is an increase in the quantity of information we are presented with. In the modern world, we're literally saturated in information, so the ability to separate useful information from false or irrelevant information has become much more important.
Naturally, these two themes interact. As I concluded last week's post: " Like any other technological advance, systems that help us better analyze information will involve tradeoffs." While Amazon, Wikipedia, Google or blogs may not be perfect, they do provide a much deeper look into a wider variety of subjects than their predecessors.
Is Wikipedia "authoritative"? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it's not infallible either; indeed, it's a lot more flawed that we usually give it credit for.[Emphasis Mine] The bad thing about probabilistic systems is that they sacrifice perfection on the microscale. Any individual entry at Wikipedia may be less reliable than its Britannica counterpart (though not necessarily), and so we need to take any single entry with a grain of salt.
The same is true for blogs, no single one of which is authoritative. As I put it in this post, "blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail--it is, by definition, variable and diverse." But collectively they are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.I once wrote a series of posts concerning this subject, starting with how the insights of reflexive documentary filmmaking are being used on blogs. Put simply, Reflexive Documentaries achieve a higher degree of objectivity by embracing and acknowledging their own biases and agenda. Ironically, by acknowledging their own subjectivity, these films are more objective and reliable. Probabilistic systems would also benefit from such acknowledgements. Blogs seem to excell at this, though it seems that many of the problems facing Wikipedia and other such systems is that people aren't aware of their subjective nature and thus assume a greater degree of objectivity than is really warranted.
It's obvious that probabilistic systems are not perfect, but that is precisely why they work. Is it worth the tradeoffs? Personally, I think they are, provided that such systems properly disclose their limitations. I also think it's worth noting that such systems will not fully replace non-probabilistic systems. One commonly referenced observation about Wikipedia, for instance, is that it "should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts."
Thursday, January 05, 2006
On the lighter side
You may be familiar with my long-winded, more serious style, but I thought this blond joke would be a welcome change of pace. Best. Joke. Evar. [via Chizumatic, whose lack of permalinks add extra irony]
Sunday, January 01, 2006
Analysis and Ignorance
A common theme on this blog is the need for better information analysis capabilities. There's nothing groundbreaking about the observation, which is probably why I keep running into stories that seemingly confirms the challenge we're facing. A little while ago, Boing Boing pointed to a study on "visual working memory" in which the people who did well weren't better at remembering things than other people - they were better at ignoring unimportant things.
"Until now, it's been assumed that people with high capacity visual working memory had greater storage but actually, it's about the bouncer – a neural mechanism that controls what information gets into awareness," Vogel said.In Feedback and Analysis, I examined an aspect of how the human eye works:
So the brain gets some input from the eye, but it's sending significantly more information towards the eye than it's receiving. This implies that the brain is doing a lot of processing and extrapolation based on the information it's been given. It seems that the information gathering part of the process, while important, is nowhere near as important as the analysis of that data. Sound familiar?Back in high school (and to a lesser extent, college), there were always people who worked extremely hard, but still couldn't manage to get good grades. You know, the people who would spend 10 hours studying for a test and still bomb it. I used to infuriate these people. I spent comparatively little time studying, and I did better than them. Now, there were a lot of reasons for this, and most of them don't have anything to do with me being smarter than anyone else. One thing I found was that if I paid attention in class, took good notes, and spent an honest amount of effort on homework, I didn't need to spend that much time cramming before a test (shocking revelation, I know). Another thing was that I knew what to study. I didn't waste time memorizing things that weren't necessary. In other words, I was good at figuring out what to ignore.
Analysis of the data is extremely important, but you need to have the appropriate data to start with. When you think about it, much of analysis is really just figuring out what is unimportant. Once you remove the noise, you're left with the signal and you just need to figure out what that signal is telling you. The problem right now is that we keep seeing new and exciting ways to collect more and more information withought a corresponding increase in analysis capabilities. This is an important technical challenge that we'll have to overcome, and I think we're starting to see the beginnings of a genuine solution. At this point another common theme on this blog will rear its ugly head. Like any other technological advance, systems that help us better analyze information will involve tradeoffs. More on this subject later this week...
Where am I?
This page contains entries posted to the Kaedrin Weblog in January 2006.
Kaedrin Beer Blog
12 Days of Christmas
2006 Movie Awards
2007 Movie Awards
2008 Movie Awards
2009 Movie Awards
2010 Movie Awards
2011 Fantastic Fest
2011 Movie Awards
2012 Movie Awards
2013 Movie Awards
2014 Movie Awards
6 Weeks of Halloween
Arts & Letters
Computers & Internet
Disgruntled, Freakish Reflections
Philadelphia Film Festival 2006
Philadelphia Film Festival 2008
Philadelphia Film Festival 2009
Philadelphia Film Festival 2010
Science & Technology
Security & Intelligence
The Dark Tower
Weird Book of the Week
Weird Movie of the Week
Copyright © 1999 - 2012 by Mark Ciocco.