Science & Technology Archives

Notes on Leonardo Da Vinci

Leave a Comment / Arts & Letters, Science & Technology / Mark / March 21, 2021

One of the more interesting books I’ve read during lockdown was Walter Isaacson’s biography of Leonardo Da Vinci. I have always done a good job keeping up with reviews of fiction (particularly science fiction), but I’m awful at following up on non-fiction. This despite non-fiction often covering more interesting ideas in more relevant, concrete ways. We’ll try to do Da Vinci justice here.

First up is probably the most significant takeaway from the book:

While at Windsor Castle looking at the swirling power of the “Deluge Drawings” that he made near the end of his life, I asked the curator, Martin Clayton, whether he thought Leonardo had done them as works of art or of science. Even as I spoke, I realized it was a dumb question. “I do not think that Leonardo would have made that distinction,” he replied.
Page 2

It’s worth noting that the notion of art has changed so dramatically since Da Vinci’s time that a lot of what he dealt with seems completely foreign today. His specific brand of naturalism is surely still a thing today, it’s just that it’s a much smaller proportion of art. This insight, that Da Vinci didn’t make a distinction between art and science, is one that recurs throughout the book. Take this:

…Leonardo’s injunction to begin any investigation by going to the source: “He who can go to the fountain does not go to the water-jar.”
Page 6

In these days of “out of context” journalism, going to the original source is as wise a piece of advice as ever. Over and over again, the need for immediate reporting and the bias of journalists lead us down false paths. Even once enough time has passed to figure out what really happened, the damage is done. No one reads the corrections.

Leonardo was human. The acuteness of his observational skill was not some superpower he possessed. Instead, it was a product of his own effort. That’s important, because it means that we can, if we wish, not just marvel at him but try to learn from him by pushing ourselves to look at things more curiously and intensely.
In his notebook, he described his method – almost like a trick – for closely observing a scene or object: look carefully and separately at each detail. He compared it to looking at the page of a book, which is meaningless when taken in as a whole and instead needs to be looked at word by word. Deep observation must be done in steps: “If you wish to have a sound knowledge of the forms of objects, begin with the details of them, and do not go on to the second step until you have the first well fixed in memory.”
Page 179

Another lesson we’d do well to learn. These days, everyone wants to be an immediate expert. No one wants to put in the time to actually become the expert, they just jump to what is considered “the best” and avoid everything else. Something important is lost in the process. I’m reminded of the computer scientist Peter Norvig. Frustrated by the proliferation of books with titles like “Learn Java in 24 Hours”, he wrote a book called Teach Yourself Programming in Ten Years. I think Da Vinci had a similar approach.

He was constantly peppering acquaintances with the type of questions we should all learn to pose more often. “Ask Benedetto Portinari how they walk on ice in Flanders,” reads one memorable and vivid entry on a to-do list. Over the years there were scores of others: “Ask Maestro Antonio how mortars are positioned on the bastions by day or night… Find a master of hydraulics and get him to tell you how to repair a lock, canal and mill in the Lombard manner… Ask Maestro Giovannino how the tower of Ferrara is walled without loopholes.”
Thus Leonardo became a disciple of both experience and received wisdom. More important, he came to see that the progress of science came from a dialogue between the two. That in turn helped him realize that knowledge also came from a related dialogue: that between experiment and theory.
Page 173

More lessons to learn from Leonardo, and the notion that experience and theory are both worth pursuing is an excellent one.

Leonardo… was interested in a part-by-part analysis of the transfer of motion. Rendering each of the moving parts-ratchets, springs, gears, levers, axles, and so on- was a method to help him understand their functions and engineering principles. He used drawing as a tool for thinking. He experimented on paper and evaluated concepts by visualizing them.
Page 190

His drawings served as visual thought experiments. By rendering the mechanisms in his notebooks rather than actually constructing them, he could envision how they would work and assess whether they would achieve perpetual motion. He eventually concluded, after looking at many different methods, that none of them would. In reasoning so, he showed that, as we go through life, there is a value in trying to do such tasks as designing a perpetual-motion machine: there are some problems that we will never be able to solve, and it’s useful to understand why.
Page 196

I like the concept of “drawing as a tool for thinking”, and I see this a lot in my day job. Not only does visualizing something help with understanding it, but it makes a huge difference in communicating it out to others. One example of this sort of thing, I’d been reporting on the benefits of an effort for a couple of months. I had stated one benefit in a text bullet point, but was able to get some actual data and changed it to a graph showing a before and after. I’d been reporting the exact same information for 2 months, but no one really noticed it until I made the graph.

It’s also interesting that Leonardo found value in unsolvable tasks like perpetual motion. Again, it speaks to the expertise problem mentioned above. People want to become immediate experts, but are unwilling to approach anything if it means they might fail. Many of the outlandish things that Leonardo speculated about did come to pass, eventually:

This inability to ground his fantasies in reality has generally been regarded as one of Leonardo’s major failings. Yet in order to be a true visionary, one has to be willing to overreach and to fail some of the time. Innovation requires a reality distortion field. The things he envisioned for the future often came to pass, even if it took a few centuries. Scuba gear, flying machines, and helicopters now exist. Suction pumps now drain swamps. Along the rout of the canal that Leonardo drew there is now a major highway. Sometimes fantasies are paths to reality.
Page 354

Of course, not all of these speculations had as much of an impact as they should:

These laws of friction, and in particular the realization that friction is independent of the contact surface area, were an important discovery, but Leonardo never published them. They had to be rediscovered almost two hundred years later by the French scientific instrument maker Guillaume Amontons. … He also devised ways to use ball bearings and roller bearings, techniques that were not commonly used until the 1800s.
Page 197

He was mainly motivated by his own curiosity. … He was more interested in pursuing knowledge than in publishing it. And even though he was collegial in his life and work, he made little effort to share his findings.
This is true for all of his studies, not just his work on anatomy. The trove of treatises that he left unpublished testifies to the unusual nature of what motivated him. He wanted to accumulate knowledge for its own sake, and for his own personal joy, rather than out of a desire to make a public name for himself as a scholar or to be part of the progress of history. … As the Leonardo scholar Charles Hope has pointed out, “He had no real understanding of the way in which the growth of knowledge was a cumulative and collaborative process.” Although he would occasionally let visitors glimpse his work, he did not seem to realize or care that the importance of research comes from its dissemination.
Page 423

Here we find yet another lesson from Da Vinci. This time, though, it’s something he was bad at that can guide us. He was ahead of his time on many things and made important discoveries… but he never published them, so they had to be rediscovered later. Sometimes for hundreds of years. I suppose this could be seen as a consequence of his ravenous curiosity. He had so much on his mind at all times that he rarely finished any one thing. But he made tons of interesting observations. Some are seemingly trivial and weird, but when we dig deeper, we find something more:

Then comes my favorite item on any Leonardo list: “Describe the tongue of the woodpecker.” This is not just a random entry. He mentioned the woodpecker’s tongue again on a later page, where he described and drew the human tongue. “Make the motions of the woodpecker,” he wrote. When I first saw his entry about the woodpecker, I regarded it, as most scholars have, as an entertaining oddity – an amuse-bouche, so to speak – evidence of the eccentric nature of Leonardo’s relentless curiosity. That it indeed is. But there is more, as I discovered after pushing myself to be more like Leonardo and drill down into random curiosities. Leonardo, I realized, had become fascinated by the muscles of the tongue. All the other muscles he studied acted by pulling rather than pushing a pody part, but the tongue seemed to be an exception. This was true in humans as in other animals. The most notable example is the tongue of the woodpecker. Nobody had drawn or fully written about it before, but Leonardo with his acute ability to observe objects in motion knew that there was something to be learned from it.
On the same list, Leonardo instructed himself to describe “the jaw of the crocodile.” Once again, if we follow his curiosity, rather than merely be amused by it, we can see that he was on to an important topic. A crocodile, unlike any mammal, has a second jaw joint, which spreads out force when it snaps shut its mouth. That gives the crocodile the most forceful bite of any animal.
Page 398

His notebooks feature tons of inventions and concepts that would not be rediscovered for centuries. Just conceiving the idea was often enough for him… but then, that’s a complicated process as well:

When Leonardo drew his Vitruvian Man, he had a lot of inter-related ideas dancing in his imagination. These included the mathematical challenge of squaring the circle, the analogy between the microcosm of man and the macrocosm of earth, the human proportions to be found through anatomical studies, the geometry of squares and circles in church architecture, the transformation of geometric shapes, and a concept combining math and art that was known as “the golden ratio” or “divine proportion.”
He developed his thoughts about these topics not just from his own experience and reading; they were formulated also through conversations with friends and colleagues. Conceiving ideas was for Leonardo, as it has been throughout history for most other cross-disciplinary thinkers, a collaborative endeavor. Unlike Michaelangelo and some other anguished artists, Leonardo enjoyed being surrounded by friends, companions, students, assistants, fellow courtiers, and thinkers. In his notebooks we find scores of people with whom he wanted to discuss ideas.
This process of bouncing around thoughts and jointly formulating ideas was facilitated by hanging around a Renaissance court like the one in Milan.
… Ideas are often generated in physical gathering places where people with diverse interests encounter one another serendipitously. That is why Steve Jobs liked his buildings to have a central atrium and why the young Benjamin Franklin founded a club where the most interesting people of Philadelphia would gather every Friday. At the court of Ludovico Sforza, Leonardo found friends who could spark new ideas by rubbing together their diverse passions.
Pages 158-159

The funny thing about Leonardo’s Vitruvian Man? It also wasn’t really published formally, it’s just a sketch in his notebook. And yet it’s one of the most famous pieces of art ever conceived. And it was a sorta collaboration, or perhaps competition would be more accurate. These days, when someone says Vitruvian Man, we immediately attribute it to Leonardo, but it’s actually a general idea that many artists tackled.

Given Leonardo’s tendency towards collaboration, I have to wonder how many things resulted that we have no idea were inspired by him. As it turns out, attribution is a particularly thorny topic for artists of this time period. They didn’t sign their paintings, so things get very complicated:

There is enough evidence, I think, to support an attribution, in whole or in part, to Leonardo: the use of a walnut panel similar in grain to that of Lady with an Ermine , the existence of some court sonnets that seem to refer to his painting such a work, and the fact that some aspects of the painting have a beauty worthy of the master. Perhaps it was a collaborative work of his studio, produced to fulfill a ducal commission, with some involvement from Leonardo’s brush but not his heart and soul.
Page 248

What is most interesting about the portrait is Silverman’s quest to prove that it was by Leonardo. Like most artists of his time, Leonardo never signed his works nor kept a record of them. So the question of authentication – figuring out which truly deserve to be called autograph works by Leonardo – becomes yet another fascinating aspect of grappling with his genius. In the case of the portrait that Silverman bought, the saga involved a combination of detective work, technical wizardry, historical research, and connoisseurship. The interdisciplinary effort, which wove together art and science, was worthy of Leonardo, who would have appreciated the interplay between those who love the humanities and those who love technology.
Page 250

One of the veils blurring our knowledge of Leonardo is the mystery surrounding the authenticity and dates of some of his paintings, including ones we think are lost and others we think are finds. Like most artist-craftsmen of his era, he did not sign his work. Although he copiously documented trivial items in his notebooks, including the amount he spent on food and on Salai’s clothes, he did not record what he was painting, what he had completed, and where his works went. For some paintings we have detailed contracts and disputes to inform us; for others we have to rely on a snippet from the sometimes reliable Vasari or other early chronicles.
Page 325

… we need to look at copies done by his followers to envision works now lost, such as the Battle of Anghiari, and to analyze what were thought to be works by his followers to see if they might actually be autograph Leonardos. These endeavors can be frustrating, but even when they do not produce certainty, they can lead to a better understanding of Leonardo, as we saw in the case of La Bella Principessa.
Page 325

In 2011 a newly rediscovered painting by Leonardo surprised the art world. Each decade, a dozen or so pieces are proposed or pushed as having a reasonable claim to be previously unknown Leonardos, but only twice before in modern times had such assertions ended up generally accepted
Page 329

One of the striking things about Da Vinci is just how little work is actually attributed to him. And yet, two of his paintings (The Mona Lisa and The Last Supper) are arguably the most famous paintings ever made. There is, of course, lots more of interest in the book, but I’ll leave you with a concept that he invented, called Sfumato:

The term sfumato derives from the Italian word for “smoke,” or more precisely the dissipation and gradual vanishing of smoke into the air. “Your shadows and lights should be blended without lines or borders in the manner of smoke losing itself in the air,” he wrote in a series of maxims for young painters. From the eyes of his angel in Baptism of Christ to the smile of the Mona Lisa, the blurred and smoke-veiled edges allow a role for our own imagination. With no sharp lines, enigmatic glances and smiles can flicker mysteriously.
Page 41

Sfumato is not merely a technique for modeling reality more accurately in a painting. It is an analogy for the blurry distinction between the known and the mysterious, one of the core themes of Leonardo’s life. Just as he blurred the boundaries between art and science, he did so to the boundaries between reality and fantasy, between experience an mystery, between objects and their surroundings.
Page 270

The Public Domain

Leave a Comment / Arts & Letters, Culture, Science & Technology / Mark / January 11, 2015

I got curious about the Public Domain recently and was surprised by what I found. On the first day of each year, Public Domain Day celebrates the moment when copyrights expire, enter the Public Domain, and join their brethren, such as the plays of Shakespeare, the music of Mozart, and the books of Dickens. Once in the Public Domain, a work can be freely copied, remixed, translated into other languages, and adapted into stage plays, movies, or other media, free from restrictions. Because they are free to use, they can live on in perpetuity.

Of course, rights are based on jurisdiction, so not all countries will benefit equally every year. In 2015, our neighbors up north in Canada celebrated the entrance of the writings of Rachel Carlson, Ian Fleming, and Flannery O’Connor to the Public Domain (along with hundreds of others). I’d be curious how a James Bond movie made in Canada would fare here in the U.S., as they now have the right to make such a movie. Speaking of the U.S., how many works do you think entered our Public Domain this year?

Not a single published work will enter the Public Domain this year. Next year? Nope! In fact, no published work will enter the Public Domain until 2019. This is assuming that Congress does not, once again, extend the Copyright term even longer than it is now (which is currently the Author’s lifetime plus 70 years) – which is how we ended up in this situation in the first place.

I’ve harped on this sort of thing before, so I won’t belabor the point. I was just surprised that the Public Domain was so dead in the United States. Even works that gained notoriety for being accidentally let into the public domain, like It’s a Wonderful Life, are being clamped down on. Ironically, It’s a Wonderful Life only became famous once it was in the Public Domain and thus free to televise (frequent airings led to popularity). In the 1990s, the original copyright holder seized on some obscure court precedents and reasserted their rights based on the original musical score and the short story on which the film was based. The details of this are unclear, but the result is clear as crystal: it’s not aired on TV very often anymore because NBC says they have exclusive rights (and they only air it a couple times a year) and derivative works, like a planned sequel, are continually blocked.

I don’t know of a solution, but I did want to reflect on what the year could have brought us. There goes my plans for a Vertigo remake!

The Myth of Digital Distribution

3 Comments / Computers & Internet, Movies, Science & Technology / Mark / September 15, 2013

The movie lover’s dream service would be something we could subscribe to that would give us a comprehensive selection of movies to stream. This service is easy to conceive, and it’s such an alluring idea that it makes people want to eschew tried-and-true distribution methods like DVDs and Blu-Ray. We’ve all heard the arguments before: physical media is dead, streaming is the future. When I made the move to Blu-Ray about 6 years ago, I estimated that it would take at least 10 years for a comprehensive streaming service to become feasible. The more I see, the more I think that I drastically underestimated that timeline… and am beginning to feel like it might never happen at all.

MGK illustrates the problem well with this example:

this is the point where someone says “but we’re all going digital instead” and I get irritated by this because digital is hardly an answer. First off, renting films – and when you “buy” digital movies, that’s what you’re doing almost every single time – is not the same as buying them. Second, digital delivery is getting more and more sporadic as rights get more and more expensive for distributors to purchase.

As an example, take Wimbledon, a charming little 2004 sports film/romcom starring Paul Bettany and Kirsten Dunst. I am not saying Wimbledon is an unsung treasure or anything; it’s a lesser offering from the Working Title factory that cranks out chipper British romcoms, a solid B-grade movie: well-written with a few flashes of inspiration, good performances all around (including a younger Nikolai Coster-Waldau before he became the Kingslayer) and mostly funny, although Jon Favreau’s character is just annoying. But it’s fun, and it’s less than a decade old. It should be relatively easy to catch digitally, right? But no. It’s not anywhere. And there are tons of Wimbledons out there.

Situations like this are an all too common occurrence, and not just with movies. It turns out that content owners can’t be bothered with a title unless it’s either new or in the public domain. This graph from a Rebecca Rosen article nicely illustrates the black hole that our extended copyright regime creates:

Books available by decade

Rosen explains:

[The graph] reveals, shockingly, that there are substantially more new editions available of books from the 1910s than from the 2000s. Editions of books that fall under copyright are available in about the same quantities as those from the first half of the 19th century. Publishers are simply not publishing copyrighted titles unless they are very recent.

The books that are the worst affected by this are those from pretty recent decades, such as the 80s and 90s, for which there is presumably the largest gap between what would satisfy some abstract notion of people’s interest and what is actually available.

More interpretation:

This is not a gently sloping downward curve! Publishers seem unwilling to sell their books on Amazon for more than a few years after their initial publication. The data suggest that publishing business models make books disappear fairly shortly after their publication and long before they are scheduled to fall into the public domain. Copyright law then deters their reappearance as long as they are owned. On the left side of the graph before 1920, the decline presents a more gentle time-sensitive downward sloping curve.

This is absolutely absurd, though it’s worth noting that it doesn’t control for used books (which are generally pretty easy to find on Amazon) and while content owners don’t seem to be rushing to digitize their catalog, future generations won’t experience the same issue we’re having with the 80s and 90s. Actually, I suspect they will have trouble with 80s and 90s content, but stuff from 2010 should theoretically be available on an indefinite basis because anything published today gets put on digital/streaming services.

Of course, intellectual property law being what it is, I’m sure that new proprietary formats and readers will render old digital copies obsolete, and once again, consumers will be hard pressed to see that 15 year old movie or book ported to the latest-and-greatest channel. It’s a weird and ironic state of affairs when the content owners are so greedy in hoarding and protecting their works, yet so unwilling to actually, you know, profit from them.

I don’t know what the solution is here. There have been some interesting ideas about having copyright expire for books that have been out of print for a certain period of time (say, 5-10 years), but that would only work now – again, future generations will theoretically have those digital versions available. They may be in a near obsolete format, but they’re available! It doesn’t seem likely that sensible copyright reform could be passed, but it would be nice to see if we could take a page from the open source playbook, but I’m seriously doubting that content owners would ever be that forward thinking.

As MGK noted, DVD ushered in an era of amazing availability, but much of that stuff has gone out of print, and we somehow appear to be regressing from that.

Kindle Updates

Arts & Letters, Computers & Internet, Science & Technology / Mark / May 8, 2013

I have, for the most part, been very pleased with using my Kindle Touch to read over the past couple years. However, while it got the job done, I felt like there were a lot of missed opportunities, especially when it came to metadata and personal metrics.

Well, Amazon just released a new update to their Kindle software, and mixed in with the usual (i.e. boring) updates to features I don’t use (like “Whispersinc” or Parental Controls), there was this little gem:

The Time To Read feature uses your reading speed to let you know how much time is left before you finish your chapter or before you finish your book. Your specific reading speed is stored only on your Kindle Touch; it is not stored on Amazon servers.

Hot damn, that’s exactly what I was asking for! Of course, it’s all locked down and you can’t really see what your reading speed is (or plot it over time, or by book, etc…), but this is the single most useful update to a device like this that I think I’ve ever encountered. Indeed, the fact that it tells you how much time until you finish both your chapter and the entire book is extremely useful, and it addresses my initial curmudgeonly complaints about the Kindle’s hatred of page numbers and love of percentage.

Time to Read in Action

Will finish this book in about 4 hours!

The notion of measuring book length by time mitigates the issues surrounding book length by giving you a personalized measurement that is relevant and intuitive. No more futzing with the wild variability in page numbers or Amazon’s bizarre location system, you can just peek at the remaining time, and it’s all good.

And I love that they give a time to read for both the current chapter and the entire book. One of the frustrating things about reading an ebook is that you never really knew how long it will take to read a chapter. With a physical book, you can easily flip ahead and see where the chapter ends. Now, ebooks have that personalized time, which is perfect.

I haven’t spent a lot of time with this new feature, but so far, I love it. I haven’t done any formal tracking, but it seems accurate, too (it seems like I’m reading faster than it says, but it’s close). It even seems to recognize when you’ve taken a break (though I’m not exactly sure of that). Of course, I would love it if Amazon would allow us access to the actual reading speed data in some way. I mean, I can appreciate their commitment to privacy, and I don’t think that needs to change either; I’d just like to be able to see some reports on my actual reading speed. Plot it over time, see how different books impact speed, and so on. Maybe I’m just a data visualization nerd, but think of the graphs! I love this update, but they’re still only scratching the surface here. There’s a lot more there for the taking. Let’s hope we’re on our way…

The State of Streaming

Arts & Letters, Movies, Science & Technology / Mark / April 24, 2013

So Netflix has had a good first quarter, exceeding expectations and crossing the $1 Billion revenue threshold. Stock prices have been skyrocketing, going from sub 100 to over 200 in just the past 4-5 months. Their subscriber base continues to grow, and fears that people would use the free trial to stream exclusive content like House of Cards, then bolt from the service seem unfounded. However, we’re starting to see a fundamental shift in the way Netflix is doing business here. For the first time ever, I’m seeing statements like this:

As we continue to focus on exclusive and curated content, our willingness to pay for non-exclusive, bulk content deals declines.

I don’t like the sound of that, but then, the cost of non-exclusive content seems to keep rising at an absurd level, and well, you know, it’s not exclusive. The costs have risen to somewhere on the order of $2 billion per year on content licensing and original shows. So statements like this seem like a natural outgrowth of that cost:

As we’ve gained experience, we’ve realized that the 20th documentary about the financial crisis will mostly just take away viewing from the other 19 such docs, and instead of trying to have everything, we should strive to have the best in each category. As such, we are actively curating our service rather than carrying as many titles as we can.

And:

We don’t and can’t compete on breadth with Comcast, Sky, Amazon, Apple, Microsoft, Sony, or Google. For us to be hugely successful we have to be a focused passion brand. Starbucks, not 7-Eleven. Southwest, not United. HBO, not Dish.

This all makes perfect sense from a business perspective, but as a consumer, this sucks. I don’t want to have to subscribe to 8 different services to watch 8 different shows that seem interesting to me. Netflix’s statements and priorities seem to be moving, for the first time, away from a goal of providing a streaming service with a wide, almost comprehensive selection of movies and television. Instead, we’re getting a more curated approach coupled with original content. That wouldn’t be the worst thing ever, but Netflix isn’t the only one playing this game. Amazon just released 14 pilot episodes for their own exclusive content. I’m guessing it’s only a matter of time before Hulu joins this roundalay (and for all I know, they’re already there – I’ve just hated every experience I’ve had with Hulu so much that I don’t really care to look into it). HBO is already doing its thing with HBO Go, which exlcusively streams their shows. How many other streaming services will I have to subscribe to if I want to watch TV (or movies) in the future? Like it or not, fragmentation is coming. And no one seems to be working on a comprehensive solution anymore (at least, not in a monthly subscription model – Amazon and iTunes have pretty good a la carte options). This is frustrating, and I feel like there’s a big market for this thing, but at the same time, content owners seem to be overcharging for their content. If Netflix’s crappy selection costs $2 billion a year, imagine what something even remotely comprehensive would cost (easily 5-10 times that amount, which is clearly not feasible).

Incidentally, Netflix’s third exclusive series, Hemlock Grove, premiered this past weekend. I tried to watch the first episode, but I fell asleep. What I remember was pretty shlockey and not particularly inspiring… but I have a soft spot for cheesy stuff like this, so I’ll give it another chance. Still, the response seems a bit mixed on this one. I did really end up enjoying House of Cards, but I’m not sure how much I’m going to stick with Hemlock Grove…

What’s in a Book Length?

4 Comments / Arts & Letters, Computers & Internet, Science & Technology / Mark / January 6, 2013

I mentioned recently that book length is something that’s been bugging me. It seems that we have a somewhat elastic relationship with length when it comes to books. The traditional indicator of book length is, of course, page number… but due to variability in font size, type, spacing, format, media, and margins, the hallowed page number may not be as concrete as we’d like. Ebooks theoretically provide an easier way to maintain a consistent measurement across different books, but it doesn’t look like anyone’s delivered on that promise. So how are we to know the lengths of our books? Fair warning, this post is about to get pretty darn nerdy, so read on at your own peril.

In terms of page numbers, books can vary wildly. Two books with the same amount of pages might be very different in terms of actual length. Let’s take two examples: Gravity’s Rainbow (784 pages) and Harry Potter and the Goblet of Fire (752 pages). Looking at page number alone, you’d say that Gravity’s Rainbow is only slightly longer than Goblet of Fire. With the help of the magical internets, let’s a closer look at the print inside the books (click image for a bigger version):

As you can see, there is much more text on the page in Gravity’s Rainbow. Harry Potter has a smaller canvas to start with (at least, in terms of height), but larger margins, more line spacing, and I think even a slightly larger font. I don’t believe it would be an exaggeration to say that when you take all this into account, the Harry Potter book is probably less than half the length of Gravity’s Rainbow. I’d estimate it somewhere on the order of 300-350 pages. And that’s even before we get into things like vocabulary and paragraph breaks (which I assume would also serve to inflate Harry Potter’s length.) Now, this is an extreme example, but it illustrates the variability of page numbers.

Ebooks present a potential solution. Because Ebooks have different sized screens and even allow the reader to choose font sizes and other display options, page numbers start to seem irrelevant. So Ebook makers devised what’s called reflowable documents, which adapt their presentation to the output device. For example, Amazon’s Kindle uses an Ebook format that is reflowable. It does not (usually) feature page numbers, instead relying on a percentage indicator and the mysterious “Location” number.

The Location number is meant to be consistent, no matter what formatting options you’re using on your ereader of choice. Sounds great, right? Well, the problem is that the Location number is pretty much just as arbitrary as page numbers. It is, of course, more granular than a page number, so you can easily skip to the exact location on multiple devices, but as for what actually constitutes a single “Location Number”, that is a little more tricky.

In looking around the internets, it seems there is distressingly little information about what constitutes an actual Location. According to this thread on Amazon, someone claims that: “Each location is 128 bytes of data, including formatting and metadata.” This rings true to me, but unfortunately, it also means that the Location number is pretty much meaningless.

The elastic relationship we have with book length is something I’ve always found interesting, but what made me want to write this post was when I wanted to pick a short book to read in early December. I was trying to make my 50 book reading goal, so I wanted something short. In looking through my book queue, I saw Alfred Bester’s classic SF novel The Stars My Destination. It’s one of those books I consistently see at the top of best SF lists, so it’s always been on my radar, and looking at Amazon, I saw that it was only 236 pages long. Score! So I bought the ebook version and fired up my Kindle only to find that in terms of locations, it’s the longest book I have on my Kindle (as of right now, I have 48 books on there). This is when I started looking around at Locations and trying to figure out what they meant. As it turns out, while the Location numbers provide a consistent reference within the book, they’re not at all consistent across books.

I did a quick spot check of 6 books on my Kindle, looking at total Location numbers, total page numbers (resorting to print version when not estimated by Amazon), and file size of the ebook (in KB). I also added a column for Locations per page number and Locations per KB. This is an admittedly small sample, but what I found is that there is little consistency among any of the numbers. The notion of each Location being 128 bytes of data seems useful at first, especially when you consider that the KB information is readily available, but because that includes formatting and metadata, it’s essentially meaningless. And the KB number also includes any media embedded in the book (i.e. illustrations crank up the KB, which distorts any calculations you might want to do with that data).

It turns out that The Stars My Destination will probably end up being relatively short, as the page numbers would imply. There’s a fair amount of formatting within the book (which, by the way, doesn’t look so hot on the Kindle), and doing spot checks of how many Locations I pass when cycling to the next screen, it appears that this particular ebook is going at a rate of about 12 Locations per cycle, while my previous book was going at a rate of around 5 or 6 per cycle. In other words, while the total Locations for The Stars My Destination were nearly twice what they were for my previously read book, I’m also cycling through Locations at double the rate. Meaning that, basically, this is the same length as my previous book.

Various attempts have been made to convert Location numbers to page numbers, with low degrees of success. This is due to the generally elastic nature of a page, combined with the inconsistent size of Locations. For most books, it seems like dividing the Location numbers by anywhere from 12-16 (the linked post posits dividing by 16.69, but the books I checked mostly ranged from 12-16) will get you a somewhat accurate page number count that is marginally consistent with print editions. Of course, for The Stars My Destination, that won’t work at all. For that book, I have to divide by 40.86 to get close to the page number.

Why is this important at all? Well, there’s clearly an issue with ebooks in academia, because citations are so important for that sort of work. Citing a location won’t get readers of a paper anywhere close to a page number in a print edition (whereas, even using differing editions, you can usually track down the quote relatively easily if a page number is referenced). On a personal level, I enjoy reading ebooks, but one of the things I miss is the easy and instinctual notion of figuring out how long a book will take to read just by looking at it. Last year, I was shooting for reading quantity, so I wanted to tackle shorter books (this year, I’m trying not to pay attention to length as much and will be tackling a bunch of large, forbidding tomes, but that’s a topic for another post)… but there really wasn’t an easily accessible way to gauge the length. As we’ve discovered, both page numbers and Location numbers are inconsistent. In general, the larger the number, the longer the book, but as we’ve seen, that can be misleading in certain edge cases.

So what is the solution here? Well, we’ve managed to work with variable page numbers for thousands of years, so maybe no solution is really needed. A lot of newer ebooks even contain page numbers (despite the variation in display), so if we can find a way to make that more consistent, that might help make things a little better. But the ultimate solution would be to use something like Word Count. That’s a number that might not be useful in the midst of reading a book, but if you’re really looking to determine the actual length of the book, Word Count appears to be the best available measurement. It would also be quite easily calculated for ebooks. Is it perfect? Probably not, but it’s better than page numbers or location numbers.

In the end, I enjoy using my Kindle to read books, but I wish they’d get on the ball with this sort of stuff. If you’re still reading this (Kudos to you) and want to read some more babbling about ebooks and where I think they should be going, check out my initial thoughts and my ideas for additional metadata and the gamification of reading. The notion of ereaders really does open up a whole new world of possibilities… it’s a shame that Amazon and other ereader companies keep their platforms so locked down and uninteresting. Of course, reading is its own reward, but I really feel like there’s a lot more we can be doing with our ereader software and hardware.

Web browsers I have known, 1996-2012

5 Comments / Computers & Internet, Science & Technology / Mark / August 8, 2012

Jason Kottke recently recapped all of the browsers he used as his default for the past 18 years. It sounded like fun, so I’m going to shamelessly steal the idea and list out my default browsers for the past 16 years (prior to 1996, I was stuck in the dark ages of dialup AOL – but once I went away to college and discovered the joys of T1/T3 connections, my browsing career started in earnest, so that’s when I’m starting this list).

1996 – Netscape Navigator 3 – This was pretty much the uncontested king of browsers at the time, but it’s reign would be short. I had a copy of IE3 (I think?) on my computer too, but I almost never used it…
1997-1998 – Netscape Communicator 4 – Basically Netscape Navigator 4, but the Communicator was a whole suite of applications which appealed to me at the time. I used it for email and even to start playing with some HTML editing (though I would eventually abandon everything but the browser from this suite). IE4 did come out sometime in this timeframe and I used it occasionally, but I think I stuck with NN4 way longer than I probably should have.
1999-2000 – Internet Explorer 5 – With the release of IE5 and the increasing issues surrounding NN4, I finally jumped ship to Microsoft. I was never particularly comfortable with IE though, and so I was constantly looking for alternatives and trying new things. I believe early builds of Mozilla were available, and I kept downloading the updates in the hopes that it would allow me to dispense with IE, but it was still early in the process for Mozilla. This was also my first exposure to Opera, which at the time wasn’t that remarkable (we’re talking version 3.5 – 4 here) except that, as usual, they were ahead of the curve on tabbed browsing (a mixed blessing, as monitor resolutions at the time weren’t great). Opera was also something you had to pay for at the time, and a lot of sites didn’t work in Opera. This would all change at the end of 2000, though, with the release of Opera 5.
2001 – Opera 5 – This browser changed everything for me. It was the first “free” Opera browser available, although the free version was ad-supported (quite annoying, but it was easy enough to get rid of the ads). The thing that was revolutionary about this browser, though, was mouse gestures. It was such a useful feature, and Opera’s implementation was (and quite frankly, still is) the best, smoothest implementation of the functionality I’ve seen. At this point, I was working at a website, so for work, I was still using IE5 and IE6 as my primary browser (because at the time, they represented something like 85-90% of the traffic to our site). I was also still experimenting with the various Mozilla-based browsers at the time as well, but Opera was my default for personal browsing. Of course, no one codes for Opera, so there were plenty of sites that I’d have to fire up IE for (this has always been an issue with Opera)
2002-2006 – Opera 6/7/8/9 – I pretty much kept rolling with Opera during this timeframe. Again, for my professional use, IE6/IE7 was still a must, but in 2004, Firefox 1.0 launched, so that added another variable to the mix. I wasn’t completely won over by the initial Firefox offerings, but it was the first new browser in a long time that I thought had a bright future. It also provided a credible alternative for when Opera crapped out on a weirdly coded page. However, as web standards started to actually be implemented, Opera’s issues became fewer as time went on…
2007 – Firefox 2/Opera 9 – It was around this time that Firefox started to really assert itself in my personal and professional usage. I still used Opera a lot for personal usage, but for professional purposes, Firefox was a simple must. At the time, I was embroiled in a year-long site redesign project for my company, and I was doing a ton of HTML/CSS/JavaScript development… Firefox was an indispensable tool at the time, mostly due to extensions like Firebug and the Web-Developer Toolbar. I suppose I should note that Safari first came to my attention at this point, mostly for troubleshooting purposes. I freakin’ hate that browser.
2008-2011 – Firefox/Opera – After 2007, there was a slow, inexorable drive towards Firefox. Opera kept things interesting with a feature they call Speed Dial (and quite frankly, I like that feature much better than what Chrome and recent versions of Firefox have implemented), but the robust and mature list of extensions for Firefox were really difficult to compete with, especially when I was trying to get stuff done. Chrome also started to gain popularity in this timeframe, but while I loved how well it loaded Ajax and other JavaScript-heavy features, I could never really get comfortable with the interface. Firefox still afforded more control, and Opera’s experience was generally better.
2012/Present – Firefox – Well, I think it’s pretty telling that I’m composing this post on Firefox. That being said, I still use Opera for simple browsing purposes semi-frequently. Indeed, I usually have both browsers open at all times on my personal computer. At work, I’m primarily using Firefox, but I’m still forced to use IE8, as our customers tend to still prefer IE (though the percentage is much less these days). I still avoid Safari like the plague (though I do sometimes need to troubleshoot and I suppose I do use Mobile Safari on my phone). I think I do need to give Chrome a closer look, as it’s definitely more attractive these days…

Well, there you have it. I do wonder if I’ll ever get over my stubborn love for Opera, a browser that almost no one but me uses. They really do manage to keep up with the times, and have even somewhat recently allowed Firefox and Chrome style extensions, though I think it’s a little too late for them. FF and Chrome just have a more robust community surrounding their development than Opera. I feel like it’s a browser fated to die at some point, but I’ll probably continue to use it until it does… So what browser do you use?

More Disgruntled, Freakish Reflections on ebooks and Readers

Arts & Letters, Computers & Internet, Disgruntled, Freakish Reflections, Science & Technology / Mark / April 11, 2012

While I have some pet peeves with the Kindle, I’ve mostly found it to be a good experience. That being said, there are some things I’d love to see in the future. These aren’t really complaints, as some of this stuff isn’t yet available, but there are a few opportunities afforded by the electronic nature of eBooks that would make the whole process better.

The Display – The electronic ink display that the basic Kindles use is fantastic… for reading text. Once you get beyond simple text, things are a little less fantastic. Things like diagrams, artwork, and photography aren’t well represented in e-ink, and even in color readers (like the iPad or Kindle Fire), there are issues with resolution and formatting that often show up in eBooks. Much of this comes down to technology and cost, both of which are improving quickly. Once stuff like IMOD displays start to deliver on their promise (low power consumption, full color, readable in sunlight, easy on the eyes, capable of supporting video, etc…), we should see a new breed of reader.
I’m not entirely sure how well this type of display will work, at least initially. For instance, how will it compare to the iPad 3’s display? What’s the resolution like? How much will it cost? And so on. Current implementations aren’t full color, and I suspect that future iterations will go through a phase where the tech isn’t quite there yet… but I think it will be good enough to move forward. I think Amazon will most certainly jump on this technology when it becomes feasible (both from a technical and cost perspective). I’m not sure if Apple would switch though. I feel like they’d want a much more robust and established display before they committed.
General Metrics and Metadata – While everyone would appreciate improvements in device displays, I’m not sure how important this would be. Maybe it’s just me, but I’d love to see a lot more in the way of metadata and flexibility, both about the book and about device usage. With respect to the book itself, this gets to the whole page number issue I was whinging about in my previous post, but it’s more than that. I’d love to see a statistical analysis of what I’m reading, on both individual and collective levels.
I’m not entirely sure what this looks like, but it doesn’t need to be rocket science. Simple Flesch-Kincaid grades seems like an easy enough place to start, and it would be pretty simple to implement. Calculating such things for my entire library (or a subset of my library), or ranking my library by grade (or similar sorting methods) would be interesting. I don’t know that this would provide a huge amount of value, but I would personally find it very illuminating and fun to play around with… and it would be very easy to implement. Individual works wouldn’t even require any processing power on the reader, it could be part of the download. Doing calculations of your collective library might be a little more complicated, but even that could probably be done in the cloud.

Other metadata would also be interesting to view. For example, Goodreads will graph your recently read books by year of publication – a lot of analysis could be done about your collection (or a sub-grouping of your collection) of books along those lines. Groupings by decade or genre or reading level, all would be very interesting to know.
Personal Metrics and Metadata – Basically, I’d like to have a way to track my reading speed. For whatever reason, this is something I’m always trying to figure out for myself. I’ve never gone through the process of actually recording my reading habits and speeds because it would be tedious and manual and maybe not even all that accurate. But now that I’m reading books in an electronic format, there’s no reason why the reader couldn’t keep track of what I’m reading, when I’m reading, and how fast I’m reading. My anecdotal experience suggests that I read anywhere from 20-50 pages an hour, depending mostly on the book. As mentioned in the previous post, a lot of this has to do with the arbitrary nature of page numbers, so perhaps standardizing to a better metric (words per minute or something like that) would normalize my reading speed.
Knowing my reading speed and graphing changes over time could be illuminating. I’ve played around a bit with speed reading software, and the results are interesting, but not drastic. In any case, one thing that would be really interesting to know when reading a book would be how much time you have left before you finish. Instead of having 200 pages, maybe you have 8 hours of reading time left.

Combining my personal data with the general data could also yield some interesting results. Maybe I read trashy SF written before 1970 much faster than more contemporary literary fiction. Maybe I read long books faster than short books. There are a lot of possibilities here.

There are a few catches to this whole personal metrics thing though. You’d need a way to account for breaks and interruptions. I might spend three hours reading tonight, but I’m sure I’ll take a break to get a glass of water or answer a phone call, etc… There’s not really an easy way around this, though there could be mitigating factors like when the reader goes to sleep mode or something like that. Another problem is that one device can be used by multiple people, which would require some sort of profile system. That might be fine, but it also adds a layer of complexity to the interface that I’m sure most companies would like to avoid. The biggest and most concerning potential issue is that of privacy. I’d love to see this information about myself, but would I want Amazon to have access to it? On the other hand, being able to aggregate data from all Kindles might prove interesting in its own right. Things like average reading speed, number of books read in a year, and so on. All interesting and useful info.

This would require an openness and flexibility that Amazon has not yet demonstrated. It’s encouraging that the Kindle Fire runs a flavor of Android (an open source OS), but on the other hand, it’s a forked version that I’m sure isn’t as free (as in speech) as I’d like (and from what I know, the Fire is partially limited by its hardware). Expecting comprehensive privacy controls from Amazon seems naive.

I’d like to think that these metrics would be desirable to a large audience of readers, but I really have no inclination what the mass market appeal would be. It’s something I’d actually like to see in a lot of other places too. Video games, for instance, provide a lot of opportunity for statistics, and some games provide a huge amount of data on your gaming habits (be it online or in a single player mode). Heck, half the fun of sports games (or sports in general) is tracking the progress of your players (particularly prospects). Other games provide a lack of depth that is most baffling. People should be playing meta-games like Fantasy Baseball, but with MLB The Show providing the data instead of real life.
The Gamification of Reading – Much of the above wanking about metrics could probably be summarized as a way to make reading a game. The metrics mentioned above readily lend themselves to point scores, social-app-like badges, and leaderboards. I don’t know that this would necessarily be a good thing, but it could make for an intriguing system. There’s an interesting psychology at work in systems like this, and I’d be curious to see if someone like Amazon could make reading more addictive. Assuming most people don’t try to abuse the system (though there will always be a cohort that will attempt to exploit stuff like this), it could ultimately lead to beneficial effects for individuals who “play” the game competitively with their friends. Again, this isn’t necessarily a good thing. Perhaps the gamification of reading will lead to a sacrifice of comprehension in the name of speed, or other mitigating effects. Still, it would be nice to see the “gamification of everything” used for something other than a way for companies to trick customers into buying their products.

As previously mentioned, the need for improved displays is a given (and not just for ereaders). But assuming these nutty metrics (and the gamification of reading) are an appealing concept, I’d like to think that it would provide an opening for someone to challenge Amazon in the market. An open, flexible device using a non-DRMed format and tied to a common store would be very nice. Throw in some game elements, add a great display, and you’ve got something close to my ideal reader. Unfortunately, it doesn’t seem like we’re all that close just yet. Maybe in 5-10 years? Seems possible, but it’s probably more likely that Amazon will continue its dominance.

Zemanta

Computers & Internet, Science & Technology, Weblogs / Mark / February 15, 2012

Last week, I looked at commonplace books and various implementation solutions. Ideally, I wanted something open and flexible that would also provide some degree of analysis in addition to the simple data aggregation most tools provide. I wanted something that would take into account a wide variety of sources in addition to my own writing (on this blog, for instance). Most tools provide a search capability of some kind, but I was hoping for something more advanced. Something that would make connections between data, or find similarities with something I’m currently writing.

At a first glance, Zemanta seemed like a promising candidate. It’s a “content suggestion engine” specifically built for blogging and it comes pre-installed on a lot of blogging software (including Movable Type). I just had to activate it, which was pretty simple. Theoretically, it continually scans a post in progress (like this one) and provides content recommendations, ranging from simple text links defining key concepts (i.e. links to Wikipedia, IMDB, Amazon, etc…), to imagery (much of which seems to be integrated with Flickr and Wikipedia), to recommended blog posts from other folks’ blogs. One of the things I thought was really neat was that I could input my own blogs, which would then give me more personalized recommendations.

Unfortunately, results so far have been mixed. There are some things I really like about Zemanta, but it’s pretty clearly not the solution I’m looking for. Some assorted thoughts:

Zemanta will only work when using the WYSIWYG Rich Text editor, which turns out to be a huge pain in the arse. I’m sure lots of people are probably fine with that, but I’ve been editing my blog posts in straight HTML for far too long. I suppose this is more of a hangup on my end than a problem with Zemanta, but it’s definitely something I find annoying. When I write a post in WYSIWYG format, I invariably switch it back to no formatting and jump through a bunch of hoops getting the post to look like what I want.
The recommended posts haven’t been very useful so far. Some of the external choices are interesting, but so far, nothing has really helped me in writing my posts. I was really hoping that loading my blog into Zemanta would add a lot of value, but it turns out that Zemanta only really scanned my recent posts, and it sorta recommended most of them, which doesn’t really help me that much. I know what I’ve written recently, what I was hoping for was that Zemanta would be able to point out some post I wrote in 2005 along similar lines (In my previous post on Taxonomy Platforms, I specifically referenced the titles of some of my old blog posts, but since they were old, Zemanta didn’t find them and recommend them. Even more annoying, when writing this post, the Taxonomy Platforms post wasn’t one of the recommended articles despite my specifically mentioning it. Update: It has it now, but it didn’t seem to appear until after I’d already gone through the trouble of linking it…) It appears that Zemanta is basing all of this on my RSS feed, which makes sense, but I wish there was a way to upload my full archives, as that might make this tool a little more powerful…
The recommendations seem to be based on a relatively simplistic algorithm. A good search engine will index data and learn associations between individual words by tracking their frequency and how close they are to other words. Zemanta doesn’t seem to do that. In my previous post, I referenced famous beer author Michael Jackson. What did Zemanta recommend? Lots of pictures and articles about the musician, nothing about the beer journalist. I don’t know if I’m expecting too much out of the system, but it would be nice if the software would pick up on the fact that this guy’s name was showing up near lots of beer talk, with nary a reference to music. It’s probably too much to hope that my specifically calling out that I was talking about “the beer critic, not the pop star” would influence the system (and indeed, my reference to “pop star” may have influenced the recommendations, despite the fact that I was trying to negate that).
The “In-Text Links”, on the other hand, seem to come in quite handy. I actually leveraged several of them in my past few posts, and they were very easy to use. Indeed, I particularly appreciated their integration with Amazon, where I could enter my associates ID, and the links that were inserted were automatically generated with my ID. This is normally a pretty intensive process involving multiple steps that has been simplified down to the press of a button. Very well done, and most of the suggestions there were very relevant.

I will probably continue to play with Zemanta, but I suspect it will be something that doesn’t last much longer. It provides some value, but it’s ultimately not as convenient as I’d like, and it’s analysis and recommendation functions don’t seem as useful as I’d like.

I’ve also been playing around with Evernote more and more, and I feel like that could be a useful tool, despite the fact that it doesn’t really offer any sort of analysis (though it does have a simple search function). There’s at least one third party, though, that seems to be positioning itself as an analysis tool that will integrate with Evernote. That tool is called Topicmarks. Unfortunately, I seem to be having some issues integrating my Evernote data with that service. At this rate, I don’t know that I’ll find a great tool for what I want, but it’s an interesting subject, and I’m guessing it will be something that will become more and more important as time goes on. We’re living in the Information Age, it seems only fair that our aggregation and analysis tools get more sophisticated.

Commonplacing

2 Comments / Computers & Internet, Science & Technology / Mark / February 8, 2012

During the Enlightenment, most intellectuals kept what’s called a Commonplace Book. Basically, folks like John Locke or Mark Twain would curate transcriptions of interesting quotes from their readings. It was a personalized record of interesting ideas that the author encountered. When I first heard about the concept, I immediately started thinking of how I could implement one… which is when I realized that I’ve actually been keeping one, more or less, for the past decade or so on this blog. It’s not very organized, though, and it’s something that’s been banging around in my head for the better part of the last year or so.

Locke was a big fan of Commonplace Books, and he spent years developing an intricate system for indexing his books’ content. It was, of course, a ridiculous and painstaking process, but it worked. Fortunately for us, this is exactly the sort of thing that computer systems excel at, right? The reason I’m writing this post is a small confluence of events that has lead me to consider creating a more formal Commonplace Book. Despite my earlier musing on the subject, this blog doesn’t really count. It’s not really organized correctly, and I don’t publish all the interesting quotes that I find. Even if I did, it’s not really in a format that would do me much good. So I’d need to devise another plan.

Why do I need a plan at all? What’s the benefit of a commonplace book? Well, I’ve been reading Steven Johnson’s book Where Good Ideas Come From: The Natural History of Innovation and he mentions how he uses a computerized version of the commonplace book:

For more than a decade now, I have been curating a private digital archive of quotes that I’ve found intriguing, my twenty-first century version of the commonplace book. … I keep all these quotes in a database using a program called DEVONthink, where I also store my own writing: chapters, essays, blog posts, notes. By combining my own words with passages from other sources, the collection becomes something more than just a file storage system. It becomes a digital extension of my imperfect memory, an archive of all my old ideas, and the ideas that have influenced me.

This DEVONthink software certainly sounds useful. It’s apparently got this fancy AI that will generate semantic connections between quotes and what you’re writing. It’s advanced enough that many of those connections seem to be subtle and “lyrical”, finding connections you didn’t know you were looking for. It sounds perfect except for the fact that it only runs on Mac OSX. Drats. It’s worth keeping in mind in case I ever do make the transition from PC to Mac, but it seems like lunacy to do so just to use this application (which, for all I know, will be useless to me).

As sheer happenstance, I’ve also been playing around with Pinterest lately, and it occurs to me that it’s a sort of commonplace book, albeit one with more of a narrow focus on images and video (and recipes?) than quotes. There are actually quite a few sites like that. I’ve been curating a large selection of links on Delicious for years now (1600+ links on my account). Steven Johnson himself has recently contributed to a new web startup called Findings, which is primarily concerned with book quotes. All of this seems rather limiting, and quite frankly, I don’t want to be using 7 completely different tools to do the same thing, but for different types of media.

I also took a look at Tumblr again, this time evaluating it from a commonplacing perspective. There are some really nice things about the interface and the ease with which you can curate your collection of media. The problem, though, is that their archiving system is even more useless than most blog software. It’s not quite the hell that is Twitter archives, but that’s a pretty low bar. Also, as near as I can tell, the data is locked up on their server, which means that even if I could find some sort of indexing and analysis tool to run through my data, I won’t really be able to do so (Update: apparently Tumblr does have a backup tool, but only for use with OSX. Again!? What is it with you people? This is the internet, right? How hard is it to make this stuff open?)

Evernote shows a lot of promise and probably warrants further examination. It seems to be the go-to alternative for lots of researchers and writers. It’s got a nice cloud implementation with a robust desktop client and the ability to export data as I see fit. I’m not sure if its search will be as sophisticated as what I ultimately want, but it could be an interesting tool.

Ultimately, I’m not sure the tool I’m looking for exists. DEVONthink sounds pretty close, but it’s hard to tell how it will work without actually using the damn thing. The ideal would be a system where you can easily maintain a whole slew of data and metadata, to the point where I could be writing something (say a blog post or a requirements document for my job) and the tool would suggest relevant quotes/posts based on what I’m writing. This would probably be difficult to accmomplish in real-time, but a “Find related content” feature would still be pretty awesome. Anyone know of any alternatives?

Update: Zemanta! I completely forgot about this. It comes installed by default with my blogging software, but I had turned it off a while ago because it took forever to load and was never really that useful. It’s basically a content recommendation engine, pulling content from lots of internet sources (notably Wikipedia, Amazon, Flickr and IMDB). It’s also grown considerably in the time since I’d last used it, and it now features a truckload of customization options, including the ability to separate general content recommendations from your own, personally curated sources. So far, I’ve only connected my two blogs to the software, but it would be interesting if I could integrate Zemanta with Evernote, Delicious, etc… I have no idea how great the recommendations will be (or how far back it will look on my blogs), but this could be exactly what I was looking for. Even if integration with other services isn’t working, I could probably create myself another blog just for quotes, and then use that blog with Zemanta. I’ll have to play around with this some more, but I’m intrigued by the possibilities