Science & Technology

The Public Domain

I got curious about the Public Domain recently and was surprised by what I found. On the first day of each year, Public Domain Day celebrates the moment when copyrights expire, enter the Public Domain, and join their brethren, such as the plays of Shakespeare, the music of Mozart, and the books of Dickens. Once in the Public Domain, a work can be freely copied, remixed, translated into other languages, and adapted into stage plays, movies, or other media, free from restrictions. Because they are free to use, they can live on in perpetuity.

Of course, rights are based on jurisdiction, so not all countries will benefit equally every year. In 2015, our neighbors up north in Canada celebrated the entrance of the writings of Rachel Carlson, Ian Fleming, and Flannery O’Connor to the Public Domain (along with hundreds of others). I’d be curious how a James Bond movie made in Canada would fare here in the U.S., as they now have the right to make such a movie. Speaking of the U.S., how many works do you think entered our Public Domain this year?

Not a single published work will enter the Public Domain this year. Next year? Nope! In fact, no published work will enter the Public Domain until 2019. This is assuming that Congress does not, once again, extend the Copyright term even longer than it is now (which is currently the Author’s lifetime plus 70 years) – which is how we ended up in this situation in the first place.

I’ve harped on this sort of thing before, so I won’t belabor the point. I was just surprised that the Public Domain was so dead in the United States. Even works that gained notoriety for being accidentally let into the public domain, like It’s a Wonderful Life, are being clamped down on. Ironically, It’s a Wonderful Life only became famous once it was in the Public Domain and thus free to televise (frequent airings led to popularity). In the 1990s, the original copyright holder seized on some obscure court precedents and reasserted their rights based on the original musical score and the short story on which the film was based. The details of this are unclear, but the result is clear as crystal: it’s not aired on TV very often anymore because NBC says they have exclusive rights (and they only air it a couple times a year) and derivative works, like a planned sequel, are continually blocked.

I don’t know of a solution, but I did want to reflect on what the year could have brought us. There goes my plans for a Vertigo remake!

The Myth of Digital Distribution

The movie lover’s dream service would be something we could subscribe to that would give us a comprehensive selection of movies to stream. This service is easy to conceive, and it’s such an alluring idea that it makes people want to eschew tried-and-true distribution methods like DVDs and Blu-Ray. We’ve all heard the arguments before: physical media is dead, streaming is the future. When I made the move to Blu-Ray about 6 years ago, I estimated that it would take at least 10 years for a comprehensive streaming service to become feasible. The more I see, the more I think that I drastically underestimated that timeline… and am beginning to feel like it might never happen at all.

MGK illustrates the problem well with this example:

this is the point where someone says “but we’re all going digital instead” and I get irritated by this because digital is hardly an answer. First off, renting films – and when you “buy” digital movies, that’s what you’re doing almost every single time – is not the same as buying them. Second, digital delivery is getting more and more sporadic as rights get more and more expensive for distributors to purchase.

As an example, take Wimbledon, a charming little 2004 sports film/romcom starring Paul Bettany and Kirsten Dunst. I am not saying Wimbledon is an unsung treasure or anything; it’s a lesser offering from the Working Title factory that cranks out chipper British romcoms, a solid B-grade movie: well-written with a few flashes of inspiration, good performances all around (including a younger Nikolai Coster-Waldau before he became the Kingslayer) and mostly funny, although Jon Favreau’s character is just annoying. But it’s fun, and it’s less than a decade old. It should be relatively easy to catch digitally, right? But no. It’s not anywhere. And there are tons of Wimbledons out there.

Situations like this are an all too common occurrence, and not just with movies. It turns out that content owners can’t be bothered with a title unless it’s either new or in the public domain. This graph from a Rebecca Rosen article nicely illustrates the black hole that our extended copyright regime creates:

Books available by decade

Rosen explains:

[The graph] reveals, shockingly, that there are substantially more new editions available of books from the 1910s than from the 2000s. Editions of books that fall under copyright are available in about the same quantities as those from the first half of the 19th century. Publishers are simply not publishing copyrighted titles unless they are very recent.

The books that are the worst affected by this are those from pretty recent decades, such as the 80s and 90s, for which there is presumably the largest gap between what would satisfy some abstract notion of people’s interest and what is actually available.

More interpretation:

This is not a gently sloping downward curve! Publishers seem unwilling to sell their books on Amazon for more than a few years after their initial publication. The data suggest that publishing business models make books disappear fairly shortly after their publication and long before they are scheduled to fall into the public domain. Copyright law then deters their reappearance as long as they are owned. On the left side of the graph before 1920, the decline presents a more gentle time-sensitive downward sloping curve.

This is absolutely absurd, though it’s worth noting that it doesn’t control for used books (which are generally pretty easy to find on Amazon) and while content owners don’t seem to be rushing to digitize their catalog, future generations won’t experience the same issue we’re having with the 80s and 90s. Actually, I suspect they will have trouble with 80s and 90s content, but stuff from 2010 should theoretically be available on an indefinite basis because anything published today gets put on digital/streaming services.

Of course, intellectual property law being what it is, I’m sure that new proprietary formats and readers will render old digital copies obsolete, and once again, consumers will be hard pressed to see that 15 year old movie or book ported to the latest-and-greatest channel. It’s a weird and ironic state of affairs when the content owners are so greedy in hoarding and protecting their works, yet so unwilling to actually, you know, profit from them.

I don’t know what the solution is here. There have been some interesting ideas about having copyright expire for books that have been out of print for a certain period of time (say, 5-10 years), but that would only work now – again, future generations will theoretically have those digital versions available. They may be in a near obsolete format, but they’re available! It doesn’t seem likely that sensible copyright reform could be passed, but it would be nice to see if we could take a page from the open source playbook, but I’m seriously doubting that content owners would ever be that forward thinking.

As MGK noted, DVD ushered in an era of amazing availability, but much of that stuff has gone out of print, and we somehow appear to be regressing from that.

Kindle Updates

I have, for the most part, been very pleased with using my Kindle Touch to read over the past couple years. However, while it got the job done, I felt like there were a lot of missed opportunities, especially when it came to metadata and personal metrics.

Well, Amazon just released a new update to their Kindle software, and mixed in with the usual (i.e. boring) updates to features I don’t use (like “Whispersinc” or Parental Controls), there was this little gem:

The Time To Read feature uses your reading speed to let you know how much time is left before you finish your chapter or before you finish your book. Your specific reading speed is stored only on your Kindle Touch; it is not stored on Amazon servers.

Hot damn, that’s exactly what I was asking for! Of course, it’s all locked down and you can’t really see what your reading speed is (or plot it over time, or by book, etc…), but this is the single most useful update to a device like this that I think I’ve ever encountered. Indeed, the fact that it tells you how much time until you finish both your chapter and the entire book is extremely useful, and it addresses my initial curmudgeonly complaints about the Kindle’s hatred of page numbers and love of percentage.

Time to Read in Action

Will finish this book in about 4 hours!

The notion of measuring book length by time mitigates the issues surrounding book length by giving you a personalized measurement that is relevant and intuitive. No more futzing with the wild variability in page numbers or Amazon’s bizarre location system, you can just peek at the remaining time, and it’s all good.

And I love that they give a time to read for both the current chapter and the entire book. One of the frustrating things about reading an ebook is that you never really knew how long it will take to read a chapter. With a physical book, you can easily flip ahead and see where the chapter ends. Now, ebooks have that personalized time, which is perfect.

I haven’t spent a lot of time with this new feature, but so far, I love it. I haven’t done any formal tracking, but it seems accurate, too (it seems like I’m reading faster than it says, but it’s close). It even seems to recognize when you’ve taken a break (though I’m not exactly sure of that). Of course, I would love it if Amazon would allow us access to the actual reading speed data in some way. I mean, I can appreciate their commitment to privacy, and I don’t think that needs to change either; I’d just like to be able to see some reports on my actual reading speed. Plot it over time, see how different books impact speed, and so on. Maybe I’m just a data visualization nerd, but think of the graphs! I love this update, but they’re still only scratching the surface here. There’s a lot more there for the taking. Let’s hope we’re on our way…

The State of Streaming

So Netflix has had a good first quarter, exceeding expectations and crossing the $1 Billion revenue threshold. Stock prices have been skyrocketing, going from sub 100 to over 200 in just the past 4-5 months. Their subscriber base continues to grow, and fears that people would use the free trial to stream exclusive content like House of Cards, then bolt from the service seem unfounded. However, we’re starting to see a fundamental shift in the way Netflix is doing business here. For the first time ever, I’m seeing statements like this:

As we continue to focus on exclusive and curated content, our willingness to pay for non-exclusive, bulk content deals declines.

I don’t like the sound of that, but then, the cost of non-exclusive content seems to keep rising at an absurd level, and well, you know, it’s not exclusive. The costs have risen to somewhere on the order of $2 billion per year on content licensing and original shows. So statements like this seem like a natural outgrowth of that cost:

As we’ve gained experience, we’ve realized that the 20th documentary about the financial crisis will mostly just take away viewing from the other 19 such docs, and instead of trying to have everything, we should strive to have the best in each category. As such, we are actively curating our service rather than carrying as many titles as we can.

And:

We don’t and can’t compete on breadth with Comcast, Sky, Amazon, Apple, Microsoft, Sony, or Google. For us to be hugely successful we have to be a focused passion brand. Starbucks, not 7-Eleven. Southwest, not United. HBO, not Dish.

This all makes perfect sense from a business perspective, but as a consumer, this sucks. I don’t want to have to subscribe to 8 different services to watch 8 different shows that seem interesting to me. Netflix’s statements and priorities seem to be moving, for the first time, away from a goal of providing a streaming service with a wide, almost comprehensive selection of movies and television. Instead, we’re getting a more curated approach coupled with original content. That wouldn’t be the worst thing ever, but Netflix isn’t the only one playing this game. Amazon just released 14 pilot episodes for their own exclusive content. I’m guessing it’s only a matter of time before Hulu joins this roundalay (and for all I know, they’re already there – I’ve just hated every experience I’ve had with Hulu so much that I don’t really care to look into it). HBO is already doing its thing with HBO Go, which exlcusively streams their shows. How many other streaming services will I have to subscribe to if I want to watch TV (or movies) in the future? Like it or not, fragmentation is coming. And no one seems to be working on a comprehensive solution anymore (at least, not in a monthly subscription model – Amazon and iTunes have pretty good a la carte options). This is frustrating, and I feel like there’s a big market for this thing, but at the same time, content owners seem to be overcharging for their content. If Netflix’s crappy selection costs $2 billion a year, imagine what something even remotely comprehensive would cost (easily 5-10 times that amount, which is clearly not feasible).

Incidentally, Netflix’s third exclusive series, Hemlock Grove, premiered this past weekend. I tried to watch the first episode, but I fell asleep. What I remember was pretty shlockey and not particularly inspiring… but I have a soft spot for cheesy stuff like this, so I’ll give it another chance. Still, the response seems a bit mixed on this one. I did really end up enjoying House of Cards, but I’m not sure how much I’m going to stick with Hemlock Grove

What’s in a Book Length?

I mentioned recently that book length is something that’s been bugging me. It seems that we have a somewhat elastic relationship with length when it comes to books. The traditional indicator of book length is, of course, page number… but due to variability in font size, type, spacing, format, media, and margins, the hallowed page number may not be as concrete as we’d like. Ebooks theoretically provide an easier way to maintain a consistent measurement across different books, but it doesn’t look like anyone’s delivered on that promise. So how are we to know the lengths of our books? Fair warning, this post is about to get pretty darn nerdy, so read on at your own peril.

In terms of page numbers, books can vary wildly. Two books with the same amount of pages might be very different in terms of actual length. Let’s take two examples: Gravity’s Rainbow (784 pages) and Harry Potter and the Goblet of Fire (752 pages). Looking at page number alone, you’d say that Gravity’s Rainbow is only slightly longer than Goblet of Fire. With the help of the magical internets, let’s a closer look at the print inside the books (click image for a bigger version):

Pages from Gravitys Rainbow and Harry Potter and the Goblet of Fire

As you can see, there is much more text on the page in Gravity’s Rainbow. Harry Potter has a smaller canvas to start with (at least, in terms of height), but larger margins, more line spacing, and I think even a slightly larger font. I don’t believe it would be an exaggeration to say that when you take all this into account, the Harry Potter book is probably less than half the length of Gravity’s Rainbow. I’d estimate it somewhere on the order of 300-350 pages. And that’s even before we get into things like vocabulary and paragraph breaks (which I assume would also serve to inflate Harry Potter’s length.) Now, this is an extreme example, but it illustrates the variability of page numbers.

Ebooks present a potential solution. Because Ebooks have different sized screens and even allow the reader to choose font sizes and other display options, page numbers start to seem irrelevant. So Ebook makers devised what’s called reflowable documents, which adapt their presentation to the output device. For example, Amazon’s Kindle uses an Ebook format that is reflowable. It does not (usually) feature page numbers, instead relying on a percentage indicator and the mysterious “Location” number.

The Location number is meant to be consistent, no matter what formatting options you’re using on your ereader of choice. Sounds great, right? Well, the problem is that the Location number is pretty much just as arbitrary as page numbers. It is, of course, more granular than a page number, so you can easily skip to the exact location on multiple devices, but as for what actually constitutes a single “Location Number”, that is a little more tricky.

In looking around the internets, it seems there is distressingly little information about what constitutes an actual Location. According to this thread on Amazon, someone claims that: “Each location is 128 bytes of data, including formatting and metadata.” This rings true to me, but unfortunately, it also means that the Location number is pretty much meaningless.

The elastic relationship we have with book length is something I’ve always found interesting, but what made me want to write this post was when I wanted to pick a short book to read in early December. I was trying to make my 50 book reading goal, so I wanted something short. In looking through my book queue, I saw Alfred Bester’s classic SF novel The Stars My Destination. It’s one of those books I consistently see at the top of best SF lists, so it’s always been on my radar, and looking at Amazon, I saw that it was only 236 pages long. Score! So I bought the ebook version and fired up my Kindle only to find that in terms of locations, it’s the longest book I have on my Kindle (as of right now, I have 48 books on there). This is when I started looking around at Locations and trying to figure out what they meant. As it turns out, while the Location numbers provide a consistent reference within the book, they’re not at all consistent across books.

I did a quick spot check of 6 books on my Kindle, looking at total Location numbers, total page numbers (resorting to print version when not estimated by Amazon), and file size of the ebook (in KB). I also added a column for Locations per page number and Locations per KB. This is an admittedly small sample, but what I found is that there is little consistency among any of the numbers. The notion of each Location being 128 bytes of data seems useful at first, especially when you consider that the KB information is readily available, but because that includes formatting and metadata, it’s essentially meaningless. And the KB number also includes any media embedded in the book (i.e. illustrations crank up the KB, which distorts any calculations you might want to do with that data).

It turns out that The Stars My Destination will probably end up being relatively short, as the page numbers would imply. There’s a fair amount of formatting within the book (which, by the way, doesn’t look so hot on the Kindle), and doing spot checks of how many Locations I pass when cycling to the next screen, it appears that this particular ebook is going at a rate of about 12 Locations per cycle, while my previous book was going at a rate of around 5 or 6 per cycle. In other words, while the total Locations for The Stars My Destination were nearly twice what they were for my previously read book, I’m also cycling through Locations at double the rate. Meaning that, basically, this is the same length as my previous book.

Various attempts have been made to convert Location numbers to page numbers, with low degrees of success. This is due to the generally elastic nature of a page, combined with the inconsistent size of Locations. For most books, it seems like dividing the Location numbers by anywhere from 12-16 (the linked post posits dividing by 16.69, but the books I checked mostly ranged from 12-16) will get you a somewhat accurate page number count that is marginally consistent with print editions. Of course, for The Stars My Destination, that won’t work at all. For that book, I have to divide by 40.86 to get close to the page number.

Why is this important at all? Well, there’s clearly an issue with ebooks in academia, because citations are so important for that sort of work. Citing a location won’t get readers of a paper anywhere close to a page number in a print edition (whereas, even using differing editions, you can usually track down the quote relatively easily if a page number is referenced). On a personal level, I enjoy reading ebooks, but one of the things I miss is the easy and instinctual notion of figuring out how long a book will take to read just by looking at it. Last year, I was shooting for reading quantity, so I wanted to tackle shorter books (this year, I’m trying not to pay attention to length as much and will be tackling a bunch of large, forbidding tomes, but that’s a topic for another post)… but there really wasn’t an easily accessible way to gauge the length. As we’ve discovered, both page numbers and Location numbers are inconsistent. In general, the larger the number, the longer the book, but as we’ve seen, that can be misleading in certain edge cases.

So what is the solution here? Well, we’ve managed to work with variable page numbers for thousands of years, so maybe no solution is really needed. A lot of newer ebooks even contain page numbers (despite the variation in display), so if we can find a way to make that more consistent, that might help make things a little better. But the ultimate solution would be to use something like Word Count. That’s a number that might not be useful in the midst of reading a book, but if you’re really looking to determine the actual length of the book, Word Count appears to be the best available measurement. It would also be quite easily calculated for ebooks. Is it perfect? Probably not, but it’s better than page numbers or location numbers.

In the end, I enjoy using my Kindle to read books, but I wish they’d get on the ball with this sort of stuff. If you’re still reading this (Kudos to you) and want to read some more babbling about ebooks and where I think they should be going, check out my initial thoughts and my ideas for additional metadata and the gamification of reading. The notion of ereaders really does open up a whole new world of possibilities… it’s a shame that Amazon and other ereader companies keep their platforms so locked down and uninteresting. Of course, reading is its own reward, but I really feel like there’s a lot more we can be doing with our ereader software and hardware.

Web browsers I have known, 1996-2012

Jason Kottke recently recapped all of the browsers he used as his default for the past 18 years. It sounded like fun, so I’m going to shamelessly steal the idea and list out my default browsers for the past 16 years (prior to 1996, I was stuck in the dark ages of dialup AOL – but once I went away to college and discovered the joys of T1/T3 connections, my browsing career started in earnest, so that’s when I’m starting this list).

  • 1996Netscape Navigator 3 – This was pretty much the uncontested king of browsers at the time, but it’s reign would be short. I had a copy of IE3 (I think?) on my computer too, but I almost never used it…
  • 1997-1998Netscape Communicator 4 – Basically Netscape Navigator 4, but the Communicator was a whole suite of applications which appealed to me at the time. I used it for email and even to start playing with some HTML editing (though I would eventually abandon everything but the browser from this suite). IE4 did come out sometime in this timeframe and I used it occasionally, but I think I stuck with NN4 way longer than I probably should have.
  • 1999-2000Internet Explorer 5 – With the release of IE5 and the increasing issues surrounding NN4, I finally jumped ship to Microsoft. I was never particularly comfortable with IE though, and so I was constantly looking for alternatives and trying new things. I believe early builds of Mozilla were available, and I kept downloading the updates in the hopes that it would allow me to dispense with IE, but it was still early in the process for Mozilla. This was also my first exposure to Opera, which at the time wasn’t that remarkable (we’re talking version 3.5 – 4 here) except that, as usual, they were ahead of the curve on tabbed browsing (a mixed blessing, as monitor resolutions at the time weren’t great). Opera was also something you had to pay for at the time, and a lot of sites didn’t work in Opera. This would all change at the end of 2000, though, with the release of Opera 5.
  • 2001Opera 5 – This browser changed everything for me. It was the first “free” Opera browser available, although the free version was ad-supported (quite annoying, but it was easy enough to get rid of the ads). The thing that was revolutionary about this browser, though, was mouse gestures. It was such a useful feature, and Opera’s implementation was (and quite frankly, still is) the best, smoothest implementation of the functionality I’ve seen. At this point, I was working at a website, so for work, I was still using IE5 and IE6 as my primary browser (because at the time, they represented something like 85-90% of the traffic to our site). I was also still experimenting with the various Mozilla-based browsers at the time as well, but Opera was my default for personal browsing. Of course, no one codes for Opera, so there were plenty of sites that I’d have to fire up IE for (this has always been an issue with Opera)
  • 2002-2006Opera 6/7/8/9 – I pretty much kept rolling with Opera during this timeframe. Again, for my professional use, IE6/IE7 was still a must, but in 2004, Firefox 1.0 launched, so that added another variable to the mix. I wasn’t completely won over by the initial Firefox offerings, but it was the first new browser in a long time that I thought had a bright future. It also provided a credible alternative for when Opera crapped out on a weirdly coded page. However, as web standards started to actually be implemented, Opera’s issues became fewer as time went on…
  • 2007Firefox 2/Opera 9 – It was around this time that Firefox started to really assert itself in my personal and professional usage. I still used Opera a lot for personal usage, but for professional purposes, Firefox was a simple must. At the time, I was embroiled in a year-long site redesign project for my company, and I was doing a ton of HTML/CSS/JavaScript development… Firefox was an indispensable tool at the time, mostly due to extensions like Firebug and the Web-Developer Toolbar. I suppose I should note that Safari first came to my attention at this point, mostly for troubleshooting purposes. I freakin’ hate that browser.
  • 2008-2011Firefox/Opera – After 2007, there was a slow, inexorable drive towards Firefox. Opera kept things interesting with a feature they call Speed Dial (and quite frankly, I like that feature much better than what Chrome and recent versions of Firefox have implemented), but the robust and mature list of extensions for Firefox were really difficult to compete with, especially when I was trying to get stuff done. Chrome also started to gain popularity in this timeframe, but while I loved how well it loaded Ajax and other JavaScript-heavy features, I could never really get comfortable with the interface. Firefox still afforded more control, and Opera’s experience was generally better.
  • 2012/PresentFirefox – Well, I think it’s pretty telling that I’m composing this post on Firefox. That being said, I still use Opera for simple browsing purposes semi-frequently. Indeed, I usually have both browsers open at all times on my personal computer. At work, I’m primarily using Firefox, but I’m still forced to use IE8, as our customers tend to still prefer IE (though the percentage is much less these days). I still avoid Safari like the plague (though I do sometimes need to troubleshoot and I suppose I do use Mobile Safari on my phone). I think I do need to give Chrome a closer look, as it’s definitely more attractive these days…

Well, there you have it. I do wonder if I’ll ever get over my stubborn love for Opera, a browser that almost no one but me uses. They really do manage to keep up with the times, and have even somewhat recently allowed Firefox and Chrome style extensions, though I think it’s a little too late for them. FF and Chrome just have a more robust community surrounding their development than Opera. I feel like it’s a browser fated to die at some point, but I’ll probably continue to use it until it does… So what browser do you use?

More Disgruntled, Freakish Reflections on ebooks and Readers

While I have some pet peeves with the Kindle, I’ve mostly found it to be a good experience. That being said, there are some things I’d love to see in the future. These aren’t really complaints, as some of this stuff isn’t yet available, but there are a few opportunities afforded by the electronic nature of eBooks that would make the whole process better.

  • The Display – The electronic ink display that the basic Kindles use is fantastic… for reading text. Once you get beyond simple text, things are a little less fantastic. Things like diagrams, artwork, and photography aren’t well represented in e-ink, and even in color readers (like the iPad or Kindle Fire), there are issues with resolution and formatting that often show up in eBooks. Much of this comes down to technology and cost, both of which are improving quickly. Once stuff like IMOD displays start to deliver on their promise (low power consumption, full color, readable in sunlight, easy on the eyes, capable of supporting video, etc…), we should see a new breed of reader.

    I’m not entirely sure how well this type of display will work, at least initially. For instance, how will it compare to the iPad 3’s display? What’s the resolution like? How much will it cost? And so on. Current implementations aren’t full color, and I suspect that future iterations will go through a phase where the tech isn’t quite there yet… but I think it will be good enough to move forward. I think Amazon will most certainly jump on this technology when it becomes feasible (both from a technical and cost perspective). I’m not sure if Apple would switch though. I feel like they’d want a much more robust and established display before they committed.

  • General Metrics and Metadata – While everyone would appreciate improvements in device displays, I’m not sure how important this would be. Maybe it’s just me, but I’d love to see a lot more in the way of metadata and flexibility, both about the book and about device usage. With respect to the book itself, this gets to the whole page number issue I was whinging about in my previous post, but it’s more than that. I’d love to see a statistical analysis of what I’m reading, on both individual and collective levels.

    I’m not entirely sure what this looks like, but it doesn’t need to be rocket science. Simple Flesch-Kincaid grades seems like an easy enough place to start, and it would be pretty simple to implement. Calculating such things for my entire library (or a subset of my library), or ranking my library by grade (or similar sorting methods) would be interesting. I don’t know that this would provide a huge amount of value, but I would personally find it very illuminating and fun to play around with… and it would be very easy to implement. Individual works wouldn’t even require any processing power on the reader, it could be part of the download. Doing calculations of your collective library might be a little more complicated, but even that could probably be done in the cloud.

    Other metadata would also be interesting to view. For example, Goodreads will graph your recently read books by year of publication – a lot of analysis could be done about your collection (or a sub-grouping of your collection) of books along those lines. Groupings by decade or genre or reading level, all would be very interesting to know.

  • Personal Metrics and Metadata – Basically, I’d like to have a way to track my reading speed. For whatever reason, this is something I’m always trying to figure out for myself. I’ve never gone through the process of actually recording my reading habits and speeds because it would be tedious and manual and maybe not even all that accurate. But now that I’m reading books in an electronic format, there’s no reason why the reader couldn’t keep track of what I’m reading, when I’m reading, and how fast I’m reading. My anecdotal experience suggests that I read anywhere from 20-50 pages an hour, depending mostly on the book. As mentioned in the previous post, a lot of this has to do with the arbitrary nature of page numbers, so perhaps standardizing to a better metric (words per minute or something like that) would normalize my reading speed.

    Knowing my reading speed and graphing changes over time could be illuminating. I’ve played around a bit with speed reading software, and the results are interesting, but not drastic. In any case, one thing that would be really interesting to know when reading a book would be how much time you have left before you finish. Instead of having 200 pages, maybe you have 8 hours of reading time left.

    Combining my personal data with the general data could also yield some interesting results. Maybe I read trashy SF written before 1970 much faster than more contemporary literary fiction. Maybe I read long books faster than short books. There are a lot of possibilities here.

    There are a few catches to this whole personal metrics thing though. You’d need a way to account for breaks and interruptions. I might spend three hours reading tonight, but I’m sure I’ll take a break to get a glass of water or answer a phone call, etc… There’s not really an easy way around this, though there could be mitigating factors like when the reader goes to sleep mode or something like that. Another problem is that one device can be used by multiple people, which would require some sort of profile system. That might be fine, but it also adds a layer of complexity to the interface that I’m sure most companies would like to avoid. The biggest and most concerning potential issue is that of privacy. I’d love to see this information about myself, but would I want Amazon to have access to it? On the other hand, being able to aggregate data from all Kindles might prove interesting in its own right. Things like average reading speed, number of books read in a year, and so on. All interesting and useful info.

    This would require an openness and flexibility that Amazon has not yet demonstrated. It’s encouraging that the Kindle Fire runs a flavor of Android (an open source OS), but on the other hand, it’s a forked version that I’m sure isn’t as free (as in speech) as I’d like (and from what I know, the Fire is partially limited by its hardware). Expecting comprehensive privacy controls from Amazon seems naive.

    I’d like to think that these metrics would be desirable to a large audience of readers, but I really have no inclination what the mass market appeal would be. It’s something I’d actually like to see in a lot of other places too. Video games, for instance, provide a lot of opportunity for statistics, and some games provide a huge amount of data on your gaming habits (be it online or in a single player mode). Heck, half the fun of sports games (or sports in general) is tracking the progress of your players (particularly prospects). Other games provide a lack of depth that is most baffling. People should be playing meta-games like Fantasy Baseball, but with MLB The Show providing the data instead of real life.

  • The Gamification of Reading – Much of the above wanking about metrics could probably be summarized as a way to make reading a game. The metrics mentioned above readily lend themselves to point scores, social-app-like badges, and leaderboards. I don’t know that this would necessarily be a good thing, but it could make for an intriguing system. There’s an interesting psychology at work in systems like this, and I’d be curious to see if someone like Amazon could make reading more addictive. Assuming most people don’t try to abuse the system (though there will always be a cohort that will attempt to exploit stuff like this), it could ultimately lead to beneficial effects for individuals who “play” the game competitively with their friends. Again, this isn’t necessarily a good thing. Perhaps the gamification of reading will lead to a sacrifice of comprehension in the name of speed, or other mitigating effects. Still, it would be nice to see the “gamification of everything” used for something other than a way for companies to trick customers into buying their products.

As previously mentioned, the need for improved displays is a given (and not just for ereaders). But assuming these nutty metrics (and the gamification of reading) are an appealing concept, I’d like to think that it would provide an opening for someone to challenge Amazon in the market. An open, flexible device using a non-DRMed format and tied to a common store would be very nice. Throw in some game elements, add a great display, and you’ve got something close to my ideal reader. Unfortunately, it doesn’t seem like we’re all that close just yet. Maybe in 5-10 years? Seems possible, but it’s probably more likely that Amazon will continue its dominance.

Zemanta

Last week, I looked at commonplace books and various implementation solutions. Ideally, I wanted something open and flexible that would also provide some degree of analysis in addition to the simple data aggregation most tools provide. I wanted something that would take into account a wide variety of sources in addition to my own writing (on this blog, for instance). Most tools provide a search capability of some kind, but I was hoping for something more advanced. Something that would make connections between data, or find similarities with something I’m currently writing.

At a first glance, Zemanta seemed like a promising candidate. It’s a “content suggestion engine” specifically built for blogging and it comes pre-installed on a lot of blogging software (including Movable Type). I just had to activate it, which was pretty simple. Theoretically, it continually scans a post in progress (like this one) and provides content recommendations, ranging from simple text links defining key concepts (i.e. links to Wikipedia, IMDB, Amazon, etc…), to imagery (much of which seems to be integrated with Flickr and Wikipedia), to recommended blog posts from other folks’ blogs. One of the things I thought was really neat was that I could input my own blogs, which would then give me more personalized recommendations.

Unfortunately, results so far have been mixed. There are some things I really like about Zemanta, but it’s pretty clearly not the solution I’m looking for. Some assorted thoughts:

  • Zemanta will only work when using the WYSIWYG Rich Text editor, which turns out to be a huge pain in the arse.  I’m sure lots of people are probably fine with that, but I’ve been editing my blog posts in straight HTML for far too long. I suppose this is more of a hangup on my end than a problem with Zemanta, but it’s definitely something I find annoying.  When I write a post in WYSIWYG format, I invariably switch it back to no formatting and jump through a bunch of hoops getting the post to look like what I want.
  • The recommended posts haven’t been very useful so far. Some of the external choices are interesting, but so far, nothing has really helped me in writing my posts. I was really hoping that loading my blog into Zemanta would add a lot of value, but it turns out that Zemanta only really scanned my recent posts, and it sorta recommended most of them, which doesn’t really help me that much.  I know what I’ve written recently, what I was hoping for was that Zemanta would be able to point out some post I wrote in 2005 along similar lines (In my previous post on Taxonomy Platforms, I specifically referenced the titles of some of my old blog posts, but since they were old, Zemanta didn’t find them and recommend them.  Even more annoying, when writing this post, the Taxonomy Platforms post wasn’t one of the recommended articles despite my specifically mentioning it. Update: It has it now, but it didn’t seem to appear until after I’d already gone through the trouble of linking it…) It appears that Zemanta is basing all of this on my RSS feed, which makes sense, but I wish there was a way to upload my full archives, as that might make this tool a little more powerful…
  • The recommendations seem to be based on a relatively simplistic algorithm. A good search engine will index data and learn associations between individual words by tracking their frequency and how close they are to other words.  Zemanta doesn’t seem to do that.  In my previous post, I referenced famous beer author Michael Jackson. What did Zemanta recommend?  Lots of pictures and articles about the musician, nothing about the beer journalist. I don’t know if I’m expecting too much out of the system, but it would be nice if the software would pick up on the fact that this guy’s name was showing up near lots of beer talk, with nary a reference to music. It’s probably too much to hope that my specifically calling out that I was talking about “the beer critic, not the pop star” would influence the system (and indeed, my reference to “pop star” may have influenced the recommendations, despite the fact that I was trying to negate that).
  • The “In-Text Links”, on the other hand, seem to come in quite handy. I actually leveraged several of them in my past few posts, and they were very easy to use. Indeed, I particularly appreciated their integration with Amazon, where I could enter my associates ID, and the links that were inserted were automatically generated with my ID. This is normally a pretty intensive process involving multiple steps that has been simplified down to the press of a button.  Very well done, and most of the suggestions there were very relevant.

I will probably continue to play with Zemanta, but I suspect it will be something that doesn’t last much longer. It provides some value, but it’s ultimately not as convenient as I’d like, and it’s analysis and recommendation functions don’t seem as useful as I’d like.

I’ve also been playing around with Evernote more and more, and I feel like that could be a useful tool, despite the fact that it doesn’t really offer any sort of analysis (though it does have a simple search function). There’s at least one third party, though, that seems to be positioning itself as an analysis tool that will integrate with Evernote.  That tool is called Topicmarks.  Unfortunately, I seem to be having some issues integrating my Evernote data with that service. At this rate, I don’t know that I’ll find a great tool for what I want, but it’s an interesting subject, and I’m guessing it will be something that will become more and more important as time goes on. We’re living in the Information Age, it seems only fair that our aggregation and analysis tools get more sophisticated.

Enhanced by Zemanta

Commonplacing

During the Enlightenment, most intellectuals kept what’s called a Commonplace Book. Basically, folks like John Locke or Mark Twain would curate transcriptions of interesting quotes from their readings. It was a personalized record of interesting ideas that the author encountered. When I first heard about the concept, I immediately started thinking of how I could implement one… which is when I realized that I’ve actually been keeping one, more or less, for the past decade or so on this blog. It’s not very organized, though, and it’s something that’s been banging around in my head for the better part of the last year or so.

Locke was a big fan of Commonplace Books, and he spent years developing an intricate system for indexing his books’ content. It was, of course, a ridiculous and painstaking process, but it worked. Fortunately for us, this is exactly the sort of thing that computer systems excel at, right? The reason I’m writing this post is a small confluence of events that has lead me to consider creating a more formal Commonplace Book. Despite my earlier musing on the subject, this blog doesn’t really count. It’s not really organized correctly, and I don’t publish all the interesting quotes that I find. Even if I did, it’s not really in a format that would do me much good. So I’d need to devise another plan.

Why do I need a plan at all? What’s the benefit of a commonplace book? Well, I’ve been reading Steven Johnson’s book Where Good Ideas Come From: The Natural History of Innovation and he mentions how he uses a computerized version of the commonplace book:

For more than a decade now, I have been curating a private digital archive of quotes that I’ve found intriguing, my twenty-first century version of the commonplace book. … I keep all these quotes in a database using a program called DEVONthink, where I also store my own writing: chapters, essays, blog posts, notes. By combining my own words with passages from other sources, the collection becomes something more than just a file storage system. It becomes a digital extension of my imperfect memory, an archive of all my old ideas, and the ideas that have influenced me.

This DEVONthink software certainly sounds useful. It’s apparently got this fancy AI that will generate semantic connections between quotes and what you’re writing. It’s advanced enough that many of those connections seem to be subtle and “lyrical”, finding connections you didn’t know you were looking for. It sounds perfect except for the fact that it only runs on Mac OSX. Drats. It’s worth keeping in mind in case I ever do make the transition from PC to Mac, but it seems like lunacy to do so just to use this application (which, for all I know, will be useless to me).

As sheer happenstance, I’ve also been playing around with Pinterest lately, and it occurs to me that it’s a sort of commonplace book, albeit one with more of a narrow focus on images and video (and recipes?) than quotes. There are actually quite a few sites like that. I’ve been curating a large selection of links on Delicious for years now (1600+ links on my account). Steven Johnson himself has recently contributed to a new web startup called Findings, which is primarily concerned with book quotes. All of this seems rather limiting, and quite frankly, I don’t want to be using 7 completely different tools to do the same thing, but for different types of media.

I also took a look at Tumblr again, this time evaluating it from a commonplacing perspective. There are some really nice things about the interface and the ease with which you can curate your collection of media. The problem, though, is that their archiving system is even more useless than most blog software. It’s not quite the hell that is Twitter archives, but that’s a pretty low bar. Also, as near as I can tell, the data is locked up on their server, which means that even if I could find some sort of indexing and analysis tool to run through my data, I won’t really be able to do so (Update: apparently Tumblr does have a backup tool, but only for use with OSX. Again!? What is it with you people? This is the internet, right? How hard is it to make this stuff open?)

Evernote shows a lot of promise and probably warrants further examination. It seems to be the go-to alternative for lots of researchers and writers. It’s got a nice cloud implementation with a robust desktop client and the ability to export data as I see fit. I’m not sure if its search will be as sophisticated as what I ultimately want, but it could be an interesting tool.

Ultimately, I’m not sure the tool I’m looking for exists. DEVONthink sounds pretty close, but it’s hard to tell how it will work without actually using the damn thing. The ideal would be a system where you can easily maintain a whole slew of data and metadata, to the point where I could be writing something (say a blog post or a requirements document for my job) and the tool would suggest relevant quotes/posts based on what I’m writing. This would probably be difficult to accmomplish in real-time, but a “Find related content” feature would still be pretty awesome. Anyone know of any alternatives?

Enhanced by ZemantaUpdate: Zemanta! I completely forgot about this. It comes installed by default with my blogging software, but I had turned it off a while ago because it took forever to load and was never really that useful. It’s basically a content recommendation engine, pulling content from lots of internet sources (notably Wikipedia, Amazon, Flickr and IMDB). It’s also grown considerably in the time since I’d last used it, and it now features a truckload of customization options, including the ability to separate general content recommendations from your own, personally curated sources. So far, I’ve only connected my two blogs to the software, but it would be interesting if I could integrate Zemanta with Evernote, Delicious, etc… I have no idea how great the recommendations will be (or how far back it will look on my blogs), but this could be exactly what I was looking for. Even if integration with other services isn’t working, I could probably create myself another blog just for quotes, and then use that blog with Zemanta. I’ll have to play around with this some more, but I’m intrigued by the possibilities

SOPA Blues

I was going to write the annual arbitrary movie awards tonight, but since the web has apparently gone on strike, I figured I’d spend a little time talking about that instead. Many sites, including the likes of Wikipedia and Reddit, have instituted a complete blackout as part of a protest against two ill-conceived pieces of censorship legislation currently being considered by the U.S. Congress (these laws are called the Stop Online Piracy Act and Protect Intellectual Property Act, henceforth to be referred to as SOPA and PIPA). I can’t even begin to pretend that blacking out my humble little site would accomplish anything, but since a lot of my personal and professional livelihood depends on the internet, I suppose I can’t ignore this either.

For the uninitiated, if the bills known as SOPA and PIPA become law, many websites could be taken offline involuntarily, without warning, and without due process of law, based on little more than an alleged copyright owner’s unproven and uncontested allegations of infringement1. The reason Wikipedia is blacked out today is that they depend solely on user-contributed content, which means they would be a ripe target for overzealous copyright holders. Sites like Google haven’t blacked themselves out, but have staged a bit of a protest as well, because under the provisions of the bill, even just linking to a site that infringes upon copyright is grounds for action (and thus search engines have a vested interest in defeating these bills). You could argue that these bills are well intentioned, and from what I can tell, their original purpose seemed to be more about foreign websites and DNS, but the road to hell is paved with good intentions and as written, these bills are completely absurd.

Lots of other sites have been registering their feelings on the matter. ArsTechnica has been posting up a storm. Shamus has a good post on the subject which is followed by a lively comment thread. But I think Aziz hits the nail on the head:

Looks like the DNS provisions in SOPA are getting pulled, and the House is delaying action on the bill until February, so it’s gratifying to see that the activism had an effect. However, that activism would have been put to better use to educate people about why DRM is harmful, why piracy should be fought not with law but with smarter pro-consumer marketing by content owners (lowered prices, more options for digital distribution, removal of DRM, fair use, and ubiquitous time-shifting). Look at the ridiculous limitations on Hulu Plus – even if you’re a paid subscriber, some shows won’t air episodes until the week after, old episodes are not always available, some episodes can only be watched on the computer and are restricted from mobile devices. These are utterly arbitrary limitations on watching content that just drive people into the pirates’ arms.

I may disagree with some of the other things in Aziz’s post, but the above paragraph is important, and for some reason, people aren’t talking about this aspect of the story. Sure, some folks are disputing the numbers, but few are pointing out the things that IP owners could be doing instead of legislation. For my money, the most important thing that IP owners have forgotten is convenience. Aziz points out Hulu, which is one of the worst services I’ve ever seen in terms of being convenient or even just intuitive to customers. I understand that piracy is frustrating for content owners and artists, but this is not the way to fight piracy. It might be disheartening to acknowledge that piracy will always exist, but it probably will, so we’re going to have to figure out a way to deal with it. The one thing we’ve seen work is convenience. Despite the fact that iTunes had DRM, it was loose enough and convenient enough that it became a massive success (it now doesn’t have DRM, which is even better). People want to spend money on this stuff, but more often than not, content owners are making it harder on the paying customer than on the pirate. SOPA/PIPA is just the latest example of this sort of thing.

I’ve already written about my thoughts on Intellectual Property, Copyright and DRM, so I encourage you to check that out. And if you’re so inclined, you can find out what senators and representatives are supporting these bills, and throw them out in November (or in a few years, if need be). I also try to support companies or individuals that put out DRM-free content (for example, Louis CK’s latest concert video has been made available, DRM free, and has apparently been a success).

Intellectual Property and Copyright is a big subject, and I have to be honest in that I don’t have all the answers. But the way it works right now just doesn’t seem right. A copyrighted work released just before I was born (i.e. Star Wars) probably won’t enter the public domain until after I’m dead (I’m generally an optimistic guy, so I won’t complain if I do make it to 2072, but still). Both protection and expiration are important parts of the way copyright works in the U.S. It’s a balancing act, to be sure, but I think the pendulum has swung too far in one direction. Maybe it’s time we swing it back. Now if you’ll excuse me, I’m going to participate in a different kind of blackout to protest SOPA.

1 – Thanks to James for the concise description. There are lots of much longer longer and better sourced descriptions of the shortcomings of this bill and the issues surrounding it, so I won’t belabor the point here.