Dilettante's Ball

Tuesday, January 17, 2006 Moving this party elsewhere

Although I've never actually had too much of a problem with Blogger (except for a hideous spam invasion during Access 2005), I've decided to move this soapbox over to WordPress on the Code4lib server.

So, from now on, I'll be at http://dilettantes.code4lib.org/.

All (none) of you that read this, take note.

Sunday, December 25, 2005 Time to start polishing off my helmet for the playoff run

No, that's not a euphemism for anything.

Pittsburgh's dominance in the first quarter was so complete that it produced some almost unbelievable stats. In the opening period, the Steelers outgained the Browns 196-1; had 162 passing yards to minus-2 for Cleveland; and led in first downs 9-0.

(from espn.com)

Bring it on, y'all.

Saturday, December 24, 2005 To Free or not to Free?

Last week I was reading Dorothea Salo's posting about OCLC's report on library branding, and it got me to thinking about this a bit.

In particular, I thought about her comment:

I would want to trial-balloon a Deep Web play in my next survey, if I were OCLC. I would want to know how many people have heard of the Deep Web, what they think is in it, whether they think information useful to them is in it, whether they would access it through their libraries if they could. This moves away from free-vs.-paid and toward exclusive-vs.-nonexclusive. People like the idea of being privileged. If the library is a place that privileges them, I think theyÂll go for it. Special-collections and archives get a boost in this campaign, too; access to rare or unique information is the ultimate in privilege.

I see a tension here. While many push for an "information wants to be free" model, this would, inherently, devalue the role of the organization that makes it free. In fact, to take her quote even farther, this is especially true of special collections and archives.

Allow me to explain.

Users aren't particularly discriminatory as to where they get their information. Our students or faculty don't really care if the article or research they are looking at comes to them courtesy of Georgia Tech or if it was found in Citeseer. They are more likely to say they found something in "Google Scholar" vs. the actual institutional repository for the school they are actually getting it from. The more open the information is, the less exclusive our collection becomes and the less leverage and value we hold (at least conforming to our traditional model).

With special collections, this is especially true. Special collections are "special" because they are "unique". Libraries spend a lot of money curating these collections. Historically, this has enjoyed a fairly good ROI because it distinguishes the library (and therefore, larger institution) as something "special" itself. These materials are exclusive to that particular institution and give value to the collection.

However, there is pressure to digitize and publish these collections. If all of these collections are digitized and published, we have a bunch of silos strewn about the internet requiring the user know about find them to use them. Since it is a lot of work to digitize and mark up these collections, there's not a terribly good return for the effort.

In an effort to improve findability, the collections need to be aggregated with other similar collections to increase their exposure. However, the result of this is improved awareness and accessibility, but at the same time it dilutes exclusiveness and branding. Whoever provides the aggregation/discovery service gets the benefit of the content, so some of the content providers (inherently) must lose.

So, what does this mean? It should not prevent us from making our collections more open and accessible. That runs counter to our mission. However, we need to start thinking of ways to generate value when our information is free. There are plenty of ways of doing that, such as tailoring services that aggregates the "free" information for our communities, or building systems that can use the information in unique and specialized ways.

There is a large cultural shift that needs to take place to realize this future, however. We still place a lot of emphasis (way too much, really) on the size and uniqueness of our collections. With a world of information available (or a lot of it, at any rate), it's not so much an issue of how many books you have in your building, but how you are able harness all the good data and present it in useful and meaningful ways. There aren't easy metrics to this. ARL just can't count book spines and annual budget. Serious consideration needs to be paid to what and how a library is utilizing the collection outside their walls.

Wednesday, December 21, 2005 Rails Resolver Router: on Rails!

Since my foray into python a couple of months ago, I've been enjoying branching out into new languages.

I had pitched the concept of a link resolver router for the state universal catalog to a committee I sit on (this group talks about SFX links in the 856 tag and whatnot). The problem with making links for a publicly available resource point to your institutional resolver is just that. It's pointing your your institutional resolver, despite the fact that your audience could be coming from anywhere. This plays out even greater in a venue such as a universal catalog, since there's not really a "home institution" to point a resolver link, anyway. OCLC and UKOLN both have resolver routers, and OCLC's certainly is an option, but I don't feel comfortable with the possibility that all of our member institutions might have to pay for the service (in the future). My other problem with OCLC's service is that you can only belong to one institution and I have never liked that (especially as more and more institutions have link resolvers).

So, in this committee I mentioned that it would be pretty simple to make a router, and since I was having trouble getting people to understand what exactly I was talking about, I decided to make a proof-of-concept. And, since I was making a proof-of-concept, I thought it'd be fun to try it in Ruby on Rails.

Now, a resolver router is about the simplest concept possible. It doesn't really do anything but take requests and pass them off to the appropriate resolver. It's a resolver abstraction layer, if you will. I thought this was a nice, small project to try to cut my Ruby teeth on. There's a little bit a database, a little bit of AJAX. It's also useful, unlike making a cookbook from a tutorial or something.

It took about three days to make this. After you pick your resolver (Choose a couple! Add your own!), you'll be taken to a page to choose between your various communities for appropriate copy.

I chose this particular citation because it shows the very huge limitation of link resolvers (if you choose Georgia Tech's resolver and Emory's resolver, for instance); despite the fact that this is freely available, it does not appear in my resolver. That's not really the use case I envision, though. I am thinking more of a case like my co-worker, Heather, who should have access to Georgia Tech's collection, Florida State's resources (she's in grad school there), Richland County Public Library (she lives in Columbia, SC), and the University of South Carolina (where her husband is a librarian). The resolver router alleviates the need to search for a given citation in the various communities (indeed to even have to think of or know where to look within those communities).

Sometime later this winter, I'll have an even better use case. I'll keep that under wraps for now.

Now, my impression of Ruby on Rails... For a project like this, it is absolutely amazing. I cannot believe I was able to learn the language from scratch and implement something that works (with this amount of functionality) in such a short amount of time. By bypassing the need to create the "framework" for the application, you can just dive into implementation.

In fact, I think my time to implementation would have been even faster if the number of resources/tutorials out there didn't suck out loud. Most references point to these tutorials to get you started, but they really aren't terribly helpful. They explain nothing about why they are doing what they are doing in them. I found this blog posting to be infinitely more useful. Her blog in general is going in my aggregator, I think.

When it comes to learning Ruby, this is a masterful work of art... but... not terribly useful if you just want to look things up. I recommend this for that.

Anyway, I am so impressed with Ruby on Rails that I am planning on using it (currently) for "alternative opac project", which is now being code named "Communicat". More on this shortly (although I did actually develop the database schema today).

Thursday, December 01, 2005 Back to the whiteboard for inspiration

Paul Miller (of Talis) was kind enough point out that Talis already has a product named Alto.

In fact, it's their ILS.

D'oh.

Given that Talis' corporate name is "composer-based" (and, I don't know, that they came up with the name waaaaaaaaaaaaaaaaaaaay before I did) I suppose I can relinquish the name "Alto" :)

So, uh, any suggestions for an appropriate name... send 'em to me!

Wednesday, November 30, 2005 Envisioning Alto

I have mentioned here several times the "alternative to the catalog" project I am trying to implement at Tech. One of the problems that I've had is naming the project something that lets people realize what I'm talking about, without the political hairiness of saying "catalog replacement" (since that's technically not true, anyway).

In a meeting two weeks ago (about subject guides), I was drawing the concept of this project on the whiteboard of our conference room. It's been up ever since and in the middle, I had written "ALTOpac" because that was an easy way to loosely describe it in a way that the uninitiated in the room could envision where I was starting from. Sitting in another meeting today, the capitalized letters jumped out at me: ALTO. It means nothing.

And I like that. Of course it still doesn't explain what it's about. That's what subtitles are for.

Now, let me explain what the hell Alto is and what it is supposed to do.

Alto is a "community-based collection builder and search engine".

Come to think of it, that might not actually clear anything up.

Let's back up a bit, shall we?

To say searching the catalog is "searching our collection" is quite arbitrary and false. Metasearch doesn't really solve this problem, since you'd still only point the metasearch engine at certain assets and it's non-trivial to make relationships between assets. Metasearch is part of the solution, but hardly the panacea.

Again, our "collection" is an ambiguous term and shouldn't be solely determined by our collection development policies/budget. It is our opinion that if something is important enough to be added to a reserves list (even a web page), it should technically be part of our collection. I would not, however, say it should be cataloged (and that's why this isn't a catalog replacement project, see?). If an item is even bookmarked (via a local social bookmarking service, such as unalog or connotea) it should then become part of our collection. A 1927 engineering textbook from Purdue's catalog? Index it! If a member of our community finds it important enough to want to come back to and share with a group, it's important enough for us to aggregate into our "collection". Relevance comes later (keep reading, if you're interested).

There are also relationships that our community (for the sake of argument, let's start with "Georgia Tech") builds that are highly relevant for finding connections between disparate "things". So, the items put on reserve for a particular course have an umbrella of commonality between them that should be utilized for anyone that runs across any of these items. The relevance ranking should be even greater for a user that happens to be a member of the group in question (for instance, is enrolled in the class).

If Alto has a citation management-esque feature in it, users can very specifically group relevant resources together based on a project. Resources can be anything: books, websites, articles, searches, chat transcripts, trails, you name it.

And all of this should feed the "relevance beast", as it were.

So that's some background. Given that we'll have some formal subject classifications for these objects (from the OPAC or from metasearch or whatever), we should be able to bridge the formal to folksonomy to make sense of how people have classified their saved things.

We can then begin to cluster search results. Format, subject, concept, group, policies... All of these can be browsed after the search begins. The search results will be a combination of metadata objects and library content. If some of the results appear in a given "subject guide", the guide will a suggested resource (and will, in turn, push some resources into the result set).

The goal is to open the silos we have created around our resources/services. It would break down the ambiguity between "collections", "services" and "policies" since they're all interrelated.

How do we plan to do this? Glad you asked (you're still reading, right?)!

We've exported all of the bib records from our catalog. The plan is to use METS as our wrapper around MODS. We'll then harvest our institutional repository and index our website. That's a pretty good base to start with. All of this is stored in a dbXML database and indexed with Lucene.

If users want to harvest a collection from citeseer or OAIster, that will be available and will become part of our collection. Annotations, links to reviews, links to content to index will all be made available.

I'm leaving a lot out and glossing some of this over... but it starts to put the idea on "paper" for me to come back later.

I sound my barbaric YAWP over the walls of my cubicle.

I woke up at 4:30 this morning.

One could easily write this off to a variety of stresses: an article I have no business writing; a conference I have no business helping organize; a huge project that I am having problems getting started on; a house that I apparently haven't sunk enough money in to move into yet; a house that I can't drag far enough away from the railroad tracks to sell; the usual burden that is "the holidays"... sure one could try to pin it on any of those.

But I woke up thinking about (meaning that I was dreaming about) something I read recently from Richard Wallis on Panlibus, Talis' 'blog:

Well yes, the current generation of ILS systems were not built with Web Services everywhere. To put it bluntly, who will pay the salaries of the developers who are going to develop these services for you to consume?

Strange thing to dream about, I know. However, when I think about this one quote, it pisses me off to no end. The University System of Georgia pays Endeavor over $500,000 a year for the privilege of running an ILS that they haven't invested any innovation in years. Granted, we are 35 libraries, so it's not like we're all paying that ransom, but, on the flip side, we're probably also getting a discount for the very fact that we are so large.

Then, to think we are but a percentage of Endeavor's total customer base...

WHERE IS THAT MONEY GOING, RICHARD?

Of course, I realize that Talis is in no way related to Endeavor, but I cannot imagine their pricing is so radically different that their coffers have no shillings to pay for developers.

Besides, they must already have developers, right? Maybe you need hire developers with vision.

So, to this argument, I call bullshit.

The other thing that struck me (again, apparently in my dream) is the apologetic tone I see quite frequently (recent example here, lots of others floating about) that shifts the blame of our stagnant, crappy Integrated Library Systems to us, the customers instead to our vendors. The argument goes that we, the libraries, have asked for the wrong things for the ILS and the poor vendors (poor, poor vendors) had their hands tied, literally tied, trying to keep up with our demands to be able to incorporate any sort of innovation in the last 15-20 years. Besides, they'd say, if they came up with something different, libraries might not want the change.

What (successful) technology company has ever relied on RFPs for their innovation? Are Google's hands tied until some customer says, "Hey, can you make a web based 'maps' site? You know what we need? A new way to do threaded email."? How about Intel? Microsoft?

No. These companies realize that they need to innovate to survive. To stagnate or half-ass is the kiss of death. See Novell. For a more dramatic example, see Apple.

No, it's time we stop taking it like abused spouses from our vendors. You know, maybe we did overcook the porkchop and maybe we do open our mouths too much, but that's no reason to have a black eye. If a handful of the better funded libraries were to help found something like the Apache Foundation for library software, our abusive husbands might find treat as partners rather than punching bags. I think I might know a good place to look for talent.

(In truth, our rottweiler woke me up, but the dream still stands).

Sunday, November 20, 2005 Library 1.7.02-4 pre 6

I really, really hate this Library 2.0 meme for a couple of reasons.

1) All of our problems will not, in fact, be solved with AJAX and web interfaces

2) In fact many of our problems cannot be solved by technology at all (try doing interesting and meaningful and different work with the current body of MARC records out there and see what I mean)

3) This quest for 2.0 would be better served if "2.0" was a milestone on the journey to "Library 4.5" -- I mean, come on folks, let's get back into innovating.

4) I think it trivializes some actually exciting and useful work that I fear will continue to fly under the radar because it's not "Web 2.0" enough.

Maybe hype is necessary to rally the troops, but I really wish vision would get more attention.

Community: