Dilettante's Ball: November 2005

Wednesday, November 30, 2005 Envisioning Alto

I have mentioned here several times the "alternative to the catalog" project I am trying to implement at Tech. One of the problems that I've had is naming the project something that lets people realize what I'm talking about, without the political hairiness of saying "catalog replacement" (since that's technically not true, anyway).

In a meeting two weeks ago (about subject guides), I was drawing the concept of this project on the whiteboard of our conference room. It's been up ever since and in the middle, I had written "ALTOpac" because that was an easy way to loosely describe it in a way that the uninitiated in the room could envision where I was starting from. Sitting in another meeting today, the capitalized letters jumped out at me: ALTO. It means nothing.

And I like that. Of course it still doesn't explain what it's about. That's what subtitles are for.

Now, let me explain what the hell Alto is and what it is supposed to do.

Alto is a "community-based collection builder and search engine".

Come to think of it, that might not actually clear anything up.

Let's back up a bit, shall we?

To say searching the catalog is "searching our collection" is quite arbitrary and false. Metasearch doesn't really solve this problem, since you'd still only point the metasearch engine at certain assets and it's non-trivial to make relationships between assets. Metasearch is part of the solution, but hardly the panacea.

Again, our "collection" is an ambiguous term and shouldn't be solely determined by our collection development policies/budget. It is our opinion that if something is important enough to be added to a reserves list (even a web page), it should technically be part of our collection. I would not, however, say it should be cataloged (and that's why this isn't a catalog replacement project, see?). If an item is even bookmarked (via a local social bookmarking service, such as unalog or connotea) it should then become part of our collection. A 1927 engineering textbook from Purdue's catalog? Index it! If a member of our community finds it important enough to want to come back to and share with a group, it's important enough for us to aggregate into our "collection". Relevance comes later (keep reading, if you're interested).

There are also relationships that our community (for the sake of argument, let's start with "Georgia Tech") builds that are highly relevant for finding connections between disparate "things". So, the items put on reserve for a particular course have an umbrella of commonality between them that should be utilized for anyone that runs across any of these items. The relevance ranking should be even greater for a user that happens to be a member of the group in question (for instance, is enrolled in the class).

If Alto has a citation management-esque feature in it, users can very specifically group relevant resources together based on a project. Resources can be anything: books, websites, articles, searches, chat transcripts, trails, you name it.

And all of this should feed the "relevance beast", as it were.

So that's some background. Given that we'll have some formal subject classifications for these objects (from the OPAC or from metasearch or whatever), we should be able to bridge the formal to folksonomy to make sense of how people have classified their saved things.

We can then begin to cluster search results. Format, subject, concept, group, policies... All of these can be browsed after the search begins. The search results will be a combination of metadata objects and library content. If some of the results appear in a given "subject guide", the guide will a suggested resource (and will, in turn, push some resources into the result set).

The goal is to open the silos we have created around our resources/services. It would break down the ambiguity between "collections", "services" and "policies" since they're all interrelated.

How do we plan to do this? Glad you asked (you're still reading, right?)!

We've exported all of the bib records from our catalog. The plan is to use METS as our wrapper around MODS. We'll then harvest our institutional repository and index our website. That's a pretty good base to start with. All of this is stored in a dbXML database and indexed with Lucene.

If users want to harvest a collection from citeseer or OAIster, that will be available and will become part of our collection. Annotations, links to reviews, links to content to index will all be made available.

I'm leaving a lot out and glossing some of this over... but it starts to put the idea on "paper" for me to come back later.

I sound my barbaric YAWP over the walls of my cubicle.

I woke up at 4:30 this morning.

One could easily write this off to a variety of stresses: an article I have no business writing; a conference I have no business helping organize; a huge project that I am having problems getting started on; a house that I apparently haven't sunk enough money in to move into yet; a house that I can't drag far enough away from the railroad tracks to sell; the usual burden that is "the holidays"... sure one could try to pin it on any of those.

But I woke up thinking about (meaning that I was dreaming about) something I read recently from Richard Wallis on Panlibus, Talis' 'blog:

Well yes, the current generation of ILS systems were not built with Web Services everywhere. To put it bluntly, who will pay the salaries of the developers who are going to develop these services for you to consume?

Strange thing to dream about, I know. However, when I think about this one quote, it pisses me off to no end. The University System of Georgia pays Endeavor over $500,000 a year for the privilege of running an ILS that they haven't invested any innovation in years. Granted, we are 35 libraries, so it's not like we're all paying that ransom, but, on the flip side, we're probably also getting a discount for the very fact that we are so large.

Then, to think we are but a percentage of Endeavor's total customer base...

WHERE IS THAT MONEY GOING, RICHARD?

Of course, I realize that Talis is in no way related to Endeavor, but I cannot imagine their pricing is so radically different that their coffers have no shillings to pay for developers.

Besides, they must already have developers, right? Maybe you need hire developers with vision.

So, to this argument, I call bullshit.

The other thing that struck me (again, apparently in my dream) is the apologetic tone I see quite frequently (recent example here, lots of others floating about) that shifts the blame of our stagnant, crappy Integrated Library Systems to us, the customers instead to our vendors. The argument goes that we, the libraries, have asked for the wrong things for the ILS and the poor vendors (poor, poor vendors) had their hands tied, literally tied, trying to keep up with our demands to be able to incorporate any sort of innovation in the last 15-20 years. Besides, they'd say, if they came up with something different, libraries might not want the change.

What (successful) technology company has ever relied on RFPs for their innovation? Are Google's hands tied until some customer says, "Hey, can you make a web based 'maps' site? You know what we need? A new way to do threaded email."? How about Intel? Microsoft?

No. These companies realize that they need to innovate to survive. To stagnate or half-ass is the kiss of death. See Novell. For a more dramatic example, see Apple.

No, it's time we stop taking it like abused spouses from our vendors. You know, maybe we did overcook the porkchop and maybe we do open our mouths too much, but that's no reason to have a black eye. If a handful of the better funded libraries were to help found something like the Apache Foundation for library software, our abusive husbands might find treat as partners rather than punching bags. I think I might know a good place to look for talent.

(In truth, our rottweiler woke me up, but the dream still stands).

Sunday, November 20, 2005 Library 1.7.02-4 pre 6

I really, really hate this Library 2.0 meme for a couple of reasons.

1) All of our problems will not, in fact, be solved with AJAX and web interfaces

2) In fact many of our problems cannot be solved by technology at all (try doing interesting and meaningful and different work with the current body of MARC records out there and see what I mean)

3) This quest for 2.0 would be better served if "2.0" was a milestone on the journey to "Library 4.5" -- I mean, come on folks, let's get back into innovating.

4) I think it trivializes some actually exciting and useful work that I fear will continue to fly under the radar because it's not "Web 2.0" enough.

Maybe hype is necessary to rally the troops, but I really wish vision would get more attention.

Community: