.comment-link {margin-left:.6em;}
Dilettante's Ball

About Me

Quædam cuiusdam
Lorcan Dempsey's Weblog
The Shifted Librarian
planet code4lib

Powered by Blogger

Exposing ourselves - what libraries can learn from the flasher community
Last Thursday/Friday I decided to put my development time where my mouth is and actually try to fiddle around with A9's Opensearch. Earlier in the week, I had told my boss that I wanted to do this and that I didn't think it would take much time or energy.

Her reaction was:
  1. What is Opensearch?
  2. Ok, now that I know what Opensearch is, why on earth would anybody choose to search Georgia Tech's catalog in their web search?
At the time, my argument centered around the fact that I was only using our catalog because it would be a fairly simple exercise to get working with Opensearch. If it seemed viable and simple to do, the real project would be to get our DSpace repository searchable this way. Since my proposal wasn't really gaining much momentum with her, I tried to sweeten the pot by noting that our environment was set up exactly like the Library of Congress's (SRW/U via Yazproxy to a Voyager database), and if they found it useful... well, wouldn't it make Tech look good to have the LoC using something we developed?

Ok, I realize this definitely makes me seem like sort of Iago whispering weasely ideas into Othello's head but there is some truth to it and if the end-product is a success (or even if the LoC is passingly interested in it), I think there's some "good to be gotten". Having an agenda isn't necessarily wrong. Using other people's desire to increase our profile in the community to further that agenda probably is, though. Oh well... enough of my cravenness.

So, in an effort to expand my horizons a bit and to try to make this a little more portable (you know, just in case the Library of Congress is interested...), I decided to try to develop this Opensearch to SRU thingy in python. Dan and Ed have been advocating python for a while now (as has my friend Tom... apparently python is heavily used in XBox hacks). The #code4lib sprint at ALA will probably be python-based, so I thought I better start getting familiar with the language a bit. I had dived into python a couple of months ago to try my hand at unalog development, but reality stepped in and dragged me back to PHP. I always do better if I have an actual objective, anyway.

By the time I left for a mini-vacation (ah, so therapeutic!) on Friday, I had a mostly working prototype (thanks to Dan for some python pointers). In fact, it was a completely working prototype except for the fact that it wasn't encoding xml entities (so, keep it in lower ASCII, folks!), which is, of course, less than ideal. I should be able to fix that today.

Right now the query requires either "keyword anywhere" or minimal knowledge of CQL. The nice thing about CQL is that it actually makes quite a bit of sense.
  • author=Hemingway
  • subject=biology and title="Introduction to biology"
This syntax either needs to be made apparent in the Opensearch column description (less than ideal) or it needs to be translated from however A9 would define this sort of thing.

On my train ride in this morning, I began to think about the conversation with my boss about this project again. Even if our potential user base is relatively small, isn't this exactly the sort of thing we want them to be able to do? From a search engine or any search of any sort, wouldn't we want to also be able to show relevant resources from our own collection? Yes, the user would need to add the column (strike one!) to their A9 search results (strike two! They're at Google!), but if it's easy enough to implement, why wouldn't we offer this? A metasearch would certainly be more ideal to expose our collections, but I don't have access to that right now and I haven't figured out how A9 deals with access controlled content, anyway.

One of the goals of the redesigned OPAC project is to create human parseable, crawler-friendly urls so these avenues of discovery can be opened. There is no good technical reason that web opacs place the session information in the url (my guess it is for backwards compatibility with cookieless browsers).

Wouldn't these urls make a lot more sense:
  • http://gil.gatech.edu/isbn/0632044160
  • http://gil.gatech.edu/author/Wilson, David L./title/Introduction to biology
  • http://gil.gatech.edu/issn/1465-7392
  • http://gil.gatech.edu/title/Nature cell biology
urlencoded, of course.

Things like this seem so simple for providing a little better access to our collections from the outside world.


At 1:15 AM, December 26, 2006, Anonymous Anonymous said...

That is so great!
It's also possible to run python in parallel on SMP: Parallel Python


Post a Comment

Links to this post:

Create a Link

<< Home