Dilettante's Ball: Librarians are arrogant asses

Thursday, August 11, 2005 Librarians are arrogant asses

Despite our waning patronage (both physically and virtually), librarians never cease their criticism of the barbarism of the unwashed masses for not adopting their love of rich metadata.

"Dumbing down the catalog"
"I don't think it's too much to ask a student to learn what the library catalog is"
"Thousands of hits"
"Did A9 even bother to look at SRW/U?"

Let's take the first (widely used) statement. A system that is able to take a natural language query and present to the user a list that contains many of the things they are looking for early in the result set is not dumb. Hemingway, Ernest is dumb. Not understanding what I, the user, mean when I type "Ernest Hemingway" is dumb. This standard is applied to librarians, why not the catalog? A librarian doesn't explicitly require the patron to know they're looking for before they will help them with a reference question, but we expect them to form a perfect boolean query to isolate that rare manuscript (acquired in 1963 and widely unheard of) that would be the "perfect compliment to their term paper".

Number two: I don't expect a student to learn how to use a sliderule, either. It's not necessary for them to know what double clutching is. It wouldn't be the end of the world if they never have seen a typewriter's correction ribbon. Technology makes awkward systems obsolete.

In regards to the "thousands of hits" meme (which Alane Wilson argued against quite convincingly), how many hits would a user get if all of our databases were searched simultaneously? What if they are getting a sufficiently smaller set of results, but it's because they're looking in the wrong place? I am seldomly unhappy with my Google results as a starting place.

Should A9 have? Does SRW/U really make any sense whatsoever to 95% of the world outside of libraries? Why doesn't the SRW/U crowd try to work with the OpenSearch community? Why? Because we say ours is better, so the other shouldn't be trifled with. To be clear, it's possible to layer OpenSearch on top of SRU; Georgia Tech does it. Is one superior to the other? SRW/U is certainly more sophisticated. Despite what you will read to the contrary, however, OpenSearch is much, much easier to implement. If you know the metadata schema of the SRW/U server, simple SRU clients are possible, but, like Z39.50 before it, there are no constraints on what you might get from an SRW/U server. OpenSearch, while limited and limiting (for certain), has a somewhat different purpose than SRW/U. SRW/U is a protocol for searching for and retrieving metadata. OpenSearch is a spec for searching for and retrieving search results. This may sound redundant, but there is a nuanced difference. No matter the OpenSearch source, the results will always look the same, so it is very simple to integrate into a display (yet not so simple to actually do anything else with the result). While SRW/U is definitely more versatile, transforming your results to OpenSearch has its advantages. But this is a hard sell to the library world, because the "metadata isn't rich enough".

It's time we stopped scorning and ignoring the outside world, because they are doing fine without us. Aaron Krowne notes that a huge amount of scholarly content is freely available, further making our position in society weaker, making it all the more important that we co-opt popular culture, rather than ridicule it. Our standards are great... now let's see how they can interface with the real world.

14 Comments:

At 1:10 AM, August 12, 2005, Anonymous said...: I've been trying to say this for quite some time. The OPAC is already dumb and it's up to the user to make up the difference. It's about making the OPAC smarter so it's not such a pain in the ass to use. Few people get it though.
At 1:17 AM, August 12, 2005, Lorcan Dempsey said...: You may already have seen http://www.loc.gov/z3950/agency/zing/forum-june05-output/notes.html for some discussion between the SRU folks and A9 re Opensearch.
At 9:07 AM, August 12, 2005, carol o said...: We largely agree, so I'm just going to pick on one of your early statements. :)

I think it's entirely possible to have rich metadata (and personally I'd love it if standard library metadata were even richer) and still have smart interfaces that aren't ridiculously demanding of users. I'm not sure you meant it this way, but the two aren't mutually exclusive. To take your particular example... Bibliographic records in the catalog could be improved sure (and FRBR will be a nice step in the right direction), but the search mechanisms, the tools we use to search the catalog should be able to intelligently re-form a query for "Ernest Hemingway" so it searches for "Hemingway, Ernest." (Either that, or when the 100/600/700 fields are indexed, names should be indexed in both direct and inverted order. Same diff. Whatever.) Ditto with cases such as "civilisation" vs. "civilization." If it's spelled one way in the catalog, it should be findable using alternate spellings as well. And actually, I think III's ILS actually has a synonym file just for this purpose.

In short: Metadata=awesome. The tools we have=suck. Adapting our tools to the outside world can only improve them and therefore=nifty. And in much fewer words=we agree.
At 9:37 AM, August 12, 2005, Ross said...: Lorcan, thanks for pointing me towards this. I hadn't seen this (I've only recently joined the ZNG list). After reading, I'm still not sure I how feel on this topic. I'm glad that it was at least mentioned. I note that since the Koha example appears on that page, GT already had their OpenSearch to SRU bridge working (and advertised).

Carol, you're right, I don't think they're mutually exclusive, either. I think there are places for a lot of metadata... there are other places where I just want to be able to consistently display search results. It's the extremely (althoughly, duly noted, not always as extreme as we'd hope) rich metadata that makes this possible. I think the point that I'm making is that we need to abstract the user from the metadata a bit. We can expose it if we want to, but it's not necessary in all cases (esp. if it points back to it).
At 12:22 AM, August 15, 2005, Anonymous said...: I concur when you allude to the complicated way the library profession has made searching, especially in the catalog. Searches should look more like "Ernest Hemingway" not "Hemingway, Ernest". Yes, dumb.

At the same time, there are different levels of information retrieval. Some are simple. The results do not have to be exact, nor do they have to be comprehensive. Day-to-day information retrieval falls into this category most of the time. On the other hand, there is scholarship -- the fodder of theses and dissertations. This kind of information retrieval is more demanding. True, it is not required by most people, but for the people to whom it is required the more sophisticated search and retrieval techniques really are imperative.

SRW/U is as search and retrieve protocol that can easily handle both the day-to-day searching tasks as well as the more specialized ones, and SRW/U is only as complicated as one makes it. OpenSearch, at the present time, only supports freetext searching using single words or phrases, and it is not possible to limit search results to specific fields/indexes. Additionally, it is not possible to create sophisticated queries. Without such features, the end-user must rely in a really "intelligent" back-end to read the user's minde and then rank and present results.

It is untrue that SRW/U is designed to return only metadata. Yes, in most instances, SRW/U results include only metadata, but there is absolutely no reason why results can not include entire records or even entire texts. The results of SRW/U searches are only limited to what can be stuffed into an XML stream.

Nor do I agree that an OpenSearch is easier to implement. Both protocols take an input. The input is encapsulated into a URL and sent to a server. The server interprets the input, processes it, and returns it as an XML stream. Both systems require some sort of client to send the query and transform the results. Both systems require some sort of server to interpret the query and search an index. Just because OpenSearch has fewer options does not mean it is easier to implement. Conversely, just becuase SRW/U has more options does not mean it more difficult.

--
Eric Lease Morgan
University Libraries of Notre Dame
At 1:26 PM, August 15, 2005, Ross said...: Eric, you keep banging this drum, but I'm still not buying it.

OpenSearch as implemented via A9.com allows no sophisticated searching, but this is not a limitation of the spec. In fact, the spec is pretty ambiguous on this front. Our OpenSearch server is more than happy to receive CQL in the query string and you'll get back the same result set that you'd get if you tried the same query against the SRU server via any other method.

We use OpenSearch as an output format because it's a lot easier for Joe Blogger (and Jane Miscellaneous Internet Content Provider) to compensate for RSS in their random web site than it is to handle whatever the output will be from the various SRU servers they may want to link to.

I am under no circumstances endorsing OpenSearch as a metasearch alternative. I am, however, trying to spread the word that it's not a half bad way to get our collections exposed to the 99% of the world that isn't directly employed by a library. There is an ever expanding number of RSS clients and toolkits out there, enabling simple OpenSearch clients to the outside world. To date I have seen two simple SRU clients (yours and Mike Taylor's), one which requires an xslt stylesheet, the other which requires either a stylesheet or some sort of programmatic intervention to easily include it in a web site. Neither of which can easily compensate for varying metadata between SRU hosts. This is not exactly the way to open ourselves up to library friendly non-libraries.

What I don't understand is why this has to be a zero-sum game. What are we giving up by transforming our search results to a format that others can use?
At 2:57 PM, August 15, 2005, Anonymous said...: As an RSS-related follow-along...

I was recently in an OPAC committee meeting where RSS was brought up as a helpful way of notifying patrons of new items (basically a saved search). One individual was concerned about this because of the difficulty he saw in using RSS.

The next day I installed the IE7 beta and found that RSS is integrated into the browser. Upon further investigation I found that the next version of windows (currently named "Vista") will integrate RSS into the OS -- you'll see your feeds on the desktop (ie Active Desktop that works). Microsoft sees RSS _as_important_ as web browsing.

Given that level of accessibility, usability and ubiquity, I think we would be sorely remiss not to integrate RSS into our core services (where appropriate).

I personally am quite pleased with the OpenSearch project and see it as a nice easy way to build needed functionality to my own projects. I'll likely be building SRU/W interfaces as well -- where needed -- but don't see this as an either/or situation. I'm more than glad to run as many interfaces to the system as is reasonable/supportable -- trapped data is worthless data.
At 8:27 AM, August 16, 2005, Anonymous said...: RSS is too narrowly defined to handle the widest spectrum of data types necessary for scholarly research. Yes, RSS can accommodate much, if not most, information needs, but RSS is not much further away from Dublin Core, and if I want to share content that is better described beyond title, creator, description, and subjects then I will need something more robust. It would be difficult to accurately and thoroughly describe datasets, paintings, music, maps, flowers, furniture, computer programs, etc. with RSS.

SRW/U clients accept SRW/U output and can transform the results into RSS. See the Ockham Alerting Service as an example. Yes, you need XSLT to transform SRW/U results, just as you need something to read MARC records in communication format before you can save them in your database. You also need a computer program to read RSS streams -- an RSS reader. SRW/U is simply more expressive in its query syntax and more flexible in its output options.

I don't see why I should run two different services when one will satisfy more needs -- SRW/U.

--
Eric Lease Morgan
University Libraries of Notre Dame
At 10:00 AM, August 16, 2005, Ross said...: Eric, you have not yet explained why this is a zero-sum game. I, nor anybody else I know that has implemented or toyed with OpenSearch, is arguing for OpenSearch over SRU.

As I have repeatedly stated, we use it as an alternative output format for our SRU server.

If you can find a large subset of examples of lay web services that give one iota of a damn about a "wide spectrum of data types", I will:

A) be shocked
B) point them to our SRU server

I have no problem running two servers (although I'm only really running one... since as I've said, OpenSearch runs on top of SRU), if that means I can potentially get my collections out to more people.

Hell, I might even run three.

Or four.

Eventually idealism has to give way to pragmatism.
At 8:25 AM, August 17, 2005, Anonymous said...: Okay, I can live with that.

--
Eric Lease Morgan
At 9:06 PM, August 17, 2005, Anonymous said...: I also don't see the argument of opensearch vs SRU. As stated above OpenSearch is more of a delivery format as it's currently pushed. If people can subscribe to search results, which I think they should, then I think OpenSearch is a nice format to do it with. I had a simple script that created RSS feeds and it only took a few extra lines to make it OpenSearch compatible.

That I have a simple search interface was my choice and was not set by OpenSearch. As said by others you can have as complex a backend as you want. I think the draw of OpenSearch is that it's fairly easy to implement it as a output mechanism for whatever system you happen to have running. You can see this by browsing some of the "channels" currently available.

I do see the argument that OpenSearch doesn't have all of the URL options some people would like and that is a valid concern. As pointed out in the link above there are some interesting arguments for advanced sorting, etc. Right now I think OpenSearch is pushed as a delivery format and not a separate service. I think very few people are writing separate search backends for OpenSearch and are instead just modifying their current engines to allow the output in that format.

Again, if you offer RSS for searches I see no reason not to offer RSS w/ the Open Search extensions no matter what backend you choose.
At 10:53 AM, August 19, 2005, Anonymous said...: Hi -- very interesting discussion!

I just wanted to add that I'm definitely inclined to figure out a way such that the query side of future version of OpenSearch can be made more flexible.

In scenario I am currently exploring, a search engine would be able to declare (via the OpenSearch description document) any number of supported query types. One of those query types might be the "native" (and simple) OpenSearch query syntax. Another might be SRU. Still another might be XQuery.

We deliberately kept OpenSearch simple, but no one objects to figuring out ways of enabling different communities in extending it for special niche purposes. As it is, the simple query format and the RSS results work very well for a rather large number of search engines. But that's not to say that the framework shouldn't be leveraged on other, more sophisticated, scenarios.

And please check out our A9 Developer Blog, where I write often about OpenSearch -- and please add your thoughts about where we can take this project in the future.

Cheers,

-DeWitt (the guy who shepherds the OpenSearch spec)
At 11:21 PM, August 19, 2005, Ross said...: Dewitt, these are interesting proposals (I learned of your blog about a month ago and have been following it, btw). I'm not sure I have any good recommendations for how a more complex search should work. Maybe all searches should default to keyword anywhere (there are worse defaults out there) and then another argument can be passed that allows a more granular search -- if the request syntax/schema is defined somewhere, it's much more likely to be able to be translated into whatever my particular search supports.

It's very cool that you would be interested in what the library (and other) communities would like from OpenSearch. I really wish the opposite was true (but, you've obviously read this post). I, personally, think it's important to keep OpenSearch simple (although some stricter guidelines might be useful), but I would really like to see it maintain its current level of compatibility with SRU (because I don't want to make a lot of work for myself).
At 6:49 AM, August 22, 2005, Anonymous said...: I don't know what you're saying, but you say it so well.

Your pal,

Gaughin

Community:

14 Comments: