Dilettante's Ball: June 2005

Tuesday, June 21, 2005 My baby's done grown up

Emory University Libraries has released Reserves Direct(2). I left for Georgia Tech during the redesign process and it far exceeds anything I could have imagined. It shares little to nothing from 0.9 (I was always too self-concious to label it a 1.x release).

One of the bigger changes is that it's now in PHP5/MySQL. PostgreSQL, for several reasons, was a rather large barrier to adoption for many libraries. The database has been completely redesigned (that was really the only part I participated in), since the old version was built upon a very awkward legacy data model (see the history for more on this). They also consolidated the three interfaces (staff, faculty and student) into one which should clear up much of the confusion faculty members would have when they couldn't edit their class from the student interface (see the demo). Actually, the new interface is so much more user-friendly, it's obvious that Maurice York (who did most of the actual interface design) put quite a bit of work into it. Jason White did the bulk of the actual coding, which I'm sure will be an improvement over the mess of code that passed for a reserves system previously. I'll know more about that soon, since I'm supposed to add the Voyager integration module (it currently only works with SirsiDynix Unicorn). My take is that this job should be a whole hell of a lot easier, thanks to Jason et. al.

One thing I find pretty slick about their site is the promotional video. That's Andy Ditzler doing the voiceover (and I'm assuming he directed it). It's seldom you see an OpenSource project with a marketing video, and even less by one sponsored by a library. I mean, who cares if the documentation is spotty if there's a slick promotional campaign, right?

I don't mean to sound so much like a shill for Reserves Direct, but it's really amazing to see something that started by being disappointed at an Eres demo turn into this.

Wednesday, June 15, 2005 Gussying up OpenSearch

So, last week, before I left for SMUG, Mike Rylander (of Evergreen-ILS), Joshua Ferraro (of Liblime/Koha) and I began talking about OpenSearch interfaces for our respective catalogs. The only reason I was able to really contribute to the conversation was the fact that I had my little python CGI, but I hadn't thought much about it since I wrote it.

I had largely given up on targeted searching within OpenSearch due to the fact that targeted searching would be wasted on A9. A9 is basically only keyword and phrase searching, so more sophisticated queries would really only produce results in the catalog column. You would remove any advantage of A9's cross searching plus you'd remove any advantage of the catalog's native interface. Since I hadn't seen anyone else implementing OpenSearch, there was no point in pursuing this.

Well, until last week. Mike was talking about implementing OpenSearch in Evergreen and was interested in including results from Koha catalogs. Since I had already created my OpenSearch widget, it also seemed like a natural target. More natural, in fact, since the advantages of including the Georgia academic libraries in a search of Georgia public libraries (and vice versa) seemed so obvious.

This meant that targeted searching needed to work, though. I made the suggestion that queries should assume keyword and respond to CQL if it is supplied. CQL is just so intuitive that it seems silly not to use it, plus the added bonus of not having to manipulate it in any way before I send requests to yazproxy. Always looking for ways to get out of doing work, you see. So with that statement, I packed up and headed to SMUG and its lack of internet access for the rest of the week.

When I got back, not only had Mike created a proof-of-concept search interface, he had also created an extension to OpenSearch to account for relevancy and merged sets. He proposes adding the namespace xmlns:openIll="http://open-ils.org/xml/openIll/1.0" and using an <openill:relevance> tag to display relevance. With this, result sets can be merged and sorted by relevance, and any search targets that don't include this will appear as columns like normal.

There are some issues with this method, the most obvious being "what is relevance?" For example, I have no abilities to sort or get relevancy rankings from Voyager's Z39.50 server. In order to make Mike's proof-of-concept work, I had to "fake" a relevancy ranking based on order (which is always reverse chronological by creation date). I take the remainder of 100 minus ((the quotient of 100 divided by the number of results) and multiplied by the (result number minus 1)) (wow, refreshing my arithmetic vocabulary). My point here is, that's a crappy algorithm. For queries that produce thousands of results, you'll wind up with upwards of 50 hits with greater than 98% relevancy, despite the fact that the second result may not be relevant at all. Of course, I'd never put this in a production system, but it still goes to prove that relevancy is relative. The other problem is that in a merged search of two or more OpenSearch targets, the results on page three of a given target may be considerably more relevant than, say, the third result of the other target, yet you will still get all of the less relevant results in the pages in between. While, this is possibly a valid argument, it sort of misses the point of OpenSearch which I view merely as a resource exposure tool rather than a robust search protocol (at this stage, anyway).

The good news is that when Art and I have our OPAC mirror set up, we will actually be able to do relevance properly. The even better news is that we don't even have to have a concept of how the public interface is going to work to get this functionality. We'll have the data, we'll have the indexer, we just need to point Z39.50 queries to it and then direct them to the real OPAC (for now).

One thing I want to point out, though, is how cool the Evergreen and Koha results are. Not only are they deduping (we'll be able to do that later), but they display a sort of brief holdings (x copies available) and provide links on author and subject. I wasn't even thinking about providing subjects! I will have to see how easy it is to add this sort of functionality to our search results.

I hope to be able to point our OpenSearcher at our DSpace repository soon, too. We're currently trying to install the OCLC SRW web app and this could go a long way in providing exposure to DSpace from other resources like WAG the Dog or our catalog.

Saturday, June 11, 2005 Mr. SMUG Man

So I'm sitting in the Adele K. Stamp Student Union at the University of Maryland waiting for the shuttle to take me to the Metro and the airport (and out of this hell-furnace they call D.C.) logged into their campus wireless with some username and password that got passed around at the SMUG (SFX/Metalib User's Group) conference.

All in all, this was a really useful conference (and I don't throw those words around lightly). One major problem, however, was that there was really no internet access for the people here (to be fair, there was a lab in the library that we could use -- but our schedule didn't allow us to get there much). Even worse, I was staying in the dorm. The dorm rooms have nothing but a bed, really... so it was imperative to avoid the dorm at all costs until I wanted to fall asleep.

But back to the conference: this was my first SMUG, and I really didn't know what to expect. The morning started off (early!) with the typical rah-rah fest of the vendor announcements (this time through the Ex Libris filter, rather than Endeavor). The last of these, though, was a bit inspirational.

Ex Libris has named Oren Beit-Arie their Chief Strategy Officer and he spoke about upcoming development and goals for the company. The amazing part here is he talked about how Ex Libris was going to focus on interoperability. He added that Ex Libris wants to create products that are modular and can be used with other technologies, whether Ex Libris is the backend or something else is and Ex Libris is used as the front end (or vice versa). Wow! This is exactly what I asked Endeavor to do (they won't).

Dear University System of Georgia: can we please move to Ex Libris as our primary vendor?
If not, Dear Ex Libris: are you hiring?

Helping put the smug into SMUG, oddly my name was mentioned by two of the speakers before 10AM on the first day (Oren and Roy Tennant). The last time my name called out before 10 AM in a room full of my peers, I had to do 3-5 (thank you folks, I'm here until 3PM!). To be completely honest, though, it was always addressed as "Peter Binkley and Ross Singer". This is actually the way it should be done, Peter should always come before me. Still... flattering that Oren had any idea who I was. Roy, of course, is on the beer (scotch, wine) list.

Roy was next to talk about the pros and cons of Google Scholar. This was very informative, made even more so by the fact that Anarug Acharya (principal engineer, Google Scholar) arrived in the middle of Roy's presentation.

Then Carla Lillvik of Harvard spoke of their Metalib installation. I'm sorry, but it flat out sucks. Worse, it has taken 2 years to get to this level of suckery. Metalib--

Next up was a panel on Metalib usability testing. If anyone needs to read Polishing the Turd, it's these folks. It seems the problems with Metalib's user interface are so deep (and customization so difficult) that they resort to things like "changing colors of tabs" to help the users. Metalib--

Anurag spoke after lunch and talked about the mechanics and direction of Google Scholar. I feel a little better about the holdings issue now, but I still am a bit uncomfortable about the advantage that gives Google Scholar over other database search engines. Anurag is an interesting fellow. He's very confident (some may take it as arrogant), intelligent, and while he wants to work with libraries, he doesn't really want to deal with our b.s., either.

At this point the program changed dramatically and my presentation with Selden was moved up to be next. I hadn't actually gotten a chance to get to a computer to see if my presentation was working (it's, of course, a live presentation). After a scary moment without any network connectivity, we were up and running. Frank Cervone introduced us and mentioned my heavy metal hair from high school. As usual, I forgot half of the things I wanted to say while onstage.

It was quite cool that Anurag stayed as long as he could through my presentation.

Afterwards, I spoke to Oren and he got me on the notion of "Context Objects" vs. "Latent OpenURL Autodiscovery". I will definitely investigate this more, but Oren (and Herbert van de Sompel) are probably right. This is probably a much better approach.

I need to start wrapping this up so I can catch the shuttle... but I have quite a bit more that I want to say... mainly how by day two (thanks to Roy, Mike McKenna and David Walker, not to mention Karen Groves) I am very high on Metalib again, but only through the X-Server. I got to spend a lot of time with David Walker (in fact, he was inducted into the beer list) and I'm really glad I did. He's doing some amazing stuff at Cal State San Marcos and we can all learn a lot from him.

So, Metalib++?

I also have several orders for WAGgers. I am thinking of maybe moving it to being a hosted service.

Also, thanks to Katie Gohn for supplying the entertainment at our table at the Dinner/Reception Thursday night.

Thursday, June 02, 2005 The Rock & Roll "Aesthetic"

When people ask me why I "retired" from acting a couple of years ago, I always tell them, "Because I don't like other actors".

So why I decided to get back into playing music is a bit beyond me. While actors and theatre people have their constant drama and their singing songs from musicals at cast parties, in many ways it pales in comparison to the juvenile society and culture that exists around rock and/or roll.

This is my first real band since The Province of Avocado (Ashley Proffitt doesn't count) which was, like, back in 1999 (so last millenium...) so I hadn't bothered to maintain my equipment. After dealing with a shorted out instrument cable for a month (and a shorted speaker cable for even longer), I finally decided it was time for new cables.

I really hate guitar shops. They are crowded, expensive, staffed by former members of my high school heavy metal band, and one has to hear (repeatedly) some kid (badly) playing either:

Metallica's Seek and Destroy
Metallica's Master of Puppets

or, if they want to show how sensitive they are

Metallica's Welcome Home (Sanitarium)

It almost makes you nostalgic for the poorly played Stairway to Heaven you used to hear.

So, a couple of months ago, with the Benz overheating, I drove across Atlanta to Guitar Center to buy some cables. It was everything I feared and worse. To boot, their cables were really freaking expensive, and since Selena's reluctant about my spending $35/mo. on rent for our practice space, she would really freak out if I spent $100 on cables.

So I went home and bought all of my cables for somewhere around $30 from Musician's Friend. This, naturally, put me on their mailing list.

So when my first catalog arrived, I was transported back to high school. Oh sure, some of the companies are different (although not that many) and some of the gadgets are high-techier (although not much), but the vibe is still the same. And for a 32 year old, it is a vibe of shame.

Let me get some things clear before I continue:

I went to high school in Chattanooga, TN.
I graduated in 1990.
When I graduated, I had hair halfway down my back.
My ensemble routinely consisted of tight jeans, converse all-stars (or wrestling shoes) and either a Metallica t-shirt or a Queensryche t-shirt.
My first bass (a Peavey Foundation) was purple.
In my senior year, my band performed for the entire school some Tesla song and Ozzy Osbourne's Crazy Train.

What I'm trying to say is that I've been in the belly of the beast of bad taste and it's nothing to be proud of and it should not be perpetuated.

So imagine my horror, 15 years later, when I see this. And this. And this (Dave Mustaine?!). How about this one? Even moving over to basses, where I belong, and had hoped (erroneously) was more "dignified", revealed this. At least there aren't as many cheesy graphics.

The struggle for the average musician is navigating between the heavy metal death schwag and the jam band "kind" look. My current bass strap would make the unsuspecting bystander think I play in the "Dave Matthews Cover Band", due to my frustration at trying to find a comfortable, padded bass strap a couple of years ago. I hate it, but I hate skulls and flames and snakes even more.

Dear "rock & roll industry": Zakk Wylde, Joe Satriani and George Lynch weren't that all that cool or interesting 15 years ago, please stop trying to sell me their crap now.

Also, I still have my Peavey, but it's no longer purple and it sits in pieces in my shed.

Wednesday, June 01, 2005 The view from the moral high ground reveals that I've fallen behind

I drove to work today. Normally, I take the train, but for several stupid reasons, I decided to drive.

Atlanta drivers have a disdain for "rules" and "traffic laws" that make me want to scream. They will pass you in an exit-only lane only to hold up that lane when they try to merge back into traffic 3 cars ahead. Four or five cars will plow through the intersection after the light has changed. The turn signal is apparently a sign of weakness.

So, because some fathead wants to get 6 car lengths ahead, we all sit and suffer in some of the worst traffic in the country. By actually following the rules, you:

Are forced to sit in the traffic being caused by those that break the rules
Are probably more of a liability on the road because if everyone is doing "wrong" things, you are the unpredictable one by being different.

Which brings me to metasearch (and a jarring segue). I am currently not sitting in a meeting in Macon to decide which metasearch product the state is going to go with. And, really, it doesn't matter that much.

Although I won't name any names, the candidates were down to two choices:

A "traditional" metasearch that uses standards like Z39.50, SRW/U, etc. to search
A metasearch that is based on screen scraping

While the research libraries in the state were leaning towards #1, there was something gnawing at us that was hard to deny.

#2 really mopped the floor with #1 as a federated search engine. Not only was it exponentially faster, but it is capable of searching over 95% of our databases (as opposed to #1 which is in the 30-40% range... and does that slowly).

Still, there is other functionality in #1 that still makes it desirable (mostly revolving around workflow and integration into an academic environment, integration with our link resolvers). As a cross database searcher, however, #2 is clearly the winner.

What this brings me to is... How did we get to this point? Why is it actually so much easier and brings better results when we "break the rules"? We have invested a lot of time, thought and energy into creating our standards... how can it possibly be easier to screen scrape results pages rather than use the tools we have created?

I blame libraries first. Securing access via Z39.50, XML gateway, API, etc. has never been a particularly high priority. Metasearch is not only a "systems" issue. It also needs to be looked upon as a collection development issue. If two vendors have "Compendex" and only one of them makes it available through means outside the native web interface, unless that vendor's native web interface is "the suck" (technical term), they really should be considered the more desirable option. Along with a whole host of other factors, of course. Still, I think non-native access is a very low priority among collection development decisions.

I blame the vendors next. First of all, so many of them don't even offer some sort of alternative access. Secondly, if they do, it's an afterthought.

I have been toying with robcaSSon's federated search project, unhelpfully supplying suggestions when he asks #code4lib for help on particular problems. What Rob has written so far is very cool (but unfinished and therefore not publically available) but it struck me how slowly it searched Academic Search Premier and Business Source Premier (that's Rob's "canned" query -- those two dbs with the keyword search "hamid karzai and heroin").

In the native interface, searching across those two dbs is nearly instantaneous... it's basically just waiting for the browser to render the tables that takes any time. In Rob's interface, it takes about 5+ seconds to do the same search (and this is with no load on Rob's end, since it's not in production... so real world performance would probably be lower). Now, as we learned from Metasearch product #1, this is sadly respectable in the metasearch arena. It's still bad, though, and I wanted to figure out why it took so much longer.

Using indexdata's handy yaz-client, I fired up a Z39.50 session to EBSCO's Z39.50 server to investigate. Searching "hamid karzai and heroin" took a little over 4 seconds. Hmm. 4 seconds?! So I did a search for "female genital mutilation". 0.2 seconds. Hmm. I did the original search again. 0.05 seconds. Wow. I exited out of yaz-client and then reopened the connection and did it all again. Basically the same thing.

So, apparently it's the first search in a session that's a problem. And that sucks. Inherently, every search in a metasearch is the first search in the session. Certainly some connections can be cached, but this definitely raises the complexity of the application and, no matter what, not everything can be cached all the time.

Now, yazproxy would be perfect for dealing with this. It could maintain the session information and at the same time transform the output to xml. Everybody wins! Well, except I can't get it to work. I guess that's a bit of a hindrance...

So, again, by trying to do right and follow the standards our community has set, we are left behind the sloppy, inexact searching of a screen scraping method. Ultimately, we all lose, though, because screen scraping can only go so far. The richness of services we can layer upon a screen scaper has far less depth than that of a structured search.

And laying on the horn doesn't really help...

Community: