sru xslt support

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

sru xslt support

Bruce D'Arcus-3
OK, am working on SRU support on my end (Matthias having been working
on it on his).  Here's what I'm looking at:

      <xsl:when test="$bibdb='sru'">
        <xsl:copy-of
          select="doc(concat($server_url,
          version=1.1&query=bib.citekey+any+,
          $citekeys,  
&operation=searchRetrieve&recordSchema=mods&recordPacking=xml&startRecord=1&maximumRecords=9999,
          $authentication))"/>
      </xsl:when>

So the citekeys variable is the same as the existing one, and I'm
thinking I need to add two more: server_url (for the base url) and
authentication.

Any thoughts?  I want to keep things flexible, but also easy to handle
(and so simple).

Bruce


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
On 29-May-2005 at 21:05 -0400 Bruce D'Arcus wrote:

>       <xsl:when test="$bibdb='sru'">
>         <xsl:copy-of
>           select="doc(concat($server_url,
>           version=1.1&query=bib.citekey+any+,
>           $citekeys, &operation=searchRetrieve&recordSchema=mods \
>           &recordPacking=xml&startRecord=1&maximumRecords=9999,
>           $authentication))"/>
>       </xsl:when>
>
> So the citekeys variable is the same as the existing one

As we've discussed earlier, I think that individual cite keys must be
enclosed by anchors to provide for exact field matches if the 'any'
relation is used:

     "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Dr Robert Sanderson
On Mon, 30 May 2005, Matthias Steffens wrote:

> On 29-May-2005 at 21:05 -0400 Bruce D'Arcus wrote:
> As we've discussed earlier, I think that individual cite keys must be
> enclosed by anchors to provide for exact field matches if the 'any'
> relation is used:
>     "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"

I don't think it makes any difference, actually.

any, all and adjacency are word relations.  If all you have in the field
is a single word, then it will act like an exact equality relation.

If you have a cite key with a space, you'll still end up with potentialy
incorrect results from:

     foo all "^the first key^  ^a second key^"

which means:

     foo = ^the and foo = first and foo = key^ and foo = ^a and foo =
     second and foo = key^

Smith1992a will not match Smith1992abcdef.  The word anchors only say
'this word must be at the beginning or end of the field'

Rob

       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
On 30-May-2005 at 11:02 +0100 Dr Robert Sanderson wrote:

> > On 29-May-2005 at 21:05 -0400 Bruce D'Arcus wrote:
> > As we've discussed earlier, I think that individual cite keys must be
> > enclosed by anchors to provide for exact field matches if the 'any'
> > relation is used:
> >     "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"
>
> I don't think it makes any difference, actually.
>
> any, all and adjacency are word relations.  If all you have in the field
> is a single word, then it will act like an exact equality relation.

True, but what if you're query contains cite keys that would match
multiple keys in your database, like:

  any "Smith1992 Mitchell1995"

and you have following cite keys in your database:

  "Smith1992"
  "Smith1992Univariate"
  "Smith1992Multiple"
  "Mitchell1995"
  "JeffriesMitchell1995"

Without the anchors, the query keys won't be unique. That's a serious
problem since we can't (and shouldn't) make assumptions about other
people's cite key syntax.

> If you have a cite key with a space, you'll still end up with
> potentialy incorrect results from:
>
>      foo all "^the first key^  ^a second key^"
>
> which means:
>
>      foo = ^the and foo = first and foo = key^ and foo = ^a and
>      foo = second and foo = key^

Ok, I see. Btw, for me it's one of the most confusing things in CQL
that spaces do mean completely different things depending on context
and which relation is used. That's a concept that's pretty different
from all other search languages I've come across, so far and I find
it hard to grasp. Of course, that doesn't mean it's bad ;-), it's
just easy to get trapped by that.

I think that when querying for multiple cite keys we should not use
the 'any' relation then but multiple 'exact' statements connected
with 'and' instead. (which isn't as smart as using 'any "..."' since
it gets pretty wordy :-/)

This problem was also the reason why I was asking for an 'anyexact'
relation which would ease things for us quite a bit, IMHO.

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
In reply to this post by Dr Robert Sanderson
On 30-May-2005 at 11:02 +0100 Dr Robert Sanderson wrote:

> On Mon, 30 May 2005, Matthias Steffens wrote:
> > I think that individual cite keys must be enclosed by anchors to
> > provide for exact field matches if the 'any' relation is used:
> >     "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"

> any, all and adjacency are word relations.  If all you have in the
> field is a single word, then it will act like an exact equality
> relation.

Maybe I'm not understanding it correctly, here. From the CQL
information on the SRW web site I learned that:

  bib.citekey any "Smith1992 Mitchell1995"

would resolve to:

  bib.citekey="Smith1992" and bib.citekey="Mitchell1995"

and that the equals sign means "contains". That, in turn, means that
other cite keys which contain the search term would be matched as
well.

If search terms used with the 'any' relation do only match whole
words, then the explanations on the web site are somehow misleading,
IMHO.

And how does CQL define a word? What about international characters
or a hyphen ('-')?

Thanks, Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthew J. Dovey
In reply to this post by Bruce D'Arcus-3
> Maybe I'm not understanding it correctly, here. From the CQL
> information on the SRW web site I learned that:
>
>   bib.citekey any "Smith1992 Mitchell1995"
>
> would resolve to:
>
>   bib.citekey="Smith1992" and bib.citekey="Mitchell1995"

Not quite - try:

bib.citekey = "Smith1992" or bib.citekey = "Mitchell1995"
 
> and that the equals sign means "contains". That, in turn, means that
> other cite keys which contain the search term would be matched as
> well.

I don't think so, from the SRW/CQL pages:

"= is used:
For word adjacency, when the term is a list of words. That is to say
that the words appear in that order with no others intervening.
Otherwise, for exact equality of value"

So Smith1992aabb would not be a match for

bib.citekey = "Smith1992" or bib.citekey = "Mitchell1995"

Or

bib.citekey any "Smith1992 Mitchell1995"


For Smith1992aabb to be a match you need the stem modifier

So

bib.citekey =/stem "Smith1992" or bib.citekey =/stem "Mitchell1995"

Or

bib.citekey any/stem "Smith1992 Mitchell1995"

 
> If search terms used with the 'any' relation do only match whole
> words, then the explanations on the web site are somehow misleading,

Could you point out where the misleading bits are - as far as we are
aware it is fairly clear, however we are always open to
corrections/ammendments/clarifications etc!

Matthew Dovey
(Technical Editor - SRW)
Oxford University


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Bruce D'Arcus
Matthias -- are you still on digest?  If yes, it might be good to
change that for these kinds of discussions.

Anyway, if someone could settle the syntax I should be using, that'd be
great.  I was so far assuming:

        bib:citekey+any+"^Smith1992a^+^Smith1992b^"

The anchors are trivial to add of course.

So now issues with my other decisions?

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Mike Taylor
In reply to this post by Matthias Steffens
> Date: Mon, 30 May 2005 12:38:50 +0200
> From: Matthias Steffens <[hidden email]>
>
>>> I think that individual cite keys must be enclosed by anchors to
>>> provide for exact field matches if the 'any' relation is used:
>>> "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"
>>
>> any, all and adjacency are word relations.  If all you have in the
>> field is a single word, then it will act like an exact equality
>> relation.
>
> Maybe I'm not understanding it correctly, here. From the CQL
> information on the SRW web site I learned that:
>
> bib.citekey any "Smith1992 Mitchell1995"
>
> would resolve to:
>
> bib.citekey="Smith1992" and bib.citekey="Mitchell1995"

(That "and" should be "or" -- presumably a typo?)

> and that the equals sign means "contains".

Yes.

> That, in turn, means that other cite keys which contain the search
> term would be matched as well.

No.  "=" is doing word matching, not substring matching.

        bib.citekey = Smith1992

with find "Smith1992" but _not_ "Smith1992a".  If you _want_ to find
those, you'll need to use a wildcard:

        bib.citekey = Smith1992*

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  Join the ASCII ribbon campaign against HTML mail -
         http://arc.pasp.de/

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/



Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
On 30-May-2005 at 12:34 +0100 Mike Taylor wrote:

> > bib.citekey any "Smith1992 Mitchell1995"
> > would resolve to:
> > bib.citekey="Smith1992" and bib.citekey="Mitchell1995"
>
> (That "and" should be "or" -- presumably a typo?)

Oops, yes that was a typo.

> > and that the equals sign means "contains".

> > That, in turn, means that other cite keys which contain the
> > search term would be matched as well.
>
> No.  "=" is doing word matching, not substring matching.

Ah, ok. I somehow got this wrong when reading the CQL documentation.
Sorry for the confusion. Then the anchors aren't necessary, of course.

Thanks, Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
In reply to this post by Bruce D'Arcus
On 30-May-2005 at 7:33 -0400 Bruce D'Arcus wrote:

> Matthias -- are you still on digest?  If yes, it might be good to
> change that for these kinds of discussions.

Yes, you're correct and I should change that. (especially since the
digest sometimes seems to get the chronological order of postings
wrong which makes it impossible to follow a conversation)

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Dr Robert Sanderson
In reply to this post by Matthias Steffens
>>>     "^Smith1992a^ ^Smith1992b^ ^Mitchell1995a^"
>>
>> I don't think it makes any difference, actually.
>> any, all and adjacency are word relations.  If all you have in the field
>> is a single word, then it will act like an exact equality relation.

> True, but what if you're query contains cite keys that would match
> multiple keys in your database, like:
>  any "Smith1992 Mitchell1995"

> and you have following cite keys in your database:
>  "Smith1992Univariate"
>  "Smith1992Multiple"

You would match them with Smith1992* using the default masking characters.
You could also use a regular expression (for example) to match them if you
specified a different masking algorithm.



>> If you have a cite key with a space, you'll still end up with
>> potentialy incorrect results from:
>>
>>      foo all "^the first key^  ^a second key^"
>>
>> which means:
>>
>>      foo = ^the and foo = first and foo = key^ and foo = ^a and
>>      foo = second and foo = key^


> Ok, I see. Btw, for me it's one of the most confusing things in CQL
> that spaces do mean completely different things depending on context
> and which relation is used.

Yes, the distinction is primarily if the field is to be treated as a
single string or a list of words

Rob

       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Dr Robert Sanderson
In reply to this post by Matthias Steffens
>  bib.citekey any "Smith1992 Mitchell1995"
> would resolve to:
>  bib.citekey="Smith1992" and bib.citekey="Mitchell1995"

> and that the equals sign means "contains".

= means (currently) word adjacency when applied to strings.  So = with one
term is the same as any or all with one term -- the field contains the
word given.

> And how does CQL define a word? What about international characters
> or a hyphen ('-')?

It doesn't.  It's up to the search engine to determine the best way to
turn a field into a list of words.

Rob


       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Dr Robert Sanderson
In reply to this post by Matthew J. Dovey
> For Smith1992aabb to be a match you need the stem modifier
> So
> bib.citekey =/stem "Smith1992" or bib.citekey =/stem "Mitchell1995"
> Or
> bib.citekey any/stem "Smith1992 Mitchell1995"

Or more appropriately, * on the end as stem is used for linguistic
stemming (ala the Porter algorithm)

Rob

       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Dr Robert Sanderson
In reply to this post by Bruce D'Arcus
> Anyway, if someone could settle the syntax I should be using, that'd be
> great.  I was so far assuming:
> bib:citekey+any+"^Smith1992a^+^Smith1992b^"

Assuming that the context set has a short name of 'bib':

      bib.citekey any "Smith1992a Smith1992b"

Plus escaping on all non URL okay characters such as space and "

Rob

       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Bruce D'Arcus-3
On 6/1/05, Dr Robert Sanderson <[hidden email]> wrote:

>
> > Anyway, if someone could settle the syntax I should be using, that'd be
> > great.  I was so far assuming:
> >       bib:citekey+any+"^Smith1992a^+^Smith1992b^"
>
> Assuming that the context set has a short name of 'bib':
>
>       bib.citekey any "Smith1992a Smith1992b"
>
> Plus escaping on all non URL okay characters such as space and "

Here's what I'm currently outputting:

version=1.1&query=bib.citekey%20any%20"Tilly2000a,%20Thrift1990a,%20Tilly2002a,%20Veer1996a,%20Tremblay2001a,%20NW2000-0207,%20NW2000-0424a"&operation=searchRetrieve"recordSchema=mods&recordPacking=xml&startRecord=1&maximumRecords=9999&x-info-2-auth1.0-authenticationToken=

I still need to finish the authentication support, and a server to
test against.  Matthias, are you able to support these queries yet?

Bruce


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
On 03-May-2005 at 7:38 -0400 Bruce D'Arcus wrote:

> On 6/1/05, Dr Robert Sanderson <[hidden email]> wrote:
> >
> > > Anyway, if someone could settle the syntax I should be using, that'd be
> > > great.  I was so far assuming:
> > >       bib:citekey+any+"^Smith1992a^+^Smith1992b^"
> >
> > Assuming that the context set has a short name of 'bib':
> >
> >       bib.citekey any "Smith1992a Smith1992b"
> >
> > Plus escaping on all non URL okay characters such as space and "
>
> Here's what I'm currently outputting:
>
> version=1.1&query=bib.citekey%20any%20"Tilly2000a,%20Thrift1990a,%
> 20Tilly2002a,%20Veer1996a,%20Tremblay2001a,%20NW2000-0207,%20NW2000
> -0424a"&operation=
> searchRetrieve"recordSchema=mods&recordPacking=xml&startRecord=1&
> maximumRecords=9999&x-info-2-auth1.0-authenticationToken=

I assume the commas shouldn't be in the above search term?

> I still need to finish the authentication support, and a server to
> test against.  Matthias, are you able to support these queries yet?

Almost. :-) I haven't yet found time to correct my incorrect
interpretation of the equals relation (i.e. '=' matches only full
words but not sub-strings). The authentication token isn't supported
either but I suppose this is easy to implement. I'll hope to finish
these things over the weekend.

Ultimately, I should rewrite my simple CQL parser as suggested by Rob
in an earlier email.

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Bruce D'Arcus
On Jun 3, 2005, at 8:29 AM, Matthias Steffens wrote:

> I assume the commas shouldn't be in the above search term?

Right.

> I'll hope to finish these things over the weekend.

OK, I'll do the same.  Let me know when you're ready and we can do a
test/demo.  I want to announce this project formally next week if
possible.

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Matthias Steffens
On 03-May-2005 at 9:23 -0400 Bruce D'Arcus wrote:

> > I'll hope to finish these things over the weekend.
>
> OK, I'll do the same.  Let me know when you're ready and we can do
> a test/demo.

I'll do.

> I want to announce this project formally next week if possible.

Great!


[Btw: I got an "Undelivered Mail" error with my last email that I
 cc-ed to you directly:

 "<[hidden email]>: host
    /var/imap/socket/lmtpprox[/var/imap/socket/lmtpprox] said: 552 5.2.2
    Over quota (reported by server2.internal in RCPT TO) (in reply to end
    of DATA command)"]

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: sru xslt support

Bruce D'Arcus-3
On Jun 3, 2005, at 9:44 AM, Matthias Steffens wrote:

> [Btw: I got an "Undelivered Mail" error with my last email that I
>  cc-ed to you directly:
>
>  "<[hidden email]>: host
>     /var/imap/socket/lmtpprox[/var/imap/socket/lmtpprox] said: 552
> 5.2.2
>     Over quota (reported by server2.internal in RCPT TO) (in reply to
> end
>     of DATA command)"]

I don't know what the deal is with fastmail. Matthew and I have both
seen the same.

GMail is a bit safer I think.

Bruce



Reply | Threaded
Open this post in threaded view
|

SRU - authenticationToken

Matthias Steffens
In reply to this post by Dr Robert Sanderson
Hi,

(in the hope to integrate with xbib) I'm trying to implement support
for the 'x-...-authenticationToken' parameter in a SRU query:

  sru.php?version=1.1&query=bib.citekey=Mock2003Diss
  &x-info-2-auth1.0-authenticationToken=email=[hidden email]

Problems are that none of the Mac OSX browsers I've tried (Safari,
Firefox, Mozilla, Camino, Opera) seems to pass the token correctly:

  x-info-2-auth1.0-authenticationToken

The dot seems to be the culprit. If I remove the dot everything works
as expected. Or could this be a problem with PHP/Apache? I'm using
Apache/1.3.33 (Darwin) PHP/5.0.4.

I appreciate any hints.

Thanks, Matthias


12