Re: SRU

classic Classic list List threaded Threaded
71 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
Hi Bruce,

[cc-ing to the xbiblio-devel list]

I've checked the resources available at the SRW/U home page and I
think that a preliminary (i.e limited) implemenation of a simple SRU
server is not too difficult to implement. Especially the code &
examples given by Mike Taylor would be helpful:

 <http://sru.miketaylor.org.uk/>

However, I would appreciate to have more syntax/implementation
examples etc. Any pointers?

On 16 May 2005 at 15:28 -0400 Bruce D'Arcus wrote:

> Rob --it's Matthias Steffens (cc-ing), who is affiliated with the
> Bibliophile project. My understanding is that he'd start by just
> supporting this specific query, so I'm wanting to standardize on the
> precise syntax.

> > >> Something like:
> > >>         http://localhost:8081/biblio?operation=searchRetrieve&version=1.1
> >    &query=cite.key+any+"Smith1992a+Smith1992b+Mitchell1995a"
> >    &recordSchema=mods
> >    &startRecord=1
> >    &maximumRecords=9999

Parsing the query would be a breeze. Delivering a correctly formed
<srw:searchRetrieveResponse> would involve quite some work, though.
It's certainly doable but I fear I won't have the time to implement
it right away. Anyhow, speaking of the distant future, I'm really
willing to support a full SRU solution.

As a  preliminary solution, would it be possible for citeproc to send
a URL query (be it SRU or whatever) and have plain MODS XML returned?
Citeproc expects MODS and refbase outputs MODS. So the most simple
solution would be if refbase could return raw MODS XML data (without
any <srw:searchRetrieveResponse> data wrapped around it).

Of course, I agree that in the long run support of proper standards
like SRU (and CQL) is the way to go. But for now it would be cool, if
I could just send plain MODS.

This works already. You can try it out yourself. To do so, login at:

 <http://polaris.ipoe.uni-kiel.de/refs/index.php>

using

  email: [hidden email]
  pwd:   guest

then click the links below (the links won't work if not logged in):

  Return records no. 623, 21654 and 23961 in MODS XML format:

 <http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=%5E(21654%7C623%7C23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=xml>

This should return 3 records in MODS XML format. Change 'exportType'
to 'text' (or 'html') to have it rendered as plain text (or wrapped
into html):

 <http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=%5E(21654%7C623%7C23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=text>
 <http://polaris.ipoe.uni-kiel.de/refs/show.php?serial=%5E(21654%7C623%7C23961)$&submit=Export&exportFormatSelector=MODS%20XML&exportType=html>

Querying of cite keys is supported as well but you must be logged in
as a regular user to do so (since cite keys are unique to every user).

Btw, 'show.php' supports also other output formats (RIS, Endnote &
Bibtex via bibutils) and querying of many other fields. Here's an
example:

  Return all database entries (in Endnote format wrapped into HTML)
  where the title field contains "situ" and where the author field
  contains "mock", excluding any duplicate records:

 <http://polaris.ipoe.uni-kiel.de/refs/show.php?title=situ&author=mock&without=dups&submit=Export&exportFormatSelector=Endnote&exportType=html>

This should return 2 records...

Now, to gain citeproc integration I would just need to convert a
citeproc query into a refbase query (similar to the above) and
incorporate the returned results. As discussed before, another
pathway would be to directly send a MODS XML file to citeproc and
display the returned results.

I'm eager to do this...

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
On Tue, 17 May 2005 01:51:03 +0200, "Matthias Steffens"
<[hidden email]> said:

> However, I would appreciate to have more syntax/implementation
> examples etc. Any pointers?

Rob Sanderson and Mike Taylor are really the guys to ask about
documentation on SRU/W.  Maybe this is
another place to start?

http://www.loc.gov/z3950/agency/zing/srw/sru-simple.html

I don't have my bookmarks handy here ;-)

> As a  preliminary solution, would it be possible for citeproc to send
> a URL query (be it SRU or whatever) and have plain MODS XML returned?
> Citeproc expects MODS and refbase outputs MODS. So the most simple
> solution would be if refbase could return raw MODS XML data (without
> any <srw:searchRetrieveResponse> data wrapped around it).

Yes, but it would be preferable if we could get it close as possible
to SRU (for practical reasons, I don't want to have to support a
different query protocol for every database).
 
> Querying of cite keys is supported as well but you must be logged in
> as a regular user to do so (since cite keys are unique to every user).

Hmm ... I think this is a problem, and I don't just mean for citeproc.
I blogged about the ID issue about six months ago, and Mike Taylor, Rob
Sanderson and I chatted about it at length somewhat more recently (in
Feburary IIRC).

There's two issues:

1)  I see no way to get the XSLT processor to be able to login.  Perhaps
instead there might be another way to bind a citekey to a user in the
url so the data can be returned without being logged in?

2)  the bigger issue which is that citekeys aren't a good solution in a
multi-user context (I use them, so am not throwing stones; am just
remembering they're less-than-ideal).

I think Mike, Rob and I concluded that an ideal solution would code
multiple ids, and allow the database to resolve them.

This starts to get complicated though, but IIRC an SRU-like solution can
make it easier.

> This should return 2 records...
>
> Now, to gain citeproc integration I would just need to convert a
> citeproc query into a refbase query (similar to the above) and
> incorporate the returned results. As discussed before, another
> pathway would be to directly send a MODS XML file to citeproc and
> display the returned results.
>
> I'm eager to do this...

Cool!

BTW, did you manage to get it working on the commandline with the
include DocBook sample?

You may have noted I included an SRU example/  I've actually not tried
it, but the biblioref elements actually point to ISBNs in the LoC
database, which is accessible over SRU.

Perhaps I ought to get that working tomorrow as a demo!

Bruce


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
> Date: Mon, 16 May 2005 20:46:47 -0400
> From: Bruce D'Arcus <[hidden email]>
>
> > However, I would appreciate to have more syntax/implementation
> > examples etc. Any pointers?
>
> Rob Sanderson and Mike Taylor are really the guys to ask about
> documentation on SRU/W.

That's me -- hi!  (Er.  I'm not Rob, though.)

>> As a preliminary solution, would it be possible for citeproc to
>> send a URL query (be it SRU or whatever) and have plain MODS XML
>> returned?  Citeproc expects MODS and refbase outputs MODS. So the
>> most simple solution would be if refbase could return raw MODS XML
>> data (without any <srw:searchRetrieveResponse> data wrapped around
>> it).
>
> Yes, but it would be preferable if we could get it close as possible
> to SRU (for practical reasons, I don't want to have to support a
> different query protocol for every database).

I'm sorry that SRU forces you to deal with this extra layer of XML;
but I'm sure you'll easily see how much more flexibility that gives
you, and how that might well come in useful down the line.  The
<searchRetrieveResponse> element give the server a way to wrap
multiple records and a place to include diagnostics, session data,
extra response data, etc.  You don't have to use all (or any) of this,
but by committing to SRU you're making it available for more
sophisticated subsequent versions of your client.

> 2) the bigger issue which is that citekeys aren't a good solution in
> a multi-user context (I use them, so am not throwing stones; am just
> remembering they're less-than-ideal).

This is indeed a big problem if you're referring to the same thing
that I think you are.  Let me check whether you are.  Consider three
papers:

        Janensch, W.  1929a.  Material und Formengehalt der Sauropoden
        in der Ausbeute der Tendaguru-Expedition.  Palaeontographica
        (Suppl. 7) 2:1-34.

        Janensch, W.  1929b.  Die Wirbelsaule der Gattung
        Dicraeosaurus. Palaeontographica (SuppI. 7) 2: 39-133.

        Janensch, W.  1929c.  Magensteine bei Sauropoden der
        Tendaguru-Schichten. Palaeontographica (Suppl. 7) 2:135-144.

If one paper cites the first two, then it will call them Janensch1929a
and Janensch1929b; if another paper cites the second and third, it
will call _them_ Janensch1929a and Janensch1929b.  So the citation
keys have two different meanings to the different papers, and the
second of the three papers has two different citation keys.

Is that the lon-globality problem you're referring to?

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  "Any sufficiently complicated C program contains an ad-hoc,
         informally-specified, bug-ridden, slow implementation of half
         of Common Lisp" -- Greenspun's Tenth Rule of Programming.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
On May 17, 2005, at 3:55 AM, Mike Taylor wrote:

>> 2) the bigger issue which is that citekeys aren't a good solution in
>> a multi-user context (I use them, so am not throwing stones; am just
>> remembering they're less-than-ideal).
>
> This is indeed a big problem if you're referring to the same thing
> that I think you are.  Let me check whether you are.  Consider three
> papers:
>
> Janensch, W.  1929a.  Material und Formengehalt der Sauropoden
> in der Ausbeute der Tendaguru-Expedition.  Palaeontographica
> (Suppl. 7) 2:1-34.
>
> Janensch, W.  1929b.  Die Wirbelsaule der Gattung
> Dicraeosaurus. Palaeontographica (SuppI. 7) 2: 39-133.
>
> Janensch, W.  1929c.  Magensteine bei Sauropoden der
> Tendaguru-Schichten. Palaeontographica (Suppl. 7) 2:135-144.
>
> If one paper cites the first two, then it will call them Janensch1929a
> and Janensch1929b; if another paper cites the second and third, it
> will call _them_ Janensch1929a and Janensch1929b.  So the citation
> keys have two different meanings to the different papers, and the
> second of the three papers has two different citation keys.
>
> Is that the lon-globality problem you're referring to?

The issue is pretty simple really: one user might have Janensch1929a
for the natural language id (e.g. citekey) and another might have
janensch29.  We really need a more general solution, and perhaps we
ought to think about how to best achieve that within an SRU framework?

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
In reply to this post by Matthias Steffens
Hi Bruce,

On 16 May 2005 at 20:46 -0400 Bruce D'Arcus wrote:

> On Tue, 17 May 2005 01:51:03 +0200, "Matthias Steffens"
> <[hidden email]> said:
>
> > However, I would appreciate to have more syntax/implementation
> > examples etc. Any pointers?
>
> Rob Sanderson and Mike Taylor are really the guys to ask about
> documentation on SRU/W.  Maybe this is another place to start?
>
> http://www.loc.gov/z3950/agency/zing/srw/sru-simple.html

Thanks, those examples as well as Rob's PDF on SRW-1.1[1] are
definitely helpful.

[1] <http://srw.cheshire3.org/SRW-1.1.pdf>

> > As a  preliminary solution, would it be possible for citeproc to
> > send a URL query (be it SRU or whatever) and have plain MODS XML
> > returned? Citeproc expects MODS and refbase outputs MODS. So the
> > most simple solution would be if refbase could return raw MODS
> > XML data (without any <srw:searchRetrieveResponse> data wrapped
> > around it).
>
> Yes, but it would be preferable if we could get it close as
> possible to SRU (for practical reasons, I don't want to have to
> support a different query protocol for every database).

That's understandable.

> > Querying of cite keys is supported as well but you must be logged
> > in as a regular user to do so (since cite keys are unique to
> > every user).
>
> Hmm ... I think this is a problem, and I don't just mean for
> citeproc.

> There's two issues:
>
> 1)  I see no way to get the XSLT processor to be able to login.

With regard to refbase, a user is normally requested to login before
he can access features like export, etc. Anyhow, I agree that a SRU
service should work without requiring a user to log into the database.

> Perhaps instead there might be another way to bind a citekey to a
> user in the url so the data can be returned without being logged in?

Yes, that's possible. For another user-specific field refbase does
already allow every user (logged in or not) to query this field if a
'userID' parameter is given in the query URL.

However, I'd prefer if querying of user-specific cite keys is only
allowed to users who are logged in and then only allow them to query
their own cite keys. There are other truly unique identifiers which
ensure unique record identity on a database-wide or even global basis
(see below).

> 2)  the bigger issue which is that citekeys aren't a good solution
> in a multi-user context (I use them, so am not throwing stones; am
> just remembering they're less-than-ideal).

Yes, I agree. This is why refbase allows a user to only access his
own cite keys (and only when logged in).

To uniquely identify records on a database-wide level refbase uses
simply a record serial number. Our database users include these
serial numbers within the body text along with a preformatted
citation string (like Steffens et al 2004a {1234}). Unlike citeproc
refbase doesn't offer to reformat these citation strings, though, but
offers to extract all citations from a text and build an appropriate
references list from it.

To uniquely identify records on a global basis I'd prefer the DOI
identifier (if available). refbase supports DOIs and I plan to allow
a SRU interface to query for these DOI numbers. OpenURLs would be
another good candidate as truly unique identifier, plus it offers
better backwards compatibility than the DOI system (IIRC).

> I think Mike, Rob and I concluded that an ideal solution would code
> multiple ids, and allow the database to resolve them.

Yes, that sounds reasonable. The database could prefer truly unique
identifiers (DOI & OpenURL) if present and may otherwise fall back to
database- or user-specific identifiers.

> BTW, did you manage to get it working on the commandline with the
> include DocBook sample?

Not yet (I'm not in front of my machine right now but I will report
back when I've tried it out).

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
In reply to this post by Bruce D'Arcus-3
> Date: Tue, 17 May 2005 08:54:16 -0400
> From: Bruce D'Arcus <[hidden email]>
>
>>> 2) the bigger issue which is that citekeys aren't a good solution
>>> in a multi-user context (I use them, so am not throwing stones; am
>>> just remembering they're less-than-ideal).
>>
>> This is indeed a big problem if you're referring to the same thing
>> that I think you are.  Let me check whether you are.  Consider
>> three papers:
>>
>> Janensch, W.  1929a.  Material und Formengehalt der Sauropoden
>> in der Ausbeute der Tendaguru-Expedition.  Palaeontographica
>> (Suppl. 7) 2:1-34.
>>
>> Janensch, W.  1929b.  Die Wirbelsaule der Gattung
>> Dicraeosaurus. Palaeontographica (SuppI. 7) 2: 39-133.
>>
>> Janensch, W.  1929c.  Magensteine bei Sauropoden der
>> Tendaguru-Schichten. Palaeontographica (Suppl. 7) 2:135-144.
>>
>> If one paper cites the first two, then it will call them
>> Janensch1929a and Janensch1929b; if another paper cites the second
>> and third, it will call _them_ Janensch1929a and Janensch1929b.  So
>> the citation keys have two different meanings to the different
>> papers, and the second of the three papers has two different
>> citation keys.
>
> The issue is pretty simple really: one user might have Janensch1929a
> for the natural language id (e.g. citekey) and another might have
> janensch29.

Right, OK, you don't even need the complexity of multiple potentially
identical citations to raise the problem.  Although of course trivial
solutions just say something like "always use the first author's
surname with the first letter captialised followed by the four-digit
year and, if necessary, a discriminator" -- and _they_ are tripped up
by the scenario I outlined.

> We really need a more general solution, and perhaps we ought to
> think about how to best achieve that within an SRU framework?

Surely -- surely! -- someone has already faced and solved this
problem?  They _must_ have!  Mustn't they?  It seems much too core a
problem for us to be facing up to now, in the 21st century.

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  "I never make predictions and I never will" -- Paul Gascoigne.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
In reply to this post by Matthias Steffens
> Date: Tue, 17 May 2005 15:04:36 +0200
> From: Matthias Steffens <[hidden email]>
>
>> 2) the bigger issue which is that citekeys aren't a good solution
>> in a multi-user context (I use them, so am not throwing stones; am
>> just remembering they're less-than-ideal).
>
> To uniquely identify records on a database-wide level refbase uses
> simply a record serial number. Our database users include these
> serial numbers within the body text along with a preformatted
> citation string (like Steffens et al 2004a {1234}). Unlike citeproc
> refbase doesn't offer to reformat these citation strings, though,
> but offers to extract all citations from a text and build an
> appropriate references list from it.
>
> To uniquely identify records on a global basis I'd prefer the DOI
> identifier (if available). refbase supports DOIs and I plan to allow
> a SRU interface to query for these DOI numbers. OpenURLs would be
> another good candidate as truly unique identifier, plus it offers
> better backwards compatibility than the DOI system (IIRC).

I really think that database IDs, and DOIs, are solving a different
problem from the one that we face here.  All I want is the ability to
type
        \cite{WilsonSereno1998}
into my document and know what it will be resolved to.  If I have to
type
        \cite{doi:3244.1327/324326hjg2132}
instead, then ... well, it's just not a solution.

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  "People will accept your ideas much more readily if you tell
         them Benjamin Franklin said it first" -- David H. Comins.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Dr Robert Sanderson
In reply to this post by Bruce D'Arcus-3

> Rob Sanderson and Mike Taylor are really the guys to ask about

Hi  :)


>> As a  preliminary solution, would it be possible for citeproc to send
>> a URL query (be it SRU or whatever) and have plain MODS XML returned?

It's probably pretty easy to create a couple of templates to wrap the data
in -- one for the response and one for each record.


> 1)  I see no way to get the XSLT processor to be able to login.  Perhaps
> instead there might be another way to bind a citekey to a user in the
> url so the data can be returned without being logged in?

SRU works via an authentication token included in each request.  How this
token is acquired is out of the scope of the protocol, so you're free to
do it however you like.


> 2)  the bigger issue which is that citekeys aren't a good solution in a
> multi-user context (I use them, so am not throwing stones; am just
> remembering they're less-than-ideal).

Looking at OpenURL as a way of identifying (rather than searching for)
items might be useful, or at least of interest.

Feel free to ask about whatever WRT SRW/U you need to know :)

Rob



       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Dr Robert Sanderson
In reply to this post by Mike Taylor
On Tue, 17 May 2005, Mike Taylor wrote:
>> From: Bruce D'Arcus <[hidden email]>
>> Rob Sanderson and Mike Taylor are really the guys to ask about
> That's me -- hi!  (Er.  I'm not Rob, though.)

That's me :)


> extra response data, etc.  You don't have to use all (or any) of this,
> but by committing to SRU you're making it available for more
> sophisticated subsequent versions of your client.

And possibly equally importantly, you're making your service available for
use by generic tools and toolkits rather than very specific ones.  It
makes it interoperable with services at OCLC, the Library of Congress, etc
etc etc.

As far as citkeys go, I don't think there's a reasonable solution which
isn't user customised.  If the UI is to include a short non globally
unique string, then it's not globally unique.  Pithy, I know, but that's
the way of it.

A user profile of citekey to article unique id seems to be the most
appropriate solution.

Rob

       ,'/:.          Dr Robert Sanderson ([hidden email])
     ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
   ,'--/::(@)::.      Dept. of Computer Science, Room 805
,'---/::::::::::.    University of Liverpool
____/:::::::::::::.
I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
In reply to this post by Mike Taylor
On May 17, 2005, at 9:27 AM, Mike Taylor wrote:

> I really think that database IDs, and DOIs, are solving a different
> problem from the one that we face here.  All I want is the ability to
> type
> \cite{WilsonSereno1998}
> into my document and know what it will be resolved to.  If I have to
> type
> \cite{doi:3244.1327/324326hjg2132}
> instead, then ... well, it's just not a solution.

Indeed!  This is exactly the tension.

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
In reply to this post by Matthias Steffens
Hi Mike & Rob,

thanks for your comments! I'm already convinced about SRW/U :-),
problems are more to find the development time for implementation.

On 17 May 2005 at 14:27 +0100 Mike Taylor wrote:

> > To uniquely identify records on a global basis I'd prefer the DOI
> > identifier (if available). [...] OpenURLs would be another good
> > candidate as truly unique identifier

> I really think that database IDs, and DOIs, are solving a different
> problem from the one that we face here.  All I want is the ability to
> type
> \cite{WilsonSereno1998}
> into my document and know what it will be resolved to.  If I have to
> type
> \cite{doi:3244.1327/324326hjg2132}
> instead, then ... well, it's just not a solution.

Ah, yes, now I understand. Seems I was only thinking in terms of the
database, not in terms of the user.

If cite keys are required to be user-specific and a SRU client sends
a userID along with its query, then these cite keys could be resolved
easily by the database to get truly unique IDs (like DOI or OpenURL).

If I understand Rob correctly, he's suggesting the same:

On 17 May 2005 at 14:42 +0100 Dr Robert Sanderson wrote:

> As far as citkeys go, I don't think there's a reasonable solution
> which isn't user customised.  If the UI is to include a short non
> globally unique string, then it's not globally unique.  Pithy, I
> know, but that's the way of it.
>
> A user profile of citekey to article unique id seems to be the most
> appropriate solution.

Regards, Matthias


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
In reply to this post by Matthias Steffens
On May 17, 2005, at 9:04 AM, Matthias Steffens wrote:

> However, I'd prefer if querying of user-specific cite keys is only
> allowed to users who are logged in and then only allow them to query
> their own cite keys.

The details really come down to the user experience.  The above works
in the case of the user who wants to access the web interface and print
records.  I don't think it works that well (?) for the user working on
a document and just wanting it to be automatically -- and transparently
-- formatted.

Let's see if we can be concrete:

We want to build the OpenOffice bib interface around these standards.  
So let's imagine that happens, and that RefBase supports SRU and that
then can become a plug-in for OOo.

How does authenticated communication between OOo and RefBase then
happen?  The user would then be viewing the RB DB through an OOo GUI.

BTW, Mike gave us a LaTeX citation example.  It's worth noting perhaps
that XSLT 2.0 makes it possible to parse non-XML documents ;-)

>> I think Mike, Rob and I concluded that an ideal solution would code
>> multiple ids, and allow the database to resolve them.
>
> Yes, that sounds reasonable. The database could prefer truly unique
> identifiers (DOI & OpenURL) if present and may otherwise fall back to
> database- or user-specific identifiers.

Yes, that's the idea.

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
In reply to this post by Dr Robert Sanderson
On May 17, 2005, at 9:42 AM, Dr Robert Sanderson wrote:

> A user profile of citekey to article unique id seems to be the most
> appropriate solution.

Can you explain this Rob in jargon-free language?

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
In reply to this post by Matthias Steffens
> Date: Tue, 17 May 2005 16:06:29 +0200
> From: Matthias Steffens <[hidden email]>
>
> thanks for your comments! I'm already convinced about SRW/U :-),
> problems are more to find the development time for implementation.

Well, this is good.  You'll find the time, SRU is such fun!  :-)

>>> To uniquely identify records on a global basis I'd prefer the DOI
>>> identifier (if available). [...] OpenURLs would be another good
>>> candidate as truly unique identifier
>>
>> I really think that database IDs, and DOIs, are solving a different
>> problem from the one that we face here.  All I want is the ability to
>> type
>> \cite{WilsonSereno1998}
>> into my document and know what it will be resolved to.  If I have to
>> type
>> \cite{doi:3244.1327/324326hjg2132}
>> instead, then ... well, it's just not a solution.
>
> Ah, yes, now I understand. Seems I was only thinking in terms of the
> database, not in terms of the user.

Right.

> If cite keys are required to be user-specific and a SRU client sends
> a userID along with its query, then these cite keys could be
> resolved easily by the database to get truly unique IDs (like DOI or
> OpenURL).

Ye-es.  But I don't think that's enormously helpful, since it requires
the server side to hold information that's specific to the client (or,
rather, the user that the client represents).

> If I understand Rob correctly, he's suggesting the same:

I don't think so:

> Date: Tue, 17 May 2005 14:42:26 +0100 (BST)
> From: Dr Robert Sanderson <[hidden email]>
>
> As far as citkeys go, I don't think there's a reasonable solution
> which isn't user customised.  If the UI is to include a short non
> globally unique string, then it's not globally unique.  Pithy, I
> know, but that's the way of it.

I don't know.  You could go a long way by using something like
"Janensch1929p39" (where 39 is the first page number).  DOIs are a
kind of ID that is truly globally unique, but the price you pay for
that is total opacity; Janensch1929p39 is a "nearly unique" ID that is
easy to remember, type, and indeed make up on the spot.  That's darned
useful combination of properties, and we would be silly to let the
absence of Total Guaranteed Uniqueness blind us to that.

Try this.  When you look up a citation key of the form <name><year>,
the database returns an exact hit if it has one, but says "be more
specific" if has more than one, in which case you have to try again
with the "p<page>" suffix.  If that has multiple hits, then you need
to make a yet more specific citation key: append the initials to get
"Janensch1929p39W".  You could make a global database of every article
ever published anywhere and still get very, very few clashes using
this simple scheme.  I for one would very much appreciate access to
that database!

> A user profile of citekey to article unique id seems to be the most
> appropriate solution.

That would not be bad.  Then instead of having to maintain my own
local bibliography file that says:

        ZOOM:
          authors:
            - firstname: Mike
              surname: Taylor
            - firstname: Sebastian
              surname: Hammer
            - firstname: Ashley
              surname: Sanders
            - firstname: Adam
              surname: Dickmeiss
            - firstname: Rob
              surname: Sanderson
            - firstname: Aaron
              surname: Lav
          title: ZOOM: The Z39.50 Object-Orientation Model, v1.4
          year: 2004
          url: http://zoom.z3950.org/

I could just say

        ZOOM: doi:1234.5678/foobarbaz

The problem is that DOIs are not free to come by, in general.  Which
makes them a bad choice of unique ID for some purposes.  Maybe
identifier URIs would be better, since anyone can have one.  But then
we need to be sure people don't confuse them with actionable URLs in
the case of web-based articles such as the ZOOM AAPI.

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  I celebrated as only we English know how, by popping into
         an office supplies shop on the way home and buying flat-pack
         furniture.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/




Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
In reply to this post by Matthias Steffens
On 17 May 2005 at 16:18 +0100 Mike Taylor wrote:

> > Date: Tue, 17 May 2005 16:06:29 +0200
> > From: Matthias Steffens <[hidden email]>
> >
> > If cite keys are required to be user-specific and a SRU client
> > sends a userID along with its query, then these cite keys could
> > be resolved easily by the database to get truly unique IDs (like
> > DOI or OpenURL).
>
> Ye-es.  But I don't think that's enormously helpful, since it
> requires the server side to hold information that's specific to the
> client (or, rather, the user that the client represents).

That's correct. While this may work well for bibliographic databases
(as developed by the members of the bibliophile initiative) I
understand that it may not work equally well in other applications.

> You could go a long way by using something like "Janensch1929p39"
> (where 39 is the first page number).

That sounds reasonable. Personally I'm using
<name><year><descriptive-word-of-title> (or something similar) since
its more intuitive than page number to me. But, obviously, that
wouldn't work in a multiple user environment since every user could
come up with another word from the title.

> Try this.  When you look up a citation key of the form
> <name><year>, the database returns an exact hit if it has one, but
> says "be more specific" if has more than one, in which case you
> have to try again with the "p<page>" suffix.  If that has multiple
> hits, then you need to make a yet more specific citation key:
> append the initials to get "Janensch1929p39W".

Sounds clever. But what if a user did previously search for
"Janensch1929" and it returned an exact hit. Now he uses this cite
key in all of his documents. After five years another record appears
in the database that would have the same identifier ("Janensch1929").
That would require the user to correct all of his documents which may
not be feasable after all. In order to be truly useful cite keys
should remain unique no matter what happens.

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
> Date: Tue, 17 May 2005 18:16:09 +0200
> From: Matthias Steffens <[hidden email]>
>
>> You could go a long way by using something like "Janensch1929p39"
>> (where 39 is the first page number).
>
> That sounds reasonable. Personally I'm using
> <name><year><descriptive-word-of-title> (or something similar) since
> its more intuitive than page number to me. But, obviously, that
> wouldn't work in a multiple user environment since every user could
> come up with another word from the title.

Agreed on both counts: the descriptive word is nicer, but I think that
"nearly unique"ness is worth trying for, which make first-page a
better bet.

>> Try this.  When you look up a citation key of the form
>> <name><year>, the database returns an exact hit if it has one, but
>> says "be more specific" if has more than one, in which case you
>> have to try again with the "p<page>" suffix.  If that has multiple
>> hits, then you need to make a yet more specific citation key:
>> append the initials to get "Janensch1929p39W".
>
> Sounds clever.

Why, thank you!  :-)

> But what if a user did previously search for "Janensch1929" and it
> returned an exact hit. Now he uses this cite key in all of his
> documents. After five years another record appears in the database
> that would have the same identifier ("Janensch1929").  That would
> require the user to correct all of his documents which may not be
> feasable after all. In order to be truly useful cite keys should
> remain unique no matter what happens.

Perhaps.  We could have the database remember which was the first
Janensch 1929 paper it was told about, and have that one remain in use
whenever plain "Janensch1929" is used?

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  "I know if I had one of those [Pterosaur crests], women would
         be throwing themselves at me, convinced of my sexual prowess
         and reproductive fitness, and men would shrink from me, sure
         that my strength greatly exceeded theirs" -- Chris Bennett.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/




Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
In reply to this post by Matthias Steffens
On 17 May 2005 at 17:26 +0100 Mike Taylor wrote:

> > But what if a user did previously search for "Janensch1929" and it
> > returned an exact hit. Now he uses this cite key in all of his
> > documents. After five years another record appears in the database
> > that would have the same identifier ("Janensch1929").  That would
> > require the user to correct all of his documents which may not be
> > feasable after all. In order to be truly useful cite keys should
> > remain unique no matter what happens.
>
> Perhaps.  We could have the database remember which was the first
> Janensch 1929 paper it was told about, and have that one remain in use
> whenever plain "Janensch1929" is used?

Yep, that would be a good rule. Still, it would only work within the
scope of *one* database. What if there are two or more databases
where different Janensch articles from 1929 were identified as
"Janensch1929"?

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: SRU

Mike Taylor
> Date: Tue, 17 May 2005 19:00:09 +0200
> From: Matthias Steffens <[hidden email]>
>
>>> But what if a user did previously search for "Janensch1929" and it
>>> returned an exact hit. Now he uses this cite key in all of his
>>> documents. After five years another record appears in the database
>>> that would have the same identifier ("Janensch1929").  That would
>>> require the user to correct all of his documents which may not be
>>> feasable after all. In order to be truly useful cite keys should
>>> remain unique no matter what happens.
>>
>> Perhaps.  We could have the database remember which was the first
>> Janensch 1929 paper it was told about, and have that one remain in
>> use whenever plain "Janensch1929" is used?
>
> Yep, that would be a good rule. Still, it would only work within the
> scope of *one* database. What if there are two or more databases
> where different Janensch articles from 1929 were identified as
> "Janensch1929"?

Hmm.  I think you're right.

So are you saying we need to use "Janensch1929p39W" from the get-go?

(If so, the "JanenschW1929p39" would be less offensive.)

 _/|_ ___________________________________________________________________
/o ) \/  Mike Taylor  <[hidden email]>  http://www.miketaylor.org.uk
)_v__/\  "I saw one of the Democrat delegates, a middle-aged man, driving
         a bumper car while talking on his cellphone.  It's time for the
         Revolution" -- Dave Barry.

--
Listen to free demos of soundtrack music for film, TV and radio
        http://www.pipedreaming.org.uk/soundtrack/



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Bruce D'Arcus-3
On May 17, 2005, at 1:29 PM, Mike Taylor wrote:

> So are you saying we need to use "Janensch1929p39W" from the get-go?
>
> (If so, the "JanenschW1929p39" would be less offensive.)

For reference, there's a long thread in the comments on this:

http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2004/11/27/ 
citation-ids

Bruce



Reply | Threaded
Open this post in threaded view
|

Re: SRU

Matthias Steffens
In reply to this post by Matthias Steffens
On 17 May 2005 at 18:29 +0100 Mike Taylor wrote:

> >> Perhaps.  We could have the database remember which was the
> >> first Janensch 1929 paper it was told about, and have that one
> >> remain in use whenever plain "Janensch1929" is used?
> >
> > Yep, that would be a good rule. Still, it would only work within
> > the scope of *one* database. What if there are two or more
> > databases where different Janensch articles from 1929 were
> > identified as "Janensch1929"?

> So are you saying we need to use "Janensch1929p39W" from the get-go?
>
> (If so, the "JanenschW1929p39" would be less offensive.)

Well, I fear that there are cases were even "Janensch1929p39W" isn't
unique. In fact, any cite key may fail that does not include all of
the important bibliographic source info.

E.g., I can imagine that in, say, year 2000 there were two authors
named "William Miller" whose articles started on page 39. I.e.,
"Miller2000p39W" wouldn't be truly unique. It would require at least
the addition of a journal and a volume identifier to solve this
case.

In our database we have the same problem with naming of files that
are associated with a given database entry. We use the DOI if
available, otherwise we use file names like

  Angel1994Nature367p126.pdf
  Thomas+Dieckmann2004Science276p394.pdf
  AdamsEtal2001MarBiol138p281.pdf

However, while these names *may* be unique (who knows if they really
are!?) they are ugly when used as cite keys within a document -- but
they are still better than a DOI number, IMHO.

Matthias


1234