Handling DOIs in a processor

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling DOIs in a processor

fbennett
Dan has flagged an issue with DOIs:

    https://github.com/Juris-M/citeproc-js/issues/57

To sum up, some styles prepend the URL stub to a DOI for display as a
hard-coded prefix, and in that case the (citeproc-js) processor wraps
only the DOI itself in an anchor.

It would be cleaner to wrap the entire URL. Hacking the processor to
recognize the URL stub and merge it into the field content could be
done, but I would be more comfortable if that and any associated
behavior were specified in CSL. We would be talking about constructs
like this:

<text variable="DOI" prefix="https://doi.org/"/>

As things stand at present in the citeproc-js processor, there are
several parts to the issue:

(1) Would it be safe for the processor to always strip a DOI URL
prefix stub from the initial input? That would assure a consistent
starting point, to avoid returning a corrupted URL.

(2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
"no," citeproc-js would need to avoid affixing the URL stub in this
case.)

(3) Should special handling be introduced for a prefix hard-coded as
in the above example? (If the prefix is always to be
"https://doi.org/," it would be cleaner to have a proper CSL attribute
that triggers its inclusion.)

Thoughts most welcome!

FB

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: Handling DOIs in a processor

skarcher
Administrator
1) So what you're saying is -- don't print the prefix at all, and the print a URLified DOI? That's always safe, yes. There's no other reason for that prefix than to URLify
2) Could you give an example? You mean http-type prefixes? I don't think we have any others.
3) Yes, absolutely, now that everyone agrees how DOIs should be displayed, there should be a CSL function for it -- we'll likely have still allow plain form, but something like <text variable="DOI" as-link="true"/> I think is appropriate.

On Fri, Dec 8, 2017 at 9:07 PM, Frank Bennett <[hidden email]> wrote:
Dan has flagged an issue with DOIs:

    https://github.com/Juris-M/citeproc-js/issues/57

To sum up, some styles prepend the URL stub to a DOI for display as a
hard-coded prefix, and in that case the (citeproc-js) processor wraps
only the DOI itself in an anchor.

It would be cleaner to wrap the entire URL. Hacking the processor to
recognize the URL stub and merge it into the field content could be
done, but I would be more comfortable if that and any associated
behavior were specified in CSL. We would be talking about constructs
like this:

<text variable="DOI" prefix="https://doi.org/"/>

As things stand at present in the citeproc-js processor, there are
several parts to the issue:

(1) Would it be safe for the processor to always strip a DOI URL
prefix stub from the initial input? That would assure a consistent
starting point, to avoid returning a corrupted URL.

(2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
"no," citeproc-js would need to avoid affixing the URL stub in this
case.)

(3) Should special handling be introduced for a prefix hard-coded as
in the above example? (If the prefix is always to be
"https://doi.org/," it would be cleaner to have a proper CSL attribute
that triggers its inclusion.)

Thoughts most welcome!

FB

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: Handling DOIs in a processor

rmzelle
Administrator
"(1) Would it be safe for the processor to always strip a DOI URL
prefix stub from the initial input?"

I would go one step further, and strip any prefix from the DOI
variable values, not only URL prefixes in the form of
"https?://(dx.)?doi.org\/" but also a preceding "doi:"
(case-insensitive).

"(2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
"no," citeproc-js would need to avoid affixing the URL stub in this
case.)"

I think you miswrote "non-DOI" here (like Sebastian I'm confused). If
you mean "non-URL prefixes", the answer would be "yes" (per above).

"(3) Should special handling be introduced for a prefix hard-coded as
in the above example? (If the prefix is always to be
"https://doi.org/," it would be cleaner to have a proper CSL attribute
that triggers its inclusion.)"

Probably yes. I'm not sure if we just need to be able to URLify the
DOI and URL variables, or need something more general (I vaguely
recall somebody on the Zotero forums who wished to hyperlink either
the title or entire bibliographic entry for a custom style).

Rintze


On Fri, Dec 8, 2017 at 9:45 PM, Sebastian Karcher
<[hidden email]> wrote:

> 1) So what you're saying is -- don't print the prefix at all, and the print
> a URLified DOI? That's always safe, yes. There's no other reason for that
> prefix than to URLify
> 2) Could you give an example? You mean http-type prefixes? I don't think we
> have any others.
> 3) Yes, absolutely, now that everyone agrees how DOIs should be displayed,
> there should be a CSL function for it -- we'll likely have still allow plain
> form, but something like <text variable="DOI" as-link="true"/> I think is
> appropriate.
>
> On Fri, Dec 8, 2017 at 9:07 PM, Frank Bennett <[hidden email]> wrote:
>>
>> Dan has flagged an issue with DOIs:
>>
>>     https://github.com/Juris-M/citeproc-js/issues/57
>>
>> To sum up, some styles prepend the URL stub to a DOI for display as a
>> hard-coded prefix, and in that case the (citeproc-js) processor wraps
>> only the DOI itself in an anchor.
>>
>> It would be cleaner to wrap the entire URL. Hacking the processor to
>> recognize the URL stub and merge it into the field content could be
>> done, but I would be more comfortable if that and any associated
>> behavior were specified in CSL. We would be talking about constructs
>> like this:
>>
>> <text variable="DOI" prefix="https://doi.org/"/>
>>
>> As things stand at present in the citeproc-js processor, there are
>> several parts to the issue:
>>
>> (1) Would it be safe for the processor to always strip a DOI URL
>> prefix stub from the initial input? That would assure a consistent
>> starting point, to avoid returning a corrupted URL.
>>
>> (2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
>> "no," citeproc-js would need to avoid affixing the URL stub in this
>> case.)
>>
>> (3) Should special handling be introduced for a prefix hard-coded as
>> in the above example? (If the prefix is always to be
>> "https://doi.org/," it would be cleaner to have a proper CSL attribute
>> that triggers its inclusion.)
>>
>> Thoughts most welcome!
>>
>> FB
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
>
>
> --
> Sebastian Karcher, PhD
> www.sebastiankarcher.com
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: Handling DOIs in a processor

rmzelle
Administrator
By the way, I checked the CSL style repository for which DOI prefixes
were present (with counts, using egrep on macOS):

$ egrep -h -i -o '\prefix=".*?doi.*?"' *.csl | sort | uniq -c | sort -bgr
 191 prefix="doi:"
  92 prefix="https://doi.org/"
  68 prefix="doi: "
  52 prefix="Doi: "
  43 prefix="DOI: "
  25 prefix=" doi:"
  15 prefix=", doi:"
  15 prefix=", doi: "
  11 prefix=". doi:"
  10 prefix="DOI "
  10 prefix=" doi: "
   9 prefix=", DOI: "
   9 prefix=" DOI: "
   8 prefix=". doi: "
   7 prefix="DOI&#160;"
   7 prefix=". DOI: "
   6 prefix="DOI: https://doi.org/"
   5 prefix=", DOI:"
   5 prefix="       |doi="
   4 prefix="DOI:"
   4 prefix=" https://doi.org/"
   4 prefix=" &lt;https://doi.org/"
   3 prefix="doi:/"
   2 prefix="doi: https://doi.org/"
   2 prefix="doi "
   2 prefix="DOI-"
   2 prefix=". https://doi.org/"
   2 prefix="(doi:"
   2 prefix="&lt;https://doi.org/"
   2 prefix=" DOI:"
   2 prefix=" DOI: https://doi.org/"
   2 prefix=" (doi:"
   2 prefix=" &lt;doi:"
   1 prefix="https://doi.org"
   1 prefix="doi.org/"
   1 prefix="[https://doi.org/"
   1 prefix="[doi: "
   1 prefix="DOI:https://doi.org/"
   1 prefix="; https://doi.org/"
   1 prefix="; doi:"
   1 prefix="; DOI: "
   1 prefix=": doi:"
   1 prefix=". DOI:https://doi.org/"
   1 prefix="- doi: "
   1 prefix=", https://doi.org/"
   1 prefix=", doi: https://doi.org/"
   1 prefix="(DOI&#160;: "
   1 prefix="&lt;doi:"
   1 prefix="&lt;a href=&quot;https://doi.org/"
   1 prefix="&amp;#10;https://doi.org/"
   1 prefix=" doi : "
   1 prefix=" [https://doi.org/"
   1 prefix=" [doi:"
   1 prefix=" DOI={"
   1 prefix=" DOI:https://doi.org/"
   1 prefix=" DOI "
   1 prefix=" Available at https://doi.org/"

Rintze

On Sat, Dec 9, 2017 at 10:52 PM, Rintze Zelle <[hidden email]> wrote:

> "(1) Would it be safe for the processor to always strip a DOI URL
> prefix stub from the initial input?"
>
> I would go one step further, and strip any prefix from the DOI
> variable values, not only URL prefixes in the form of
> "https?://(dx.)?doi.org\/" but also a preceding "doi:"
> (case-insensitive).
>
> "(2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
> "no," citeproc-js would need to avoid affixing the URL stub in this
> case.)"
>
> I think you miswrote "non-DOI" here (like Sebastian I'm confused). If
> you mean "non-URL prefixes", the answer would be "yes" (per above).
>
> "(3) Should special handling be introduced for a prefix hard-coded as
> in the above example? (If the prefix is always to be
> "https://doi.org/," it would be cleaner to have a proper CSL attribute
> that triggers its inclusion.)"
>
> Probably yes. I'm not sure if we just need to be able to URLify the
> DOI and URL variables, or need something more general (I vaguely
> recall somebody on the Zotero forums who wished to hyperlink either
> the title or entire bibliographic entry for a custom style).
>
> Rintze
>
>
> On Fri, Dec 8, 2017 at 9:45 PM, Sebastian Karcher
> <[hidden email]> wrote:
>> 1) So what you're saying is -- don't print the prefix at all, and the print
>> a URLified DOI? That's always safe, yes. There's no other reason for that
>> prefix than to URLify
>> 2) Could you give an example? You mean http-type prefixes? I don't think we
>> have any others.
>> 3) Yes, absolutely, now that everyone agrees how DOIs should be displayed,
>> there should be a CSL function for it -- we'll likely have still allow plain
>> form, but something like <text variable="DOI" as-link="true"/> I think is
>> appropriate.
>>
>> On Fri, Dec 8, 2017 at 9:07 PM, Frank Bennett <[hidden email]> wrote:
>>>
>>> Dan has flagged an issue with DOIs:
>>>
>>>     https://github.com/Juris-M/citeproc-js/issues/57
>>>
>>> To sum up, some styles prepend the URL stub to a DOI for display as a
>>> hard-coded prefix, and in that case the (citeproc-js) processor wraps
>>> only the DOI itself in an anchor.
>>>
>>> It would be cleaner to wrap the entire URL. Hacking the processor to
>>> recognize the URL stub and merge it into the field content could be
>>> done, but I would be more comfortable if that and any associated
>>> behavior were specified in CSL. We would be talking about constructs
>>> like this:
>>>
>>> <text variable="DOI" prefix="https://doi.org/"/>
>>>
>>> As things stand at present in the citeproc-js processor, there are
>>> several parts to the issue:
>>>
>>> (1) Would it be safe for the processor to always strip a DOI URL
>>> prefix stub from the initial input? That would assure a consistent
>>> starting point, to avoid returning a corrupted URL.
>>>
>>> (2) If "yes" to (1), should non-DOI prefixes also be stripped? (If
>>> "no," citeproc-js would need to avoid affixing the URL stub in this
>>> case.)
>>>
>>> (3) Should special handling be introduced for a prefix hard-coded as
>>> in the above example? (If the prefix is always to be
>>> "https://doi.org/," it would be cleaner to have a proper CSL attribute
>>> that triggers its inclusion.)
>>>
>>> Thoughts most welcome!
>>>
>>> FB
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>>
>>
>>
>> --
>> Sebastian Karcher, PhD
>> www.sebastiankarcher.com
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel