Re: Proposed change to CSL input XML

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Proposed change to CSL input XML

toreilly

After giving more thought to my proposed CSL element, I would like to modify the proposal. I would to rename the proposed element from <serials> to <containers>. Other changes will be listed in my answer to the questions that Bruce raises.


Bruce brings up 2 important concerns with my previous proposal:


QUESTIONS

1) Is the proposal necessary?

2) Could the solution be solved by extending the features of the <group> element?

3) Could the proposal be generalized beyond its niche applicability?


ANSWERS

(1) The proposal is necessary in my opinion.

Frank Bennett has done a lot of incredible work with citeproc-js and Multi-lingual Zotero. In addition to maintaining the processor, he added experimental support for parallel citations for legal users. However, the lengths that he has had to go to achieve that support demonstrate why the <containers> construct should be adopted.

"In CSL-M, parallel citations are produced when the item types of two adjacent citations match, the items are of a legal type, and their titles and dates also match." (Bennett, F. "Citations out of the Box", p. 78).

This approach introduces several problems. First, storing information about the same item in two different citations allows the possibility of storing inconsistent information. Second, it relies upon the processor to imply a relationship between two citations, instead of relying upon an explicit data structure. This means that CSL Processors need to have complicated code in order to support parallel citations - which limits adoption of the feature. Third - from my understanding - the style creators don't have the ability give instructions about how parallel citations should be styled.

In contrast, supporting a <containers> element will ensure data integrity, it is easier for processors to implement, and it gives style creators the flexibility to target users of parallel citations, if they choose.

(2) Extending the <group> element to support the proposed features would NOT be an advisable solution.

Just like the proposed <containers> element, the <group> element is primarily used for item data that is logically related. However, it's primary function is that it "implicitly acts as a conditional." (http://docs.citationstyles.org/en/stable/specification.html#group). <group> elements are not rendered if it contains a variable, and the variables are empty.  When a <group> element is rendered, it can apply prefix before the content, apply a suffix after the content, and apply delimiters in between its variables.

The <group> element has a clearly defined, useful role in styling data. However, there are several features that the <group> element lacks compared to the proposed <containers> element.
First, the <group> element does not have a variable name. The <group> element acts as a collection of variables, but it is not a variable itself.
Second, because the <group> element does not correspond to a variable, it also does not support iterating through complex data. The <names> element and <date> element are examples of elements that supported complex data - data that is represented in a nested structure. A <names> element will iterate through every name that is associated with that variable. The proposed <containers> element is functionally closer to the <names> element than to the <group> element.
In fact, the <containers> element is fairly be described as a more flexible <names> element. Just you would not want to extend the <group> element to directly render names, I think that it would be just as unwise to use <group> to directly render information that a <containers> element should render.

(3) The proposal could be generalized beyond supporting only parallel citations for legal users.

I think that the element should be named a <containers> element, instead of being named <serials>. The allowed sub-elements of <containers> would be <container> elements. This would indicate to style designers that the element is to be used as a "container" for any pieces of information that are logically related and that are repeatable.

Each <container> element would be composed of <container-part> elements. The <container-part> elements are directly analogous to <text> elements of a CSL style. <container-part> elements should follow the variable-naming conventions for Standard Variables from Appendix IV of the CSL specification. (http://docs.citationstyles.org/en/stable/specification.html#standard-variables).

EXAMPLE REPRESENTATION

<containers name="variable-name">

    <container>

        <container-part name="text-variable-name" />

        .....

    </container>

</containers>


ADDITIONAL THOUGHTS
1) THE PROPOSAL IS PARTIALLY BACKWARDS COMPATIBLE

If adopted, new CSL styles would be able to process old data. I have already written a patch for citeproc-js that would support <containers> elements in a CSL style sheet. (81 lines of pretty simple code). My implementation first looks for item data under the variable-name of the <containers> element. If that variable-name is not found, the processor then looks for item data using the variable names of the <container-part> elements. This means that styles that use <containers> elements can still fully process item data, even if that data was not encoded to explicitly support <containers> elements.


2) THE PROPOSAL IS NOT FULLY BACKWARDS COMPATIBLE, AND IS NOT COMPATIBLE BETWEEN STYLES. THE POSSIBLE VARIABLE NAMES FOR <CONTAINERS> ELEMENTS SHOULD BE SPECIFIED

Old style sheets would not be able to process new data that would target the <containers> features of CSL - unless the possible variable names for <containers> elements are constrained. Constraining the variable names would also required for compatibility between new CSL styles.

It is hard to work out which variable names should be allowed for <containers> elements, without anticipating every use case. If I could speculate about a possible solution. . . Drawing inspiration from Bibframe's model (https://www.loc.gov/bibframe/docs/bibframe2-model.html), perhaps the variable name for <containers> elements should be limited to names such "Instances", "Events", and "Subjects".


- Tom




From: [hidden email] <[hidden email]>
Sent: Friday, December 30, 2016 6:29 PM
To: [hidden email]
Subject: xbiblio-devel Digest, Vol 116, Issue 1
 
Send xbiblio-devel mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
lists.sourceforge.net
This list is for development discussion around the xbiblio project, including schema design for the citation style language (CSL) and implementation discussion.

or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of xbiblio-devel digest..."


Today's Topics:

   1. Re: Proposed change to CSL input XML (Bruce D'Arcus)
   2. citeproc-java 1.0.0 has just been released! (Michel Kr?mer)
   3. Incorrectly changed title case (Joseph Reagle)
   4. Re: Incorrectly changed title case (Sebastian Karcher)


----------------------------------------------------------------------

Message: 1
Date: Tue, 04 Oct 2016 02:03:41 +0000
From: "Bruce D'Arcus" <[hidden email]>
Subject: Re: [xbiblio-devel] Proposed change to CSL input XML
To: "[hidden email]"
        <[hidden email]>
Message-ID:
        <CAF-FPGMy1qdB_9pdwqH4mwJfJ01e2N6WN2gm4NSohu+[hidden email]>
Content-Type: text/plain; charset="utf-8"

Whenever a case like this comes up, I try to generalize.

So the proposal is fine, except I wonder if there's really need for a new
element to support this, and if it might be generalized.

Thinking out loud, this is basically a situation where there is more than
one "container". Maybe one or more new attributes on cs:group?

On Mon, Sep 26, 2016, 12:56 PM Thomas O'Reilly <[hidden email]> wrote:

> Sebastian seems to have a solid grasp on the issues. I also often share
> his frustration with legal citations. Just so that everyone in the
> conversation knows what the deal is about legal citations, I have included
> some background.
>
> Legal decisions are published in serial publications called "reporters".
> In the U.S. there are two main commercial reporters - LexisNexis and
> Westlaw - and a smattering of government reporters for various courts. The
> commercial reporters are objectively better. They are comprehensive, and
> they are heavily annotated; everyone I ever met in the profession uses
> either LexisNexis or Westlaw. However, reporters are expensive, and they
> consume a lot of space in an office, so law offices usually subscribe to
> either Lexis or Westlaw, but not both. Parallel citations were designed so
> that legal readers could benefit from reading a document, even if they had
> a different reporter than the one preferred by the writer. The practical
> importance of parallel citations is diminishing as the legal world moves
> into the digital environment, but parallel citations are still a
> requirement in many jurisdictions and journals.
>
>
> What would my proposal look like in an actual CSL style? I am not an
> expert in CSL, but I'll give it my best shot. The most important use case
> for a "serials" variable is for parallel citations. I will show how an
> existing CSL style could be modified to add support for parallel citations.
>
> My example is taken from "bluebook.csl". The excerpt below shows the
> styling instructions for creating the long form citation for legal opinions.
>
>
>
> *Bluebook.csl *
>       <else-if type="legal_case">
>         <text variable="title" suffix=", " font-variant="normal"/>
>         <text variable="number" suffix=", "/>
>         <group delimiter=" ">
>           <text variable="volume"/>
>           <text variable="container-title"/>
>           <text variable="page"/>
>         </group>
>         <text variable="locator" prefix=", "/>
>         <group prefix=" (" suffix=")" delimiter=" ">
>           <text variable="authority"/>
>           <date variable="issued">
>             <date-part name="month" form="short" suffix=" "/>
>             <date-part name="day" suffix=", "/>
>             <date-part name="year"/>
>           </date>
>         </group>
>       </else-if>
>
>
> My proposed modification would be to allow CSL styles to encapsulate
> essential information about a serial publication within a <serials>
> element. The <serials> block would have one or more <serial> children which
> contained the text variables.
>
> *Modified-Bluebook.csl*
>
>       <else-if type="legal_case">
>         <text variable="title" suffix=", " font-variant="normal"/>
>         <text variable="number" suffix=", "/>
> *        <serials variable="reporter">*
> *           <serial delimiter=", " >*
>                 <group delimiter=" ">
>                     <text variable="volume"/>
>                     <text variable="container-title"/>
>                     <text variable="page"/>
>                 </group>
>               <text variable="locator" prefix=", "/>
>           * </serial>*
>         *</serials>*
>         <group prefix=" (" suffix=")" delimiter=" ">
>           <text variable="authority"/>
>           <date variable="issued">
>             <date-part name="month" form="short" suffix=" "/>
>             <date-part name="day" suffix=", "/>
>             <date-part name="year"/>
>           </date>
>         </group>
>       </else-if>
>
>
> There are two things that I want to point out about the
> modified-bluebook.csl. One, parallel citations are separated by commas, so
> the <serial> element has a delimiter attribute.
>
> Two, the "locator" text variable has to be inside the <serial> element -
> which is something I did not consider in my first post. Each parallel
> citation has to have its own "pincite" or "locator".
>
> A citation rendered with the modified-bluebook styling would appear as
> follows.
>
> "Czapinski v. St. Francis Hosp., Inc., 2000 WI 80, 86, 236 Wis. 2d 316,
> 319, 613 N.W.2d 120, 122. (Wis. 2000)"
>
> In closing:
>
> I recognize that CSL should not adopt any changes that are not supported
> by the major stakeholders. However, I am very interested in your feedback
> on the technical merits of the proposal.
>
> - Tom O'Reilly
>
> ------------------------------------------------------------------------------
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
lists.sourceforge.net
This list is for development discussion around the xbiblio project, including schema design for the citation style language (CSL) and implementation discussion.

>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Sat, 19 Nov 2016 14:52:26 +0100
From: Michel Kr?mer <[hidden email]>
Subject: [xbiblio-devel] citeproc-java 1.0.0 has just been released!
To: [hidden email]
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=us-ascii

Dear CSL community!

I'm happy to announce the next release of citeproc-java. Version 1.0.0 comes with many new features and improved performance:

http://michel-kraemer.github.io/citeproc-java/

The highlights in this release are:

* citeproc-java is compatible to Java 8
* It now has remote connectors for Zotero and Mendeley
* Improved performance and memory usage
* Improved command line tool
* New interactive shell
* etc.

On macOS you can install the command line tool with Homebrew:

  brew tap michel-kraemer/citeproc-java
  brew install citeproc-java

Other installation options are available:

https://michel-kraemer.github.io/citeproc-java/download/

citeproc-java uses citeproc-js under the hood. It wouldn't have been possible to develop it without this great library. So, thanks to Frank Bennet and all contributors for their great work!

Any feedback is appreciated.

Cheers,
Michel


------------------------------

Message: 3
Date: Fri, 30 Dec 2016 17:30:42 -0500
From: Joseph Reagle <[hidden email]>
Subject: [xbiblio-devel] Incorrectly changed title case
To: [hidden email],
        [hidden email]
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=windows-1252

Hello all, when I use the following YAML entry with chicago-fullnote-bibliography.csl the title is rendered as:

?The IKettle, the Eleven-Hour Struggle to Make a Cup of Tea, and Why It Was All About Data, Analytics and Connecting Things Together,?

Because "iKettle" is a mixed-cased proper noun, it should not be changed IMHO. Is this expected behavior? Is it the result of the specification, actual CSL, or citeproc? Any tips appreciated.


pandoc 1.19.1
pandoc-citeproc 0.10.3

---
references:
- id: Rittman2016ieh
  type: post-weblog
  genre: Web log message
  author:
  - family: "Rittman"
    given: "Mark"
  container-title: "Medium"
  issued:
    year: 2016
    month: 10
    day: 12
  title: "The iKettle, the eleven-hour struggle to make a cup of tea, and why it was all about data, analytics and connecting things together"
  URL: "<a href="https://medium.com/mark-rittman/the-story-behind-the-ikettle-the-eleven-hour-struggle-to-make-a-cup-of-tea-and-why-it-was-all-769144d12d7\\#.au2p9rjbz" id="LPlnk969707" previewremoved="true">https://medium.com/mark-rittman/the-story-behind-the-ikettle-the-eleven-hour-struggle-to-make-a-cup-of-tea-and-why-it-was-all-769144d12d7\\#.au2p9rjbz"
medium.com
The iKettle, the Eleven-Hour Struggle to Make a Cup of Tea, and Why It Was All About Data, Analytics and Connecting Things Together. So today a story about my eleven ...

  accessed:
    year: 2016
    month: 10
    day: 17

...



------------------------------

Message: 4
Date: Fri, 30 Dec 2016 18:04:12 -0500
From: Sebastian Karcher <[hidden email]>
Subject: Re: [xbiblio-devel] Incorrectly changed title case
To: development discussion for xbiblio
        <[hidden email]>
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

I think the specs (and citeproc-js) have this right:
The iKettle, the Eleven-Hour Struggle to Make a Cup of Tea, and Why It Was
All about Data, Analytics and Connecting Things Together.

This follows the 2nd sentence of point 2. here:
http://docs.citationstyles.org/en/stable/specification.html#title-case-conversion
So I think this is a citeproc-pandoc bug.

On Fri, Dec 30, 2016 at 5:30 PM, Joseph Reagle <[hidden email]>
wrote:

> Hello all, when I use the following YAML entry with
> chicago-fullnote-bibliography.csl the title is rendered as:
>
> ?The IKettle, the Eleven-Hour Struggle to Make a Cup of Tea, and Why It
> Was All About Data, Analytics and Connecting Things Together,?
>
> Because "iKettle" is a mixed-cased proper noun, it should not be changed
> IMHO. Is this expected behavior? Is it the result of the specification,
> actual CSL, or citeproc? Any tips appreciated.
>
>
> pandoc 1.19.1
> pandoc-citeproc 0.10.3
>
> ---
> references:
> - id: Rittman2016ieh
>   type: post-weblog
>   genre: Web log message
>   author:
>   - family: "Rittman"
>     given: "Mark"
>   container-title: "Medium"
>   issued:
>     year: 2016
>     month: 10
>     day: 12
>   title: "The iKettle, the eleven-hour struggle to make a cup of tea, and
> why it was all about data, analytics and connecting things together"
>   URL: "https://medium.com/mark-rittman/the-story-behind-the-
> ikettle-the-eleven-hour-struggle-to-make-a-cup-of-tea-and-why-it-was-all-
> 769144d12d7\\#.au2p9rjbz"
>   accessed:
>     year: 2016
>     month: 10
>     day: 17
>
> ...
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>



--
Sebastian Karcher, PhD
www.sebastiankarcher.com
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

------------------------------

_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


End of xbiblio-devel Digest, Vol 116, Issue 1
*********************************************

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed change to CSL input XML

rmzelle
Administrator
Hi Tom,

So, to briefly condense your posts on this topic (for myself and others): Juris-M (formerly Multilingual Zotero, MLZ) extends the official CSL specification in a number of ways to improve support for legal citations. One of these extensions is the support for parallel citations, where a single document is published redundantly in multiple outlets. Juris-M assumes that each so-called report is stored as a separate item, and automatically collapses reports to the same document if they are cited directly next to each other. See pages 6 and 78 of http://citationstylist.org/public/mlzbook.pdf for more context and examples.

Your proposal is to store these separate reports as a single item, and to make it possible in CSL to properly render the information from the various reporters of each item (each report will have its own values for fields like "container-title", "volume", "section", etc.).

After reading your proposal, I have a few questions:

* assuming that we store all the publication information from the various publishers/reporters in a single item, in some form of ordered array, what are the exact formatting requirements for parallel citations? For example, for an item with multiple reports, is it ever necessary to be able to:
    - cite a subset of the reports?
    - control the order of the reports?
* for the old-timers here: does anybody know if there are any good discussions here or on the Zotero forums about hierarchical item types? Do we have an overview of the various cases where hierarchical item types would help? E.g. https://www.zotero.org/support/requested_features only mentions "chapters as sub-items of an edited volume". I'm wondering if there is a lot of functional overlap between hierarchical item types and parallel legal citations.

Rintze

P.S. Tom, it looks like every time you respond you start a new thread in this mailing list, which makes this discussion harder to follow (this discussion is a continuation of http://xbiblio-devel.2463403.n2.nabble.com/Proposed-change-to-CSL-input-XML-specification-tp7579492.html and http://xbiblio-devel.2463403.n2.nabble.com/Proposed-change-to-CSL-input-XML-tp7579502.html). Could you try to reply directly next time? You should be able to do this by replying to the thread via http://xbiblio-devel.2463403.n2.nabble.com/, or by subscribing to the mailing list at https://lists.sourceforge.net/lists/listinfo/xbiblio-devel, after which you should receive future mailing list emails in your inbox.

On Tue, Jan 3, 2017 at 7:29 PM, Thomas O'Reilly <[hidden email]> wrote:

After giving more thought to my proposed CSL element, I would like to modify the proposal. I would to rename the proposed element from <serials> to <containers>. Other changes will be listed in my answer to the questions that Bruce raises.


Bruce brings up 2 important concerns with my previous proposal:


QUESTIONS

1) Is the proposal necessary?

2) Could the solution be solved by extending the features of the <group> element?

3) Could the proposal be generalized beyond its niche applicability?


ANSWERS

(1) The proposal is necessary in my opinion.

Frank Bennett has done a lot of incredible work with citeproc-js and Multi-lingual Zotero. In addition to maintaining the processor, he added experimental support for parallel citations for legal users. However, the lengths that he has had to go to achieve that support demonstrate why the <containers> construct should be adopted.

"In CSL-M, parallel citations are produced when the item types of two adjacent citations match, the items are of a legal type, and their titles and dates also match." (Bennett, F. "Citations out of the Box", p. 78).

This approach introduces several problems. First, storing information about the same item in two different citations allows the possibility of storing inconsistent information. Second, it relies upon the processor to imply a relationship between two citations, instead of relying upon an explicit data structure. This means that CSL Processors need to have complicated code in order to support parallel citations - which limits adoption of the feature. Third - from my understanding - the style creators don't have the ability give instructions about how parallel citations should be styled.

In contrast, supporting a <containers> element will ensure data integrity, it is easier for processors to implement, and it gives style creators the flexibility to target users of parallel citations, if they choose.

(2) Extending the <group> element to support the proposed features would NOT be an advisable solution.

Just like the proposed <containers> element, the <group> element is primarily used for item data that is logically related. However, it's primary function is that it "implicitly acts as a conditional." (http://docs.citationstyles.org/en/stable/specification.html#group). <group> elements are not rendered if it contains a variable, and the variables are empty.  When a <group> element is rendered, it can apply prefix before the content, apply a suffix after the content, and apply delimiters in between its variables.

The <group> element has a clearly defined, useful role in styling data. However, there are several features that the <group> element lacks compared to the proposed <containers> element.
First, the <group> element does not have a variable name. The <group> element acts as a collection of variables, but it is not a variable itself.
Second, because the <group> element does not correspond to a variable, it also does not support iterating through complex data. The <names> element and <date> element are examples of elements that supported complex data - data that is represented in a nested structure. A <names> element will iterate through every name that is associated with that variable. The proposed <containers> element is functionally closer to the <names> element than to the <group> element.
In fact, the <containers> element is fairly be described as a more flexible <names> element. Just you would not want to extend the <group> element to directly render names, I think that it would be just as unwise to use <group> to directly render information that a <containers> element should render.

(3) The proposal could be generalized beyond supporting only parallel citations for legal users.

I think that the element should be named a <containers> element, instead of being named <serials>. The allowed sub-elements of <containers> would be <container> elements. This would indicate to style designers that the element is to be used as a "container" for any pieces of information that are logically related and that are repeatable.

Each <container> element would be composed of <container-part> elements. The <container-part> elements are directly analogous to <text> elements of a CSL style. <container-part> elements should follow the variable-naming conventions for Standard Variables from Appendix IV of the CSL specification. (http://docs.citationstyles.org/en/stable/specification.html#standard-variables).

EXAMPLE REPRESENTATION

<containers name="variable-name">

    <container>

        <container-part name="text-variable-name" />

        .....

    </container>

</containers>


ADDITIONAL THOUGHTS
1) THE PROPOSAL IS PARTIALLY BACKWARDS COMPATIBLE

If adopted, new CSL styles would be able to process old data. I have already written a patch for citeproc-js that would support <containers> elements in a CSL style sheet. (81 lines of pretty simple code). My implementation first looks for item data under the variable-name of the <containers> element. If that variable-name is not found, the processor then looks for item data using the variable names of the <container-part> elements. This means that styles that use <containers> elements can still fully process item data, even if that data was not encoded to explicitly support <containers> elements.


2) THE PROPOSAL IS NOT FULLY BACKWARDS COMPATIBLE, AND IS NOT COMPATIBLE BETWEEN STYLES. THE POSSIBLE VARIABLE NAMES FOR <CONTAINERS> ELEMENTS SHOULD BE SPECIFIED

Old style sheets would not be able to process new data that would target the <containers> features of CSL - unless the possible variable names for <containers> elements are constrained. Constraining the variable names would also required for compatibility between new CSL styles.

It is hard to work out which variable names should be allowed for <containers> elements, without anticipating every use case. If I could speculate about a possible solution. . . Drawing inspiration from Bibframe's model (https://www.loc.gov/bibframe/docs/bibframe2-model.html), perhaps the variable name for <containers> elements should be limited to names such "Instances", "Events", and "Subjects".


- Tom


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel