Quantcast

Developing a CSL processor

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Developing a CSL processor

Rönkkö Mikko
Hi

I decided to develop a simple CSL processor to convert Zotero json strings to APA citations. The code will be used in ZotPad and after the processor works, I will publish the code as a separate project in gihub. 

I am using json strings from Zotero server  as data and validating the output against formatted citations from Zotero server. The citations are formatted using the APA style from https://github.com/citation-style-language/styles/blob/master/apa.csl using the CSL 1.0.1 specification. 

I am using the following bibliography item as my test data

Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous Formative Variables. <i>Journal of Business Research</i>.

There is one thing that I do not understand. In the APA style (lines 429-434) there is a group

        <group delimiter=". ">
          <text macro="author"/>
          <text macro="issued"/>
          <text macro="title" prefix=" "/>
          <text macro="container"/>
        </group>

The macro "author" has a names element with initialize-with=". " and the macro "issued" contains a group with prefix " (". Now to my understanding, this means that

- The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
- The "issued" macro will start with " ("   [ (Forthcoming)]
- The macros are delimited with ". "

This results in a bibliographic item that starts by

Cadogan, J. W., & Lee, N..  (Forthcoming).

This is obviously not correct. There should not be a double period followed by a double space, but I do not understand which part of the formatting logic is incorrect. 

Mikko



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Sebastian Karcher
where are you trying this out? In Zotero?


On Thu, Sep 20, 2012 at 1:47 PM, Rönkkö Mikko <[hidden email]> wrote:
Hi

I decided to develop a simple CSL processor to convert Zotero json strings to APA citations. The code will be used in ZotPad and after the processor works, I will publish the code as a separate project in gihub. 

I am using json strings from Zotero server  as data and validating the output against formatted citations from Zotero server. The citations are formatted using the APA style from https://github.com/citation-style-language/styles/blob/master/apa.csl using the CSL 1.0.1 specification. 

I am using the following bibliography item as my test data

Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous Formative Variables. <i>Journal of Business Research</i>.

There is one thing that I do not understand. In the APA style (lines 429-434) there is a group

        <group delimiter=". ">
          <text macro="author"/>
          <text macro="issued"/>
          <text macro="title" prefix=" "/>
          <text macro="container"/>
        </group>

The macro "author" has a names element with initialize-with=". " and the macro "issued" contains a group with prefix " (". Now to my understanding, this means that

- The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
- The "issued" macro will start with " ("   [ (Forthcoming)]
- The macros are delimited with ". "

This results in a bibliographic item that starts by

Cadogan, J. W., & Lee, N..  (Forthcoming).

This is obviously not correct. There should not be a double period followed by a double space, but I do not understand which part of the formatting logic is incorrect. 

Mikko



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Rönkkö Mikko
The data are from zotero server using the read API and I am running it on my Mac. (XCode console to be exact.)

Sent from my iPad

On 20.9.2012, at 23.52, "Sebastian Karcher" <[hidden email]> wrote:

where are you trying this out? In Zotero?


On Thu, Sep 20, 2012 at 1:47 PM, Rönkkö Mikko <[hidden email]> wrote:
Hi

I decided to develop a simple CSL processor to convert Zotero json strings to APA citations. The code will be used in ZotPad and after the processor works, I will publish the code as a separate project in gihub. 

I am using json strings from Zotero server  as data and validating the output against formatted citations from Zotero server. The citations are formatted using the APA style from https://github.com/citation-style-language/styles/blob/master/apa.csl using the CSL 1.0.1 specification. 

I am using the following bibliography item as my test data

Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous Formative Variables. <i>Journal of Business Research</i>.

There is one thing that I do not understand. In the APA style (lines 429-434) there is a group

        <group delimiter=". ">
          <text macro="author"/>
          <text macro="issued"/>
          <text macro="title" prefix=" "/>
          <text macro="container"/>
        </group>

The macro "author" has a names element with initialize-with=". " and the macro "issued" contains a group with prefix " (". Now to my understanding, this means that

- The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
- The "issued" macro will start with " ("   [ (Forthcoming)]
- The macros are delimited with ". "

This results in a bibliographic item that starts by

Cadogan, J. W., & Lee, N..  (Forthcoming).

This is obviously not correct. There should not be a double period followed by a double space, but I do not understand which part of the formatting logic is incorrect. 

Mikko



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

fbennett
On Fri, Sep 21, 2012 at 5:58 AM, Rönkkö Mikko <[hidden email]> wrote:
> The data are from zotero server using the read API and I am running it on my
> Mac. (XCode console to be exact.)

Is the output with the extra period and space what you get back from
the API call?

>
> Sent from my iPad
>
> On 20.9.2012, at 23.52, "Sebastian Karcher" <[hidden email]>
> wrote:
>
> where are you trying this out? In Zotero?
>
>
> On Thu, Sep 20, 2012 at 1:47 PM, Rönkkö Mikko <[hidden email]> wrote:
>>
>> Hi
>>
>> I decided to develop a simple CSL processor to convert Zotero json strings
>> to APA citations. The code will be used in ZotPad and after the processor
>> works, I will publish the code as a separate project in gihub.
>>
>> I am using json strings from Zotero server  as data and validating the
>> output against formatted citations from Zotero server. The citations are
>> formatted using the APA style from
>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>> the CSL 1.0.1 specification.
>>
>> I am using the following bibliography item as my test data
>>
>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>> Formative Variables. <i>Journal of Business Research</i>.
>>
>> There is one thing that I do not understand. In the APA style (lines
>> 429-434) there is a group
>>
>>         <group delimiter=". ">
>>           <text macro="author"/>
>>           <text macro="issued"/>
>>           <text macro="title" prefix=" "/>
>>           <text macro="container"/>
>>         </group>
>>
>>
>> The macro "author" has a names element with initialize-with=". " and the
>> macro "issued" contains a group with prefix " (". Now to my understanding,
>> this means that
>>
>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>> - The macros are delimited with ". "
>>
>> This results in a bibliographic item that starts by
>>
>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>
>> This is obviously not correct. There should not be a double period
>> followed by a double space, but I do not understand which part of the
>> formatting logic is incorrect.
>>
>> Mikko
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>
>
>
> --
> ------
> Sebastian Karcher
> Ph.D. Candidate
> Department of Political Science
> Northwestern University
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Sebastian Karcher
In reply to this post by Rönkkö Mikko
sorry, I still feel like I'm missing the question - what produces the outcome? Is that your implementation, citeproc.js or citeproc-node? (well - I just see I'm not the only one, that makes me feel better).

There are obviously two periods and two spaces in the logic - one each in the delimiter and in the initialization - but I'm pretty sure citeproc deals with that elegantly.

On Thu, Sep 20, 2012 at 2:58 PM, Rönkkö Mikko <[hidden email]> wrote:
The data are from zotero server using the read API and I am running it on my Mac. (XCode console to be exact.)

Sent from my iPad

On 20.9.2012, at 23.52, "Sebastian Karcher" <[hidden email]> wrote:

where are you trying this out? In Zotero?


On Thu, Sep 20, 2012 at 1:47 PM, Rönkkö Mikko <[hidden email]> wrote:
Hi

I decided to develop a simple CSL processor to convert Zotero json strings to APA citations. The code will be used in ZotPad and after the processor works, I will publish the code as a separate project in gihub. 

I am using json strings from Zotero server  as data and validating the output against formatted citations from Zotero server. The citations are formatted using the APA style from https://github.com/citation-style-language/styles/blob/master/apa.csl using the CSL 1.0.1 specification. 

I am using the following bibliography item as my test data

Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous Formative Variables. <i>Journal of Business Research</i>.

There is one thing that I do not understand. In the APA style (lines 429-434) there is a group

        <group delimiter=". ">
          <text macro="author"/>
          <text macro="issued"/>
          <text macro="title" prefix=" "/>
          <text macro="container"/>
        </group>

The macro "author" has a names element with initialize-with=". " and the macro "issued" contains a group with prefix " (". Now to my understanding, this means that

- The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
- The "issued" macro will start with " ("   [ (Forthcoming)]
- The macros are delimited with ". "

This results in a bibliographic item that starts by

Cadogan, J. W., & Lee, N..  (Forthcoming).

This is obviously not correct. There should not be a double period followed by a double space, but I do not understand which part of the formatting logic is incorrect. 

Mikko



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
------
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Charles Parnot
Mikko, the processor you write will have to deal with duplicated punctuations. It seems this is the issue here?

On Sep 20, 2012, at 11:02 PM, Sebastian Karcher <[hidden email]> wrote:

> sorry, I still feel like I'm missing the question - what produces the outcome? Is that your implementation, citeproc.js or citeproc-node? (well - I just see I'm not the only one, that makes me feel better).
>
> There are obviously two periods and two spaces in the logic - one each in the delimiter and in the initialization - but I'm pretty sure citeproc deals with that elegantly.
>
> On Thu, Sep 20, 2012 at 2:58 PM, Rönkkö Mikko <[hidden email]> wrote:
> The data are from zotero server using the read API and I am running it on my Mac. (XCode console to be exact.)
>
> Sent from my iPad
>
> On 20.9.2012, at 23.52, "Sebastian Karcher" <[hidden email]> wrote:
>
>> where are you trying this out? In Zotero?
>>
>>
>> On Thu, Sep 20, 2012 at 1:47 PM, Rönkkö Mikko <[hidden email]> wrote:
>> Hi
>>
>> I decided to develop a simple CSL processor to convert Zotero json strings to APA citations. The code will be used in ZotPad and after the processor works, I will publish the code as a separate project in gihub.
>>
>> I am using json strings from Zotero server  as data and validating the output against formatted citations from Zotero server. The citations are formatted using the APA style from https://github.com/citation-style-language/styles/blob/master/apa.csl using the CSL 1.0.1 specification.
>>
>> I am using the following bibliography item as my test data
>>
>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous Formative Variables. <i>Journal of Business Research</i>.
>>
>> There is one thing that I do not understand. In the APA style (lines 429-434) there is a group
>>
>>         <group delimiter=". ">
>>           <text macro="author"/>
>>
>>
>>
>>           <text macro="issued"/>
>>           <text macro="title" prefix=" "/>
>>
>>
>>
>>           <text macro="container"/>
>>         </group>
>>
>> The macro "author" has a names element with initialize-with=". " and the macro "issued" contains a group with prefix " (". Now to my understanding, this means that
>>
>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>> - The macros are delimited with ". "
>>
>> This results in a bibliographic item that starts by
>>
>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>
>> This is obviously not correct. There should not be a double period followed by a double space, but I do not understand which part of the formatting logic is incorrect.
>>
>> Mikko
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>>
>>
>>
>> --
>> ------
>> Sebastian Karcher
>> Ph.D. Candidate
>> Department of Political Science
>> Northwestern University
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
>
>
> --
> ------
> Sebastian Karcher
> Ph.D. Candidate
> Department of Political Science
> Northwestern University
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html_______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

--
Charles Parnot
[hidden email]
twitter: @cparnot
http://mekentosj.com



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

fbennett
In reply to this post by Rönkkö Mikko
On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:

> Hi
>
> I decided to develop a simple CSL processor to convert Zotero json strings
> to APA citations. The code will be used in ZotPad and after the processor
> works, I will publish the code as a separate project in gihub.
>
> I am using json strings from Zotero server  as data and validating the
> output against formatted citations from Zotero server. The citations are
> formatted using the APA style from
> https://github.com/citation-style-language/styles/blob/master/apa.csl using
> the CSL 1.0.1 specification.
>
> I am using the following bibliography item as my test data
>
> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
> Formative Variables. <i>Journal of Business Research</i>.
>
> There is one thing that I do not understand. In the APA style (lines
> 429-434) there is a group
>
>         <group delimiter=". ">
>           <text macro="author"/>
>           <text macro="issued"/>
>           <text macro="title" prefix=" "/>
>           <text macro="container"/>
>         </group>
>
>
> The macro "author" has a names element with initialize-with=". " and the
> macro "issued" contains a group with prefix " (". Now to my understanding,
> this means that
>
> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
> - The "issued" macro will start with " ("   [ (Forthcoming)]
> - The macros are delimited with ". "
>
> This results in a bibliographic item that starts by
>
> Cadogan, J. W., & Lee, N..  (Forthcoming).
>
> This is obviously not correct. There should not be a double period followed
> by a double space, but I do not understand which part of the formatting
> logic is incorrect.
>
> Mikko

Mikko,

Below I've assumed that the output is from your project code. If I
have it backwards, let me know.

You have the logic right. That's the literal result you will get from
flattening the structure without anything more:

  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]

Double punctuation needs to be culled by the processor. It's a little
tricky, since formatting (italics etc) might lie between the two
periods, depending on the style. There is also potential interaction
with quote marks, depending on whether or not the style has
punctuation-in-quotes set true or false. For those reasons, the cull
function can't work on the output string: it needs to analyse the
nested structure before collapsing to identify "adjacent" punctuation.
With content strings, delimiters and affixes in the mix, it's pretty
hair-raising. The citeproc-js code for this is heavily tested and
seems to work quite well, but I would be hard-pressed to explain
exactly how it works.

Concerning spaces, there was a long discussion a couple of years back
concerning whether extraneous spaces added by affixes should be
considered style bugs:

  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none

That thread does not reflect well on me, I'm afraid. The point made by
Andrea (and, I think, Bruce) is perfectly valid: double-space issues
*can* be eliminated by more careful construction of CSL code, and
should be. It is also true that masking double spaces in the processor
gives a green light to sloppy coding. That said, the amount of work
required to eliminate all potential extra spaces from the CSL
repository would be pretty staggering. At the end of the day, we're
kind of stuck with this problem.

Double spaces are hard to catch in the processor for the same reason:
you have to work on the nested structure before it is flattened into
an output string. It's a little simpler because you can assume input
strings will not have leading or trailing spaces; but tracking spaces
across affix and delimiter attributes across multiple nested layers is
still a challenge.

If you are only going to process one style in one output format and a
single locale, you may be able to fix things up by running a regular
expression over the output string. That wouldn't work as a general
solution, though.

Sorry for the long response. Hope it helps!

Frank


>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Rönkkö Mikko
Hi

Thanks for the response.

On Sep 21, 2012, at 0:20 , Frank Bennett wrote:

> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>> Hi
>>
>> I decided to develop a simple CSL processor to convert Zotero json strings
>> to APA citations. The code will be used in ZotPad and after the processor
>> works, I will publish the code as a separate project in gihub.
>>
>> I am using json strings from Zotero server  as data and validating the
>> output against formatted citations from Zotero server. The citations are
>> formatted using the APA style from
>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>> the CSL 1.0.1 specification.
>>
>> I am using the following bibliography item as my test data
>>
>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>> Formative Variables. <i>Journal of Business Research</i>.
>>
>> There is one thing that I do not understand. In the APA style (lines
>> 429-434) there is a group
>>
>>        <group delimiter=". ">
>>          <text macro="author"/>
>>          <text macro="issued"/>
>>          <text macro="title" prefix=" "/>
>>          <text macro="container"/>
>>        </group>
>>
>>
>> The macro "author" has a names element with initialize-with=". " and the
>> macro "issued" contains a group with prefix " (". Now to my understanding,
>> this means that
>>
>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>> - The macros are delimited with ". "
>>
>> This results in a bibliographic item that starts by
>>
>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>
>> This is obviously not correct. There should not be a double period followed
>> by a double space, but I do not understand which part of the formatting
>> logic is incorrect.
>>
>> Mikko
>
> Mikko,
>
> Below I've assumed that the output is from your project code. If I
> have it backwards, let me know.

You are correct.

The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.

I posted my code to https://github.com/mronkko/CSLProcessor 

At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.

Mikko


>
> You have the logic right. That's the literal result you will get from
> flattening the structure without anything more:
>
>  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>
> Double punctuation needs to be culled by the processor. It's a little
> tricky, since formatting (italics etc) might lie between the two
> periods, depending on the style. There is also potential interaction
> with quote marks, depending on whether or not the style has
> punctuation-in-quotes set true or false. For those reasons, the cull
> function can't work on the output string: it needs to analyse the
> nested structure before collapsing to identify "adjacent" punctuation.
> With content strings, delimiters and affixes in the mix, it's pretty
> hair-raising. The citeproc-js code for this is heavily tested and
> seems to work quite well, but I would be hard-pressed to explain
> exactly how it works.
>
> Concerning spaces, there was a long discussion a couple of years back
> concerning whether extraneous spaces added by affixes should be
> considered style bugs:
>
>  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>
> That thread does not reflect well on me, I'm afraid. The point made by
> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
> *can* be eliminated by more careful construction of CSL code, and
> should be. It is also true that masking double spaces in the processor
> gives a green light to sloppy coding. That said, the amount of work
> required to eliminate all potential extra spaces from the CSL
> repository would be pretty staggering. At the end of the day, we're
> kind of stuck with this problem.
>
> Double spaces are hard to catch in the processor for the same reason:
> you have to work on the nested structure before it is flattened into
> an output string. It's a little simpler because you can assume input
> strings will not have leading or trailing spaces; but tracking spaces
> across affix and delimiter attributes across multiple nested layers is
> still a challenge.
>
> If you are only going to process one style in one output format and a
> single locale, you may be able to fix things up by running a regular
> expression over the output string. That wouldn't work as a general
> solution, though.
>
> Sorry for the long response. Hope it helps!
>
> Frank
>
>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

fbennett
On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:

> Hi
>
> Thanks for the response.
>
> On Sep 21, 2012, at 0:20 , Frank Bennett wrote:
>
>> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>>> Hi
>>>
>>> I decided to develop a simple CSL processor to convert Zotero json strings
>>> to APA citations. The code will be used in ZotPad and after the processor
>>> works, I will publish the code as a separate project in gihub.
>>>
>>> I am using json strings from Zotero server  as data and validating the
>>> output against formatted citations from Zotero server. The citations are
>>> formatted using the APA style from
>>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>>> the CSL 1.0.1 specification.
>>>
>>> I am using the following bibliography item as my test data
>>>
>>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>>> Formative Variables. <i>Journal of Business Research</i>.
>>>
>>> There is one thing that I do not understand. In the APA style (lines
>>> 429-434) there is a group
>>>
>>>        <group delimiter=". ">
>>>          <text macro="author"/>
>>>          <text macro="issued"/>
>>>          <text macro="title" prefix=" "/>
>>>          <text macro="container"/>
>>>        </group>
>>>
>>>
>>> The macro "author" has a names element with initialize-with=". " and the
>>> macro "issued" contains a group with prefix " (". Now to my understanding,
>>> this means that
>>>
>>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>>> - The macros are delimited with ". "
>>>
>>> This results in a bibliographic item that starts by
>>>
>>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>>
>>> This is obviously not correct. There should not be a double period followed
>>> by a double space, but I do not understand which part of the formatting
>>> logic is incorrect.
>>>
>>> Mikko
>>
>> Mikko,
>>
>> Below I've assumed that the output is from your project code. If I
>> have it backwards, let me know.
>
> You are correct.
>
> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>
> I posted my code to https://github.com/mronkko/CSLProcessor
>
> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>
> Mikko

This may be more distraction than you need at this point, but just in case ...

There is a set of test fixtures covering space-suppression in the
citeproc-js sources (scroll down to the fixtures prefixed with
"spaces_"):

  https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local

I didn't put the tests into the main test suite, because the
discussion I linked above was inconclusive about whether it would be
appropriate to recognise space-suppression in the official
specification. The main test suite is here:

  https://bitbucket.org/bdarcus/citeproc-test

The future of CSL processor testing probably lies in work by Sylvester
Keil, which is here:

  https://github.com/citation-style-language/test-suite

(The repository above hasn't been updated in awhile, but Sylvester
recently indicated that there will be activity there once he has
reached a milestone in his current work on citeproc-ruby.)

Frank

>
>
>>
>> You have the logic right. That's the literal result you will get from
>> flattening the structure without anything more:
>>
>>  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>
>> Double punctuation needs to be culled by the processor. It's a little
>> tricky, since formatting (italics etc) might lie between the two
>> periods, depending on the style. There is also potential interaction
>> with quote marks, depending on whether or not the style has
>> punctuation-in-quotes set true or false. For those reasons, the cull
>> function can't work on the output string: it needs to analyse the
>> nested structure before collapsing to identify "adjacent" punctuation.
>> With content strings, delimiters and affixes in the mix, it's pretty
>> hair-raising. The citeproc-js code for this is heavily tested and
>> seems to work quite well, but I would be hard-pressed to explain
>> exactly how it works.
>>
>> Concerning spaces, there was a long discussion a couple of years back
>> concerning whether extraneous spaces added by affixes should be
>> considered style bugs:
>>
>>  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>
>> That thread does not reflect well on me, I'm afraid. The point made by
>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>> *can* be eliminated by more careful construction of CSL code, and
>> should be. It is also true that masking double spaces in the processor
>> gives a green light to sloppy coding. That said, the amount of work
>> required to eliminate all potential extra spaces from the CSL
>> repository would be pretty staggering. At the end of the day, we're
>> kind of stuck with this problem.
>>
>> Double spaces are hard to catch in the processor for the same reason:
>> you have to work on the nested structure before it is flattened into
>> an output string. It's a little simpler because you can assume input
>> strings will not have leading or trailing spaces; but tracking spaces
>> across affix and delimiter attributes across multiple nested layers is
>> still a challenge.
>>
>> If you are only going to process one style in one output format and a
>> single locale, you may be able to fix things up by running a regular
>> expression over the output string. That wouldn't work as a general
>> solution, though.
>>
>> Sorry for the long response. Hope it helps!
>>
>> Frank
>>
>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Robert Knight
> For those reasons, the cull function can't work on the output string:
> it needs to analyse the nested structure before collapsing to identify "adjacent"
> punctuation. With content strings, delimiters and affixes in the mix,
> it's pretty hair-raising.

W3C specifications often include pseudo-algorithms that
implementations should follow.
Perhaps it would make sense to try and do the same in the CSL spec?

Regards,
Rob.

On 21 September 2012 10:22, Frank Bennett <[hidden email]> wrote:

> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>> Hi
>>
>> Thanks for the response.
>>
>> On Sep 21, 2012, at 0:20 , Frank Bennett wrote:
>>
>>> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>>>> Hi
>>>>
>>>> I decided to develop a simple CSL processor to convert Zotero json strings
>>>> to APA citations. The code will be used in ZotPad and after the processor
>>>> works, I will publish the code as a separate project in gihub.
>>>>
>>>> I am using json strings from Zotero server  as data and validating the
>>>> output against formatted citations from Zotero server. The citations are
>>>> formatted using the APA style from
>>>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>>>> the CSL 1.0.1 specification.
>>>>
>>>> I am using the following bibliography item as my test data
>>>>
>>>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>>>> Formative Variables. <i>Journal of Business Research</i>.
>>>>
>>>> There is one thing that I do not understand. In the APA style (lines
>>>> 429-434) there is a group
>>>>
>>>>        <group delimiter=". ">
>>>>          <text macro="author"/>
>>>>          <text macro="issued"/>
>>>>          <text macro="title" prefix=" "/>
>>>>          <text macro="container"/>
>>>>        </group>
>>>>
>>>>
>>>> The macro "author" has a names element with initialize-with=". " and the
>>>> macro "issued" contains a group with prefix " (". Now to my understanding,
>>>> this means that
>>>>
>>>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>>>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>>>> - The macros are delimited with ". "
>>>>
>>>> This results in a bibliographic item that starts by
>>>>
>>>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>>>
>>>> This is obviously not correct. There should not be a double period followed
>>>> by a double space, but I do not understand which part of the formatting
>>>> logic is incorrect.
>>>>
>>>> Mikko
>>>
>>> Mikko,
>>>
>>> Below I've assumed that the output is from your project code. If I
>>> have it backwards, let me know.
>>
>> You are correct.
>>
>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>
>> I posted my code to https://github.com/mronkko/CSLProcessor
>>
>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>
>> Mikko
>
> This may be more distraction than you need at this point, but just in case ...
>
> There is a set of test fixtures covering space-suppression in the
> citeproc-js sources (scroll down to the fixtures prefixed with
> "spaces_"):
>
>   https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>
> I didn't put the tests into the main test suite, because the
> discussion I linked above was inconclusive about whether it would be
> appropriate to recognise space-suppression in the official
> specification. The main test suite is here:
>
>   https://bitbucket.org/bdarcus/citeproc-test
>
> The future of CSL processor testing probably lies in work by Sylvester
> Keil, which is here:
>
>   https://github.com/citation-style-language/test-suite
>
> (The repository above hasn't been updated in awhile, but Sylvester
> recently indicated that there will be activity there once he has
> reached a milestone in his current work on citeproc-ruby.)
>
> Frank
>
>>
>>
>>>
>>> You have the logic right. That's the literal result you will get from
>>> flattening the structure without anything more:
>>>
>>>  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>
>>> Double punctuation needs to be culled by the processor. It's a little
>>> tricky, since formatting (italics etc) might lie between the two
>>> periods, depending on the style. There is also potential interaction
>>> with quote marks, depending on whether or not the style has
>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>> function can't work on the output string: it needs to analyse the
>>> nested structure before collapsing to identify "adjacent" punctuation.
>>> With content strings, delimiters and affixes in the mix, it's pretty
>>> hair-raising. The citeproc-js code for this is heavily tested and
>>> seems to work quite well, but I would be hard-pressed to explain
>>> exactly how it works.
>>>
>>> Concerning spaces, there was a long discussion a couple of years back
>>> concerning whether extraneous spaces added by affixes should be
>>> considered style bugs:
>>>
>>>  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>
>>> That thread does not reflect well on me, I'm afraid. The point made by
>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>> *can* be eliminated by more careful construction of CSL code, and
>>> should be. It is also true that masking double spaces in the processor
>>> gives a green light to sloppy coding. That said, the amount of work
>>> required to eliminate all potential extra spaces from the CSL
>>> repository would be pretty staggering. At the end of the day, we're
>>> kind of stuck with this problem.
>>>
>>> Double spaces are hard to catch in the processor for the same reason:
>>> you have to work on the nested structure before it is flattened into
>>> an output string. It's a little simpler because you can assume input
>>> strings will not have leading or trailing spaces; but tracking spaces
>>> across affix and delimiter attributes across multiple nested layers is
>>> still a challenge.
>>>
>>> If you are only going to process one style in one output format and a
>>> single locale, you may be able to fix things up by running a regular
>>> expression over the output string. That wouldn't work as a general
>>> solution, though.
>>>
>>> Sorry for the long response. Hope it helps!
>>>
>>> Frank
>>>
>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Everyone hates slow websites. So do we.
>>>> Make your web apps faster with AppDynamics
>>>> Download AppDynamics Lite for free today:
>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>> _______________________________________________
>>>> xbiblio-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Charles Parnot
Hi Rob,

While the idea of pseudo-algorithms is attractive, I very much like the idea of fixtures being the specification instead. In the case of bibliogprahic software, this seems like a really good fit, as it's easy to discuss the output, and compare those to what's actually in books and articles (or common sense). The fixtures can be read by people that are not programmers, and this is a big plus as well: you can show them and discuss them with non-technical people that know the field. Discussing the "algorithms" to get to the actual results is not as useful IMO. Or at least if should not come first, and only be formalized when enough examples of the issue at hand have been produced. The disambiguation process is another one of the hair-rising issue.

My 2 cents :-)

Charles



On Sep 21, 2012, at 11:44 AM, Robert Knight <[hidden email]> wrote:

>> For those reasons, the cull function can't work on the output string:
>> it needs to analyse the nested structure before collapsing to identify "adjacent"
>> punctuation. With content strings, delimiters and affixes in the mix,
>> it's pretty hair-raising.
>
> W3C specifications often include pseudo-algorithms that
> implementations should follow.
> Perhaps it would make sense to try and do the same in the CSL spec?
>
> Regards,
> Rob.
>
> On 21 September 2012 10:22, Frank Bennett <[hidden email]> wrote:
>> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>>> Hi
>>>
>>> Thanks for the response.
>>>
>>> On Sep 21, 2012, at 0:20 , Frank Bennett wrote:
>>>
>>>> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>>>>> Hi
>>>>>
>>>>> I decided to develop a simple CSL processor to convert Zotero json strings
>>>>> to APA citations. The code will be used in ZotPad and after the processor
>>>>> works, I will publish the code as a separate project in gihub.
>>>>>
>>>>> I am using json strings from Zotero server  as data and validating the
>>>>> output against formatted citations from Zotero server. The citations are
>>>>> formatted using the APA style from
>>>>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>>>>> the CSL 1.0.1 specification.
>>>>>
>>>>> I am using the following bibliography item as my test data
>>>>>
>>>>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>>>>> Formative Variables. <i>Journal of Business Research</i>.
>>>>>
>>>>> There is one thing that I do not understand. In the APA style (lines
>>>>> 429-434) there is a group
>>>>>
>>>>>       <group delimiter=". ">
>>>>>         <text macro="author"/>
>>>>>         <text macro="issued"/>
>>>>>         <text macro="title" prefix=" "/>
>>>>>         <text macro="container"/>
>>>>>       </group>
>>>>>
>>>>>
>>>>> The macro "author" has a names element with initialize-with=". " and the
>>>>> macro "issued" contains a group with prefix " (". Now to my understanding,
>>>>> this means that
>>>>>
>>>>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>>>>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>>>>> - The macros are delimited with ". "
>>>>>
>>>>> This results in a bibliographic item that starts by
>>>>>
>>>>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>>>>
>>>>> This is obviously not correct. There should not be a double period followed
>>>>> by a double space, but I do not understand which part of the formatting
>>>>> logic is incorrect.
>>>>>
>>>>> Mikko
>>>>
>>>> Mikko,
>>>>
>>>> Below I've assumed that the output is from your project code. If I
>>>> have it backwards, let me know.
>>>
>>> You are correct.
>>>
>>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>>
>>> I posted my code to https://github.com/mronkko/CSLProcessor
>>>
>>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>>
>>> Mikko
>>
>> This may be more distraction than you need at this point, but just in case ...
>>
>> There is a set of test fixtures covering space-suppression in the
>> citeproc-js sources (scroll down to the fixtures prefixed with
>> "spaces_"):
>>
>>  https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>>
>> I didn't put the tests into the main test suite, because the
>> discussion I linked above was inconclusive about whether it would be
>> appropriate to recognise space-suppression in the official
>> specification. The main test suite is here:
>>
>>  https://bitbucket.org/bdarcus/citeproc-test
>>
>> The future of CSL processor testing probably lies in work by Sylvester
>> Keil, which is here:
>>
>>  https://github.com/citation-style-language/test-suite
>>
>> (The repository above hasn't been updated in awhile, but Sylvester
>> recently indicated that there will be activity there once he has
>> reached a milestone in his current work on citeproc-ruby.)
>>
>> Frank
>>
>>>
>>>
>>>>
>>>> You have the logic right. That's the literal result you will get from
>>>> flattening the structure without anything more:
>>>>
>>>> [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>>
>>>> Double punctuation needs to be culled by the processor. It's a little
>>>> tricky, since formatting (italics etc) might lie between the two
>>>> periods, depending on the style. There is also potential interaction
>>>> with quote marks, depending on whether or not the style has
>>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>>> function can't work on the output string: it needs to analyse the
>>>> nested structure before collapsing to identify "adjacent" punctuation.
>>>> With content strings, delimiters and affixes in the mix, it's pretty
>>>> hair-raising. The citeproc-js code for this is heavily tested and
>>>> seems to work quite well, but I would be hard-pressed to explain
>>>> exactly how it works.
>>>>
>>>> Concerning spaces, there was a long discussion a couple of years back
>>>> concerning whether extraneous spaces added by affixes should be
>>>> considered style bugs:
>>>>
>>>> http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>>
>>>> That thread does not reflect well on me, I'm afraid. The point made by
>>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>>> *can* be eliminated by more careful construction of CSL code, and
>>>> should be. It is also true that masking double spaces in the processor
>>>> gives a green light to sloppy coding. That said, the amount of work
>>>> required to eliminate all potential extra spaces from the CSL
>>>> repository would be pretty staggering. At the end of the day, we're
>>>> kind of stuck with this problem.
>>>>
>>>> Double spaces are hard to catch in the processor for the same reason:
>>>> you have to work on the nested structure before it is flattened into
>>>> an output string. It's a little simpler because you can assume input
>>>> strings will not have leading or trailing spaces; but tracking spaces
>>>> across affix and delimiter attributes across multiple nested layers is
>>>> still a challenge.
>>>>
>>>> If you are only going to process one style in one output format and a
>>>> single locale, you may be able to fix things up by running a regular
>>>> expression over the output string. That wouldn't work as a general
>>>> solution, though.
>>>>
>>>> Sorry for the long response. Hope it helps!
>>>>
>>>> Frank
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Everyone hates slow websites. So do we.
>>>>> Make your web apps faster with AppDynamics
>>>>> Download AppDynamics Lite for free today:
>>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>>> _______________________________________________
>>>>> xbiblio-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Everyone hates slow websites. So do we.
>>>> Make your web apps faster with AppDynamics
>>>> Download AppDynamics Lite for free today:
>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>> _______________________________________________
>>>> xbiblio-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Got visibility?
>>> Most devs has no idea what their production app looks like.
>>> Find out how fast your code is with AppDynamics Lite.
>>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

--
Charles Parnot
[hidden email]
twitter: @cparnot
http://mekentosj.com



------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

rmzelle
Administrator
On Fri, Sep 21, 2012 at 9:13 AM, Charles Parnot
<[hidden email]> wrote:
> Discussing the "algorithms" to get to the actual results is not as useful IMO. Or at least if should not come first, and only be formalized when enough examples of the issue at hand have been produced. The disambiguation process is another one of the hair-rising issue.

Same goes for parsing of raw dates or unstructured names. These kinds
of things mostly evolve based on user feedback, so I agree with
Charles that test fixtures should play an important role here. Based
on those, we indeed might be able to extract some standardized rules.

Rintze

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Bruce D'Arcus-3
In reply to this post by Charles Parnot

Are these mutually exclusive though?

On Sep 21, 2012 9:14 AM, "Charles Parnot" <[hidden email]> wrote:
Hi Rob,

While the idea of pseudo-algorithms is attractive, I very much like the idea of fixtures being the specification instead. In the case of bibliogprahic software, this seems like a really good fit, as it's easy to discuss the output, and compare those to what's actually in books and articles (or common sense). The fixtures can be read by people that are not programmers, and this is a big plus as well: you can show them and discuss them with non-technical people that know the field. Discussing the "algorithms" to get to the actual results is not as useful IMO. Or at least if should not come first, and only be formalized when enough examples of the issue at hand have been produced. The disambiguation process is another one of the hair-rising issue.

My 2 cents :-)

Charles



On Sep 21, 2012, at 11:44 AM, Robert Knight <[hidden email]> wrote:

>> For those reasons, the cull function can't work on the output string:
>> it needs to analyse the nested structure before collapsing to identify "adjacent"
>> punctuation. With content strings, delimiters and affixes in the mix,
>> it's pretty hair-raising.
>
> W3C specifications often include pseudo-algorithms that
> implementations should follow.
> Perhaps it would make sense to try and do the same in the CSL spec?
>
> Regards,
> Rob.
>
> On 21 September 2012 10:22, Frank Bennett <[hidden email]> wrote:
>> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>>> Hi
>>>
>>> Thanks for the response.
>>>
>>> On Sep 21, 2012, at 0:20 , Frank Bennett wrote:
>>>
>>>> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>>>>> Hi
>>>>>
>>>>> I decided to develop a simple CSL processor to convert Zotero json strings
>>>>> to APA citations. The code will be used in ZotPad and after the processor
>>>>> works, I will publish the code as a separate project in gihub.
>>>>>
>>>>> I am using json strings from Zotero server  as data and validating the
>>>>> output against formatted citations from Zotero server. The citations are
>>>>> formatted using the APA style from
>>>>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>>>>> the CSL 1.0.1 specification.
>>>>>
>>>>> I am using the following bibliography item as my test data
>>>>>
>>>>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>>>>> Formative Variables. <i>Journal of Business Research</i>.
>>>>>
>>>>> There is one thing that I do not understand. In the APA style (lines
>>>>> 429-434) there is a group
>>>>>
>>>>>       <group delimiter=". ">
>>>>>         <text macro="author"/>
>>>>>         <text macro="issued"/>
>>>>>         <text macro="title" prefix=" "/>
>>>>>         <text macro="container"/>
>>>>>       </group>
>>>>>
>>>>>
>>>>> The macro "author" has a names element with initialize-with=". " and the
>>>>> macro "issued" contains a group with prefix " (". Now to my understanding,
>>>>> this means that
>>>>>
>>>>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>>>>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>>>>> - The macros are delimited with ". "
>>>>>
>>>>> This results in a bibliographic item that starts by
>>>>>
>>>>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>>>>
>>>>> This is obviously not correct. There should not be a double period followed
>>>>> by a double space, but I do not understand which part of the formatting
>>>>> logic is incorrect.
>>>>>
>>>>> Mikko
>>>>
>>>> Mikko,
>>>>
>>>> Below I've assumed that the output is from your project code. If I
>>>> have it backwards, let me know.
>>>
>>> You are correct.
>>>
>>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>>
>>> I posted my code to https://github.com/mronkko/CSLProcessor
>>>
>>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>>
>>> Mikko
>>
>> This may be more distraction than you need at this point, but just in case ...
>>
>> There is a set of test fixtures covering space-suppression in the
>> citeproc-js sources (scroll down to the fixtures prefixed with
>> "spaces_"):
>>
>>  https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>>
>> I didn't put the tests into the main test suite, because the
>> discussion I linked above was inconclusive about whether it would be
>> appropriate to recognise space-suppression in the official
>> specification. The main test suite is here:
>>
>>  https://bitbucket.org/bdarcus/citeproc-test
>>
>> The future of CSL processor testing probably lies in work by Sylvester
>> Keil, which is here:
>>
>>  https://github.com/citation-style-language/test-suite
>>
>> (The repository above hasn't been updated in awhile, but Sylvester
>> recently indicated that there will be activity there once he has
>> reached a milestone in his current work on citeproc-ruby.)
>>
>> Frank
>>
>>>
>>>
>>>>
>>>> You have the logic right. That's the literal result you will get from
>>>> flattening the structure without anything more:
>>>>
>>>> [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>>
>>>> Double punctuation needs to be culled by the processor. It's a little
>>>> tricky, since formatting (italics etc) might lie between the two
>>>> periods, depending on the style. There is also potential interaction
>>>> with quote marks, depending on whether or not the style has
>>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>>> function can't work on the output string: it needs to analyse the
>>>> nested structure before collapsing to identify "adjacent" punctuation.
>>>> With content strings, delimiters and affixes in the mix, it's pretty
>>>> hair-raising. The citeproc-js code for this is heavily tested and
>>>> seems to work quite well, but I would be hard-pressed to explain
>>>> exactly how it works.
>>>>
>>>> Concerning spaces, there was a long discussion a couple of years back
>>>> concerning whether extraneous spaces added by affixes should be
>>>> considered style bugs:
>>>>
>>>> http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>>
>>>> That thread does not reflect well on me, I'm afraid. The point made by
>>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>>> *can* be eliminated by more careful construction of CSL code, and
>>>> should be. It is also true that masking double spaces in the processor
>>>> gives a green light to sloppy coding. That said, the amount of work
>>>> required to eliminate all potential extra spaces from the CSL
>>>> repository would be pretty staggering. At the end of the day, we're
>>>> kind of stuck with this problem.
>>>>
>>>> Double spaces are hard to catch in the processor for the same reason:
>>>> you have to work on the nested structure before it is flattened into
>>>> an output string. It's a little simpler because you can assume input
>>>> strings will not have leading or trailing spaces; but tracking spaces
>>>> across affix and delimiter attributes across multiple nested layers is
>>>> still a challenge.
>>>>
>>>> If you are only going to process one style in one output format and a
>>>> single locale, you may be able to fix things up by running a regular
>>>> expression over the output string. That wouldn't work as a general
>>>> solution, though.
>>>>
>>>> Sorry for the long response. Hope it helps!
>>>>
>>>> Frank
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Everyone hates slow websites. So do we.
>>>>> Make your web apps faster with AppDynamics
>>>>> Download AppDynamics Lite for free today:
>>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>>> _______________________________________________
>>>>> xbiblio-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Everyone hates slow websites. So do we.
>>>> Make your web apps faster with AppDynamics
>>>> Download AppDynamics Lite for free today:
>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>> _______________________________________________
>>>> xbiblio-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Got visibility?
>>> Most devs has no idea what their production app looks like.
>>> Find out how fast your code is with AppDynamics Lite.
>>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

--
Charles Parnot
[hidden email]
twitter: @cparnot
http://mekentosj.com



------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Charles Parnot

On Sep 21, 2012, at 16:03, "Bruce D'Arcus" <[hidden email]> wrote:

Are these mutually exclusive though?

Nope, but i still feel the fixtures and use cases should come first, and should guide the initial client implementation. Some logic may hopefully come out of it, and that helps writing better documentation, and yes, why not, some pseudo algorithms, which can be very useful for new implementations. Edge cases abound, however, and some of the logic is really convoluted. For CSL, I feel like fixtures should be holding the truth.

In the case of HTML or CSS, the specifications are written so that they are as unambiguous as possible, and don't have to follow rules set by crazy librarians 30 years ago ;)

Charles




On Sep 21, 2012 9:14 AM, "Charles Parnot" <[hidden email]> wrote:
Hi Rob,

While the idea of pseudo-algorithms is attractive, I very much like the idea of fixtures being the specification instead. In the case of bibliogprahic software, this seems like a really good fit, as it's easy to discuss the output, and compare those to what's actually in books and articles (or common sense). The fixtures can be read by people that are not programmers, and this is a big plus as well: you can show them and discuss them with non-technical people that know the field. Discussing the "algorithms" to get to the actual results is not as useful IMO. Or at least if should not come first, and only be formalized when enough examples of the issue at hand have been produced. The disambiguation process is another one of the hair-rising issue.

My 2 cents :-)

Charles



On Sep 21, 2012, at 11:44 AM, Robert Knight <[hidden email]> wrote:

>> For those reasons, the cull function can't work on the output string:
>> it needs to analyse the nested structure before collapsing to identify "adjacent"
>> punctuation. With content strings, delimiters and affixes in the mix,
>> it's pretty hair-raising.
>
> W3C specifications often include pseudo-algorithms that
> implementations should follow.
> Perhaps it would make sense to try and do the same in the CSL spec?
>
> Regards,
> Rob.
>
> On 21 September 2012 10:22, Frank Bennett <[hidden email]> wrote:
>> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>>> Hi
>>>
>>> Thanks for the response.
>>>
>>> On Sep 21, 2012, at 0:20 , Frank Bennett wrote:
>>>
>>>> On Fri, Sep 21, 2012 at 4:47 AM, Rönkkö Mikko <[hidden email]> wrote:
>>>>> Hi
>>>>>
>>>>> I decided to develop a simple CSL processor to convert Zotero json strings
>>>>> to APA citations. The code will be used in ZotPad and after the processor
>>>>> works, I will publish the code as a separate project in gihub.
>>>>>
>>>>> I am using json strings from Zotero server  as data and validating the
>>>>> output against formatted citations from Zotero server. The citations are
>>>>> formatted using the APA style from
>>>>> https://github.com/citation-style-language/styles/blob/master/apa.csl using
>>>>> the CSL 1.0.1 specification.
>>>>>
>>>>> I am using the following bibliography item as my test data
>>>>>
>>>>> Cadogan, J. W., & Lee, N. (Forthcoming). Improper Use of Endogenous
>>>>> Formative Variables. <i>Journal of Business Research</i>.
>>>>>
>>>>> There is one thing that I do not understand. In the APA style (lines
>>>>> 429-434) there is a group
>>>>>
>>>>>       <group delimiter=". ">
>>>>>         <text macro="author"/>
>>>>>         <text macro="issued"/>
>>>>>         <text macro="title" prefix=" "/>
>>>>>         <text macro="container"/>
>>>>>       </group>
>>>>>
>>>>>
>>>>> The macro "author" has a names element with initialize-with=". " and the
>>>>> macro "issued" contains a group with prefix " (". Now to my understanding,
>>>>> this means that
>>>>>
>>>>> - The "author" macro will end with ". "    [Cadogan, J. W., & Lee, N.]
>>>>> - The "issued" macro will start with " ("   [ (Forthcoming)]
>>>>> - The macros are delimited with ". "
>>>>>
>>>>> This results in a bibliographic item that starts by
>>>>>
>>>>> Cadogan, J. W., & Lee, N..  (Forthcoming).
>>>>>
>>>>> This is obviously not correct. There should not be a double period followed
>>>>> by a double space, but I do not understand which part of the formatting
>>>>> logic is incorrect.
>>>>>
>>>>> Mikko
>>>>
>>>> Mikko,
>>>>
>>>> Below I've assumed that the output is from your project code. If I
>>>> have it backwards, let me know.
>>>
>>> You are correct.
>>>
>>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>>
>>> I posted my code to https://github.com/mronkko/CSLProcessor
>>>
>>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>>
>>> Mikko
>>
>> This may be more distraction than you need at this point, but just in case ...
>>
>> There is a set of test fixtures covering space-suppression in the
>> citeproc-js sources (scroll down to the fixtures prefixed with
>> "spaces_"):
>>
>>  https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>>
>> I didn't put the tests into the main test suite, because the
>> discussion I linked above was inconclusive about whether it would be
>> appropriate to recognise space-suppression in the official
>> specification. The main test suite is here:
>>
>>  https://bitbucket.org/bdarcus/citeproc-test
>>
>> The future of CSL processor testing probably lies in work by Sylvester
>> Keil, which is here:
>>
>>  https://github.com/citation-style-language/test-suite
>>
>> (The repository above hasn't been updated in awhile, but Sylvester
>> recently indicated that there will be activity there once he has
>> reached a milestone in his current work on citeproc-ruby.)
>>
>> Frank
>>
>>>
>>>
>>>>
>>>> You have the logic right. That's the literal result you will get from
>>>> flattening the structure without anything more:
>>>>
>>>> [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>>
>>>> Double punctuation needs to be culled by the processor. It's a little
>>>> tricky, since formatting (italics etc) might lie between the two
>>>> periods, depending on the style. There is also potential interaction
>>>> with quote marks, depending on whether or not the style has
>>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>>> function can't work on the output string: it needs to analyse the
>>>> nested structure before collapsing to identify "adjacent" punctuation.
>>>> With content strings, delimiters and affixes in the mix, it's pretty
>>>> hair-raising. The citeproc-js code for this is heavily tested and
>>>> seems to work quite well, but I would be hard-pressed to explain
>>>> exactly how it works.
>>>>
>>>> Concerning spaces, there was a long discussion a couple of years back
>>>> concerning whether extraneous spaces added by affixes should be
>>>> considered style bugs:
>>>>
>>>> http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>>
>>>> That thread does not reflect well on me, I'm afraid. The point made by
>>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>>> *can* be eliminated by more careful construction of CSL code, and
>>>> should be. It is also true that masking double spaces in the processor
>>>> gives a green light to sloppy coding. That said, the amount of work
>>>> required to eliminate all potential extra spaces from the CSL
>>>> repository would be pretty staggering. At the end of the day, we're
>>>> kind of stuck with this problem.
>>>>
>>>> Double spaces are hard to catch in the processor for the same reason:
>>>> you have to work on the nested structure before it is flattened into
>>>> an output string. It's a little simpler because you can assume input
>>>> strings will not have leading or trailing spaces; but tracking spaces
>>>> across affix and delimiter attributes across multiple nested layers is
>>>> still a challenge.
>>>>
>>>> If you are only going to process one style in one output format and a
>>>> single locale, you may be able to fix things up by running a regular
>>>> expression over the output string. That wouldn't work as a general
>>>> solution, though.
>>>>
>>>> Sorry for the long response. Hope it helps!
>>>>
>>>> Frank
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Everyone hates slow websites. So do we.
>>>>> Make your web apps faster with AppDynamics
>>>>> Download AppDynamics Lite for free today:
>>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>>> _______________________________________________
>>>>> xbiblio-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Everyone hates slow websites. So do we.
>>>> Make your web apps faster with AppDynamics
>>>> Download AppDynamics Lite for free today:
>>>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>>> _______________________________________________
>>>> xbiblio-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Got visibility?
>>> Most devs has no idea what their production app looks like.
>>> Find out how fast your code is with AppDynamics Lite.
>>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

--
Charles Parnot
[hidden email]
twitter: @cparnot
http://mekentosj.com



------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

rmzelle
Administrator
In reply to this post by fbennett
Frank recently added some tests to catalog the current citeproc-js
behavior when it comes to punctuation suppression:

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

It doesn't cover suppression of duplicated spaces (as discussed below,
there are already older "space_..." unit tests), and it only covers
punctuation added by prefixes, suffixes and punctuation that is part
of the variable field content (e.g. punctuation added as group
delimiters isn't tested). I tried to pull these new results together
in a spreadsheet:

https://docs.google.com/spreadsheet/ccc?key=0AoKgWUjfrk4_dE9vNmVQeElmQkhPbzFKd1ZVWUticVE&usp=sharing

With this as a starting point, I hope we can agree on specific rules
for punctuation suppression so that we can include some guidance on
this topic in the CSL specification. These rules will likely have to
be very precise, and take into account the origin of the punctuation
(variable field content, affixes, group delimiters, group affixes,
etc.).

Sebastian, could you remind me whether CMoS has any clear rules on
punctuation suppression?

Rintze

On Fri, Sep 21, 2012 at 5:22 AM, Frank Bennett <[hidden email]> wrote:

> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>
>> I posted my code to https://github.com/mronkko/CSLProcessor
>>
>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>
>> Mikko
>
> This may be more distraction than you need at this point, but just in case ...
>
> There is a set of test fixtures covering space-suppression in the
> citeproc-js sources (scroll down to the fixtures prefixed with
> "spaces_"):
>
>   https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>
> I didn't put the tests into the main test suite, because the
> discussion I linked above was inconclusive about whether it would be
> appropriate to recognise space-suppression in the official
> specification. The main test suite is here:
>
>   https://bitbucket.org/bdarcus/citeproc-test
>
> The future of CSL processor testing probably lies in work by Sylvester
> Keil, which is here:
>
>   https://github.com/citation-style-language/test-suite
>
> (The repository above hasn't been updated in awhile, but Sylvester
> recently indicated that there will be activity there once he has
> reached a milestone in his current work on citeproc-ruby.)
>
> Frank
>
>>
>>
>>>
>>> You have the logic right. That's the literal result you will get from
>>> flattening the structure without anything more:
>>>
>>>  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>
>>> Double punctuation needs to be culled by the processor. It's a little
>>> tricky, since formatting (italics etc) might lie between the two
>>> periods, depending on the style. There is also potential interaction
>>> with quote marks, depending on whether or not the style has
>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>> function can't work on the output string: it needs to analyse the
>>> nested structure before collapsing to identify "adjacent" punctuation.
>>> With content strings, delimiters and affixes in the mix, it's pretty
>>> hair-raising. The citeproc-js code for this is heavily tested and
>>> seems to work quite well, but I would be hard-pressed to explain
>>> exactly how it works.
>>>
>>> Concerning spaces, there was a long discussion a couple of years back
>>> concerning whether extraneous spaces added by affixes should be
>>> considered style bugs:
>>>
>>>  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>
>>> That thread does not reflect well on me, I'm afraid. The point made by
>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>> *can* be eliminated by more careful construction of CSL code, and
>>> should be. It is also true that masking double spaces in the processor
>>> gives a green light to sloppy coding. That said, the amount of work
>>> required to eliminate all potential extra spaces from the CSL
>>> repository would be pretty staggering. At the end of the day, we're
>>> kind of stuck with this problem.
>>>
>>> Double spaces are hard to catch in the processor for the same reason:
>>> you have to work on the nested structure before it is flattened into
>>> an output string. It's a little simpler because you can assume input
>>> strings will not have leading or trailing spaces; but tracking spaces
>>> across affix and delimiter attributes across multiple nested layers is
>>> still a challenge.
>>>
>>> If you are only going to process one style in one output format and a
>>> single locale, you may be able to fix things up by running a regular
>>> expression over the output string. That wouldn't work as a general
>>> solution, though.
>>>
>>> Sorry for the long response. Hope it helps!
>>>
>>> Frank
>>>

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Developing a CSL processor

Sebastian Karcher
"A period (aside from an abbreviating period; see 6.117) never accompanies a question mark or an exclamation point. The latter two marks, being stronger, take precedence over the period. This principle continues to apply when the question mark or exclamation point is part of the title of a work, as in the final example"
CMoS 6.118

"When a title ending with a question mark or an exclamation mark would normally be followed by a period, the period is omitted; see"
CMoS 14.105


On Fri, Aug 23, 2013 at 7:01 PM, Rintze Zelle <[hidden email]> wrote:
Frank recently added some tests to catalog the current citeproc-js
behavior when it comes to punctuation suppression:

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

It doesn't cover suppression of duplicated spaces (as discussed below,
there are already older "space_..." unit tests), and it only covers
punctuation added by prefixes, suffixes and punctuation that is part
of the variable field content (e.g. punctuation added as group
delimiters isn't tested). I tried to pull these new results together
in a spreadsheet:

https://docs.google.com/spreadsheet/ccc?key=0AoKgWUjfrk4_dE9vNmVQeElmQkhPbzFKd1ZVWUticVE&usp=sharing

With this as a starting point, I hope we can agree on specific rules
for punctuation suppression so that we can include some guidance on
this topic in the CSL specification. These rules will likely have to
be very precise, and take into account the origin of the punctuation
(variable field content, affixes, group delimiters, group affixes,
etc.).

Sebastian, could you remind me whether CMoS has any clear rules on
punctuation suppression?

Rintze

On Fri, Sep 21, 2012 at 5:22 AM, Frank Bennett <[hidden email]> wrote:
> On Fri, Sep 21, 2012 at 4:16 PM, Rönkkö Mikko <[hidden email]> wrote:
>> The problem was that my implementation produces incorrect bibliography items even though the implementation follows the CSL specification. (Or a subset of the CSL specification, that is sufficient to produce bibliography items in the APA style). I did not know that strictly following the specification will not result in correct formatting, but the processor needs to "be smart" about spaces and punctuation. I could not find this in the documentation. But now that I know this, it should not be difficult to fix.
>>
>> I posted my code to https://github.com/mronkko/CSLProcessor
>>
>> At this point the goal is to format single citations and single bibliography items using the APA style. In the future I may make it more generic.
>>
>> Mikko
>
> This may be more distraction than you need at this point, but just in case ...
>
> There is a set of test fixtures covering space-suppression in the
> citeproc-js sources (scroll down to the fixtures prefixed with
> "spaces_"):
>
>   https://bitbucket.org/fbennett/citeproc-js/src/5cc7cff350ee/tests/fixtures/local
>
> I didn't put the tests into the main test suite, because the
> discussion I linked above was inconclusive about whether it would be
> appropriate to recognise space-suppression in the official
> specification. The main test suite is here:
>
>   https://bitbucket.org/bdarcus/citeproc-test
>
> The future of CSL processor testing probably lies in work by Sylvester
> Keil, which is here:
>
>   https://github.com/citation-style-language/test-suite
>
> (The repository above hasn't been updated in awhile, but Sylvester
> recently indicated that there will be activity there once he has
> reached a milestone in his current work on citeproc-ruby.)
>
> Frank
>
>>
>>
>>>
>>> You have the logic right. That's the literal result you will get from
>>> flattening the structure without anything more:
>>>
>>>  [author ending in "."] + ". "{delimiter} + " ("{prefix} + [issued]
>>>
>>> Double punctuation needs to be culled by the processor. It's a little
>>> tricky, since formatting (italics etc) might lie between the two
>>> periods, depending on the style. There is also potential interaction
>>> with quote marks, depending on whether or not the style has
>>> punctuation-in-quotes set true or false. For those reasons, the cull
>>> function can't work on the output string: it needs to analyse the
>>> nested structure before collapsing to identify "adjacent" punctuation.
>>> With content strings, delimiters and affixes in the mix, it's pretty
>>> hair-raising. The citeproc-js code for this is heavily tested and
>>> seems to work quite well, but I would be hard-pressed to explain
>>> exactly how it works.
>>>
>>> Concerning spaces, there was a long discussion a couple of years back
>>> concerning whether extraneous spaces added by affixes should be
>>> considered style bugs:
>>>
>>>  http://xbiblio-devel.2463403.n2.nabble.com/how-much-bugged-a-style-may-be-tt5784767.html#none
>>>
>>> That thread does not reflect well on me, I'm afraid. The point made by
>>> Andrea (and, I think, Bruce) is perfectly valid: double-space issues
>>> *can* be eliminated by more careful construction of CSL code, and
>>> should be. It is also true that masking double spaces in the processor
>>> gives a green light to sloppy coding. That said, the amount of work
>>> required to eliminate all potential extra spaces from the CSL
>>> repository would be pretty staggering. At the end of the day, we're
>>> kind of stuck with this problem.
>>>
>>> Double spaces are hard to catch in the processor for the same reason:
>>> you have to work on the nested structure before it is flattened into
>>> an output string. It's a little simpler because you can assume input
>>> strings will not have leading or trailing spaces; but tracking spaces
>>> across affix and delimiter attributes across multiple nested layers is
>>> still a challenge.
>>>
>>> If you are only going to process one style in one output format and a
>>> single locale, you may be able to fix things up by running a regular
>>> expression over the output string. That wouldn't work as a general
>>> solution, though.
>>>
>>> Sorry for the long response. Hope it helps!
>>>
>>> Frank
>>>

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



--
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Loading...