CSL spec and test cases

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

CSL spec and test cases

Brecht Machiels-2
Hello,

I've been doing some work on my citeproc-py  
(https://github.com/brechtm/citeproc-py) and have written down some  
questions/remarks about some of the tests and the CSL spec. Note that I  
could simply be misunderstanding/misinterpreting things for some of these.

* the CSL spec is contradictory about number detection
>>> Tests whether the given variables **contain numeric content**.
versus
>>> Content is considered numeric if it **solely consists of numbers**.
>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"  
>>> test "false".
does not seem to agree with condition_IsNumeric

* Chicago page range format: what do do with five or more digits?

* Which values are allowed for the "page" input field? I see multiple  
ranges can also be specified. I think the CSL spec should, in general,  
also define the format of the input fields. Personally, I would opt for a  
structured format (like the date fields) as opposed to a string-format  
(the page field). Individual CSL processors can still convert a  
string-formatted field to the structured data. This would require changes  
to the tests.

* Shouldn't "page-first" be a number variable? It is used with number in  
page_NumberPageFirst

* The spec doesn't say anything about the nested groups special case.  
variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with  
the CSL spec:
>>> cs:group and its child elements are suppressed if a) at least one  
>>> renderingelement in cs:group calls a variable (either directly or via  
>>> a macro), and b)all variables that are called are empty.
In the group in the else section only the title variable is called. For  
ITEM-3, this variable is empty, so the group should be suppressed, but it  
isn't.
Should a nested group always act as if it's (successfully) calling a  
variable? If so, the spec should mention this.

* I seem to remember citeproc-js postprocesses its output to remove  
duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.  
What's the official stance on this? I would personally avoid doing this,  
unless the spec includes an unambiguous definition on how this should work.

* locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,  
and default-locale="en", which locale should we use?

* textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is  
not part of the CSL spec, is it?

* textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to  
follow the CSL spec. The 'a' after the colon should be capitalized:
>>> In both cases, stop words are lowercased, unless they are the first or  
>>> lastword in the string, or follow a colon.

* date_VariousInvalidDates: why is 'Spring' in the output?

* page_Chicago: is the example S input data correct? It strikes me as a  
confusing way of representing a page range (in addition to saving only a  
single digit).

* A large number of tests test functionality that is not in the CSL spec,  
but is provided by citeproc-js (raw dates, static ordering, literal names,  
...). I think these should be indicated as such, or perhaps moved to a  
separate directory. This would make it easier to check the other CSL  
processor's compatibility.

I hope you can find the time to answer these.

Thanks,
Brecht


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

rmzelle
Administrator
On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels <[hidden email]> wrote:
> * the CSL spec is contradictory about number detection
>>>> Tests whether the given variables **contain numeric content**.
> versus
>>>> Content is considered numeric if it **solely consists of numbers**.
>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>> test "false".
> does not seem to agree with condition_IsNumeric

The behavior of "is-numeric" changed in CSL 1.0.1. See
http://citationstyles.org/downloads/release-notes-csl101.html#numbers

I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In "Tests whether the given variables contain numeric content."
(http://citationstyles.org/downloads/specification.html#choose), I
mean to say that the test is against the entire string contents of
each variable. In a string like "2nd edition", the "edition" substring
means that the entire string is non-numeric.

> * Chicago page range format: what do do with five or more digits?

The specification currently links to
http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?

> * Which values are allowed for the "page" input field? I see multiple
> ranges can also be specified. I think the CSL spec should, in general,
> also define the format of the input fields. Personally, I would opt for a
> structured format (like the date fields) as opposed to a string-format
> (the page field). Individual CSL processors can still convert a
> string-formatted field to the structured data. This would require changes
> to the tests.

This would presumably involve describing the JSON format used by
citeproc-js in more detail. See
http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
for a relevant discussion on this topic.

> * Shouldn't "page-first" be a number variable? It is used with number in
> page_NumberPageFirst

See https://github.com/citation-style-language/schema/issues/9. I
think Frank prefers to render "page" and "page-first" with cs:number,
but that's currently not kosher CSL.

> * The spec doesn't say anything about the nested groups special case.
> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
> the CSL spec:
>>>> cs:group and its child elements are suppressed if a) at least one
>>>> renderingelement in cs:group calls a variable (either directly or via
>>>> a macro), and b)all variables that are called are empty.
> In the group in the else section only the title variable is called. For
> ITEM-3, this variable is empty, so the group should be suppressed, but it
> isn't.
> Should a nested group always act as if it's (successfully) calling a
> variable? If so, the spec should mention this.

I think Frank already has an opinion on this, but I can't find the
discussion. I think the test
(https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
describes the desired behavior, in which case the specification should
indeed be amended. This is somewhat related to the open issue
https://github.com/citation-style-language/schema/issues/104

> * I seem to remember citeproc-js postprocesses its output to remove
> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
> What's the official stance on this? I would personally avoid doing this,
> unless the spec includes an unambiguous definition on how this should work.

I'm convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,
> and default-locale="en", which locale should we use?

http://citationstyles.org/downloads/specification.html#locale-fallback
discusses this: "If the chosen output locale is a language (e.g.
"de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."

The table above that line mentions that "en-US" is the primary dialect for "en".

> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
> not part of the CSL spec, is it?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt

Correct.

> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>> In both cases, stop words are lowercased, unless they are the first or
>>>> lastword in the string, or follow a colon.

It seems like it should
(http://citationstyles.org/downloads/specification.html#title-case-conversion).
Frank?

> * date_VariousInvalidDates: why is 'Spring' in the output?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt

Don't know. I think you can ignore this unit test. Frank?

> * page_Chicago: is the example S input data correct? It strikes me as a
> confusing way of representing a page range (in addition to saving only a
> single digit).

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt

Looks unambiguous to me.

> * A large number of tests test functionality that is not in the CSL spec,
> but is provided by citeproc-js (raw dates, static ordering, literal names,
> ...). I think these should be indicated as such, or perhaps moved to a
> separate directory. This would make it easier to check the other CSL
> processor's compatibility.

Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature

If somebody else helps with the technical infrastructure, I'd be happy
to help reclassifying the existing unit tests.

Rintze

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

David Lawrence
Re: Page range

The fore-matter in books and some journals is usually in Roman numerals. Is this observation relevant?

David


On Thu, Aug 8, 2013 at 12:27 PM, Rintze Zelle <[hidden email]> wrote:
On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels <[hidden email]> wrote:
> * the CSL spec is contradictory about number detection
>>>> Tests whether the given variables **contain numeric content**.
> versus
>>>> Content is considered numeric if it **solely consists of numbers**.
>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>> test "false".
> does not seem to agree with condition_IsNumeric

The behavior of "is-numeric" changed in CSL 1.0.1. See
http://citationstyles.org/downloads/release-notes-csl101.html#numbers

I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In "Tests whether the given variables contain numeric content."
(http://citationstyles.org/downloads/specification.html#choose), I
mean to say that the test is against the entire string contents of
each variable. In a string like "2nd edition", the "edition" substring
means that the entire string is non-numeric.

> * Chicago page range format: what do do with five or more digits?

The specification currently links to
http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?

> * Which values are allowed for the "page" input field? I see multiple
> ranges can also be specified. I think the CSL spec should, in general,
> also define the format of the input fields. Personally, I would opt for a
> structured format (like the date fields) as opposed to a string-format
> (the page field). Individual CSL processors can still convert a
> string-formatted field to the structured data. This would require changes
> to the tests.

This would presumably involve describing the JSON format used by
citeproc-js in more detail. See
http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
for a relevant discussion on this topic.

> * Shouldn't "page-first" be a number variable? It is used with number in
> page_NumberPageFirst

See https://github.com/citation-style-language/schema/issues/9. I
think Frank prefers to render "page" and "page-first" with cs:number,
but that's currently not kosher CSL.

> * The spec doesn't say anything about the nested groups special case.
> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
> the CSL spec:
>>>> cs:group and its child elements are suppressed if a) at least one
>>>> renderingelement in cs:group calls a variable (either directly or via
>>>> a macro), and b)all variables that are called are empty.
> In the group in the else section only the title variable is called. For
> ITEM-3, this variable is empty, so the group should be suppressed, but it
> isn't.
> Should a nested group always act as if it's (successfully) calling a
> variable? If so, the spec should mention this.

I think Frank already has an opinion on this, but I can't find the
discussion. I think the test
(https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
describes the desired behavior, in which case the specification should
indeed be amended. This is somewhat related to the open issue
https://github.com/citation-style-language/schema/issues/104

> * I seem to remember citeproc-js postprocesses its output to remove
> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
> What's the official stance on this? I would personally avoid doing this,
> unless the spec includes an unambiguous definition on how this should work.

I'm convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,
> and default-locale="en", which locale should we use?

http://citationstyles.org/downloads/specification.html#locale-fallback
discusses this: "If the chosen output locale is a language (e.g.
"de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."

The table above that line mentions that "en-US" is the primary dialect for "en".

> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
> not part of the CSL spec, is it?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt

Correct.

> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>> In both cases, stop words are lowercased, unless they are the first or
>>>> lastword in the string, or follow a colon.

It seems like it should
(http://citationstyles.org/downloads/specification.html#title-case-conversion).
Frank?

> * date_VariousInvalidDates: why is 'Spring' in the output?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt

Don't know. I think you can ignore this unit test. Frank?

> * page_Chicago: is the example S input data correct? It strikes me as a
> confusing way of representing a page range (in addition to saving only a
> single digit).

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt

Looks unambiguous to me.

> * A large number of tests test functionality that is not in the CSL spec,
> but is provided by citeproc-js (raw dates, static ordering, literal names,
> ...). I think these should be indicated as such, or perhaps moved to a
> separate directory. This would make it easier to check the other CSL
> processor's compatibility.

Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature

If somebody else helps with the technical infrastructure, I'd be happy
to help reclassifying the existing unit tests.

Rintze

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

rmzelle
Administrator
What exactly led you to this remark? The discussion about the "is-numeric" test?

My guess is that citeproc-js doesn't currently parse roman numerals in
its input data, and just treats it as text, which should work
reasonably well.

On Thu, Aug 8, 2013 at 4:05 PM, David Lawrence <[hidden email]> wrote:
> Re: Page range
>
> The fore-matter in books and some journals is usually in Roman numerals. Is
> this observation relevant?
>
>
> David

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

Sebastian Karcher
In reply to this post by David Lawrence
CMoS page range specs don't change for ranges with more than 4 digits, i.e. "Use two digits unless more are needed to include all changed parts"
12345-46
12345-678
12345-6789

and the different rules for multiples of hundred and the first nine digits thereafter remain,
i.e. cite all digits when dealing with multiples of hundred
12300-12345
and only the changed digit(s) for the first ten thereafter
12301-8



On Thu, Aug 8, 2013 at 2:05 PM, David Lawrence <[hidden email]> wrote:
Re: Page range

The fore-matter in books and some journals is usually in Roman numerals. Is this observation relevant?

David


On Thu, Aug 8, 2013 at 12:27 PM, Rintze Zelle <[hidden email]> wrote:
On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels <[hidden email]> wrote:
> * the CSL spec is contradictory about number detection
>>>> Tests whether the given variables **contain numeric content**.
> versus
>>>> Content is considered numeric if it **solely consists of numbers**.
>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>> test "false".
> does not seem to agree with condition_IsNumeric

The behavior of "is-numeric" changed in CSL 1.0.1. See
http://citationstyles.org/downloads/release-notes-csl101.html#numbers

I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In "Tests whether the given variables contain numeric content."
(http://citationstyles.org/downloads/specification.html#choose), I
mean to say that the test is against the entire string contents of
each variable. In a string like "2nd edition", the "edition" substring
means that the entire string is non-numeric.

> * Chicago page range format: what do do with five or more digits?

The specification currently links to
http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?

> * Which values are allowed for the "page" input field? I see multiple
> ranges can also be specified. I think the CSL spec should, in general,
> also define the format of the input fields. Personally, I would opt for a
> structured format (like the date fields) as opposed to a string-format
> (the page field). Individual CSL processors can still convert a
> string-formatted field to the structured data. This would require changes
> to the tests.

This would presumably involve describing the JSON format used by
citeproc-js in more detail. See
http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
for a relevant discussion on this topic.

> * Shouldn't "page-first" be a number variable? It is used with number in
> page_NumberPageFirst

See https://github.com/citation-style-language/schema/issues/9. I
think Frank prefers to render "page" and "page-first" with cs:number,
but that's currently not kosher CSL.

> * The spec doesn't say anything about the nested groups special case.
> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
> the CSL spec:
>>>> cs:group and its child elements are suppressed if a) at least one
>>>> renderingelement in cs:group calls a variable (either directly or via
>>>> a macro), and b)all variables that are called are empty.
> In the group in the else section only the title variable is called. For
> ITEM-3, this variable is empty, so the group should be suppressed, but it
> isn't.
> Should a nested group always act as if it's (successfully) calling a
> variable? If so, the spec should mention this.

I think Frank already has an opinion on this, but I can't find the
discussion. I think the test
(https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
describes the desired behavior, in which case the specification should
indeed be amended. This is somewhat related to the open issue
https://github.com/citation-style-language/schema/issues/104

> * I seem to remember citeproc-js postprocesses its output to remove
> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
> What's the official stance on this? I would personally avoid doing this,
> unless the spec includes an unambiguous definition on how this should work.

I'm convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,
> and default-locale="en", which locale should we use?

http://citationstyles.org/downloads/specification.html#locale-fallback
discusses this: "If the chosen output locale is a language (e.g.
"de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."

The table above that line mentions that "en-US" is the primary dialect for "en".

> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
> not part of the CSL spec, is it?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt

Correct.

> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>> In both cases, stop words are lowercased, unless they are the first or
>>>> lastword in the string, or follow a colon.

It seems like it should
(http://citationstyles.org/downloads/specification.html#title-case-conversion).
Frank?

> * date_VariousInvalidDates: why is 'Spring' in the output?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt

Don't know. I think you can ignore this unit test. Frank?

> * page_Chicago: is the example S input data correct? It strikes me as a
> confusing way of representing a page range (in addition to saving only a
> single digit).

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt

Looks unambiguous to me.

> * A large number of tests test functionality that is not in the CSL spec,
> but is provided by citeproc-js (raw dates, static ordering, literal names,
> ...). I think these should be indicated as such, or perhaps moved to a
> separate directory. This would make it easier to check the other CSL
> processor's compatibility.

Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature

If somebody else helps with the technical infrastructure, I'd be happy
to help reclassifying the existing unit tests.

Rintze

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

Sebastian Karcher
re roman numerals: treating them as text works for CMoS which always wants full page ranges for roman numbers.


On Thu, Aug 8, 2013 at 2:46 PM, Sebastian Karcher <[hidden email]> wrote:
CMoS page range specs don't change for ranges with more than 4 digits, i.e. "Use two digits unless more are needed to include all changed parts"
12345-46
12345-678
12345-6789

and the different rules for multiples of hundred and the first nine digits thereafter remain,
i.e. cite all digits when dealing with multiples of hundred
12300-12345
and only the changed digit(s) for the first ten thereafter
12301-8



On Thu, Aug 8, 2013 at 2:05 PM, David Lawrence <[hidden email]> wrote:
Re: Page range

The fore-matter in books and some journals is usually in Roman numerals. Is this observation relevant?

David


On Thu, Aug 8, 2013 at 12:27 PM, Rintze Zelle <[hidden email]> wrote:
On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels <[hidden email]> wrote:
> * the CSL spec is contradictory about number detection
>>>> Tests whether the given variables **contain numeric content**.
> versus
>>>> Content is considered numeric if it **solely consists of numbers**.
>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>> test "false".
> does not seem to agree with condition_IsNumeric

The behavior of "is-numeric" changed in CSL 1.0.1. See
http://citationstyles.org/downloads/release-notes-csl101.html#numbers

I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In "Tests whether the given variables contain numeric content."
(http://citationstyles.org/downloads/specification.html#choose), I
mean to say that the test is against the entire string contents of
each variable. In a string like "2nd edition", the "edition" substring
means that the entire string is non-numeric.

> * Chicago page range format: what do do with five or more digits?

The specification currently links to
http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?

> * Which values are allowed for the "page" input field? I see multiple
> ranges can also be specified. I think the CSL spec should, in general,
> also define the format of the input fields. Personally, I would opt for a
> structured format (like the date fields) as opposed to a string-format
> (the page field). Individual CSL processors can still convert a
> string-formatted field to the structured data. This would require changes
> to the tests.

This would presumably involve describing the JSON format used by
citeproc-js in more detail. See
http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
for a relevant discussion on this topic.

> * Shouldn't "page-first" be a number variable? It is used with number in
> page_NumberPageFirst

See https://github.com/citation-style-language/schema/issues/9. I
think Frank prefers to render "page" and "page-first" with cs:number,
but that's currently not kosher CSL.

> * The spec doesn't say anything about the nested groups special case.
> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
> the CSL spec:
>>>> cs:group and its child elements are suppressed if a) at least one
>>>> renderingelement in cs:group calls a variable (either directly or via
>>>> a macro), and b)all variables that are called are empty.
> In the group in the else section only the title variable is called. For
> ITEM-3, this variable is empty, so the group should be suppressed, but it
> isn't.
> Should a nested group always act as if it's (successfully) calling a
> variable? If so, the spec should mention this.

I think Frank already has an opinion on this, but I can't find the
discussion. I think the test
(https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
describes the desired behavior, in which case the specification should
indeed be amended. This is somewhat related to the open issue
https://github.com/citation-style-language/schema/issues/104

> * I seem to remember citeproc-js postprocesses its output to remove
> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
> What's the official stance on this? I would personally avoid doing this,
> unless the spec includes an unambiguous definition on how this should work.

I'm convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,
> and default-locale="en", which locale should we use?

http://citationstyles.org/downloads/specification.html#locale-fallback
discusses this: "If the chosen output locale is a language (e.g.
"de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."

The table above that line mentions that "en-US" is the primary dialect for "en".

> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
> not part of the CSL spec, is it?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt

Correct.

> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>> In both cases, stop words are lowercased, unless they are the first or
>>>> lastword in the string, or follow a colon.

It seems like it should
(http://citationstyles.org/downloads/specification.html#title-case-conversion).
Frank?

> * date_VariousInvalidDates: why is 'Spring' in the output?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt

Don't know. I think you can ignore this unit test. Frank?

> * page_Chicago: is the example S input data correct? It strikes me as a
> confusing way of representing a page range (in addition to saving only a
> single digit).

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt

Looks unambiguous to me.

> * A large number of tests test functionality that is not in the CSL spec,
> but is provided by citeproc-js (raw dates, static ordering, literal names,
> ...). I think these should be indicated as such, or perhaps moved to a
> separate directory. This would make it easier to check the other CSL
> processor's compatibility.

Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature

If somebody else helps with the technical infrastructure, I'd be happy
to help reclassifying the existing unit tests.

Rintze

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel




--
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University



--
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

rmzelle
Administrator
In reply to this post by Sebastian Karcher
But according to "If numbers are four digits long and three digits
change, use all digits", you would have:

> 1234-46
> 1234-1678

So I'd expect
> 12345-46
> 12345-12678
> 12345-16789

In http://citationstyles.org/downloads/specification.html#appendix-v-page-range-formats,
shouldn't we write something like
"If numbers are *four or more* digits long and *three or more* digits
change, use all digits" ?

Rintze

On Thu, Aug 8, 2013 at 4:46 PM, Sebastian Karcher
<[hidden email]> wrote:

> CMoS page range specs don't change for ranges with more than 4 digits, i.e.
> "Use two digits unless more are needed to include all changed parts"
> 12345-46
> 12345-678
> 12345-6789
>
> and the different rules for multiples of hundred and the first nine digits
> thereafter remain,
> i.e. cite all digits when dealing with multiples of hundred
> 12300-12345
> and only the changed digit(s) for the first ten thereafter
> 12301-8

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

Brecht Machiels-2
Hi,

On Thu, 08 Aug 2013 23:39:39 +0200, Rintze Zelle  
<[hidden email]> wrote:
> In  
> http://citationstyles.org/downloads/specification.html#appendix-v-page-range-formats,
> shouldn't we write something like
> "If numbers are *four or more* digits long and *three or more* digits
> change, use all digits" ?

Yes, this is the the reason why I asked in the first place. I should  
probably have mentioned that.

For now, in citeproc-py, if the number of common digits between the start  
en end page numbers is less than two, it uses the expanded form.

12345-468
12345-13576
123456-5614

I'm not sure which I prefer. As long as it's clearly defined, I'm happy :)

Cheers,
Brecht


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

Brecht Machiels-2
In reply to this post by rmzelle
Hello,

Thank you, Rintze, for the clarifications and pointers to relevant  
information. These should prove helpful.

Brecht

On Thu, 08 Aug 2013 21:27:36 +0200, Rintze Zelle  
<[hidden email]> wrote:

> On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels  
> <[hidden email]> wrote:
>> * the CSL spec is contradictory about number detection
>>>>> Tests whether the given variables **contain numeric content**.
>> versus
>>>>> Content is considered numeric if it **solely consists of numbers**.
>>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>>> test "false".
>> does not seem to agree with condition_IsNumeric
>
> The behavior of "is-numeric" changed in CSL 1.0.1. See
> http://citationstyles.org/downloads/release-notes-csl101.html#numbers
>
> I can see how the current description in the specification might be
> somewhat confusing, but it is meant to agree with
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
> In "Tests whether the given variables contain numeric content."
> (http://citationstyles.org/downloads/specification.html#choose), I
> mean to say that the test is against the entire string contents of
> each variable. In a string like "2nd edition", the "edition" substring
> means that the entire string is non-numeric.
>
>> * Chicago page range format: what do do with five or more digits?
>
> The specification currently links to
> http://www.aahn.org/guidelines.html, but it seems like the content we
> relied on moved to http://www.aahn.org/stylesheet.html . The latter
> page shows an excerpt from CMoS that we almost copied verbatim.
> Sebastian, could you check if CMoS 16th edition gives any guidance on
> number ranges of 5 or more digits?
>
>> * Which values are allowed for the "page" input field? I see multiple
>> ranges can also be specified. I think the CSL spec should, in general,
>> also define the format of the input fields. Personally, I would opt for  
>> a
>> structured format (like the date fields) as opposed to a string-format
>> (the page field). Individual CSL processors can still convert a
>> string-formatted field to the structured data. This would require  
>> changes
>> to the tests.
>
> This would presumably involve describing the JSON format used by
> citeproc-js in more detail. See
> http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
> for a relevant discussion on this topic.
>
>> * Shouldn't "page-first" be a number variable? It is used with number in
>> page_NumberPageFirst
>
> See https://github.com/citation-style-language/schema/issues/9. I
> think Frank prefers to render "page" and "page-first" with cs:number,
> but that's currently not kosher CSL.
>
>> * The spec doesn't say anything about the nested groups special case.
>> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
>> the CSL spec:
>>>>> cs:group and its child elements are suppressed if a) at least one
>>>>> renderingelement in cs:group calls a variable (either directly or via
>>>>> a macro), and b)all variables that are called are empty.
>> In the group in the else section only the title variable is called. For
>> ITEM-3, this variable is empty, so the group should be suppressed, but  
>> it
>> isn't.
>> Should a nested group always act as if it's (successfully) calling a
>> variable? If so, the spec should mention this.
>
> I think Frank already has an opinion on this, but I can't find the
> discussion. I think the test
> (https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
> describes the desired behavior, in which case the specification should
> indeed be amended. This is somewhat related to the open issue
> https://github.com/citation-style-language/schema/issues/104
>
>> * I seem to remember citeproc-js postprocesses its output to remove
>> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
>> What's the official stance on this? I would personally avoid doing this,
>> unless the spec includes an unambiguous definition on how this should  
>> work.
>
> I'm convinced that CSL processors need to do some suppression of
> duplicated punctuation. Frank just prepared some tests that describe
> the current behavior in citeproc-js, and I hope to write up some
> requirements for the specification in the next few weeks based on
> those. See
>
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt
>
>> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If  
>> so,
>> and default-locale="en", which locale should we use?
>
> http://citationstyles.org/downloads/specification.html#locale-fallback
> discusses this: "If the chosen output locale is a language (e.g.
> "de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."
>
> The table above that line mentions that "en-US" is the primary dialect  
> for "en".
>
>> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
>> not part of the CSL spec, is it?
>
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt
>
> Correct.
>
>> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
>> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>>> In both cases, stop words are lowercased, unless they are the first  
>>>>> or
>>>>> lastword in the string, or follow a colon.
>
> It seems like it should
> (http://citationstyles.org/downloads/specification.html#title-case-conversion).
> Frank?
>
>> * date_VariousInvalidDates: why is 'Spring' in the output?
>
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt
>
> Don't know. I think you can ignore this unit test. Frank?
>
>> * page_Chicago: is the example S input data correct? It strikes me as a
>> confusing way of representing a page range (in addition to saving only a
>> single digit).
>
> https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt
>
> Looks unambiguous to me.
>
>> * A large number of tests test functionality that is not in the CSL  
>> spec,
>> but is provided by citeproc-js (raw dates, static ordering, literal  
>> names,
>> ...). I think these should be indicated as such, or perhaps moved to a
>> separate directory. This would make it easier to check the other CSL
>> processor's compatibility.
>
> Sylvester Keil proposed using a Cucumber format for unit tests, which
> would allow tests to be tagged:
> https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature
>
> If somebody else helps with the technical infrastructure, I'd be happy
> to help reclassifying the existing unit tests.
>
> Rintze


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: CSL spec and test cases

Sebastian Karcher
In reply to this post by Brecht Machiels-2
sorry, was on vacations.


On Sun, Aug 11, 2013 at 4:27 AM, Brecht Machiels <[hidden email]> wrote:
> In
> http://citationstyles.org/downloads/specification.html#appendix-v-page-range-formats,
> shouldn't we write something like
> "If numbers are *four or more* digits long and *three or more* digits
> change, use all digits" ?

Yes, this is the the reason why I asked in the first place. I should
probably have mentioned that.

For now, in citeproc-py, if the number of common digits between the start
en end page numbers is less than two, it uses the expanded form.

12345-468
12345-13576
123456-5614

I'm not sure which I prefer. As long as it's clearly defined, I'm happy :)

The current CSL spec (and thus Brecht's implementation in -py) is incorrect according to the current CMoS.
Here are three examples from the relevant chapter (9.60)
1496–500
11564–615
12991–3001

i.e. even when only one digit stays the same, only the changing digits are displayed after the en-dash.
Since we call this rule "Chicago" we should change this in the specs (and implementers should change this accordingly).
According to the manual, these rules have never changed, so we must have gotten that wrong at some point. Sorry for never catching that.



 

Cheers,
Brecht


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



--
Sebastian Karcher
Ph.D. Candidate
Department of Political Science
Northwestern University

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel