locale files - Addendum

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

locale files - Addendum

Paulo Ney de Souza
And since we may be close to a change, there is no reason, not to change it to be able to deal with a full generality locale file. The last issue left here are languages that can be written in more than one script. Sometimes the different scripts are used in different countries and the string la-CO (language-Country) completely defines it, but sometimes NOT, for example Azeri. It is written in Arabic script in Iran and in both Cyrillic and Latin script in Azerbaijan, making it impossible to localize with a country string.

For those cases, the common string to define locale is la-Scrp-CO, and the most commonly used locales that require the script language to be defined are:

az_Cyrl_AZ
az_Latn_AZ

ha_Arab_NG
ha_Latn_NG

mn_Cyrl_MN
mn_Mong_CN

sr_Cyrl_BA
sr_Latn_BA

sr_Cyrl_CS
sr_Latn_CS

sr_Cyrl_ME
sr_Latn_ME

sr_Cyrl_RS
sr_Latn_RS

uz_Cyrl_UZ
uz_Latn_UZ

zh_Hans_HK
zh_Hant_HK

zh_Hans_MO
zh_Hant_MO

In the case of CSL, the change would be just the ability of accepting this more general string as the name of a locale file, and of the corresponding language file (-sr_Cyrl.xml and -sr-Latn.xml).

That is definitely my last posting on the subject for a while, I'll come back now with the tool for someone to be able to create and save a local and a way to produce all  factored common language files, depending on the adoption of the proposals.

Paulo Ney


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

rmzelle
Administrator
On Mon, Oct 7, 2013 at 7:45 AM, Paulo Ney de Souza <[hidden email]> wrote:
And since we may be close to a change, there is no reason, not to change it to be able to deal with a full generality locale file. The last issue left here are languages that can be written in more than one script. Sometimes the different scripts are used in different countries and the string la-CO (language-Country) completely defines it, but sometimes NOT, for example Azeri. It is written in Arabic script in Iran and in both Cyrillic and Latin script in Azerbaijan, making it impossible to localize with a country string.
 
In the case of CSL, the change would be just the ability of accepting this more general string as the name of a locale file, and of the corresponding language file (-sr_Cyrl.xml and -sr-Latn.xml).

This actually has come up once, but I never accepted the pull request because I didn't know how to deal with this issue. See https://github.com/citation-style-language/locales/pull/46#issuecomment-10812215 (somebody contributed a Latin version of Serbian, while we already have Cyrillic).

In all these cases, should we always include the script in the file name (and "xml:lang" attribute within the locale)? Or should we designate one script (e.g. Serbian Cyrillic) as the "primary" script and assign it "sr" instead of "sr-cyrl"?

Rintze

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

fbennett
On Mon, Oct 7, 2013 at 10:21 PM, Rintze Zelle <[hidden email]> wrote:

> On Mon, Oct 7, 2013 at 7:45 AM, Paulo Ney de Souza <[hidden email]>
> wrote:
>>
>> And since we may be close to a change, there is no reason, not to change
>> it to be able to deal with a full generality locale file. The last issue
>> left here are languages that can be written in more than one script.
>> Sometimes the different scripts are used in different countries and the
>> string la-CO (language-Country) completely defines it, but sometimes NOT,
>> for example Azeri. It is written in Arabic script in Iran and in both
>> Cyrillic and Latin script in Azerbaijan, making it impossible to localize
>> with a country string.
>
>
>>
>> In the case of CSL, the change would be just the ability of accepting this
>> more general string as the name of a locale file, and of the corresponding
>> language file (-sr_Cyrl.xml and -sr-Latn.xml).

So the idea is to follow RFC 5646?

        http://tools.ietf.org/html/rfc5646

>
>
> This actually has come up once, but I never accepted the pull request
> because I didn't know how to deal with this issue. See
> https://github.com/citation-style-language/locales/pull/46#issuecomment-10812215
> (somebody contributed a Latin version of Serbian, while we already have
> Cyrillic).
>
> In all these cases, should we always include the script in the file name
> (and "xml:lang" attribute within the locale)? Or should we designate one
> script (e.g. Serbian Cyrillic) as the "primary" script and assign it "sr"
> instead of "sr-cyrl"?

That's specified In RFC 5646. Languages can be associated with a
primary script, in which case it is omitted from the tag (or in this
case, the filename). It's at 2.2.3, para 4 (reference to
'Suppress-Script').

Following the RFC would probably be best, but we would have to think
about how to specify the fallback behaviour.

Frank

>
> Rintze
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

Paulo Ney de Souza
In reply to this post by rmzelle
I would vote to include the script in the filename because that is what Linux, Python, PHP, and other open source projects do. It is not a standard yet, but it is fast becoming one ...

Paulo Ney


On Mon, Oct 7, 2013 at 10:21 AM, Rintze Zelle <[hidden email]> wrote:

This actually has come up once, but I never accepted the pull request because I didn't know how to deal with this issue. See https://github.com/citation-style-language/locales/pull/46#issuecomment-10812215 (somebody contributed a Latin version of Serbian, while we already have Cyrillic).

In all these cases, should we always include the script in the file name (and "xml:lang" attribute within the locale)? Or should we designate one script (e.g. Serbian Cyrillic) as the "primary" script and assign it "sr" instead of "sr-cyrl"?

Rintze


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

Paulo Ney de Souza
In reply to this post by fbennett


On Mon, Oct 7, 2013 at 10:45 AM, Frank Bennett <[hidden email]> wrote:

So the idea is to follow RFC 5646?

        http://tools.ietf.org/html/rfc5646

We should try to follow the RFC as close as possible, it is the best guidance we have.
 

> In all these cases, should we always include the script in the file name
> (and "xml:lang" attribute within the locale)? Or should we designate one
> script (e.g. Serbian Cyrillic) as the "primary" script and assign it "sr"
> instead of "sr-cyrl"?

That's specified In RFC 5646. Languages can be associated with a
primary script, in which case it is omitted from the tag (or in this
case, the filename). It's at 2.2.3, para 4 (reference to
'Suppress-Script').


My reading of the RFC is that one MAY omit or not the Script tag, depending on judgement if "it adds no distinguishing value to the tag". Some languages-locale combinations have a preferred script, examples being:

   Han Traditional in Taiwan

   Han Simplified in Mainland China

Some others like Serbian obviously depend on the region and there is NO preferred way that could be specified by the location tag. Since one would have to come up with a uniform scheme (to make it easier for all), the way it being used is to INCLUDE the tag. That is how it is done virtually all Linux distributions, Python, Perl, PHP and several other Open Source projects with large i18n efforts.

Paulo Ney
 

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

fbennett
On Mon, Oct 7, 2013 at 11:19 PM, Paulo Ney de Souza <[hidden email]> wrote:

>
>
> On Mon, Oct 7, 2013 at 10:45 AM, Frank Bennett <[hidden email]>
> wrote:
>>
>>
>> So the idea is to follow RFC 5646?
>>
>>         http://tools.ietf.org/html/rfc5646
>
>
> We should try to follow the RFC as close as possible, it is the best
> guidance we have.
>
>>
>>
>> > In all these cases, should we always include the script in the file name
>> > (and "xml:lang" attribute within the locale)? Or should we designate one
>> > script (e.g. Serbian Cyrillic) as the "primary" script and assign it
>> > "sr"
>> > instead of "sr-cyrl"?
>>
>> That's specified In RFC 5646. Languages can be associated with a
>> primary script, in which case it is omitted from the tag (or in this
>> case, the filename). It's at 2.2.3, para 4 (reference to
>> 'Suppress-Script').
>>
>
> My reading of the RFC is that one MAY omit or not the Script tag, depending
> on judgement if "it adds no distinguishing value to the tag". Some
> languages-locale combinations have a preferred script, examples being:
>
>    Han Traditional in Taiwan
>
>    Han Simplified in Mainland China
>
> Some others like Serbian obviously depend on the region and there is NO
> preferred way that could be specified by the location tag. Since one would
> have to come up with a uniform scheme (to make it easier for all), the way
> it being used is to INCLUDE the tag. That is how it is done virtually all
> Linux distributions, Python, Perl, PHP and several other Open Source
> projects with large i18n efforts.

I don't have a strong view on file naming. We'll need to work out how
fallback works, though.


>
> Paulo Ney
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

Paulo Ney de Souza
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


On Mon, Oct 7, 2013 at 11:25 AM, Frank Bennett <[hidden email]> wrote:

I don't have a strong view on file naming. We'll need to work out how
fallback works, though.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

rmzelle
Administrator
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

Paulo Ney de Souza
That is the most important reason why it is better for the fall-back file to be defined inside the XML file. In this particular case we would have a small file

zh-CN

that would define a fall-back to zh-hans-CN (the most natural choice in this case), or directly to the language file zh-Hans.xml

Paulo Ney


On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]> wrote:
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

rmzelle
Administrator
Similar to dependent CSL styles? I'm a bit worried about the burden to implement that.

Rintze

On Mon, Oct 7, 2013 at 11:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
That is the most important reason why it is better for the fall-back file to be defined inside the XML file. In this particular case we would have a small file

zh-CN

that would define a fall-back to zh-hans-CN (the most natural choice in this case), or directly to the language file zh-Hans.xml

Paulo Ney


On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]> wrote:
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

Paulo Ney de Souza
I'll write the files and send them to you. Once a schema is decided I can chenges a few lines on a script that will write all of them!

Paulo Ney


On Mon, Oct 7, 2013 at 1:11 PM, Rintze Zelle <[hidden email]> wrote:
Similar to dependent CSL styles? I'm a bit worried about the burden to implement that.

Rintze


On Mon, Oct 7, 2013 at 11:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
That is the most important reason why it is better for the fall-back file to be defined inside the XML file. In this particular case we would have a small file

zh-CN

that would define a fall-back to zh-hans-CN (the most natural choice in this case), or directly to the language file zh-Hans.xml

Paulo Ney


On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]> wrote:
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

rmzelle
Administrator
I meant the burden for implementing the scheme in CSL processors. Making the files and changing the schema are the easy parts.

Rintze

On Mon, Oct 7, 2013 at 12:14 PM, Paulo Ney de Souza <[hidden email]> wrote:
I'll write the files and send them to you. Once a schema is decided I can chenges a few lines on a script that will write all of them!

Paulo Ney


On Mon, Oct 7, 2013 at 1:11 PM, Rintze Zelle <[hidden email]> wrote:
Similar to dependent CSL styles? I'm a bit worried about the burden to implement that.

Rintze


On Mon, Oct 7, 2013 at 11:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
That is the most important reason why it is better for the fall-back file to be defined inside the XML file. In this particular case we would have a small file

zh-CN

that would define a fall-back to zh-hans-CN (the most natural choice in this case), or directly to the language file zh-Hans.xml

Paulo Ney


On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]> wrote:
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

ajlyon
This all sounds like we are trying to implement something parallel to RFC 5646 / BCP 47. This is a standard that we should embrace as possible, and it has some implicit ideas of partial matching and fallback, but we'd be adding substantial ad-hoc semantics by trying to define a new set of fallback relationships.

I've been lurking on the IETF Languages mailing list for several years (working group for RFC 5646 language subtags and stomping grounds of language tagging experts), since Frank and I worked on getting some new subtags approved by that body, and it strikes me that we are doing something wrong if we are coming up with things like zh-Hans -> zh-Hant -> en.

I know that a substantial body of useful localization data is encoded in CLDR (http://cldr.unicode.org/), but I don't know if we'd find this there.

The basic logic of substituting strings from successively fuzzier matches makes sense at the surface, but in practice it's going to be technically difficult and probably hard to debug. I'd recommend _not_ supporting fallback, except perhaps to the guaranteed-complete locale of English.

Some logical fallbacks could be hard to detect for a user debugging strange translations, or potentially offensive. There are cases where languages are similar and we could save duplicated strings by having pt-PT and pt-BR both inherit from a common pt, or one from the other, but that would make it difficult to determine, for example, how complete the Brazilian Portuguese translation is. Other failovers between scripts could be politically fraught-- Hans and Hant, or all instances of Latn/Cyrl. 

RFC 5646 also brings up other questions about failover -- do we want to be handling macrolanguages?
The example of zh here has thusfar ignored the fact that zh isn't even a correct subtag for these locale files-- we should be using cmn, as zh is a macrolanguage encompassing several subtags (http://people.w3.org/rishida/utils/subtags/index.php?lookup=zh&submit=Look+up).

I think that the answer across the board is that no, we don't want to handle failover. Our goal should be fuzzy matching to get the best single locale file for the user's desired locale. We should use scarce engineering resources to make sure that that component does work, so that we match the style- or user-specified locale of ru-alalc97 to ru-Latn-alalc97 if available, or to best-matches like ru-RU if not, so that we match cmn-Hant to whatever zh we have available. That's the level we have to get right. String-level failover is out of spec and bound to be extremely confusing for implementers, localizers and users.


On Mon, Oct 7, 2013 at 9:16 AM, Rintze Zelle <[hidden email]> wrote:
I meant the burden for implementing the scheme in CSL processors. Making the files and changing the schema are the easy parts.

Rintze


On Mon, Oct 7, 2013 at 12:14 PM, Paulo Ney de Souza <[hidden email]> wrote:
I'll write the files and send them to you. Once a schema is decided I can chenges a few lines on a script that will write all of them!

Paulo Ney


On Mon, Oct 7, 2013 at 1:11 PM, Rintze Zelle <[hidden email]> wrote:
Similar to dependent CSL styles? I'm a bit worried about the burden to implement that.

Rintze


On Mon, Oct 7, 2013 at 11:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
That is the most important reason why it is better for the fall-back file to be defined inside the XML file. In this particular case we would have a small file

zh-CN

that would define a fall-back to zh-hans-CN (the most natural choice in this case), or directly to the language file zh-Hans.xml

Paulo Ney


On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]> wrote:
To elaborate on my concern, for languages with multiple scripts, we have to figure out what happens if a user specifies "zh-CN", and we only have "zh-hans-cn" and "zh-hant-cn" files. What then?

Rintze

On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza <[hidden email]> wrote:
I would make sense NOT to change from one script to another while doing a fall-back (unless it is the ultimate English substitution), so the scheme of

    zh-HanS-CN  -->  zh-HanS  --> en-US

seems the more appropriate, or even better:

    zh-HanS-CN  -->  zh-HanS  --> en

after the English general file is created.

PN


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|

Re: locale files - Addendum

fbennett
On Tue, Oct 8, 2013 at 1:47 AM, Avram Lyon <[hidden email]> wrote:

> This all sounds like we are trying to implement something parallel to RFC
> 5646 / BCP 47. This is a standard that we should embrace as possible, and it
> has some implicit ideas of partial matching and fallback, but we'd be adding
> substantial ad-hoc semantics by trying to define a new set of fallback
> relationships.
>
> I've been lurking on the IETF Languages mailing list for several years
> (working group for RFC 5646 language subtags and stomping grounds of
> language tagging experts), since Frank and I worked on getting some new
> subtags approved by that body, and it strikes me that we are doing something
> wrong if we are coming up with things like zh-Hans -> zh-Hant -> en.
>
> I know that a substantial body of useful localization data is encoded in
> CLDR (http://cldr.unicode.org/), but I don't know if we'd find this there.
>
> The basic logic of substituting strings from successively fuzzier matches
> makes sense at the surface, but in practice it's going to be technically
> difficult and probably hard to debug. I'd recommend _not_ supporting
> fallback, except perhaps to the guaranteed-complete locale of English.
>
> Some logical fallbacks could be hard to detect for a user debugging strange
> translations, or potentially offensive. There are cases where languages are
> similar and we could save duplicated strings by having pt-PT and pt-BR both
> inherit from a common pt, or one from the other, but that would make it
> difficult to determine, for example, how complete the Brazilian Portuguese
> translation is. Other failovers between scripts could be politically
> fraught-- Hans and Hant, or all instances of Latn/Cyrl.
>
> RFC 5646 also brings up other questions about failover -- do we want to be
> handling macrolanguages?
> The example of zh here has thusfar ignored the fact that zh isn't even a
> correct subtag for these locale files-- we should be using cmn, as zh is a
> macrolanguage encompassing several subtags
> (http://people.w3.org/rishida/utils/subtags/index.php?lookup=zh&submit=Look+up).
>
> I think that the answer across the board is that no, we don't want to handle
> failover. Our goal should be fuzzy matching to get the best single locale
> file for the user's desired locale. We should use scarce engineering
> resources to make sure that that component does work, so that we match the
> style- or user-specified locale of ru-alalc97 to ru-Latn-alalc97 if
> available, or to best-matches like ru-RU if not, so that we match cmn-Hant
> to whatever zh we have available. That's the level we have to get right.
> String-level failover is out of spec and bound to be extremely confusing for
> implementers, localizers and users.

String-level fallback is required by the CSL spec:

    http://citationstyles.org/downloads/specification.html#locale

If a script specifier is added to the mix, we will need to work out
how fallback works with it, if only for the limited case of term
overrides embedded in a style.

>
>
> On Mon, Oct 7, 2013 at 9:16 AM, Rintze Zelle <[hidden email]> wrote:
>>
>> I meant the burden for implementing the scheme in CSL processors. Making
>> the files and changing the schema are the easy parts.
>>
>> Rintze
>>
>>
>> On Mon, Oct 7, 2013 at 12:14 PM, Paulo Ney de Souza <[hidden email]>
>> wrote:
>>>
>>> I'll write the files and send them to you. Once a schema is decided I can
>>> chenges a few lines on a script that will write all of them!
>>>
>>> Paulo Ney
>>>
>>>
>>> On Mon, Oct 7, 2013 at 1:11 PM, Rintze Zelle <[hidden email]>
>>> wrote:
>>>>
>>>> Similar to dependent CSL styles? I'm a bit worried about the burden to
>>>> implement that.
>>>>
>>>> Rintze
>>>>
>>>>
>>>> On Mon, Oct 7, 2013 at 11:46 AM, Paulo Ney de Souza <[hidden email]>
>>>> wrote:
>>>>>
>>>>> That is the most important reason why it is better for the fall-back
>>>>> file to be defined inside the XML file. In this particular case we would
>>>>> have a small file
>>>>>
>>>>> zh-CN
>>>>>
>>>>> that would define a fall-back to zh-hans-CN (the most natural choice in
>>>>> this case), or directly to the language file zh-Hans.xml
>>>>>
>>>>> Paulo Ney
>>>>>
>>>>>
>>>>> On Mon, Oct 7, 2013 at 12:05 PM, Rintze Zelle <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> To elaborate on my concern, for languages with multiple scripts, we
>>>>>> have to figure out what happens if a user specifies "zh-CN", and we only
>>>>>> have "zh-hans-cn" and "zh-hant-cn" files. What then?
>>>>>>
>>>>>> Rintze
>>>>>>
>>>>>> On Mon, Oct 7, 2013 at 10:46 AM, Paulo Ney de Souza
>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>> I would make sense NOT to change from one script to another while
>>>>>>> doing a fall-back (unless it is the ultimate English substitution), so the
>>>>>>> scheme of
>>>>>>>
>>>>>>>     zh-HanS-CN  -->  zh-HanS  --> en-US
>>>>>>>
>>>>>>> seems the more appropriate, or even better:
>>>>>>>
>>>>>>>     zh-HanS-CN  -->  zh-HanS  --> en
>>>>>>>
>>>>>>> after the English general file is created.
>>>>>>>
>>>>>>> PN
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> October Webinars: Code for Performance
>>>>>> Free Intel webinars can help you accelerate application performance.
>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>>> most from
>>>>>> the latest Intel processors and coprocessors. See abstracts and
>>>>>> register >
>>>>>>
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>>>>> _______________________________________________
>>>>>> xbiblio-devel mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> October Webinars: Code for Performance
>>>>> Free Intel webinars can help you accelerate application performance.
>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>> most from
>>>>> the latest Intel processors and coprocessors. See abstracts and
>>>>> register >
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> xbiblio-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> October Webinars: Code for Performance
>>>> Free Intel webinars can help you accelerate application performance.
>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>>> from
>>>> the latest Intel processors and coprocessors. See abstracts and register
>>>> >
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> xbiblio-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register
>>> >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel