Embedding citation-specific metadata in PDF files

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Embedding citation-specific metadata in PDF files

johnmie
digi-libris Reader can export all Metadata of an object including CSL-specific variables (those that cannot be mapped 1:1 to Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar file which can be imported into PDF files (with Acrobat.exe) and which other software might be able to read.

Until now we have stored these CSL variables as attribute/value pairs under custom entries which appear in Acrobat.exe under File >> Properties >> Additional Metadata >> Advanced and are stored in the XMP file as

<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
</rdf:Description>


I am now considering changing this to include a proper CSL namespace which could look line

<rdf:Description rdf:about="" xmlns:cs="http://purl.org/net/xbiblio/csl/">
<cs:pageRange>7-9</cs:pageRange>
</rdf:Description>


but unfortunately this URL returns a 404 error or automatically re-directs you to http://citationstyles.org. No way to see a list of variables.

What do you recommend:

   1 stay with pdfx
   2 change to xbiblio even though the latter does not reveal a valid namespace
   3 register a new domain with purl.org (under purl.org/digi/csl/ or similar)
   4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or similar) ?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

rmzelle
Administrator
Since nobody is responding, my two cents: I would pick option 4 for now.

Rintze

On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:

> digi-libris Reader <http://digi-libris.com>   can export all Metadata of an
> object including CSL-specific variables (those that cannot be mapped 1:1 to
> Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar file
> which can be imported into PDF files (with Acrobat.exe) and which other
> software might be able to read.
>
> Until now we have stored these CSL variables as attribute/value pairs under
> custom entries which appear in Acrobat.exe under /File >> Properties >>
> Additional Metadata >> Advanced/ and are stored in the XMP file as
>
> /<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
> <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
> </rdf:Description>/
>
> I am now considering changing this to include a proper CSL namespace which
> could look line
> /
> <rdf:Description rdf:about="" xmlns:cs="http://purl.org/net/xbiblio/csl/">
> <cs:pageRange>7-9</cs:pageRange>
> </rdf:Description>/
>
> but unfortunately this URL returns a 404 error or automatically re-directs
> you to http://citationstyles.org. No way to see a list of variables.
>
> What do you recommend:
>
>    1 stay with pdfx
>    2 change to xbiblio even though the latter does not reveal a valid
> namespace
>    3 register a new domain with purl.org (under purl.org/digi/csl/ or
> similar)
>    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or similar)
> ?

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

Bruce D'Arcus-3
In reply to this post by johnmie
You could do option 5: use bibo?



On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
digi-libris Reader <http://digi-libris.com>   can export all Metadata of an
object including CSL-specific variables (those that cannot be mapped 1:1 to
Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar file
which can be imported into PDF files (with Acrobat.exe) and which other
software might be able to read.

Until now we have stored these CSL variables as attribute/value pairs under
custom entries which appear in Acrobat.exe under /File >> Properties >>
Additional Metadata >> Advanced/ and are stored in the XMP file as

/<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
</rdf:Description>/

I am now considering changing this to include a proper CSL namespace which
could look line
/
<rdf:Description rdf:about="" xmlns:cs="http://purl.org/net/xbiblio/csl/">
<cs:pageRange>7-9</cs:pageRange>
</rdf:Description>/

but unfortunately this URL returns a 404 error or automatically re-directs
you to http://citationstyles.org. No way to see a list of variables.

What do you recommend:

   1 stay with pdfx
   2 change to xbiblio even though the latter does not reveal a valid
namespace
   3 register a new domain with purl.org (under purl.org/digi/csl/ or
similar)
   4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or similar)
?




--
View this message in context: http://xbiblio-devel.2463403.n2.nabble.com/Embedding-citation-specific-metadata-in-PDF-files-tp7579100.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

aurimas
In reply to this post by rmzelle
On Sun, Jul 13, 2014 at 8:30 PM, Rintze Zelle <[hidden email]> wrote:
Since nobody is responding, my two cents: I would pick option 4 for now.

Rintze

On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
> digi-libris Reader <http://digi-libris.com>   can export all Metadata of an
> object including CSL-specific variables (those that cannot be mapped 1:1 to
> Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar file
> which can be imported into PDF files (with Acrobat.exe) and which other
> software might be able to read.

Where are these CSL-specific variables coming from? I don't see any CSL spec (neither CSL documentation, nor CSL JSON format) defining pageRange.
 
>
> Until now we have stored these CSL variables as attribute/value pairs under
> custom entries which appear in Acrobat.exe under /File >> Properties >>
> Additional Metadata >> Advanced/ and are stored in the XMP file as
>
> /<rdf:Description rdf:about="" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
> <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
> </rdf:Description>/
>
> I am now considering changing this to include a proper CSL namespace which
> could look line
> /
> <rdf:Description rdf:about="" xmlns:cs="http://purl.org/net/xbiblio/csl/">
> <cs:pageRange>7-9</cs:pageRange>
> </rdf:Description>/
>
> but unfortunately this URL returns a 404 error or automatically re-directs
> you to http://citationstyles.org. No way to see a list of variables.

Namespaces are not required to resolve to a valid page (I agree that it may be useful though). For all intents and purposes they're just some globally-unique string.
 
>
> What do you recommend:
>
>    1 stay with pdfx
>    2 change to xbiblio even though the latter does not reveal a valid
> namespace
>    3 register a new domain with purl.org (under purl.org/digi/csl/ or
> similar)
>    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or similar)
> ?

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

Robert Knight
> digi-libris Reader <http://digi-libris.com>   can export all Metadata of an
> object including CSL-specific variables (those that cannot be mapped 1:1 to
> Dublin Core Terms. e.g. pageRange, event, genre etc.)

Have you considered mapping to PRISM as well? [1] That fills in a
number of gaps in Dublin Core and is already in use by several
publishers. Mendeley will read PRISM metadata from PDFs in addition to
Dublin Core and I think Papers does as well. I'm not sure if Zotero
can?

[1] http://www.prismstandard.org/specifications/2.1/PRISM_prism_namespace_2.1.pdf

On 14 July 2014 02:48, Aurimas Vinckevicius <[hidden email]> wrote:

> On Sun, Jul 13, 2014 at 8:30 PM, Rintze Zelle <[hidden email]>
> wrote:
>>
>> Since nobody is responding, my two cents: I would pick option 4 for now.
>>
>> Rintze
>>
>> On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
>> > digi-libris Reader <http://digi-libris.com>   can export all Metadata of
>> > an
>> > object including CSL-specific variables (those that cannot be mapped 1:1
>> > to
>> > Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar
>> > file
>> > which can be imported into PDF files (with Acrobat.exe) and which other
>> > software might be able to read.
>
>
> Where are these CSL-specific variables coming from? I don't see any CSL spec
> (neither CSL documentation, nor CSL JSON format) defining pageRange.
>
>>
>> >
>> > Until now we have stored these CSL variables as attribute/value pairs
>> > under
>> > custom entries which appear in Acrobat.exe under /File >> Properties >>
>> > Additional Metadata >> Advanced/ and are stored in the XMP file as
>> >
>> > /<rdf:Description rdf:about=""
>> > xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
>> > <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
>> > </rdf:Description>/
>> >
>> > I am now considering changing this to include a proper CSL namespace
>> > which
>> > could look line
>> > /
>> > <rdf:Description rdf:about=""
>> > xmlns:cs="http://purl.org/net/xbiblio/csl/">
>> > <cs:pageRange>7-9</cs:pageRange>
>> > </rdf:Description>/
>> >
>> > but unfortunately this URL returns a 404 error or automatically
>> > re-directs
>> > you to http://citationstyles.org. No way to see a list of variables.
>
>
> Namespaces are not required to resolve to a valid page (I agree that it may
> be useful though). For all intents and purposes they're just some
> globally-unique string.
>
>>
>> >
>> > What do you recommend:
>> >
>> >    1 stay with pdfx
>> >    2 change to xbiblio even though the latter does not reveal a valid
>> > namespace
>> >    3 register a new domain with purl.org (under purl.org/digi/csl/ or
>> > similar)
>> >    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or
>> > similar)
>> > ?
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> Code Sight&#153; - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck&#174;
> Code Sight&#153; - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

aurimas
PRISM 3.0 has been published as well, though Zotero (I can't speak for other managers) does not yet recognize the new spec/namespace, but we'll get there soon. In either case, Zotero does not read metadata directly from PDFs, because, from what we've seen, the metadata is very unreliable (though this may change in the future).


On Mon, Jul 14, 2014 at 12:36 AM, Robert Knight <[hidden email]> wrote:
> digi-libris Reader <http://digi-libris.com>   can export all Metadata of an
> object including CSL-specific variables (those that cannot be mapped 1:1 to
> Dublin Core Terms. e.g. pageRange, event, genre etc.)

Have you considered mapping to PRISM as well? [1] That fills in a
number of gaps in Dublin Core and is already in use by several
publishers. Mendeley will read PRISM metadata from PDFs in addition to
Dublin Core and I think Papers does as well. I'm not sure if Zotero
can?

[1] http://www.prismstandard.org/specifications/2.1/PRISM_prism_namespace_2.1.pdf

On 14 July 2014 02:48, Aurimas Vinckevicius <[hidden email]> wrote:
> On Sun, Jul 13, 2014 at 8:30 PM, Rintze Zelle <[hidden email]>
> wrote:
>>
>> Since nobody is responding, my two cents: I would pick option 4 for now.
>>
>> Rintze
>>
>> On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
>> > digi-libris Reader <http://digi-libris.com>   can export all Metadata of
>> > an
>> > object including CSL-specific variables (those that cannot be mapped 1:1
>> > to
>> > Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar
>> > file
>> > which can be imported into PDF files (with Acrobat.exe) and which other
>> > software might be able to read.
>
>
> Where are these CSL-specific variables coming from? I don't see any CSL spec
> (neither CSL documentation, nor CSL JSON format) defining pageRange.
>
>>
>> >
>> > Until now we have stored these CSL variables as attribute/value pairs
>> > under
>> > custom entries which appear in Acrobat.exe under /File >> Properties >>
>> > Additional Metadata >> Advanced/ and are stored in the XMP file as
>> >
>> > /<rdf:Description rdf:about=""
>> > xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
>> > <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
>> > </rdf:Description>/
>> >
>> > I am now considering changing this to include a proper CSL namespace
>> > which
>> > could look line
>> > /
>> > <rdf:Description rdf:about=""
>> > xmlns:cs="http://purl.org/net/xbiblio/csl/">
>> > <cs:pageRange>7-9</cs:pageRange>
>> > </rdf:Description>/
>> >
>> > but unfortunately this URL returns a 404 error or automatically
>> > re-directs
>> > you to http://citationstyles.org. No way to see a list of variables.
>
>
> Namespaces are not required to resolve to a valid page (I agree that it may
> be useful though). For all intents and purposes they're just some
> globally-unique string.
>
>>
>> >
>> > What do you recommend:
>> >
>> >    1 stay with pdfx
>> >    2 change to xbiblio even though the latter does not reveal a valid
>> > namespace
>> >    3 register a new domain with purl.org (under purl.org/digi/csl/ or
>> > similar)
>> >    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or
>> > similar)
>> > ?
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> Code Sight&#153; - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck&#174;
> Code Sight&#153; - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

johnmie
In reply to this post by rmzelle
Thanks Rintze for relaunshing the debate

I have adopted option four as follows:
Citation relevant variables will be stored on export in XMP sidecar files as

<rdf:Description rdf:about="" xmlns:cs="http://purl.org/digilib/cita/">
<citation:title>Embedded Metadata add Value to Scientific Publications</citation:title>
<citation:number-of-pages>3</citation:number-of-pages>
<citation:original-publisher-place>Geneva</citation:original-publisher-place>
...
</rdf:Description>

and those which cannot be mapped to DC will also be carried under pdfx as custom attribute/value pairs with the 'citation_' prefix, same as already in use in many HTML files.

to aurimas: You are absolutely right, 'pagerange' is not in the CSL specification. It is a convenience variable I have used, but it exports as 'page' and not as 'pagerange'. My fault, sorry for the misleading typo.

to robert: be happy to include a bridge for PRISM variables if this is a widely used standard. Just show me a mapping list and the purl.org entry to use.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

Robert Knight
In reply to this post by aurimas
> In either case, Zotero does not read metadata directly from PDFs, because, from what we've seen,
> the metadata is very unreliable (though this may change in the future).

The main problem we observed was that the same Dublin Core fields that
are used for article metadata are also filled in by PDF generation
software using generic defaults - for example the filename of the
source document (Word, LaTeX etc.) as dc:title and the name of the
software that created the PDF as dc:creator.

In Mendeley we apply some simple heuristics based on a comparison of
the metadata with the actual content of the first few pages of the PDF
to decide whether or not to use that metadata.

The presence of PRISM fields is also a useful indicator since they are
more domain specific and less likely to be populated with other data
than the DC fields.

On 14 July 2014 07:11, Aurimas Vinckevicius <[hidden email]> wrote:

> PRISM 3.0 has been published as well, though Zotero (I can't speak for other
> managers) does not yet recognize the new spec/namespace, but we'll get there
> soon. In either case, Zotero does not read metadata directly from PDFs,
> because, from what we've seen, the metadata is very unreliable (though this
> may change in the future).
>
>
> On Mon, Jul 14, 2014 at 12:36 AM, Robert Knight <[hidden email]>
> wrote:
>>
>> > digi-libris Reader <http://digi-libris.com>   can export all Metadata of
>> > an
>> > object including CSL-specific variables (those that cannot be mapped 1:1
>> > to
>> > Dublin Core Terms. e.g. pageRange, event, genre etc.)
>>
>> Have you considered mapping to PRISM as well? [1] That fills in a
>> number of gaps in Dublin Core and is already in use by several
>> publishers. Mendeley will read PRISM metadata from PDFs in addition to
>> Dublin Core and I think Papers does as well. I'm not sure if Zotero
>> can?
>>
>> [1]
>> http://www.prismstandard.org/specifications/2.1/PRISM_prism_namespace_2.1.pdf
>>
>> On 14 July 2014 02:48, Aurimas Vinckevicius <[hidden email]> wrote:
>> > On Sun, Jul 13, 2014 at 8:30 PM, Rintze Zelle <[hidden email]>
>> > wrote:
>> >>
>> >> Since nobody is responding, my two cents: I would pick option 4 for
>> >> now.
>> >>
>> >> Rintze
>> >>
>> >> On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
>> >> > digi-libris Reader <http://digi-libris.com>   can export all Metadata
>> >> > of
>> >> > an
>> >> > object including CSL-specific variables (those that cannot be mapped
>> >> > 1:1
>> >> > to
>> >> > Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar
>> >> > file
>> >> > which can be imported into PDF files (with Acrobat.exe) and which
>> >> > other
>> >> > software might be able to read.
>> >
>> >
>> > Where are these CSL-specific variables coming from? I don't see any CSL
>> > spec
>> > (neither CSL documentation, nor CSL JSON format) defining pageRange.
>> >
>> >>
>> >> >
>> >> > Until now we have stored these CSL variables as attribute/value pairs
>> >> > under
>> >> > custom entries which appear in Acrobat.exe under /File >> Properties
>> >> > >>
>> >> > Additional Metadata >> Advanced/ and are stored in the XMP file as
>> >> >
>> >> > /<rdf:Description rdf:about=""
>> >> > xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
>> >> > <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
>> >> > </rdf:Description>/
>> >> >
>> >> > I am now considering changing this to include a proper CSL namespace
>> >> > which
>> >> > could look line
>> >> > /
>> >> > <rdf:Description rdf:about=""
>> >> > xmlns:cs="http://purl.org/net/xbiblio/csl/">
>> >> > <cs:pageRange>7-9</cs:pageRange>
>> >> > </rdf:Description>/
>> >> >
>> >> > but unfortunately this URL returns a 404 error or automatically
>> >> > re-directs
>> >> > you to http://citationstyles.org. No way to see a list of variables.
>> >
>> >
>> > Namespaces are not required to resolve to a valid page (I agree that it
>> > may
>> > be useful though). For all intents and purposes they're just some
>> > globally-unique string.
>> >
>> >>
>> >> >
>> >> > What do you recommend:
>> >> >
>> >> >    1 stay with pdfx
>> >> >    2 change to xbiblio even though the latter does not reveal a valid
>> >> > namespace
>> >> >    3 register a new domain with purl.org (under purl.org/digi/csl/ or
>> >> > similar)
>> >> >    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or
>> >> > similar)
>> >> > ?
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Want fast and easy access to all the code in your enterprise? Index and
>> >> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> >> Code Sight&#153; - the same software that powers the world's largest
>> >> code
>> >> search on Ohloh, the Black Duck Open Hub! Try it now.
>> >> http://p.sf.net/sfu/bds
>> >> _______________________________________________
>> >> xbiblio-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Want fast and easy access to all the code in your enterprise? Index and
>> > search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> > Code Sight&#153; - the same software that powers the world's largest
>> > code
>> > search on Ohloh, the Black Duck Open Hub! Try it now.
>> > http://p.sf.net/sfu/bds
>> > _______________________________________________
>> > xbiblio-devel mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> Code Sight&#153; - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck&#174;
> Code Sight&#153; - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

Robert Knight
> to robert: be happy to include a bridge for PRISM variables if this is a
> widely used standard. Just show me a mapping list and the purl.org entry to
> use.

I'm not sure if there is an existing purl.org entry. The example at
http://www.prismstandard.org/resources/mod_prism.html uses a
prismstandard.org URL for the namespace. There is a PURL
'/rss/1.0/modules/prism/' which points to the aforementioned
mod_prism.html resource but you want one which points to the
namespace?

I don't have a list of PRISM -> CSL mappings directly to hand, but the
fields that we recognize which I believe
map straightforwardly to CSL in most cases are:

"prism:aggregationType", "prism:copyright", "prism:doi", "prism:edition",
"prism:endingPage", "prism:genre", "prism:issn", "prism:issueIdentifier",
"prism:issueName", "prism:keyword", "prism:location", "prism:number",
"prism:organization", "prism:pageRange", "prism:publicationDate",
"prism:publicationName", "prism:section", "prism:startingPage",
"prism:volume", "prism:url"

On 14 July 2014 10:29, Robert Knight <[hidden email]> wrote:

>> In either case, Zotero does not read metadata directly from PDFs, because, from what we've seen,
>> the metadata is very unreliable (though this may change in the future).
>
> The main problem we observed was that the same Dublin Core fields that
> are used for article metadata are also filled in by PDF generation
> software using generic defaults - for example the filename of the
> source document (Word, LaTeX etc.) as dc:title and the name of the
> software that created the PDF as dc:creator.
>
> In Mendeley we apply some simple heuristics based on a comparison of
> the metadata with the actual content of the first few pages of the PDF
> to decide whether or not to use that metadata.
>
> The presence of PRISM fields is also a useful indicator since they are
> more domain specific and less likely to be populated with other data
> than the DC fields.
>
> On 14 July 2014 07:11, Aurimas Vinckevicius <[hidden email]> wrote:
>> PRISM 3.0 has been published as well, though Zotero (I can't speak for other
>> managers) does not yet recognize the new spec/namespace, but we'll get there
>> soon. In either case, Zotero does not read metadata directly from PDFs,
>> because, from what we've seen, the metadata is very unreliable (though this
>> may change in the future).
>>
>>
>> On Mon, Jul 14, 2014 at 12:36 AM, Robert Knight <[hidden email]>
>> wrote:
>>>
>>> > digi-libris Reader <http://digi-libris.com>   can export all Metadata of
>>> > an
>>> > object including CSL-specific variables (those that cannot be mapped 1:1
>>> > to
>>> > Dublin Core Terms. e.g. pageRange, event, genre etc.)
>>>
>>> Have you considered mapping to PRISM as well? [1] That fills in a
>>> number of gaps in Dublin Core and is already in use by several
>>> publishers. Mendeley will read PRISM metadata from PDFs in addition to
>>> Dublin Core and I think Papers does as well. I'm not sure if Zotero
>>> can?
>>>
>>> [1]
>>> http://www.prismstandard.org/specifications/2.1/PRISM_prism_namespace_2.1.pdf
>>>
>>> On 14 July 2014 02:48, Aurimas Vinckevicius <[hidden email]> wrote:
>>> > On Sun, Jul 13, 2014 at 8:30 PM, Rintze Zelle <[hidden email]>
>>> > wrote:
>>> >>
>>> >> Since nobody is responding, my two cents: I would pick option 4 for
>>> >> now.
>>> >>
>>> >> Rintze
>>> >>
>>> >> On Wed, Jun 25, 2014 at 1:17 PM, johnmie <[hidden email]> wrote:
>>> >> > digi-libris Reader <http://digi-libris.com>   can export all Metadata
>>> >> > of
>>> >> > an
>>> >> > object including CSL-specific variables (those that cannot be mapped
>>> >> > 1:1
>>> >> > to
>>> >> > Dublin Core Terms. e.g. pageRange, event, genre etc.) as XMP sidecar
>>> >> > file
>>> >> > which can be imported into PDF files (with Acrobat.exe) and which
>>> >> > other
>>> >> > software might be able to read.
>>> >
>>> >
>>> > Where are these CSL-specific variables coming from? I don't see any CSL
>>> > spec
>>> > (neither CSL documentation, nor CSL JSON format) defining pageRange.
>>> >
>>> >>
>>> >> >
>>> >> > Until now we have stored these CSL variables as attribute/value pairs
>>> >> > under
>>> >> > custom entries which appear in Acrobat.exe under /File >> Properties
>>> >> > >>
>>> >> > Additional Metadata >> Advanced/ and are stored in the XMP file as
>>> >> >
>>> >> > /<rdf:Description rdf:about=""
>>> >> > xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
>>> >> > <pdfx:citation_pageRange>7-9</pdfx:citation_pageRange>
>>> >> > </rdf:Description>/
>>> >> >
>>> >> > I am now considering changing this to include a proper CSL namespace
>>> >> > which
>>> >> > could look line
>>> >> > /
>>> >> > <rdf:Description rdf:about=""
>>> >> > xmlns:cs="http://purl.org/net/xbiblio/csl/">
>>> >> > <cs:pageRange>7-9</cs:pageRange>
>>> >> > </rdf:Description>/
>>> >> >
>>> >> > but unfortunately this URL returns a 404 error or automatically
>>> >> > re-directs
>>> >> > you to http://citationstyles.org. No way to see a list of variables.
>>> >
>>> >
>>> > Namespaces are not required to resolve to a valid page (I agree that it
>>> > may
>>> > be useful though). For all intents and purposes they're just some
>>> > globally-unique string.
>>> >
>>> >>
>>> >> >
>>> >> > What do you recommend:
>>> >> >
>>> >> >    1 stay with pdfx
>>> >> >    2 change to xbiblio even though the latter does not reveal a valid
>>> >> > namespace
>>> >> >    3 register a new domain with purl.org (under purl.org/digi/csl/ or
>>> >> > similar)
>>> >> >    4 as nbr 3 above but use a proprietary prefix (e.g. digicita: or
>>> >> > similar)
>>> >> > ?
>>> >>
>>> >>
>>> >>
>>> >> ------------------------------------------------------------------------------
>>> >> Want fast and easy access to all the code in your enterprise? Index and
>>> >> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>>> >> Code Sight&#153; - the same software that powers the world's largest
>>> >> code
>>> >> search on Ohloh, the Black Duck Open Hub! Try it now.
>>> >> http://p.sf.net/sfu/bds
>>> >> _______________________________________________
>>> >> xbiblio-devel mailing list
>>> >> [hidden email]
>>> >> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>> >
>>> >
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > Want fast and easy access to all the code in your enterprise? Index and
>>> > search up to 200,000 lines of code with a free copy of Black Duck&#174;
>>> > Code Sight&#153; - the same software that powers the world's largest
>>> > code
>>> > search on Ohloh, the Black Duck Open Hub! Try it now.
>>> > http://p.sf.net/sfu/bds
>>> > _______________________________________________
>>> > xbiblio-devel mailing list
>>> > [hidden email]
>>> > https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want fast and easy access to all the code in your enterprise? Index and
>>> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>>> Code Sight&#153; - the same software that powers the world's largest code
>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>> http://p.sf.net/sfu/bds
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck&#174;
>> Code Sight&#153; - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> xbiblio-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

aurimas
In reply to this post by johnmie

Probably just a typo, but your namespace declaration doesn't match the prefix you are using.

More importantly, if the idea behind using this namespace URI is to offer interoperability between software, then I'm not sure this is helpful. I've never seen http://purl.org/digilib/cita/ namespace used in this context (though it may be) and I can't find any documentation for it. Can anyone point to a reference?

I think your best choice for interoperability would be to use a common, rich vocabulary, like PRISM, with a namespace URI that is official/widely used (e.g. http://prismstandard.org/namespaces/basic/2.1/ or a different version).

If you don't care about interoperability, then I guess it doesn't matter at all.

Aurimas

On Jul 14, 2014 4:21 AM, "johnmie" <[hidden email]> wrote:
>
> Thanks Rintze for relaunshing the debate
>
> I have adopted option four as follows:
> Citation relevant variables will be stored on export in XMP sidecar files as
>
> <rdf:Description rdf:about="" xmlns:cs="http://purl.org/digilib/cita/">
> <citation:title>Embedded Metadata add Value to Scientific
> Publications</citation:title>
> <citation:number-of-pages>3</citation:number-of-pages>
> <citation:original-publisher-place>Geneva</citation:original-publisher-place>
> ...
> </rdf:Description>
>
> and those which cannot be mapped to DC will also be carried under pdfx as
> custom attribute/value pairs with the 'citation_' prefix, same as already in
> use in many HTML files.
>
> to aurimas: You are absolutely right, 'pagerange' is not in the CSL
> specification. It is a convenience variable I have used, but it exports as
> 'page' and not as 'pagerange'. My fault, sorry for the misleading typo.
>
> to robert: be happy to include a bridge for PRISM variables if this is a
> widely used standard. Just show me a mapping list and the purl.org entry to
> use.
>
>
>
> --
> View this message in context: http://xbiblio-devel.2463403.n2.nabble.com/Embedding-citation-specific-metadata-in-PDF-files-tp7579100p7579116.html
> Sent from the xbiblio-devel mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck&#174;
> Code Sight&#153; - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> xbiblio-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

johnmie
In reply to this post by johnmie
actually it now reads <rdf:Description rdf:about=""
xmlns:citation="http://purl.org/digilib/cita/">, cs came from a previous
test version.

The purl.org/digilib/cita/ link is new and the updated version of digi-libris reader has not yet been uploaded. This is why you have not yet seen it anywhere. Remember I had asked the original question only a few days ago and did not get any meaningful suggestions until Rintze re-launched the debate.

If you click on our purl link you will be directed to our Citation Variables appendix which documents all the variables we use and how CSL variables are mapped.

The prismstandard link points to an errata page which in turn returns 404. But from what I have seen on another prism page, many of the variables do not map 1:1 to either CSL or DC and will be treated as
custom variables.

john m.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

aurimas
OK, so you're basically establishing a new schema, which is highly parallel to the variables used in CSL documentation (those are effectively _not_ under http://purl.org/net/xbiblio/csl namespace, because they are never used in QNAMES in that context). The only thing that is not entirely clear to me from the documentation is which verbs (in the RDF sense) fall under the "http://purl.org/digilib/cita/" namespce. Is it all of the listed terms under "Variables used in CSL styles", except for the ones marked with * and ** (since those are mapped to dc/dcterms namespace)? Or is it all of the terms on that page? IMO, since you're establishing a new namespace anyway, it would make sense to add all of the CSL variables to this namespace with no exceptions (you're not forced to use this in digibib export anyway). I would also go with a namespace URI that makes this relationship clear (e.g. "http://purl.org/net/xbiblio/csl-vars#", "http://purl.org/net/digibib/csl-vars#" or something similar).


On Mon, Jul 14, 2014 at 3:23 PM, johnmie <[hidden email]> wrote:
actually it now reads <rdf:Description rdf:about=""
xmlns:citation="http://purl.org/digilib/cita/">, cs came from a previous
test version.

The purl.org/digilib/cita/ link is new and the updated version of
digi-libris reader has not yet been uploaded. This is why you have not yet
seen it anywhere. Remember I had asked the original question only a few days
ago and did not get any meaningful suggestions until Rintze re-launched the
debate.

If you click on our purl link you will be directed to our Citation Variables
appendix which documents all the variables we use and how CSL variables are
mapped.

The prismstandard link points to an errata page which in turn returns 404.
But from what I have seen on another prism page, many of the variables do
not map 1:1 to either CSL or DC and will be treated as
custom variables.

john m.



--
View this message in context: http://xbiblio-devel.2463403.n2.nabble.com/Embedding-citation-specific-metadata-in-PDF-files-tp7579100p7579120.html
Sent from the xbiblio-devel mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xbiblio-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Embedding citation-specific metadata in PDF files

johnmie
In reply to this post by johnmie
digi-libris distinguishes between 5 types of variables:
      1. those that can be mapped to dc or dcterms,
      2. those that can be mapped to CSL and dc/dcterms,
      3. those that can only be mapped to CSL,
      4. those imported and re-mappable such as ris, bib, MARC21, Prism (planned) etc.
      5. those that cannot be mapped to any of the above (which become custom attribute/value pairs).
     
      In export all CSL variables are included under digilib namespace
      and in addition those also available as dc or dcterms are
      duplicated under the respective namespace, all other CSL variables are
      duplicated as custom attribute/value pairs under pdfx with the 'citation_' prefix.
      Thus they are visible even to individuals who may not have
      adequate software to read all of the embedded metadata.
Loading...