[Digikam-devel] Encoding for IPTC comments

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Digikam-devel] Encoding for IPTC comments

Leonid Zeitlin
Hello, Digikam developers,
I am a active and happy user of Digikam. One feature I like and use is
ability to save image comments to EXIF and IPTC tags. This way image
comments entered in Digikam are also usable in other applications I use
under Windows, e.g. Google Picasa, IrfanView or ACDSee. Unfortunately, I
face a problem when comments are not in English. I am a Russian speaker and
naturally want to annotate my pictures in Russian. However, Digikam will
only save ASCII characters as IPTC comment. It may be in line with IPTC
standard (although I'm not sure, as far as I can see the standard says that
caption consists of "graphic characters" where "graphic characters" are
defined as "characters that have visual representation") - but in any case
Picasa or ACDSee happily read and write non-ASCII characters in IPTC
caption. These Windows applications of course use Windows character set,
CP1251 in my case, which Digikam won't read. In the end, no
interoperability between Digikam and Windows world in terms of Russian
image captions.

I understand and share the concern about following the IPTC standard, but
interoperability with popular image manipulation programs is equally
important for me, plus I don't really see IPTC insisting on ASCII.

To help my problem I have come up with small patches for Digikam and
kipi-plugins, which I would like to offer to Digikam community for review
and comment. This is what they do:
1. Add an option in Configure Digikam dialog, Metadata page, "IPTC
Encoding". By default it is ASCII and Digikam's current behavior is
preserved.
2. If the option is set to a non-ASCII encoding, Digikam will read and write
IPTC tags in this encoding. Metadataedit KIPI plugin will do the same.
3. The setting is stored in kdeglobals (rather than digikamrc), since it's
used not only by Digikam itself, but also by Digikam kioslaves and any
applications that would load Metadataedit KIPI plugin.

The patches are made against Digikam 0.9.0 and kipi-plugins 0.1.3 sources.

I welcome any feedback about my patches and hope to see them in Digikam one
day.

Thanks,
  Leonid
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel

digikam-0.9.0-iptc-encoding-lz.patch (9K) Download Attachment
kipi-plugins-0.1.3-iptc-encoding-lz.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Gilles Caulier-2
Hi Leonid,

In first, thanks for your help. All contributions are very appreciate.

Well, i know the IPTC char encoding problem. Look the bug report :

http://bugs.kde.org/show_bug.cgi?id=132244

.. and the solution is written in this report....

About you patch, you must never use a stable inplementation to build a source
code patch, but always the current implementation from svn. Look here :

http://www.digikam.org/?q=contrib

With the current implementation, i cannot use your patch directly, because i
have created a new shared library named libkexiv2 witch is an Exiv2 interface
for digiKam and kipi-plugins. This way remove all dupplicate code in
Digikam::DMetadata class and KipiPlugins::Exiv2Iface class.

The libkexiv2 is at the same place than kipi-plugins in svn :

http://websvn.kde.org/trunk/extragear/libs/libkexiv2

To apply the solution explained into #132244 bug report, the libkexiv2 need to
be patched.

Others part of your big patch sound good (widgets, settings, etc.). of course,
i need to test it indeep (:=)))

Please review again your patch. Thanks in advance for your help

Regards

Gilles Caulier


Le Mardi 6 Février 2007 22:02, Leonid Zeitlin a écrit :

> Hello, Digikam developers,
> I am a active and happy user of Digikam. One feature I like and use is
> ability to save image comments to EXIF and IPTC tags. This way image
> comments entered in Digikam are also usable in other applications I use
> under Windows, e.g. Google Picasa, IrfanView or ACDSee. Unfortunately, I
> face a problem when comments are not in English. I am a Russian speaker and
> naturally want to annotate my pictures in Russian. However, Digikam will
> only save ASCII characters as IPTC comment. It may be in line with IPTC
> standard (although I'm not sure, as far as I can see the standard says that
> caption consists of "graphic characters" where "graphic characters" are
> defined as "characters that have visual representation") - but in any case
> Picasa or ACDSee happily read and write non-ASCII characters in IPTC
> caption. These Windows applications of course use Windows character set,
> CP1251 in my case, which Digikam won't read. In the end, no
> interoperability between Digikam and Windows world in terms of Russian
> image captions.
>
> I understand and share the concern about following the IPTC standard, but
> interoperability with popular image manipulation programs is equally
> important for me, plus I don't really see IPTC insisting on ASCII.
>
> To help my problem I have come up with small patches for Digikam and
> kipi-plugins, which I would like to offer to Digikam community for review
> and comment. This is what they do:
> 1. Add an option in Configure Digikam dialog, Metadata page, "IPTC
> Encoding". By default it is ASCII and Digikam's current behavior is
> preserved.
> 2. If the option is set to a non-ASCII encoding, Digikam will read and
> write IPTC tags in this encoding. Metadataedit KIPI plugin will do the
> same. 3. The setting is stored in kdeglobals (rather than digikamrc), since
> it's used not only by Digikam itself, but also by Digikam kioslaves and any
> applications that would load Metadataedit KIPI plugin.
>
> The patches are made against Digikam 0.9.0 and kipi-plugins 0.1.3 sources.
>
> I welcome any feedback about my patches and hope to see them in Digikam one
> day.
>
> Thanks,
>   Leonid
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Leonid Zeitlin
Hi Gilles,
Thanks for your reply. I didn't realize the code was refactored since
0.9.0 release. I will get the latest from SVN and adapt my patch to
the new code.

Regarding the discussion in bug #132244. I tried setting
Iptc.Envelope.CharacterSet with exiv2 command-line utility and then
saving Iptc.Application2.Caption in UTF8 as described there. I've
found that both Photoshop and IrfanView didn't decode UTF and showed
it as is (unreadable), while Picasa simply didn't recognize the
presence of caption at all. I also saw that Picasa doesn't set
Iptc.Envelope.CharacterSet tag. Therefore I think my approach is
orthogonal to what is discussed there and still would be a good
feature.

I will get back to you once I update the patch to the latest code.

Thanks,
  Leonid

On 2/6/07, Caulier Gilles <[hidden email]> wrote:

> Hi Leonid,
>
> In first, thanks for your help. All contributions are very appreciate.
>
> Well, i know the IPTC char encoding problem. Look the bug report :
>
> http://bugs.kde.org/show_bug.cgi?id=132244
>
> .. and the solution is written in this report....
>
> About you patch, you must never use a stable inplementation to build a source
> code patch, but always the current implementation from svn. Look here :
>
> http://www.digikam.org/?q=contrib
>
> With the current implementation, i cannot use your patch directly, because i
> have created a new shared library named libkexiv2 witch is an Exiv2 interface
> for digiKam and kipi-plugins. This way remove all dupplicate code in
> Digikam::DMetadata class and KipiPlugins::Exiv2Iface class.
>
> The libkexiv2 is at the same place than kipi-plugins in svn :
>
> http://websvn.kde.org/trunk/extragear/libs/libkexiv2
>
> To apply the solution explained into #132244 bug report, the libkexiv2 need to
> be patched.
>
> Others part of your big patch sound good (widgets, settings, etc.). of course,
> i need to test it indeep (:=)))
>
> Please review again your patch. Thanks in advance for your help
>
> Regards
>
> Gilles Caulier
>
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Gilles Caulier-2
Le mercredi 7 février 2007 12:37, vous avez écrit :
> Hi Gilles,
> Thanks for your reply. I didn't realize the code was refactored since
> 0.9.0 release. I will get the latest from SVN and adapt my patch to
> the new code.
>
> Regarding the discussion in bug #132244. I tried setting
> Iptc.Envelope.CharacterSet with exiv2 command-line utility and then
> saving Iptc.Application2.Caption in UTF8 as described there.

yes, this tag need to be set accordinly to the charset encoding used. I
recommend to provide 2 charset (i have not yet checked your patch) :

- ASSCI
- UTF8

With the last one, all language will be supported.

> I've
> found that both Photoshop and IrfanView didn't decode UTF and showed
> it as is (unreadable), while Picasa simply didn't recognize the
> presence of caption at all. I also saw that Picasa doesn't set
> Iptc.Envelope.CharacterSet tag.

yes, i have suspected this problem, reading some web site about this subject.

> Therefore I think my approach is
> orthogonal to what is discussed there and still would be a good
> feature.

yes, it's look fine for me. but the enveloppe tag need to be set accordinly.

>
> I will get back to you once I update the patch to the latest code.

fine for me. Please post it in the bugzilla file #132244. It's better than
mailling list (this one is limited to attachment size)...

Gilles
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Leonid Zeitlin
Ok, I will post to bugzilla.

One note about setting Iptc.Envelope.CharacterSet though. I am afraid
CP1251 is not even registered in the "International Registry for Coded
Character Sets", and probably some other charsets that could be used
by KDE aren't either. Therefore setting this tags is not always
possible.

Thanks,
  Leonid

On 2/7/07, Caulier Gilles <[hidden email]> wrote:

> Le mercredi 7 février 2007 12:37, vous avez écrit:
> > Hi Gilles,
> > Thanks for your reply. I didn't realize the code was refactored since
> > 0.9.0 release. I will get the latest from SVN and adapt my patch to
> > the new code.
> >
> > Regarding the discussion in bug #132244. I tried setting
> > Iptc.Envelope.CharacterSet with exiv2 command-line utility and then
> > saving Iptc.Application2.Caption in UTF8 as described there.
>
> yes, this tag need to be set accordinly to the charset encoding used. I
> recommend to provide 2 charset (i have not yet checked your patch) :
>
> - ASSCI
> - UTF8
>
> With the last one, all language will be supported.
>
> > I've
> > found that both Photoshop and IrfanView didn't decode UTF and showed
> > it as is (unreadable), while Picasa simply didn't recognize the
> > presence of caption at all. I also saw that Picasa doesn't set
> > Iptc.Envelope.CharacterSet tag.
>
> yes, i have suspected this problem, reading some web site about this subject.
>
> > Therefore I think my approach is
> > orthogonal to what is discussed there and still would be a good
> > feature.
>
> yes, it's look fine for me. but the enveloppe tag need to be set accordinly.
>
> >
> > I will get back to you once I update the patch to the latest code.
>
> fine for me. Please post it in the bugzilla file #132244. It's better than
> mailling list (this one is limited to attachment size)...
>
> Gilles
>





_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Gilles Caulier-2
Le mercredi 7 février 2007 15:15, Leonid Zeitlin a écrit :
> Ok, I will post to bugzilla.
>
> One note about setting Iptc.Envelope.CharacterSet though. I am afraid
> CP1251 is not even registered in the "International Registry for Coded
> Character Sets", and probably some other charsets that could be used
> by KDE aren't either. Therefore setting this tags is not always
> possible.

Use UTF-8 instead.

Look in B.K.O #132244. Andreas, the Exiv2 library author have posted a link
with the solution about envelope IPTC tag value. From the page :

http://www.annocpan.org/~BETTELLI/Image-MetaData-JPEG-0.15/lib/Image/MetaData/JPEG/TagLists.pod

... on "IPTC data (Editorial information and envelope record)" section, you
can read :

«
...
  4) This dataset selects a character set, for use in character oriented
     datasets in records 2-6, according to the "International Register of
     Coded Character Sets" (ISO/IEC 2022 and ISO/IEC 2375, see for instance
     L<http://www.itscj.ipsj.or.jp/ISO-IR/>), and typically consist of the
     escape control character followed by one or more graphic characters.
     For instance, "\033/A" refers to ISO-8859-1 (latin-1) and "\033%G" refers
     to UTF-8 (a Unicode encoding).
...
»

I propose to give one option in digiKam setup and MetadataEdit kipi-plugin to
set the charset encoding to ASCII or UTF-8.

Gilles
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Leonid Zeitlin
On 2/8/07, Caulier Gilles <[hidden email]> wrote:

> Use UTF-8 instead.
[...]
>
> I propose to give one option in digiKam setup and MetadataEdit kipi-plugin to
> set the charset encoding to ASCII or UTF-8.
>
> Gilles
>

Hi Gilles,
Using UTF-8 would allow Digikam to save IPTC comments in international
languages, but it won't achieve interperability with
Picasa/IrfanView/Photoshop etc., and that was my primary goal. Those Windows
programs, as I can see from my testing, write and display IPTC
comments as is, without any encoding/decoding. Therefore for
interoperatilbity with them, comments should be in whatever encoding
Windows is using, and that is not UTF-8 (for Russian language it's
going to be CP1251).

Thanks,

 Leonid
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Gilles Caulier-2
Le jeudi 8 février 2007 13:26, vous avez écrit :

> On 2/8/07, Caulier Gilles <[hidden email]> wrote:
> > Use UTF-8 instead.
>
> [...]
>
> > I propose to give one option in digiKam setup and MetadataEdit
> > kipi-plugin to set the charset encoding to ASCII or UTF-8.
> >
> > Gilles
>
> Hi Gilles,
> Using UTF-8 would allow Digikam to save IPTC comments in international
> languages, but it won't achieve interperability with
> Picasa/IrfanView/Photoshop etc., and that was my primary goal. Those
> Windows programs, as I can see from my testing, write and display IPTC
> comments as is, without any encoding/decoding. Therefore for
> interoperatilbity with them, comments should be in whatever encoding
> Windows is using, and that is not UTF-8 (for Russian language it's
> going to be CP1251).

You want mean than Picasa/IrfanView/Photoshop support IPTC charset like CP1251
but not UTF-8 ?

But you have said before than CP1251 is not even registered in
the "International Registry for Coded Character Sets" (:=)))... this is a non
sence if commercial photo apps do not following at least the standard
specification define into norms...

Witch version of photoshop you use ?

Gilles
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Digikam-devel] Encoding for IPTC comments

Leonid Zeitlin
On 2/8/07, Caulier Gilles <[hidden email]> wrote:

>
> You want mean than Picasa/IrfanView/Photoshop support IPTC charset like CP1251
> but not UTF-8 ?
>
> But you have said before than CP1251 is not even registered in
> the "International Registry for Coded Character Sets" (:=)))... this is a non
> sence if commercial photo apps do not following at least the standard
> specification define into norms...
>
> Witch version of photoshop you use ?
>
> Gilles
>

Gilles, I am trying to say that these Windows applications do not
attempt to recode the IPTC values in any way. They take them as is,
i.e. as being in whatever charset Windows is using. For Russian
language Windows is using code page 1251 charset. And yes, Microsoft
doesn't care to register it's code pages with ISO. I guess it applies
to all Windows code pages, not only the Russian (Cyrillic) one.

Photoshop version 7.

Thanks,
  Leonid
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Encoding for IPTC comments

Leonid Zeitlin
Hi Gilles,
It looks like IptcWidget in Digikam SVN is not yet converted to use
libkexiv2. Is it going to be updated?

Thanks,
  Leonid
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Encoding for IPTC comments

Leonid Zeitlin
On 2/9/07, Caulier Gilles <[hidden email]> wrote:

> Le vendredi 9 février 2007 14:26, vous avez écrit:
> > Hi Gilles,
> > It looks like IptcWidget in Digikam SVN is not yet converted to use
> > libkexiv2. Is it going to be updated?
>
> yes, of course, but i need to add more method in libkexiv2 for that,
> especially to extract a list of tags in metadata.
>
> It's not simple, and i'm busy on other part actually. Still on my TODO list,
> excepted if you want do it of course (:=)))
>
> Gilles


Hi Gilles,
I see. Well, at the moment I don't feel I'm up to this task :-). But I
am going to try to apply my encoding patch at the libkexiv2 level. If
by the time I do that I feel comfortable with libkexiv2 and exiv2, I
may even try approaching IptcWidget.

Thanks,
  Leonid





_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel