digiKam › digikam-users

Jpeg Comments and encodings

Classic

List

Threaded

1 message

Matthias Keller

Jpeg Comments and encodings

Hi all

I've been using digikam for a long time but one thing I always stumble upon again and again is interoperability concerning the various forms of Jpeg Comments.
I usually view my files in Digikam and Gwenview as well as Photoshop and Faststone ImageViewer on Windows.
So far I haven't found an acceptable way to tag my images so that it displays correctly most of the time.

I found this old thread explaining some charsets of the various fields:
http://mail.kde.org/pipermail/digikam-users/2006-October/002116.html
It says:
- JFIF is converted from latin1
- EXIF UserComment may provide a charset, else some 'autodetection' takes place
- IPTC is converted from latin1
- XMP wasn't supported then...

With some testing I found that digiKam reads the tags in the following order:
- Xmp.dc.description
- Xmp.exif.UserComment
- Xmp.tiff.ImageDescription
- JFIF Comment ("Jpeg comment")
- Exif.Photo.UserComment
- Iptc.Application2.Caption (envelope encoding not honored)

All the Xmp.*.* tags seem to be read and written as UTF8 which is correct as far as I know
However, the JFIF-Comment is written as UTF8 which is at least questionable, as the standard doesn't define any charset at all as far as I know (and it also seem to have changed since the above discussion in 2006).

a) Now when we come to EXIF, things get hairy:
I've prepared a jpeg file with exiv2 and inserted an Exif.Photo.UserComment using Unicode: (reading with exiv2 -pv image.jpg) - I've added the complete tag name in the comment to recognize where it comes from later on)
0x9286 Photo UserComment Undefined 88 charset="Unicode" Commentwithäöü. (Exif.Photo.UserComment)

Now when viewing in digiKam, the Xmp.dc.description tag is used in the GUI since it's present as well. If I change the text and save again, the comment shows up as:
0x9286 Photo UserComment Undefined 23 charset="Ascii" Commentwith��.

Thus the text was converted to ISO-8859-1 and the charset specified as Ascii - isn't that wrong, since it's definitely not ASCII but ISO-8859-1? Why doesn't digiKam use charset="Unicode"?

b) Iptc.Application2.Caption:
According to that Mail from 2006, IPTC Data is always encoded/decoded as latin1, though in other places I found that one should/can specify the Iptc.Envelope.CharacterSet to specify the character set used. This appears to be ignored by digiKam...

c) Question about Xmp "lang"
One thing I still do not understand is the lang="..." attribute in Xmp comments - what exactly is its meaning? Is it just to add multiple entries using different languages? Does this affect encoding at all or is it really always UTF8 ?

Thank you very much

Matt

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users