Hi all
I've been using digikam for a long time but one thing I always stumble upon again and again is interoperability concerning the various forms of Jpeg Comments. I usually view my files in Digikam and Gwenview as well as Photoshop and Faststone ImageViewer on Windows. So far I haven't found an acceptable way to tag my images so that it displays correctly most of the time. I found this old thread explaining some charsets of the various fields: http://mail.kde.org/pipermail/digikam-users/2006-October/002116.html It says: - JFIF is converted from latin1 - EXIF UserComment may provide a charset, else some 'autodetection' takes place - IPTC is converted from latin1 - XMP wasn't supported then... With some testing I found that digiKam reads the tags in the following order: - Xmp.dc.description - Xmp.exif.UserComment - Xmp.tiff.ImageDescription - JFIF Comment ("Jpeg comment") - Exif.Photo.UserComment - Iptc.Application2.Caption (envelope encoding not honored) All the Xmp.*.* tags seem to be read and written as UTF8 which is correct as far as I know However, the JFIF-Comment is written as UTF8 which is at least questionable, as the standard doesn't define any charset at all as far as I know (and it also seem to have changed since the above discussion in 2006). a) Now when we come to EXIF, things get hairy: I've prepared a jpeg file with exiv2 and inserted an Exif.Photo.UserComment using Unicode: (reading with exiv2 -pv image.jpg) - I've added the complete tag name in the comment to recognize where it comes from later on) 0x9286 Photo UserComment Undefined 88 charset="Unicode" Commentwithäöü. (Exif.Photo.UserComment) Now when viewing in digiKam, the Xmp.dc.description tag is used in the GUI since it's present as well. If I change the text and save again, the comment shows up as: 0x9286 Photo UserComment Undefined 23 charset="Ascii" Commentwith���. Thus the text was converted to ISO-8859-1 and the charset specified as Ascii - isn't that wrong, since it's definitely not ASCII but ISO-8859-1? Why doesn't digiKam use charset="Unicode"? b) Iptc.Application2.Caption: According to that Mail from 2006, IPTC Data is always encoded/decoded as latin1, though in other places I found that one should/can specify the Iptc.Envelope.CharacterSet to specify the character set used. This appears to be ignored by digiKam... c) Question about Xmp "lang" One thing I still do not understand is the lang="..." attribute in Xmp comments - what exactly is its meaning? Is it just to add multiple entries using different languages? Does this affect encoding at all or is it really always UTF8 ? Thank you very much Matt _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Free forum by Nabble | Edit this page |