Hello, Digikam handbook says that captions are UTF-8 compatible (see the text extract below) however I found some pictures with "Ìåäâåäü ïîñë" simbols that do not make sense to me.Extract from the handbook: >>2.2.6.2 Comment View >>The caption view can be used to type or paste in a caption of unlimited size (see note below). >>The text is UTF-8 compatible, meaning that all special characters are allowed. ... >>With digiKam you can enter unlimited amounts of text using internationalized alphabet (UTF-8) as caption. Best regards, Andrey Goreev |
On mardi 17 janvier 2017 06:26:46 CET Andrey Goreev wrote:
> Hello, > > Digikam handbook says that captions are UTF-8 compatible (see the text > extract below) however I found some pictures with "Ìåäâåäü ïîñë" > simbols that do not make sense to me. > I went through all the settings and did not notice any related setting. > Please advise if I am missing something. > That fragment looks like you had some UTF-8 text in the captions that is then displayed by a program that does not understand UTF-8 (but uses an 8-bit character set). A series like that could be a text fragment in a non-latin script, where each '�" stands for a letter. Note that, while Digikam can handle UTF-8, only XPM tags can store it, IPTC and EXIF are limited to 8 bits/ char (and thus would give the kind of strings you quoted). I can't be more precise, as you don't specify where you saw that text (within Digikam's caption editor, its metadata viewer, or using an external tag viewer/editor, or ...). Remco |
Hello Remco, Digikam shows the caption under the thumbnail as well as in the right panel:Properties -> digiKam properties/Caption Metadata -> EXIF/Image Description; IPTC/Caption (IPTC/Character Set shows UTF-8); XMP/Description, XMP/User comment; XMP/Image description; Captions -> Description/Captions [File] Comment : ├â┬Ø├â┬¼├â┬« [IFD0] ImageDescription : ├Ø├¼├« [ExifIFD] UserComment : ├â┬Ø├â┬¼├â┬« [XMP-tiff] ImageDescription : ├â┬Ø├â┬¼├â┬« [XMP-exif] UserComment : ├â┬Ø├â┬¼├â┬« [XMP-acdsee] Notes : ├â┬Ø├â┬¼├â┬« [XMP-dc] Description : ├â┬Ø├â┬¼├â┬« [IPTC] Caption-Abstract : ├â┬Ø├â┬¼├â┬« Best regards, Andrey Goreev On Tue, Jan 17, 2017 at 8:13 AM, Remco Viëtor <[hidden email]> wrote: On mardi 17 janvier 2017 06:26:46 CET Andrey Goreev wrote: |
On mardi 17 janvier 2017 08:48:30 CET Andrey Goreev wrote:
> Hello Remco, > > Digikam shows the caption under the thumbnail as well as in the right panel: > Properties -> digiKam properties/Caption > Metadata -> EXIF/Image Description; IPTC/Caption (IPTC/Character Set shows > UTF-8); XMP/Description, XMP/User comment; XMP/Image description; > Captions -> Description/Captions > I know where to find the captions within Digikam. What wasn't clear to me is where _you_ saw that mutilated utf-8. > Here is an extract from the output of ExifTool -a -G1 -s command: > > [File] Comment : ├â┬Ø├â┬¼├â┬« > > [IFD0] ImageDescription : ├Ø├¼├« > > [ExifIFD] UserComment : ├â┬Ø├â┬¼├â┬« > > [XMP-tiff] ImageDescription : ├â┬Ø├â┬¼├â┬« > > [XMP-exif] UserComment : ├â┬Ø├â┬¼├â┬« > > [XMP-acdsee] Notes : ├â┬Ø├â┬¼├â┬« > > [XMP-dc] Description : ├â┬Ø├â┬¼├â┬« > > [IPTC] Caption-Abstract : ├â┬Ø├â┬¼├â┬« > Even stranger: this doesn't even look like the original string you posted, almost as if your terminal uses something like the IBM850 codepage. So what seems to have happened: somewhere in your chain, an utf-8 string was interpreted using an 8-bit char encoding. And it looks like your terminal does the same thing... To give you an idea what I'm talking about (hoping the strings pass...) UTF-8 string: æâ¢az#&ˇÉÉŠ same coded as cp-8859-15: Êââaz#&ËÃÃÅ same coded as cp-1254: æâ¢az#&ˇÉÉŠsame coded as IBM850: ├ª├ó├é┬óaz#&╦ç├ë├ë┼á (the last three are different codepages, or different ways to assign char glyphs to 8-bit values, the standard before utf-8 became more or less generally used). Note that the 4 ASCII chars in the middle (az#&) survive intact: those are coded on 7 bits, and utf-8 uses the same encoding as ASCII for the first 127 characters. After that, the codes differ (utf-8 can use up to 4 bytes per character, iirc). Note that all of these examples use the exact same bytes, just interpreted differently... (this would be even more striking with the utf-8 text in cyrillic or greek alphabet, but I don't have such a keyboard handy) |
Remco, Normally I would paste æâ¢az#&ˇÉÉŠ to a text file, view it (F3) in Total Commander or a similar program, change codepage and get the normal text. It did not work this time. Well, since there are not so many images in my library with such issue and I don't really care about the captions I guess I will just delete the symbols and move on. Thank you for your help anyways! Best regards, Andrey Goreev |
Free forum by Nabble | Edit this page |