digiKam › digikam-users

[digiKam-users] ImageDescription field

Classic

List

Threaded

6 messages Options

meku

[digiKam-users] ImageDescription field

I discovered that my UTF captions appear to be corrupted using Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.

Using exiv2 command line it appears I can write UTF caption to this field.

I tried loading up Digikam-6.0.0 and it appears to ignore the field when writing, even though the default settings in Metadata>Advanced are set to write to the field.

Is this an issue with Digikam or is this a limitation of EXIF?

Gilles Caulier-4

Re: ImageDescription field

Hi,

Exif do not support any special character encoding, as UTF8

Using XMP is the right way.

Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.

Gilles Caulier

2018-04-19 9:43 GMT+02:00 meku <[hidden email]>:

I discovered that my UTF captions appear to be corrupted using Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.

Using exiv2 command line it appears I can write UTF caption to this field.

I tried loading up Digikam-6.0.0 and it appears to ignore the field when writing, even though the default settings in Metadata>Advanced are set to write to the field.

Is this an issue with Digikam or is this a limitation of EXIF?

meku

Re: ImageDescription field

Strange exiv2 commandline appears to work, eg:

exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG

I filed a bug for Exif.Image.ImageDescription field not updating in DK6, https://bugs.kde.org/show_bug.cgi?id=393283

On 19 April 2018 at 17:48, Gilles Caulier <[hidden email]> wrote:

Hi,

Exif do not support any special character encoding, as UTF8

Using XMP is the right way.

Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.

Gilles Caulier

2018-04-19 9:43 GMT+02:00 meku <[hidden email]>:
I discovered that my UTF captions appear to be corrupted using Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.

Using exiv2 command line it appears I can write UTF caption to this field.

I tried loading up Digikam-6.0.0 and it appears to ignore the field when writing, even though the default settings in Metadata>Advanced are set to write to the field.

Is this an issue with Digikam or is this a limitation of EXIF?

Remco Viëtor

Re: ImageDescription field

On jeudi 19 avril 2018 10:43:41 CEST meku wrote:

> Strange exiv2 commandline appears to work, eg:
> exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG
>
> I filed a bug for Exif.Image.ImageDescription field not updating in DK6,
> https://bugs.kde.org/show_bug.cgi?id=393283
>
> On 19 April 2018 at 17:48, Gilles Caulier <[hidden email]> wrote:
> > Hi,
> >
> > Exif do not support any special character encoding, as UTF8
> >
> > Using XMP is the right way.
> >
> > Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.
> >
> > Gilles Caulier
> >
> > 2018-04-19 9:43 GMT+02:00 meku <[hidden email]>:
> >> I discovered that my UTF captions appear to be corrupted using
> >> Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.
> >>
> >> Using exiv2 command line it appears I can write UTF caption to this
> >> field.
> >>
> >> I tried loading up Digikam-6.0.0 and it appears to ignore the field when
> >> writing, even though the default settings in Metadata>Advanced are set to
> >> write to the field.
> >>
> >> Is this an issue with Digikam or is this a limitation of EXIF?

I think the key here is "appears" to work. According to the standard, EXIF
tags can only use (7-bit) ASCII characters, but that does not mean that
programs reading and writing the tags scrupulously respect that.

At least in standard C and C++, the easiest way is to grab the string the user
gives as a sequence of bytes, and write that to the metadata. And just read
the contents from the metadata as a sequence of bytes. All that without
worrying about the encoding... (which is not all that straightforward with
those languages). Somewhere there must be a translation to the encoding the
user wants, but that's not the problem of the library handling the metadata.

As long as there aren't any unexpected \000 bytes in such a sequence, that may
appear to work correctly, *as long as the same encoding is used on writing and
on reading*. But if the encodings for reading and writing differ, you'll get
garbled output, and *no* sure way to get the correct encoding (though you can
find an encoding that's 'close enough').

And changing the character encoding between reading and writing can happen
without the user realising it: a few years ago, my linux distro switched to
utf-8 as the default. But a lot of older files are in one of the ISO encodings.
Result: those appear garbled for any character outside the ASCII range. And it
might get even worse between operating systems

Personally, I think it might be a good thing if Digikam 6 refuses to write
non-ascii data to Exif tags, provided the information can get written to
corresponding XMP tags (which afaik is always possible).

Remco

Gilles Caulier-4

Re: ImageDescription field

2018-04-19 11:27 GMT+02:00 Remco Viëtor <[hidden email]>:

On jeudi 19 avril 2018 10:43:41 CEST meku wrote:
> Strange exiv2 commandline appears to work, eg:
> exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG
>
> I filed a bug for Exif.Image.ImageDescription field not updating in DK6,
> https://bugs.kde.org/show_bug.cgi?id=393283
>
> On 19 April 2018 at 17:48, Gilles Caulier <[hidden email]> wrote:
> > Hi,
> >
> > Exif do not support any special character encoding, as UTF8
> >
> > Using XMP is the right way.
> >
> > Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.
> >
> > Gilles Caulier
> >
> > 2018-04-19 9:43 GMT+02:00 meku <[hidden email]>:
> >> I discovered that my UTF captions appear to be corrupted using
> >> Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.
> >>
> >> Using exiv2 command line it appears I can write UTF caption to this
> >> field.
> >>
> >> I tried loading up Digikam-6.0.0 and it appears to ignore the field when
> >> writing, even though the default settings in Metadata>Advanced are set to
> >> write to the field.
> >>
> >> Is this an issue with Digikam or is this a limitation of EXIF?

I think the key here is "appears" to work. According to the standard, EXIF
tags can only use (7-bit) ASCII characters, but that does not mean that
programs reading and writing the tags scrupulously respect that.

At least in standard C and C++, the easiest way is to grab the string the user
gives as a sequence of bytes, and write that to the metadata. And just read
the contents from the metadata as a sequence of bytes. All that without
worrying about the encoding... (which is not all that straightforward with
those languages). Somewhere there must be a translation to the encoding the
user wants, but that's not the problem of the library handling the metadata.

As long as there aren't any unexpected \000 bytes in such a sequence, that may
appear to work correctly, *as long as the same encoding is used on writing and
on reading*. But if the encodings for reading and writing differ, you'll get
garbled output, and *no* sure way to get the correct encoding (though you can
find an encoding that's 'close enough').

The metadata encapsulating is already in digiKam source code here :

https://cgit.kde.org/digikam.git/tree/core/libs/dmetadata/metaengine_p.cpp#n390

Gilles Caulier

meku

Re: ImageDescription field

In reply to this post by Remco Viëtor

The caption saved with exiv2 was readable on the commandline AND in Digikam, but saving in Digikam would replace the caption with junk.

If Digikam 6 refuses to write non-ascii data to Exif maybe that is not such a bad thing, but it must be careful to DELETE the old Exif data in this case - otherwise you can end up with old captions or other data hanging around in Exif when the user expected it to be overwritten.

On 19 April 2018 at 19:27, Remco Viëtor <[hidden email]> wrote:

On jeudi 19 avril 2018 10:43:41 CEST meku wrote:
> Strange exiv2 commandline appears to work, eg:
> exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG
>
> I filed a bug for Exif.Image.ImageDescription field not updating in DK6,
> https://bugs.kde.org/show_bug.cgi?id=393283
>
> On 19 April 2018 at 17:48, Gilles Caulier <[hidden email]> wrote:
> > Hi,
> >
> > Exif do not support any special character encoding, as UTF8
> >
> > Using XMP is the right way.
> >
> > Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.
> >
> > Gilles Caulier
> >
> > 2018-04-19 9:43 GMT+02:00 meku <[hidden email]>:
> >> I discovered that my UTF captions appear to be corrupted using
> >> Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.
> >>
> >> Using exiv2 command line it appears I can write UTF caption to this
> >> field.
> >>
> >> I tried loading up Digikam-6.0.0 and it appears to ignore the field when
> >> writing, even though the default settings in Metadata>Advanced are set to
> >> write to the field.
> >>
> >> Is this an issue with Digikam or is this a limitation of EXIF?

I think the key here is "appears" to work. According to the standard, EXIF
tags can only use (7-bit) ASCII characters, but that does not mean that
programs reading and writing the tags scrupulously respect that.

At least in standard C and C++, the easiest way is to grab the string the user
gives as a sequence of bytes, and write that to the metadata. And just read
the contents from the metadata as a sequence of bytes. All that without
worrying about the encoding... (which is not all that straightforward with
those languages). Somewhere there must be a translation to the encoding the
user wants, but that's not the problem of the library handling the metadata.

As long as there aren't any unexpected \000 bytes in such a sequence, that may
appear to work correctly, *as long as the same encoding is used on writing and
on reading*. But if the encodings for reading and writing differ, you'll get
garbled output, and *no* sure way to get the correct encoding (though you can
find an encoding that's 'close enough').

And changing the character encoding between reading and writing can happen
without the user realising it: a few years ago, my linux distro switched to
utf-8 as the default. But a lot of older files are in one of the ISO encodings.
Result: those appear garbled for any character outside the ASCII range. And it
might get even worse between operating systems

Personally, I think it might be a good thing if Digikam 6 refuses to write
non-ascii data to Exif tags, provided the information can get written to
corresponding XMP tags (which afaik is always possible).

Remco