[Bug 195508] New: UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] New: UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

           Summary: UTF-8 characters in XMP should be synced to IPTC after
                    conversion to printable ASCII
           Product: digikam
           Version: 0.10.0
          Platform: Ubuntu Packages
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: general
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]


Version:           0.10.0 (using KDE 4.2.2)
OS:                Linux
Installed from:    Ubuntu Packages

The original IPTC standard allows only printable ASCII characters.

When using UTF-8 characters in Digikam (e.g. author, copyright, keywords),
these are synced to IPTC wrongly - majority of unknown characters are replaced
by a question mark, while some characters still survive (I assume those defined
in ISO-8859-1 / Latin1 set).

I would assume that non-ASCII text should be transliterated to ASCII
equivalent, if possible.

See the screenshot here:
http://www.milan-knizek.net/files/tmp/digikam_01.png

It shows both UTF-8 console and Digikam output and also the iconv command for
transliteration.

(Ignore the repeated keyword "Kašpárek" in IPTC displayed by Digikam, this
seems to be another bug reported by someone else earlier.)

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Gilles Caulier-4
https://bugs.kde.org/show_bug.cgi?id=195508


Gilles Caulier <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|general                     |Metadata




--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #1 from Gilles Caulier <caulier gilles gmail com>  2009-06-07 10:21:57 ---
Milan,

This is the code :

http://lxr.kde.org/source/KDE/kdegraphics/libs/libkexiv2/libkexiv2/kexiv2iptc.cpp#357

The constraint is below :

QString::toAscii() : http://doc.trolltech.com/4.5/qstring.html#toAscii

It's QT4 API.

Gilles Caulier

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Bugzilla from mikmach@wp.pl
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #2 from Mikolaj Machowski <mikmach wp pl>  2009-06-07 12:55:35 ---
According to Metadata Working Group guidelines data should be written back to
IPTC in UTF-8.

http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
page 28:
 If the IPTC-IIM has not been written in UTF-8 before, a robust Changer SHOULD
convert all properties to UTF-8 and write the corresponding identifier for
UTF-8 to the 1:90 DataSet.

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Milan Knizek
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #3 from Milan Knizek <knizek volny cz>  2009-06-07 21:47:42 ---
Gilles,

thanks for the explanation.

Not being a programmer, I assume that it would be easier to change Digikam to
use UTF-8 for IPTC as proposed by Mikolaj, than to change the Qt4 API.

In the meantime, I stick with pure ASCII text in XMP, since I want to have it
synced with IPTC, at least for the foreseeable future.

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Michal Thoma
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508


Michal Thoma <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--- Comment #4 from Michal Thoma <michal thoma cz>  2009-06-13 23:03:55 ---
Not being a programmer but in linked qt4 doc I read:

--
If a codec has been set using QTextCodec::setCodecForCStrings(), it is used to
convert Unicode to 8-bit char; otherwise this function does the same as
toLatin1().
--

This seems it's happening - instead of to ASCII it converts chars to Latin1
(and thus leave some characters illegal in IPTC).

My RAW converter RawThepraee crashes because of illegal chars present in IPTC
fields... Hope to see this resolved somehow.

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Bugzilla from dani@zubinet.org
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508


Daniel Zuberbühler <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Bugzilla from jhaugex@online.no
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508


Jostein Hauge <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--- Comment #5 from Jostein Hauge <jhaugex online no>  2010-03-04 22:16:16 ---
I can add that this issue create problems when exporting pictures from Digikam
to Gallery (http://gallery.menalto.com/). The tags containing non-english
characters becomes corrupt.

Is it this bug which is causing the problem, or is it Gallery who should accept
utf-8 encoded IPTC?

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508


Gilles Caulier <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--- Comment #6 from Gilles Caulier <caulier gilles gmail com>  2010-03-05 08:29:47 ---
Definitively, IPTC do not accept UTF-8. Use XMP instead which support it.

Gilles Caulier

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Milan Knizek
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #7 from Milan Knizek <knizek volny cz>  2010-03-05 21:50:59 ---
The trouble is that the UTF-8 strings are converted to Latin1 and some
characters are corrupted. This does not seem to be a bug in QT4, it is a
feature of the above mentioned function.

Is it possible to use some other convert-to-7bit-ascii function, which takes
care about transliteration like iconv?

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Bugzilla from jhaugex@online.no
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #8 from Jostein Hauge <jhaugex online no>  2010-03-06 01:21:45 ---
While waiting for a real solution, is there any easy way to make a script that
convert the strings to ascii without loosing the non-english characters?

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Bugzilla from timid3000@gmail.com
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508


Kévin FERRARE <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--- Comment #9 from Kévin FERRARE <timid3000 gmail com>  2011-01-25 07:02:15 ---
IPTC can support UTF-8 with the CodedCharacterSet tag

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508





--- Comment #10 from Gilles Caulier <caulier gilles gmail com>  2011-01-25 07:34:19 ---
No. IPTC do not support UTF8 officially in specification. XMP do it. It's not
the same... This is why XMP have been created by Adobe (it's not the only
problem of course, as string char limitation in IPTC).

Gilles Caulier

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Marcel Wiesweg
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

Marcel Wiesweg <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #11 from Marcel Wiesweg <[hidden email]> ---
Coming back to this file, there are some questions for Andreas:
Indeed exiv2 seems to be doing some charset detection in the IPTC
implementation, with detectCharset returning "UTF-8" or "ASCII".
- are the returned std::strings from the ITPCData in this encoding?
- what would a return value of 0 tell us?
- writing: need the std::strings added to IPTC data expected to be in the same
encoding
- is there a way to set/convert the encoding, possibly with the Coded Character
Set 1:90 tag as mentioned in the MWG guidance or is this left to the
application (read all strings, convert them, set the
"Iptc.Envelope.CharacterSet" to the cryptic "\033%G" value what ever that is)
(I believe we dont want to do that though, but write IPTC as 7bit ASCII
everywhere)

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

--- Comment #12 from Gilles Caulier <[hidden email]> ---
Andreas,

Do you see the previous comment from Marcel ?

Gilles Caulier

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] Syncing IPTC with UTF-8 characters from XMP after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

Gilles Caulier <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|UTF-8 characters in XMP     |Syncing IPTC with UTF-8
                   |should be synced to IPTC    |characters from XMP after
                   |after conversion to         |conversion to printable
                   |printable ASCII             |ASCII

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] Syncing IPTC with UTF-8 characters from XMP after conversion to printable ASCII

Gilles Caulier-4
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

Gilles Caulier <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #13 from Gilles Caulier <[hidden email]> ---
Alan,

We miss a feedback from Andreas in this file. See question from Marcel on
comment #11

thanks in advance

Gilles

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] Syncing IPTC with UTF-8 characters from XMP after conversion to printable ASCII

Alan Pater
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

--- Comment #14 from Alan Pater <[hidden email]> ---
I can't answer for Andreas, but my understanding is that UTF-8 is allowed and
optional in IPTC-IIM. My own tests within exiv2 show that unicode characters
are preserved when syncing between XMP and IPTC. I probably missed some cases
though, as I was not explicitly looking for cases where it did not. I don't
think converting is needed. If unicode exists in XMP,  it can be preserved in
IPTC.

This is way over my head technically, but the IPTC spec (version 3, October
1995) says:

1:90 Coded Character Set
Optional, not repeatable, up to 32 octets, consisting of the
escape control character, and graphic characters.
One or more escape sequences for the announcement of the
code extension facilities used in the data which follows, for the
initial designation of the G0, G1, G2 and G3 graphic character
sets and the initial invocation of the graphic set (7 bits) or the
left-hand and the right-hand graphic set (8 bits) and for the initial
invocation of the C0 (7 bits) or of the C0 and the C1 control
character sets (8 bits) in use for data fields in records 2-6 and 8.
Follows the ISO 2022 standard. The recognised graphic
repertoire and control function repertoire are listed in Appendix
C.
The announcement of the code extension facilities, if
transmitted, must appear in this data set. Designation and
invocation of graphic and control function sets (shifting) may be
transmitted anywhere where the escape and the other
necessary control characters are permitted. However, it is
recommended to transmit in this data set an initial designation
and invocation, i.e. to define all designations and the shift status
currently in use by transmitting the appropriate escape
sequences and locking-shift functions.
If 1:90 is omitted, the default for records 2-6 and 8 is ISO 646
IRV (7 bits) or ISO 4873 DV (8 bits). Record 1 shall always use
ISO 646 IRV or ISO 4873 DV respectively.

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] HUB : Syncing IPTC with UTF-8 characters from XMP after conversion to printable ASCII

bugzilla_noreply
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Metadata                    |Metadata-Hub
            Summary|Syncing IPTC with UTF-8     |HUB : Syncing IPTC with
                   |characters from XMP after   |UTF-8 characters from XMP
                   |conversion to printable     |after conversion to
                   |ASCII                       |printable ASCII

--- Comment #15 from [hidden email] ---
This entry is illegible for GSoC 2016 project  :

https://community.kde.org/GSoC/2016/Ideas#Project:_digiKam_MetadataHub_improvements

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 195508] HUB : Syncing IPTC with UTF-8 characters from XMP after conversion to printable ASCII

bugzilla_noreply
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |wishlist

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel