https://bugs.kde.org/show_bug.cgi?id=195508
Summary: UTF-8 characters in XMP should be synced to IPTC after conversion to printable ASCII Product: digikam Version: 0.10.0 Platform: Ubuntu Packages OS/Version: Linux Status: UNCONFIRMED Severity: normal Priority: NOR Component: general AssignedTo: [hidden email] ReportedBy: [hidden email] Version: 0.10.0 (using KDE 4.2.2) OS: Linux Installed from: Ubuntu Packages The original IPTC standard allows only printable ASCII characters. When using UTF-8 characters in Digikam (e.g. author, copyright, keywords), these are synced to IPTC wrongly - majority of unknown characters are replaced by a question mark, while some characters still survive (I assume those defined in ISO-8859-1 / Latin1 set). I would assume that non-ASCII text should be transliterated to ASCII equivalent, if possible. See the screenshot here: http://www.milan-knizek.net/files/tmp/digikam_01.png It shows both UTF-8 console and Digikam output and also the iconv command for transliteration. (Ignore the repeated keyword "Kašpárek" in IPTC displayed by Digikam, this seems to be another bug reported by someone else earlier.) -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
https://bugs.kde.org/show_bug.cgi?id=195508
Gilles Caulier <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|general |Metadata -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #1 from Gilles Caulier <caulier gilles gmail com> 2009-06-07 10:21:57 --- Milan, This is the code : http://lxr.kde.org/source/KDE/kdegraphics/libs/libkexiv2/libkexiv2/kexiv2iptc.cpp#357 The constraint is below : QString::toAscii() : http://doc.trolltech.com/4.5/qstring.html#toAscii It's QT4 API. Gilles Caulier -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #2 from Mikolaj Machowski <mikmach wp pl> 2009-06-07 12:55:35 --- According to Metadata Working Group guidelines data should be written back to IPTC in UTF-8. http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf page 28: If the IPTC-IIM has not been written in UTF-8 before, a robust Changer SHOULD convert all properties to UTF-8 and write the corresponding identifier for UTF-8 to the 1:90 DataSet. -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #3 from Milan Knizek <knizek volny cz> 2009-06-07 21:47:42 --- Gilles, thanks for the explanation. Not being a programmer, I assume that it would be easier to change Digikam to use UTF-8 for IPTC as proposed by Mikolaj, than to change the Qt4 API. In the meantime, I stick with pure ASCII text in XMP, since I want to have it synced with IPTC, at least for the foreseeable future. -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Michal Thoma <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #4 from Michal Thoma <michal thoma cz> 2009-06-13 23:03:55 --- Not being a programmer but in linked qt4 doc I read: -- If a codec has been set using QTextCodec::setCodecForCStrings(), it is used to convert Unicode to 8-bit char; otherwise this function does the same as toLatin1(). -- This seems it's happening - instead of to ASCII it converts chars to Latin1 (and thus leave some characters illegal in IPTC). My RAW converter RawThepraee crashes because of illegal chars present in IPTC fields... Hope to see this resolved somehow. -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Daniel Zuberbühler <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Jostein Hauge <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #5 from Jostein Hauge <jhaugex online no> 2010-03-04 22:16:16 --- I can add that this issue create problems when exporting pictures from Digikam to Gallery (http://gallery.menalto.com/). The tags containing non-english characters becomes corrupt. Is it this bug which is causing the problem, or is it Gallery who should accept utf-8 encoded IPTC? -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Gilles Caulier <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #6 from Gilles Caulier <caulier gilles gmail com> 2010-03-05 08:29:47 --- Definitively, IPTC do not accept UTF-8. Use XMP instead which support it. Gilles Caulier -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #7 from Milan Knizek <knizek volny cz> 2010-03-05 21:50:59 --- The trouble is that the UTF-8 strings are converted to Latin1 and some characters are corrupted. This does not seem to be a bug in QT4, it is a feature of the above mentioned function. Is it possible to use some other convert-to-7bit-ascii function, which takes care about transliteration like iconv? -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #8 from Jostein Hauge <jhaugex online no> 2010-03-06 01:21:45 --- While waiting for a real solution, is there any easy way to make a script that convert the strings to ascii without loosing the non-english characters? -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Kévin FERRARE <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #9 from Kévin FERRARE <timid3000 gmail com> 2011-01-25 07:02:15 --- IPTC can support UTF-8 with the CodedCharacterSet tag -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #10 from Gilles Caulier <caulier gilles gmail com> 2011-01-25 07:34:19 --- No. IPTC do not support UTF8 officially in specification. XMP do it. It's not the same... This is why XMP have been created by Adobe (it's not the only problem of course, as string char limitation in IPTC). Gilles Caulier -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Marcel Wiesweg <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #11 from Marcel Wiesweg <[hidden email]> --- Coming back to this file, there are some questions for Andreas: Indeed exiv2 seems to be doing some charset detection in the IPTC implementation, with detectCharset returning "UTF-8" or "ASCII". - are the returned std::strings from the ITPCData in this encoding? - what would a return value of 0 tell us? - writing: need the std::strings added to IPTC data expected to be in the same encoding - is there a way to set/convert the encoding, possibly with the Coded Character Set 1:90 tag as mentioned in the MWG guidance or is this left to the application (read all strings, convert them, set the "Iptc.Envelope.CharacterSet" to the cryptic "\033%G" value what ever that is) (I believe we dont want to do that though, but write IPTC as 7bit ASCII everywhere) -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #12 from Gilles Caulier <[hidden email]> --- Andreas, Do you see the previous comment from Marcel ? Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Gilles Caulier <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|UTF-8 characters in XMP |Syncing IPTC with UTF-8 |should be synced to IPTC |characters from XMP after |after conversion to |conversion to printable |printable ASCII |ASCII -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
Gilles Caulier <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #13 from Gilles Caulier <[hidden email]> --- Alan, We miss a feedback from Andreas in this file. See question from Marcel on comment #11 thanks in advance Gilles -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
--- Comment #14 from Alan Pater <[hidden email]> --- I can't answer for Andreas, but my understanding is that UTF-8 is allowed and optional in IPTC-IIM. My own tests within exiv2 show that unicode characters are preserved when syncing between XMP and IPTC. I probably missed some cases though, as I was not explicitly looking for cases where it did not. I don't think converting is needed. If unicode exists in XMP, it can be preserved in IPTC. This is way over my head technically, but the IPTC spec (version 3, October 1995) says: 1:90 Coded Character Set Optional, not repeatable, up to 32 octets, consisting of the escape control character, and graphic characters. One or more escape sequences for the announcement of the code extension facilities used in the data which follows, for the initial designation of the G0, G1, G2 and G3 graphic character sets and the initial invocation of the graphic set (7 bits) or the left-hand and the right-hand graphic set (8 bits) and for the initial invocation of the C0 (7 bits) or of the C0 and the C1 control character sets (8 bits) in use for data fields in records 2-6 and 8. Follows the ISO 2022 standard. The recognised graphic repertoire and control function repertoire are listed in Appendix C. The announcement of the code extension facilities, if transmitted, must appear in this data set. Designation and invocation of graphic and control function sets (shifting) may be transmitted anywhere where the escape and the other necessary control characters are permitted. However, it is recommended to transmit in this data set an initial designation and invocation, i.e. to define all designations and the shift status currently in use by transmitting the appropriate escape sequences and locking-shift functions. If 1:90 is omitted, the default for records 2-6 and 8 is ISO 646 IRV (7 bits) or ISO 4873 DV (8 bits). Record 1 shall always use ISO 646 IRV or ISO 4873 DV respectively. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Metadata |Metadata-Hub Summary|Syncing IPTC with UTF-8 |HUB : Syncing IPTC with |characters from XMP after |UTF-8 characters from XMP |conversion to printable |after conversion to |ASCII |printable ASCII --- Comment #15 from [hidden email] --- This entry is illegible for GSoC 2016 project : https://community.kde.org/GSoC/2016/Ideas#Project:_digiKam_MetadataHub_improvements -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Milan Knizek
https://bugs.kde.org/show_bug.cgi?id=195508
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |wishlist -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
Free forum by Nabble | Edit this page |