Database migration error due to 4byte UTF in filename

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Database migration error due to 4byte UTF in filename

Brian J Hoskins-2
Digikam experts,

I have encountered an error in database migration (SQLite to MySQL) which I would like to share.

During the database migration, when the "Copy Images..." step is progressing, I receive the following complaint from the migration tool:

-----
Error while converting the database.
Details: Incorrect string value: '\xF0\x9F\x92\xA6\xF0\x9F...' for column 'name' at row 1
-----

The complaint suggested a 4-byte UTF character F09F92A6.  I looked that up and it's an Emoji (:sweat_drops:) character. See below:

http://www.fileformat.info/info/unicode/char/1f4a6/index.htm

This is what happens when you allow children to name files.

I am wondering if this error occurs because the MySQL database is limited to three byte UTF characters.  If so, my suggestions are:

* Modify the database schema to allow 4 byte UTF characters
OR...
* Create a more user-friendly error message for this event (most users probably not able to decipher it).

I would imagine this is a rare error among Digikam users.  It's possible I'm the only one who has encountered it.  So low priority for sure.

Thanks,

Brian.

Reply | Threaded
Open this post in threaded view
|

Re: Database migration error due to 4byte UTF in filename

Gilles Caulier-4
UTF8 characters encoding is supported by SQL schema for Mysql and Sqlite.

This want mean that each character can be encoded from 1 to 4 bytes.

So i think that your problem is not with the database support as well.

First point, the port to Qt4 to Qt5 as fixed a ASCII latin-1 encoding in Mysql to UTF8. This is done with 5.0.0 release. So the first check is to look in your tables properties if UTF8 is used properly.

Second point is the support of extra characters as Emoji as been completed with Unicode 9.0 (june 2016) in norm and i'm not sure if older sql was compatible with it.



In fact this depend of your emoji char value which must exist in specific Unicode version.

To resume :

1/ I agree that Emoji must be deprecated in file paths.
2/ The error reported by digiKam must be improved. Please open a file in bugzilla about this topic, in database section.
3/ The digiKam handbook must annotated a section to explain the database restriction with char encoding (and certainly other of course).

Gilles Caulier




2017-01-08 11:52 GMT+01:00 Brian J Hoskins <[hidden email]>:
Digikam experts,

I have encountered an error in database migration (SQLite to MySQL) which I would like to share.

During the database migration, when the "Copy Images..." step is progressing, I receive the following complaint from the migration tool:

-----
Error while converting the database.
Details: Incorrect string value: '\xF0\x9F\x92\xA6\xF0\x9F...' for column 'name' at row 1
-----

The complaint suggested a 4-byte UTF character F09F92A6.  I looked that up and it's an Emoji (:sweat_drops:) character. See below:

http://www.fileformat.info/info/unicode/char/1f4a6/index.htm

This is what happens when you allow children to name files.

I am wondering if this error occurs because the MySQL database is limited to three byte UTF characters.  If so, my suggestions are:

* Modify the database schema to allow 4 byte UTF characters
OR...
* Create a more user-friendly error message for this event (most users probably not able to decipher it).

I would imagine this is a rare error among Digikam users.  It's possible I'm the only one who has encountered it.  So low priority for sure.

Thanks,

Brian.