https://bugs.kde.org/show_bug.cgi?id=262452
Summary: duplicate uniqueHash (image hash) in database, wrong thumb on images Product: digikam Version: 1.7.0 Platform: Ubuntu Packages OS/Version: Linux Status: UNCONFIRMED Severity: normal Priority: NOR Component: Database AssignedTo: [hidden email] ReportedBy: [hidden email] Version: 1.7.0 (using KDE 4.4.5) OS: Linux One raw file processed multiple times by ufraw, output as tifs with different names. These images are very different renditions visually. Also they all have different md5sums when running md5sum at the command line. In the digikam database, most of the renditions have the wrong thumb. So I created a test database with only 8 images, 2 raw files, one tiff from one of the raw files, several tiffs (visually very different renditions from each other) from the other raw file, and one jpeg from the raw file (probably not produced by ufraw). In digikam4.db there are 8 entries in the Images table, 5 of which have the same uniqueHash. in thumbnails-digikam.db there are only 4 thumbs. Right-clicking on the thumbs and selecting "edit" does open the correct image file, as does opening the preview. So I used ufraw to produce 3 tifs and 2 jpegs from the other raw file. The jpegs got different uniqueHashes, the tifs all share the same uniqueHash, giving me 13 images in the database, and only 7 uniqueHashes. Reproducible: Always Steps to Reproduce: Put a raw file into a directory. Open the raw file with ufraw. produce a tif. do it a couple more times, make the images look wildly different, so there is no question that the images are not the same. Save each time under a different name. Then open digikam and rescan the directory (or import a new collection if a different root). Actual Results: Use SQLite database browser to inspect the digikam data and thumbs databases. You'll see an entry in the images table for each tif, but they'll all share the same uniqueHash. Initially the images may or may not have different thumbs, but play around, the thumbs will collapse, so that all the images with the same uniqueHash now have the same thumb. Expected Results: I'd expect each tif-rendition/version of the original raw file, saved under different names, would have truly unique uniqueHashes, and would have their own correct thumbs. jpegs from ufraw don't seem to have this problem. I haven't checked other tif-producing software (but I will). Using exiftool to inspect a couple of the ufraw-produced tifs, it looks like ufraw 0.16 copies all the raw file data over to the tiff, so all the metadata in the two images looks (upon quick glance) to be identical. If uniqueHash is depending on metadata to generate uniqueHashes, then that could be the source of the problem. As md5 of itself is subject to hash collisions, it seems to me that in a large image database, using only a part of the image to calculate md5 hashes is not such a good idea, even apart from the current issue. As already stated, the actual md5 hashes of the images, as calculated by md5sum at the command line, are all different. (Probably a move to sha1 (over the whole image) would be overkill. And probably I don't know enough about hashes to even make these statements.) -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
https://bugs.kde.org/show_bug.cgi?id=262452
Marcel Wiesweg <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution| |DUPLICATE --- Comment #1 from Marcel Wiesweg <marcel wiesweg gmx de> 2011-01-07 23:39:33 --- Thanks a lot for your research, indeed this is a problem, known and solved (for the future). 1) This happens usually with TIFF images without metadata. The header of such files contains several kilobytes of (pretty useless) line offsets. I have not seen a JPEG which is affected 2) Computing the hash over the whole file is a major performance problem - scanning would take much longer. The old hash covered 99.9% of cases, we'll see what the new algorithm brings. 3) Some other problems in context of renaming are probably unrelated *** This bug has been marked as a duplicate of bug 210353 *** -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452
--- Comment #2 from Elle Stone <l elle stone gmail com> 2011-01-08 00:03:42 --- Hi Marcel, Regarding, "This is a problem, known and solved (for the future). 1) This happens usually with TIFF images without metadata." In fact the affected images, tiffs output by UFRaw 0.16 and 0.17, have a LOT of metadata, all the metadata that was in the raw file (.cr2). If one were to use exiftool to add eg copyright information, keywords, contact information, location, etc.to one's raw files (which I do, in fact) there could be a whole lot of metadata in a raw file. Suspecting that a wealth of metadata could be the problem, I used exiftool to strip out all the metadata in the UFRaw-produced tiffs, and when I added the stripped tiffs to the digikam database, the stripped tiffs all had unique hashes and proper thumbs. Is the future solved bug version of digikam available somewhere? Elle Stone On 1/7/11, Marcel Wiesweg <[hidden email]> wrote: > https://bugs.kde.org/show_bug.cgi?id=262452 > > > Marcel Wiesweg <[hidden email]> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |RESOLVED > Resolution| |DUPLICATE > > > > > --- Comment #1 from Marcel Wiesweg <marcel wiesweg gmx de> 2011-01-07 > 23:39:33 --- > Thanks a lot for your research, indeed this is a problem, known and solved > (for > the future). > > 1) This happens usually with TIFF images without metadata. The header of > such > files contains several kilobytes of (pretty useless) line offsets. I have > not > seen a JPEG which is affected > > 2) Computing the hash over the whole file is a major performance problem - > scanning would take much longer. The old hash covered 99.9% of cases, we'll > see > what the new algorithm brings. > > 3) Some other problems in context of renaming are probably unrelated > > *** This bug has been marked as a duplicate of bug 210353 *** > > -- > Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. > You reported the bug. > -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452
Gilles Caulier <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #3 from Gilles Caulier <caulier gilles gmail com> 2011-01-08 10:16:59 --- Elle, Because Marcel work current on Google Summer of Code 2010 branch, i think it's fixed to 2.0.0 Gilles -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452
--- Comment #4 from Elle Stone <l elle stone gmail com> 2011-01-08 13:21:35 --- Gilles, thanks. Can 2.0.0 be run alongside rather than in place of current digikam? Elle On 1/8/11, Gilles Caulier <[hidden email]> wrote: > https://bugs.kde.org/show_bug.cgi?id=262452 > > > Gilles Caulier <[hidden email]> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |[hidden email] > > > > > --- Comment #3 from Gilles Caulier <caulier gilles gmail com> 2011-01-08 > 10:16:59 --- > Elle, > > Because Marcel work current on Google Summer of Code 2010 branch, i think > it's > fixed to 2.0.0 > > Gilles > > -- > Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. > You reported the bug. > -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452
--- Comment #5 from Marcel Wiesweg <marcel wiesweg gmx de> 2011-01-08 15:42:53 --- 1.x does not know the new hash, so it will not open the database once you converted it to use the new hash with 2.0. You need to convert explicitly for this reason, there is an Update button at the bottom of the Database panel in the Settings dialog. Without this conversion, both version can operate on the same db, but your problem is not fixed. -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Database |Database-Thumbs -- You are receiving this mail because: You are the assignee for the bug. |
Free forum by Nabble | Edit this page |