[Bug 262452] New: duplicate uniqueHash (image hash) in database, wrong thumb on images

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] New: duplicate uniqueHash (image hash) in database, wrong thumb on images

Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452

           Summary: duplicate uniqueHash (image hash) in database, wrong
                    thumb on images
           Product: digikam
           Version: 1.7.0
          Platform: Ubuntu Packages
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: Database
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]


Version:           1.7.0 (using KDE 4.4.5)
OS:                Linux

One raw file processed multiple times by ufraw, output as tifs with different
names. These images are very different renditions visually. Also they all have
different md5sums when running md5sum at the command line.

In the digikam database, most of the renditions have the wrong thumb. So I
created a test database with only 8 images, 2 raw files, one tiff from one of
the raw files, several tiffs (visually very different renditions from each
other) from the other raw file, and one jpeg from the raw file (probably not
produced by ufraw). In digikam4.db there are 8 entries in the Images table, 5
of which have the same uniqueHash. in thumbnails-digikam.db there are only 4
thumbs.

Right-clicking on the thumbs and selecting "edit" does open the correct image
file, as does opening the preview.

So I used ufraw to produce 3 tifs and 2 jpegs from the other raw file. The
jpegs got different uniqueHashes, the tifs all share the same uniqueHash,
giving me 13 images in the database, and only 7 uniqueHashes.

Reproducible: Always

Steps to Reproduce:
Put a raw file into a directory. Open the raw file with ufraw. produce a tif.
do it a couple more times, make the images look wildly different, so there is
no question that the images are not the same. Save each time under a different
name. Then open digikam and rescan the directory (or import a new collection if
a different root).

Actual Results:  
Use SQLite database browser to inspect the digikam data and thumbs databases.
You'll see an entry in the images table for each tif, but they'll all share the
same uniqueHash. Initially the images may or may not have different thumbs, but
play around, the thumbs will collapse, so that all the images with the same
uniqueHash now have the same thumb.

Expected Results:  
I'd expect each tif-rendition/version of the original raw file, saved under
different names, would have truly unique uniqueHashes, and would have their own
correct thumbs.

jpegs from ufraw don't seem to have this problem. I haven't checked other
tif-producing software (but I will). Using exiftool to inspect a couple of the
ufraw-produced tifs, it looks like ufraw 0.16 copies all the raw file data over
to the tiff, so all the metadata in the two images looks (upon quick glance) to
be identical. If uniqueHash is depending on metadata to generate uniqueHashes,
then that could be the source of the problem.

As md5 of itself is subject to hash collisions, it seems to me that in a large
image database, using only a part of the image to calculate md5 hashes is not
such a good idea, even apart from the current issue. As already stated, the
actual md5 hashes of the images, as calculated by md5sum at the command line,
are all different. (Probably a move to sha1 (over the whole image) would be
overkill. And probably I don't know enough about hashes to even make these
statements.)

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

Marcel Wiesweg
https://bugs.kde.org/show_bug.cgi?id=262452


Marcel Wiesweg <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |DUPLICATE




--- Comment #1 from Marcel Wiesweg <marcel wiesweg gmx de>  2011-01-07 23:39:33 ---
Thanks a lot for your research, indeed this is a problem, known and solved (for
the future).

1) This happens usually with TIFF images without metadata. The header of such
files contains several kilobytes of (pretty useless) line offsets. I have not
seen a JPEG which is affected

2) Computing the hash over the whole file is a major performance problem -
scanning would take much longer. The old hash covered 99.9% of cases, we'll see
what the new algorithm brings.

3) Some other problems in context of renaming are probably unrelated

*** This bug has been marked as a duplicate of bug 210353 ***

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

Elle Stone
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452





--- Comment #2 from Elle Stone <l elle stone gmail com>  2011-01-08 00:03:42 ---
Hi Marcel,

Regarding, "This is a problem, known and solved (for the future). 1)
This happens usually with TIFF images without metadata."

In fact the affected images, tiffs output by UFRaw 0.16 and 0.17, have
a LOT of metadata, all the metadata that was in the raw file (.cr2).
If one were to use exiftool to add eg copyright information, keywords,
contact information, location, etc.to one's raw files (which I do, in
fact) there could be a whole lot of metadata in a raw file.

Suspecting that a wealth of metadata could be the problem, I used
exiftool to strip out all the metadata in the UFRaw-produced tiffs,
and when I added the stripped tiffs to the digikam database, the
stripped tiffs all had unique hashes and proper thumbs.

Is the future solved bug version of digikam available somewhere?

Elle Stone

On 1/7/11, Marcel Wiesweg <[hidden email]> wrote:

> https://bugs.kde.org/show_bug.cgi?id=262452
>
>
> Marcel Wiesweg <[hidden email]> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |RESOLVED
>          Resolution|                            |DUPLICATE
>
>
>
>
> --- Comment #1 from Marcel Wiesweg <marcel wiesweg gmx de>  2011-01-07
> 23:39:33 ---
> Thanks a lot for your research, indeed this is a problem, known and solved
> (for
> the future).
>
> 1) This happens usually with TIFF images without metadata. The header of
> such
> files contains several kilobytes of (pretty useless) line offsets. I have
> not
> seen a JPEG which is affected
>
> 2) Computing the hash over the whole file is a major performance problem -
> scanning would take much longer. The old hash covered 99.9% of cases, we'll
> see
> what the new algorithm brings.
>
> 3) Some other problems in context of renaming are probably unrelated
>
> *** This bug has been marked as a duplicate of bug 210353 ***
>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> You reported the bug.
>

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

Gilles Caulier-4
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452


Gilles Caulier <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]




--- Comment #3 from Gilles Caulier <caulier gilles gmail com>  2011-01-08 10:16:59 ---
Elle,

Because Marcel work current on Google Summer of Code 2010 branch, i think it's
fixed to 2.0.0

Gilles

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

Elle Stone
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452





--- Comment #4 from Elle Stone <l elle stone gmail com>  2011-01-08 13:21:35 ---
Gilles, thanks. Can 2.0.0 be run alongside rather than in place of
current digikam?

Elle

On 1/8/11, Gilles Caulier <[hidden email]> wrote:

> https://bugs.kde.org/show_bug.cgi?id=262452
>
>
> Gilles Caulier <[hidden email]> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |[hidden email]
>
>
>
>
> --- Comment #3 from Gilles Caulier <caulier gilles gmail com>  2011-01-08
> 10:16:59 ---
> Elle,
>
> Because Marcel work current on Google Summer of Code 2010 branch, i think
> it's
> fixed to 2.0.0
>
> Gilles
>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> You reported the bug.
>

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

Marcel Wiesweg
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452





--- Comment #5 from Marcel Wiesweg <marcel wiesweg gmx de>  2011-01-08 15:42:53 ---
1.x does not know the new hash, so it will not open the database once you
converted it to use the new hash with 2.0. You need to convert explicitly for
this reason, there is an Update button at the bottom of the Database panel in
the Settings dialog. Without this conversion, both version can operate on the
same db, but your problem is not fixed.

--
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 262452] duplicate uniqueHash (image hash) in database, wrong thumb on images

bugzilla_noreply
In reply to this post by Elle Stone
https://bugs.kde.org/show_bug.cgi?id=262452

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Database                    |Database-Thumbs

--
You are receiving this mail because:
You are the assignee for the bug.