|
Hi,
I am trying to write some code for a scripting language to extract the data for a given picture from the digikam database file. (I want to keep my raw format picture files unmodified by digikam, but sometimes I need to automatically convert some of them to jpeg files for export outside the digikam directory and need to extract the information needed for IPTC from the database. So I need to identify the id of a picture in the database.) Unfortunately the algorithm to calculate the uniqueHash appears to be sort of weird. What I found so far (from the undocumented source), that the uniqueHash is an MD5 sum of the concatenation of - the exif section of the picture - the first 8192 bytes of the picture file - the length of the picture file written as a decimal number I then could correctly calculate the uniqueHash for jpeg images, but not for raw images. raw images are usually based on the TIFF file format. Exif data are afaik TIFF entries. Therefore, TIFF files do (unlike JPEG) not have a separate Exif section, but have Exif tags (which are in fact TIFF tags) interwoven with the hole file. How exactly is the uniqueHash calculated for these files? regards Hadmut _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
Hi again,
could anyone please point out how exactly the uniqueHash is caculated for the different sorts of pictures (the middle part with the exif data), and what design criterions led to the decision to use hash(first 8kb, exif, file length) ? regards Hadmut _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
http://lxr.kde.org/source/extragear/graphics/digikam/libs/dimg/dimg.cpp#2107
which point into : http://lxr.kde.org/source/extragear/graphics/digikam/libs/dimg/loaders/dimgloader.cpp#204 ... to compute it. Gilles Caulier 2010/4/21 Hadmut Danisch <[hidden email]>: > Hi again, > > could anyone please point out how exactly the uniqueHash is caculated > for the different sorts of pictures (the middle part with the exif > data), and what design criterions led to the decision to use > hash(first 8kb, exif, file length) ? > > regards > Hadmut > _______________________________________________ > Digikam-devel mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-devel > Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by Bugzilla from hadmut@danisch.de
> Hi again, > > could anyone please point out how exactly the uniqueHash is caculated > for the different sorts of pictures (the middle part with the exif > data), libexiv2 is able to deliver us a data packet which contains the Exif information packed as for inclusion in a JPEG file. It's technically the easiest way to get a hash on this information. > and what design criterions led to the decision to use > hash(first 8kb, exif, file length) ? 1. We want a hash 2. A hash over the complete file is too slow 3. we need parts of the file as unique as possible 4. The exif info typically contains the creation date, which is pretty unique, and photographic parameters like aperture and shutter speed 5. The first 8kb: It's not 0, it's not the full file, it's in between. It's small enough to be fast. In the end, an arbitrary decision. 6. The file length is pretty unique for compressed formats, because it depends on compression entropy of the image data. It also contains at least the smallest possible amount of information on the end of the file, while we calculate the hash on the beginning. _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
| Free forum by Nabble | Edit this page |
