Disclaimer: probably this is not the right list to ask this. if so, just let me know. also, I'm not subscribed, so please CC me in the answers. I'm trying to write a script that is able to take an image already in digikam's database and resize it, apply the same tags as the original, and possibly remove the original. so far the idea is that this script will be independent of digikam, touching it's database when needed. so I checked the database structure and it looks ok, except for the md5sum. I tried to reimplement DImgLoader::uniqueHashV2() in libs/dimg/loaders/dimgloader.cpp:329, and even reimplementing it in python with the same libraries (qt4's md5) and copying the algo line by line, I get different values in the database and with the script. am I missing something? for omparisson, I attach the script I did. -- (Not so) Random fortune: 19:39 < m4rgin4l> por chupamedias 19:40 < m4rgin4l> o como me gusta denominarlo: academic social engineering _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users resize.py (1K) Download Attachment |
It'll be better to send this message to the developers' list, I think.
Marie-Noëlle 2013/6/17 Marcos Dione <[hidden email]>
-- Mes dernières photos sont dans ma galerie. Connaissez-vous Image Fixe, le photo-club de Saint Jean du Gard ? Et parcourez les Cévennes à ma façon avec Cévennes Plurielles, _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
In reply to this post by Bugzilla from mdione@grulic.org.ar
> Disclaimer: probably this is not the right list to ask this. if so, > just let me know. also, I'm not subscribed, so please CC me in the > answers. > > I'm trying to write a script that is able to take an image already > in digikam's database and resize it, apply the same tags as the > original, and possibly remove the original. so far the idea is that this > script will be independent of digikam, touching it's database when > needed. so I checked the database structure and it looks ok, except for > the md5sum. I tried to reimplement DImgLoader::uniqueHashV2() in > libs/dimg/loaders/dimgloader.cpp:329, and even reimplementing it in > python with the same libraries (qt4's md5) and copying the algo line by > line, I get different values in the database and with the script. am I > missing something? for omparisson, I attach the script I did. That's the fun of a hash...Well, I dont know. For debugging, I would record the binary data you feed into the hash in Python and C++ to a file, compare that one. If it differs, you'll be able to locate the problem. If not, there's a difference in the hash implementation, but I doubt that. Marcel _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
On Monday 17 June 2013 19:24:45 Marcel Wiesweg wrote:
> > > Disclaimer: probably this is not the right list to ask this. if so, > > just let me know. also, I'm not subscribed, so please CC me in the > > answers. > > > > I'm trying to write a script that is able to take an image already > > in digikam's database and resize it, apply the same tags as the > > original, and possibly remove the original. so far the idea is that this > > script will be independent of digikam, touching it's database when > > needed. so I checked the database structure and it looks ok, except for > > the md5sum. I tried to reimplement DImgLoader::uniqueHashV2() in > > libs/dimg/loaders/dimgloader.cpp:329, and even reimplementing it in > > python with the same libraries (qt4's md5) and copying the algo line by > > line, I get different values in the database and with the script. am I > > missing something? for omparisson, I attach the script I did. > > That's the fun of a hash...Well, I dont know. > For debugging, I would record the binary data you feed into the hash in > and C++ to a file, compare that one. If it differs, you'll be able to locate > the problem. If not, there's a difference in the hash implementation, but I > doubt that. > > Marcel > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users According to the code, the same hashing routine is used (not only the same algorithm, but actually the same implementation). There is one difference between the two routines though: - in the Digikam C++ routine, the datablocks are only used if there are actually data read - in the python routine, this check is omitted, and the data block is added to the data to be hashed /unconditionally/. For the second data block (the last 100 kB), as there is a seek just before, that could make a difference if the file is <100kB: - in C++, the file's probably in an error state, so no data will be read, so the second data block will not be fed to the hash routine. - in Python, the data block /is/ fed, but will probably contain rubbish if the file is <100kB... Also, if the python script changes anything in the metadata (e.g. by recording the correct image size...), the first 100kB will differ. Remco _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
> For the second data block (the last 100 kB), as there is a seek just > before, that could make a difference if the file is <100kB: > - in C++, the file's probably in an error state, so no data will be read, so > the second data block will not be fed to the hash routine. > - in Python, the data block /is/ fed, but will probably contain rubbish if > the file is <100kB... Interesting observation. Anyway, if this was a bug, we wont change it to keep the hash stable. > > Also, if the python script changes anything in the metadata (e.g. by > recording the correct image size...), the first 100kB will differ. That is intentional _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
On Tuesday 18 June 2013 21:00:44 Marcel Wiesweg wrote:
> > > For the second data block (the last 100 kB), as there is a seek just > > before, that could make a difference if the file is <100kB: > > - in C++, the file's probably in an error state, so no data will be read, so > > the second data block will not be fed to the hash routine. > > - in Python, the data block /is/ fed, but will probably contain rubbish if > > the file is <100kB... > > Interesting observation. Anyway, if this was a bug, we wont change it to keep > the hash stable. The C++ version seems to me to do the correct thing, in that it doesn't feed data to the hash generator if the file doesn't provide the data... What I ment to show was that the two routines are /not/ identical, in that they can feed different data to the hash generator, and in that case, /should/ end up with a different hash value. Remco P.S. There might be a situation where the hash isn't stable: if the data buffer isn't initialised, and not completely filled by the file reads, the end of the buffer could differ between two calls on the same file, and thus the hash value could differ (as the full buffer is sent to the generator). _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Free forum by Nabble | Edit this page |