I need to find then delete duplicates. But there are two issues:
- I need to keep the file names of the deleted and remaining files. They have to remain together - The volumes are large (10,000 in total, with 2-4 dupes) so I cannot copy the file names by hand. Equally, I cannot go through the duplicate interface to manually remove the images one by one I will keep/delete files based on their path name. So is there are way to get a list of the images and their duplicates, path name included? Francis _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Hi, Francis,
I just dealt with the same issue, only with a lot fewer duplicates than you need to deal with. As my solution was round-about, I was hoping someone would chime in with a better way to find the duplicate images. My first clue that there were in fact duplicate images (I mean exactly duplicate, down to the metadata, only the file name and sometimes the file path were different) in my database was after I created a new clean digikam database (I had already written the metadata to the appropriate images before archiving the previous digikam database). When I closed digikam and inspected the databases created using SQLite Database Browser, to my surprise, there were more images in the database than there were UniqueHashes. Upon investigation, it turned out that some of the images were in fact duplicate images. Some of these duplicate images were in the same directory with slightly different names. Some had inadvertently, somewhere along the way, been created in the wrong directory. I used the SQLite Database Browser to locate the images. It wasn't easy. You can click on "File", then "Export", then "Table as csv file", to get a comma-separated listing of the contents of each table in the database. If you export enough tables and pull them all into a spreadsheet, you can use Images and thumbid and FilePath, along with the UniqueHash, to locate all the images with duplicate UniqueHashes. As I was only dealing with about 10 duplicates out of 6000 images, tracking them down by hand and verifying visually was not such a chore, given that the spreadsheet I created using the exported database tables told me where to look. In your case, if you really have lots and lots of duplicates, and if nobody comes up with a way to use digikam to track down the duplicates, all is not lost, but you'll end up doing a lot more work with the exported tables than I had to do. You can use SQLite Database Browser to locate the duplicates and make a list. Then you can use exiftool (or maybe exiv2?) at the command line to move all the duplicates to a new directory, if that will help. I myself have never used exiftool to move files listed in a spreadsheet, but I understand that it can easily do so. Also, the exiftool forum is very friendly and answer questions quickly. I'd advise doing a lot of testing on a small set of files before using exiftool on your real files, as getting the syntax wrong can wreck major havoc. If you decide to go the exiftool route, I can help you figure out the syntax to move images on a list. I know the above suggestions are not easy or quick and I really hope someone else has an easier answer. It seems unreasonable that digikam will happily created UniqueHashes that are the same for more than one image and not issue a warning and a list of affected files. Also, if your duplicate images don't have exactly the same metadata, then they probably won't generate UniqueHashes that are the same for the duplicates. Elle Stone _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
At 2011-01-21 13:52, Elle Stone wrote:
>I used the SQLite Database Browser to locate the images. It wasn't >easy. You can click on "File", then "Export", then "Table as csv >file", to get a comma-separated listing of the contents of each table >in the database. If you export enough tables and pull them all into a >spreadsheet, you can use Images and thumbid and FilePath, along with >the UniqueHash, to locate all the images with duplicate UniqueHashes. Thanks for the answer. In your case, where you had exact matches, this could work well. However, I have to find duplicates that do not necessarily have the same resolution, or that have been processed (mask & curves). So the metadata and hash codes will be different. Any other suggestion from anyone? Francis _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Free forum by Nabble | Edit this page |