digiKam › digikam-users

How to export duplicates list?

Classic

List

Threaded

3 messages Options

Francis Corvin

How to export duplicates list?

I need to find then delete duplicates. But there are two issues:
- I need to keep the file names of the deleted and remaining files. They have to
remain together
- The volumes are large (10,000 in total, with 2-4 dupes) so I cannot copy the
file names by hand. Equally, I cannot go through the duplicate interface to
manually remove the images one by one

I will keep/delete files based on their path name. So is there are way to get a
list of the images and their duplicates, path name included?

Francis

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Elle Stone

Re: How to export duplicates list?

Hi, Francis,

I just dealt with the same issue, only with a lot fewer duplicates
than you need to deal with. As my solution was round-about, I was
hoping someone would chime in with a better way to find the duplicate
images.

My first clue that there were in fact duplicate images (I mean exactly
duplicate, down to the metadata, only the file name and sometimes the
file path were different) in my database was after I created a new
clean digikam database (I had already written the metadata to the
appropriate images before archiving the previous digikam database).
When I closed digikam and inspected the databases created using SQLite
Database Browser, to my surprise, there were more images in the
database than there were UniqueHashes.

Upon investigation, it turned out that some of the images were in fact
duplicate images. Some of these duplicate images were in the same
directory with slightly different names. Some had inadvertently,
somewhere along the way, been created in the wrong directory.

I used the SQLite Database Browser to locate the images. It wasn't
easy. You can click on "File", then "Export", then "Table as csv
file", to get a comma-separated listing of the contents of each table
in the database. If you export enough tables and pull them all into a
spreadsheet, you can use Images and thumbid and FilePath, along with
the UniqueHash, to locate all the images with duplicate UniqueHashes.

As I was only dealing with about 10 duplicates out of 6000 images,
tracking them down by hand and verifying visually was not such a
chore, given that the spreadsheet I created using the exported
database tables told me where to look.

In your case, if you really have lots and lots of duplicates, and if
nobody comes up with a way to use digikam to track down the
duplicates, all is not lost, but you'll end up doing a lot more work
with the exported tables than I had to do. You can use SQLite Database
Browser to locate the duplicates and make a list. Then you can use
exiftool (or maybe exiv2?) at the command line to move all the
duplicates to a new directory, if that will help. I myself have never
used exiftool to move files listed in a spreadsheet, but I understand
that it can easily do so. Also, the exiftool forum is very friendly
and answer questions quickly.

I'd advise doing a lot of testing on a small set of files before using
exiftool on your real files, as getting the syntax wrong can wreck
major havoc. If you decide to go the exiftool route, I can help you
figure out the syntax to move images on a list.

I know the above suggestions are not easy or quick and I really hope
someone else has an easier answer. It seems unreasonable that digikam
will happily created UniqueHashes that are the same for more than one
image and not issue a warning and a list of affected files.

Also, if your duplicate images don't have exactly the same metadata,
then they probably won't generate UniqueHashes that are the same for
the duplicates.

Elle Stone
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Francis Corvin

Re: How to export duplicates list?

At 2011-01-21 13:52, Elle Stone wrote:

>I used the SQLite Database Browser to locate the images. It wasn't
>easy. You can click on "File", then "Export", then "Table as csv
>file", to get a comma-separated listing of the contents of each table
>in the database. If you export enough tables and pull them all into a
>spreadsheet, you can use Images and thumbid and FilePath, along with
>the UniqueHash, to locate all the images with duplicate UniqueHashes.

Thanks for the answer. In your case, where you had exact matches,
this could work well. However, I have to find duplicates that do not
necessarily have the same resolution, or that have been processed
(mask & curves). So the metadata and hash codes will be different.

Any other suggestion from anyone?

Francis

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users