[digiKam-users] fuzzy search for duplicates - how to use?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[digiKam-users] fuzzy search for duplicates - how to use?

Uwe Haider
hi together!

I try to clean up my collection with the fuzzy search for duplicates.

First I have to build the fingerprints.
Second step is to mark the folder/tags where to search.

I don't understand the "restrictions":

What is the pull down restrict to "only selected tab" / "one of" /
"both" / "albums but not text" / "tags but not albums" for ??

Next pull down restriction "none" "restrict to reference album" /
"exclude reference album" ??

What is the "reference Album" the first or the oldest album in the album
list? Can I select a reference album?

In the results list some pictures in several albums are marked as
reference. But I can't see why? There are albums in different album
trees marked as reference....

After finding the duplicates I want to delete all. The result list is
sorted by album and date. All albums together are containing ~ 250.000
pictures. I expect to get ~ 50.000 duplicates - hit "delete" will make
strong fingers :-(

Can it run automatic?

How to you use this feature?

Thanks for your advice.....
--
Uwe Haider
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: fuzzy search for duplicates - how to use?

Mario Frank
Hi Uwe,

I will answer inline.

Am 07.01.19 um 22:16 schrieb Uwe Haider:

> hi together!
>
> I try to clean up my collection with the fuzzy search for duplicates.
>
> First I have to build the fingerprints.
> Second step is to mark the folder/tags where to search.
>
> I don't understand the "restrictions":
>
> What is the pull down restrict to "only selected tab" / "one of" /
> "both" / "albums but not text" / "tags but not albums" for ??
This pull down menu gives you the possibility to search for duplicates
both in albums and tags.
Consider you selected Album1 and Album2 in the albums tab and Tag1 and
Tag2 in tags tab
and there are images that are in Tag1 but not in Album1 or Album2.
If you are currently in Albums tab, "only selected tab" will scan only
the images in Album1 and Album2.
If you choose one of, all images that are in Album1, Album2, Tag1 or
Tag2 are scanned. (mathematical union)
If you choose both, all images that are both in the albums and tags are
scanned. (mathematical intersection)
If you choose "albums but not tags", only images that are in the albums
but have neither Tag1 or Tag2 are scanned (mathematical difference).
"tags but not albums" is analogous to "albums but not tags".

>
> Next pull down restriction "none" "restrict to reference album" /
> "exclude reference album" ??
>
> What is the "reference Album" the first or the oldest album in the
> album list? Can I select a reference album?
This pull down menu gives you the possibility to restrict the images
with which an image is compared.
If you have an image in Album1 and one image in Album 2.
If you choose "none", the images are compared.
If you choose "restrict to reference album", the images are not compared.
If you choose "exclude reference album", the images are compared.

To make it brief, the "reference Album" is the album of the image for
which the duplicates are searched.
So the reference album is automatically chosen.
>
> In the results list some pictures in several albums are marked as
> reference. But I can't see why? There are albums in different album
> trees marked as reference....
Can you describe this more precise? I am not sure I understand what you
mean.
>
> After finding the duplicates I want to delete all. The result list is
> sorted by album and date. All albums together are containing ~ 250.000
> pictures. I expect to get ~ 50.000 duplicates - hit "delete" will make
> strong fingers :-(
>
> Can it run automatic?
Automatic deletion is not implemented, although it is technically not
really complicated.
But there was a long discussion and the problem is how to choose the
images to delete.
There are too many possible criterions, e.g. file type ("I want to have
PNG only"),
resolution, filesize and whatsoever.

Regards,
Mario

>
> How to you use this feature?
>
> Thanks for your advice.....



smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy search for duplicates - how to use?

Uwe Haider
Many thanks Mario,

for your explanations.

Am 08.01.19 um 10:23 schrieb Mario Frank:
[...]

I don't understand the part with the "reference album":
(please have a look inline)

>>
>> Next pull down restriction "none" "restrict to reference album" /
>> "exclude reference album" ??
>>
>> What is the "reference Album" the first or the oldest album in the
>> album list? Can I select a reference album?
> This pull down menu gives you the possibility to restrict the images
> with which an image is compared.
> If you have an image in Album1 and one image in Album 2.
> If you choose "none", the images are compared.
> If you choose "restrict to reference album", the images are not compared.

Seems to be senseless for me????

> If you choose "exclude reference album", the images are compared.

Seems to function as "none"???

>
> To make it brief, the "reference Album" is the album of the image for
> which the duplicates are searched.

I don't select an image to look for its duplicates... I mark several
albums to compare them. My goal is to delete all duplicate images,move
the different images in the correct album and delete the empty
"leftovers-albums". How is the reference album chosen by digikam? Is it
the first marked album?


>> In the results list some pictures in several albums are marked as
>> reference. But I can't see why? There are albums in different album
>> trees marked as reference....
> Can you describe this more precise? I am not sure I understand what you
> mean.

OK, my english is a leftover from school... from the las centura :-) I
try my very best:

I have chosen 67 albums from 2 different album trees and searched the
duplicates. The result list is ordered by date and separated by album.

For the first xxx images the there were 3 albums with duplicates, images
in the second album are marked as "reference".

Suddenly... >xxx+1 image the reference image is located in the first
album. No new search, just a chance in the ordering of the result list.

I want to clean my album structure so I decide to delete the albums in
the "wrong" position in my trees. The reference is not important for
this decision. But I like to understand how it works...


--
Uwe Haider
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: fuzzy search for duplicates - how to use?

Mario Frank
Hi Uwe,

You're welcome. I will answer inline again.

Am 08.01.19 um 22:48 schrieb Uwe Haider:

> Many thanks Mario,
>
> for your explanations.
>
> Am 08.01.19 um 10:23 schrieb Mario Frank:
> [...]
>
> I don't understand the part with the "reference album":
> (please have a look inline)
>
>>>
>>> Next pull down restriction "none" "restrict to reference album" /
>>> "exclude reference album" ??
>>>
>>> What is the "reference Album" the first or the oldest album in the
>>> album list? Can I select a reference album?
>> This pull down menu gives you the possibility to restrict the images
>> with which an image is compared.
>> If you have an image in Album1 and one image in Album 2.
>> If you choose "none", the images are compared.
>> If you choose "restrict to reference album", the images are not
>> compared.
>
> Seems to be senseless for me????
The example was too trivial to explain the functionality well, I think.

Okay: Consider you select Album 1 and Album 2.
In Album 1, you have two images, Image 1 and Image 2.
Image 2 is a nearly-duplicate of Image 1 with 90 % similarity (e.g. you
made series photos).
In Album 2, you have a duplicate of Image 1, let's call it Image 3.

When selecting "none", the result will be Image 1, Image 2 and Image 3
When selecting "restrict to reference album", the result will be Image 1
and Image 2.
This option is good if you have huge albums with potentially many
duplicates.

When selecting "exclude reference album", the result will be Image 1 and
Image 3.
This option is good if you do not care about duplicates in the same album.

>
>> If you choose "exclude reference album", the images are compared.
>
> Seems to function as "none"???
>
>>
>> To make it brief, the "reference Album" is the album of the image for
>> which the duplicates are searched.
>
> I don't select an image to look for its duplicates... I mark several
> albums to compare them. My goal is to delete all duplicate images,move
> the different images in the correct album and delete the empty
> "leftovers-albums". How is the reference album chosen by digikam? Is
> it the first marked album?
You are right, you do not select images.
But If you select an album, a duplicates search is done for every
(reference) image in this album.
and for each of these duplicates searches, the reference album is the
album in which
the image is located.
So, in the example above, the reference album of Image 1 and Image 2 is
Album 1
and the reference album of Image 3 is Album 2.

>
>
>>> In the results list some pictures in several albums are marked as
>>> reference. But I can't see why? There are albums in different album
>>> trees marked as reference....
>> Can you describe this more precise? I am not sure I understand what you
>> mean.
>
> OK, my english is a leftover from school... from the las centura :-) I
> try my very best:
That's okay. :)

>
> I have chosen 67 albums from 2 different album trees and searched the
> duplicates. The result list is ordered by date and separated by album.
>
> For the first xxx images the there were 3 albums with duplicates,
> images in the second album are marked as "reference".
>
> Suddenly... >xxx+1 image the reference image is located in the first
> album. No new search, just a chance in the ordering of the result list.
>
> I want to clean my album structure so I decide to delete the albums in
> the "wrong" position in my trees. The reference is not important for
> this decision. But I like to understand how it works...
If you chose those 67 albums, and get duplicates, you get a list of
duplicates results (result list) in the left part of the panel
with a thumbnail, the filename of the image, the count of duplicates and
the average similatrity.
Each of these duplicates results has one "reference image" (or in German
"Referenzbild") which is the one
with the filename given in the left panel.

If you click on a duplicates result, you get the duplicates and the
result image (just for comparison) in the right panel.
The right panel is usually organised by album (but you can change this),
and you should have one and only one
reference image in the right panel.

The right panel is not sorted by the reference image. The sorting of the
right panel depends on the view configuration you have chosen.
If the panel shall be separated by albums, you get the albums in
lexicographical order.
The album content itself is sorted by the option you chose in view->sort
entries (in German "Einträge sortieren").
So the reference image can be shown in any album if you separate the
view by albums.

I hope I got you right. Otherwise, can you send some screenshots?

>
>



smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy search for duplicates - how to use?

Uwe Haider
Thanks again Mario,

again a step further.... your explanations are very good, the use of the
fuzzy search tool is not so trivial.

My problem is not the search for duplicate images. Seems I search for
duplicate albums....

The duplicated albums are spread all over my collection (failed
backup-routine). I mark all duplicated albums in the collection (all
from 2010) search duplicates, delete the duplicate-images in the wrong
positioned albums, move the single-images in the right places and delete
the empty wrong positioned albums. That's the plan...

Current status: 67 albums are marked, two times duplicate search &
deleting the found two duplicates for each image -> next search again
found ~ 70 duplicated files? This time only one duplicate for each
image....

What settings for the search will be best? Is there a way to find all
duplicate images in the marked albums in the first search? Or must I
rebuild fingerprints after every search & delete? Or must I perform a
database cleaning after every search & delete?

Is the fingerprints of an image with tags/geotags/faces equal to the
fingerprint of the same image without any tags?

btw:
marking albums in the album tree to select them for the search is a bit
curios to me: I mark a first level album for the search and the second
level albums in this first level album are not included for the
search....? The search run was very fast :-)

Same behaviour in the tag tree. bug or feature? Or is there a trick like
Strg + mouse or something like that?

best regards
Uwe
Am 09.01.19 um 09:31 schrieb Mario Frank:

> Hi Uwe,
>
> You're welcome. I will answer inline again.
>
> Am 08.01.19 um 22:48 schrieb Uwe Haider:
>> Many thanks Mario,
>>
>> for your explanations.
>>
>> Am 08.01.19 um 10:23 schrieb Mario Frank:
>> [...]
>>
>> I don't understand the part with the "reference album":
>> (please have a look inline)
>>
>>>>
>>>> Next pull down restriction "none" "restrict to reference album" /
>>>> "exclude reference album" ??
>>>>
>>>> What is the "reference Album" the first or the oldest album in the
>>>> album list? Can I select a reference album?
>>> This pull down menu gives you the possibility to restrict the images
>>> with which an image is compared.
>>> If you have an image in Album1 and one image in Album 2.
>>> If you choose "none", the images are compared.
>>> If you choose "restrict to reference album", the images are not
>>> compared.
>>
>> Seems to be senseless for me????
> The example was too trivial to explain the functionality well, I think.
>
> Okay: Consider you select Album 1 and Album 2.
> In Album 1, you have two images, Image 1 and Image 2.
> Image 2 is a nearly-duplicate of Image 1 with 90 % similarity (e.g. you
> made series photos).
> In Album 2, you have a duplicate of Image 1, let's call it Image 3.
>
> When selecting "none", the result will be Image 1, Image 2 and Image 3
> When selecting "restrict to reference album", the result will be Image 1
> and Image 2.
> This option is good if you have huge albums with potentially many
> duplicates.
>
> When selecting "exclude reference album", the result will be Image 1 and
> Image 3.
> This option is good if you do not care about duplicates in the same album.
>
>>
>>> If you choose "exclude reference album", the images are compared.
>>
>> Seems to function as "none"???
>>
>>>
>>> To make it brief, the "reference Album" is the album of the image for
>>> which the duplicates are searched.
>>
>> I don't select an image to look for its duplicates... I mark several
>> albums to compare them. My goal is to delete all duplicate images,move
>> the different images in the correct album and delete the empty
>> "leftovers-albums". How is the reference album chosen by digikam? Is
>> it the first marked album?
>
> You are right, you do not select images.
> But If you select an album, a duplicates search is done for every
> (reference) image in this album.
> and for each of these duplicates searches, the reference album is the
> album in which
> the image is located.
> So, in the example above, the reference album of Image 1 and Image 2 is
> Album 1
> and the reference album of Image 3 is Album 2.
>>
>>
>>>> In the results list some pictures in several albums are marked as
>>>> reference. But I can't see why? There are albums in different album
>>>> trees marked as reference....
>>> Can you describe this more precise? I am not sure I understand what you
>>> mean.
>>
>> OK, my english is a leftover from school... from the las centura :-) I
>> try my very best:
> That's okay. :)
>>
>> I have chosen 67 albums from 2 different album trees and searched the
>> duplicates. The result list is ordered by date and separated by album.
>>
>> For the first xxx images the there were 3 albums with duplicates,
>> images in the second album are marked as "reference".
>>
>> Suddenly... >xxx+1 image the reference image is located in the first
>> album. No new search, just a chance in the ordering of the result list.
>>
>> I want to clean my album structure so I decide to delete the albums in
>> the "wrong" position in my trees. The reference is not important for
>> this decision. But I like to understand how it works...
> If you chose those 67 albums, and get duplicates, you get a list of
> duplicates results (result list) in the left part of the panel
> with a thumbnail, the filename of the image, the count of duplicates and
> the average similatrity.
> Each of these duplicates results has one "reference image" (or in German
> "Referenzbild") which is the one
> with the filename given in the left panel.
>
> If you click on a duplicates result, you get the duplicates and the
> result image (just for comparison) in the right panel.
> The right panel is usually organised by album (but you can change this),
> and you should have one and only one
> reference image in the right panel.
>
> The right panel is not sorted by the reference image. The sorting of the
> right panel depends on the view configuration you have chosen.
> If the panel shall be separated by albums, you get the albums in
> lexicographical order.
> The album content itself is sorted by the option you chose in view->sort
> entries (in German "Einträge sortieren").
> So the reference image can be shown in any album if you separate the
> view by albums.
>
> I hope I got you right. Otherwise, can you send some screenshots?
>
>>
>>
>
>

--
Uwe Haider
[hidden email]