[digikam] [Bug 375573] New: Don't reset/destroy context after deleting one image among a set of duplicates

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] New: Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

            Bug ID: 375573
           Summary: Don't reset/destroy context after deleting one image
                    among a set of duplicates
           Product: digikam
           Version: 5.5.0
          Platform: Other
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: Searches-Fuzzy
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

I have a collection with many similar images (~20 takes per shot) and the goal
is to quickly go through duplicates and delete all but one of the images.

After I click Find Duplicates, I click on "Ref. images", then double click on
the first thumbnail to go into the Preview. As I'm navigating with left/right
arrow through the duplicates, I press Shift+Delete to delete one of them that's
clearly worse than the ones I've seen so far. I would like to repeat this
process until I end up with only one image.

The problem is that after I delete an image, digiKam unhelpfully displays
"Failed to load image" and kicks me out of that duplicate set. The keyboard
focus is also lost.

What would make a lot more sense is to let me navigate with the arrows through
the other images in the set of duplicates, and keep deleting them.

I've tried Alt+3 to set flags, and while this is a workaround, it's
unnecessary, I think. I don't see a good reason to take the user out of the
flow of deleting duplicates in a set after they've deleted the first one.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

Mario Frank <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #1 from Mario Frank <[hidden email]> ---
Hey Dan,

there was a bug before 5.4 with a quite long discussion (
https://bugs.kde.org/show_bug.cgi?id=261417 ). To make it short:
When some image from a duplicates album is deleted, the count of duplicates for
this album has to be adjusted. Otherwise, we provide wrong information. Also,
the deleted image may be member of other duplicates albums. Thus, they have to
be adjusted, too. Some of the albums may even vanish if this was the only
duplicate to the reference image.

Following this, I took the most performant approach: all duplicates albums that
contained the image are rescanned for duplicates and followingly refreshed.
This may take some time depending on images involved. During this time, the
image view loses the connection to the duplicates album since it is not present
during rescan but only afterwards.

So, what you experience is the lost connection.

I agree that the workflow is interrupted in this case. If only one duplicates
album needs to be adjusted, trying to just decrement the image count would be
feasible. But as soon as another duplicates album becomes dirty by the
deletion,
a rescan should be definitely done, I think.
Delaying the rescan would technically be possible. the problem here is that we
cannot estimate the usual time a user should have until a rescan is done.
If a duplicates album has 100 items and you delete one image per second, the
delay is okay. But 10 seconds delay, for example may again interrupt the
workflow of users.

Any comments/opinions to this?

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

--- Comment #2 from Dan Dascalescu <[hidden email]> ---
Hey Mario,

Thank you for the explanation. I understand the tradeoff - accuracy in
reporting the number of dupes, vs. speedy processing. The solution I propose
revolved around lazy calculation - does the user care more about a precise
number shown next to the album *when they get to see it*, or to be able to move
on to examine the other duplicates in the cluster?

I mentioned "when they get to see it" because after the user deletes one of the
duplicates, the list of duplicate clusters in the left pane always scrolls to
the top (IMO this could be improved to try to keep the scroll position, but
digiKam probably just re-sorts the list), so if they were working on a
duplicate cluster below the fold (i.e. if they have scrolled down at all), the
number of duplicates in that album won't be visible anyway. In fact, when you
deal with many clusters of duplicates, only those items at the top, according
to the sort order (Ref. images filename, # of items, or Avg. similarity) will
be visible.

Not sure what you meant by "one duplicates album" (needs to be adjusted) - did
you mean a cluster (in DUFF terminology, http://duff.dreda.org/) of duplicates
(which may be spread across different albums), or an album that contains
duplicates, so the count of items in the album needs to be adjusted? In the
latter case, that count is even farther from the user's attention, because the
user is in the Fuzzy tab, vs. in the Albums tab. Could the recalculation of
counts be done only once, when the user leaves the Fuzzy tab?

Also, there are two different scenarios I see when it comes to deleting
duplicates:

1) Deleting images in duplicate clusters one by one, while the user looks at
the picture in Preview Mode, to examine it in as large of a size as possible.
In this case, only one image is deleted at a time. Would counts be easier to
decrement in this case?

2) Staying in Thumbnails or Table, selecting multiple images, and deleting them
at once.

Finally, question about "the deleted image may be member of other duplicates
albums" (this relates to the cluster vs. album distinction) - is the duplicate
relationship transitive? I mean, if images A and B are dupes within the
similarity range, and B is part of another cluster of duplicates, A should be
part of that cluster too, which means only two counts need to be updates: the
number of dupes in that cluster, and the number of items in the album the image
belongs to.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

--- Comment #3 from Mario Frank <[hidden email]> ---
Hey Dan,

I will answer inline since there are some things that came me in mind.

(In reply to Dan Dascalescu from comment #2)
> Hey Mario,
>
> Thank you for the explanation. I understand the tradeoff - accuracy in
> reporting the number of dupes, vs. speedy processing. The solution I propose
> revolved around lazy calculation - does the user care more about a precise
> number shown next to the album *when they get to see it*, or to be able to
> move on to examine the other duplicates in the cluster?

I would expect the latter to be more important than the accuracy. Thus,
delaying is an option for me.

>
> I mentioned "when they get to see it" because after the user deletes one of
> the duplicates, the list of duplicate clusters in the left pane always
> scrolls to the top (IMO this could be improved to try to keep the scroll
> position, but digiKam probably just re-sorts the list), so if they were
> working on a duplicate cluster below the fold (i.e. if they have scrolled
> down at all), the number of duplicates in that album won't be visible
> anyway. In fact, when you deal with many clusters of duplicates, only those
> items at the top, according to the sort order (Ref. images filename, # of
> items, or Avg. similarity) will be visible.

Okay, let's switch to your terminus. With duplicates albums, we refer to
what you call duplicates clusters (internally called search albums), i.e.
the entries in the left table - one duplicates album is one entry here.
Scrolling to the top is really annoying. This could be resolved.
But I will come to that later.

>
> Not sure what you meant by "one duplicates album" (needs to be adjusted) -
> did you mean a cluster (in DUFF terminology, http://duff.dreda.org/) of
> duplicates (which may be spread across different albums), or an album that
> contains duplicates, so the count of items in the album needs to be
> adjusted? In the latter case, that count is even farther from the user's
> attention, because the user is in the Fuzzy tab, vs. in the Albums tab.
> Could the recalculation of counts be done only once, when the user leaves
> the Fuzzy tab?
>
> Also, there are two different scenarios I see when it comes to deleting
> duplicates:
>
> 1) Deleting images in duplicate clusters one by one, while the user looks at
> the picture in Preview Mode, to examine it in as large of a size as
> possible. In this case, only one image is deleted at a time. Would counts be
> easier to decrement in this case?

Yes, this was my first approach when I tried to fix the referenced bug.
But the fact that the image should also vanish from other duplicates clusters
would have forced me to decrement there, too. But the count of images is
defined
in the internal search albums in the way that the count is the count of image
ids.
And the cluster list does not know how many of the images are existent.
Nevertheless, it is technically possible to get the cluster list to know which
images
still exist and which do not. But then again, the average similarity is not
correct
anymore as it is calculated on the complete set of images.
This could be also solved by the fact that I introduced the similarities
between images
in database shortly before release of 5.4.

>
> 2) Staying in Thumbnails or Table, selecting multiple images, and deleting
> them at once.
>
> Finally, question about "the deleted image may be member of other duplicates
> albums" (this relates to the cluster vs. album distinction) - is the
> duplicate relationship transitive? I mean, if images A and B are dupes
> within the similarity range, and B is part of another cluster of duplicates,
> A should be part of that cluster too, which means only two counts need to be
> updates: the number of dupes in that cluster, and the number of items in the
> album the image belongs to.

Theoretically, you are right. If image A is a duplicate of reference images
B and C, the images B and C have *some* similarity, too. But as in audio
streams -
if stream a is part of stream b and c, the latter streams have *some*
similarity
in *some* position. Perhaps the similar parts are only 2 %. Depending on the
given
similarity range, this similarity is ignored. We cannot use transitive closures
here.

So, to roll up.
If we have duplicates cluster A and we delete some image that is also part of
duplicates
cluster B, we need to update both clusters - in some way:
rescanning/decrementing counts.
If we delete the reference image of cluster A itself, the cluster would
currently vanish.
As consequence, the internal search album is removed and you lose context. This
is a problem
which was not addressed in the referenced bug. And this is a real disturbance
in the workflow.

I would thus propose the following: the removal of an image in some duplicates
album should
signal the list of duplicates clusters to update. The count of images in
clusters is recalculated
by getting the information which images still exist. At the same time, the new
average similarity
is calculated with the similarities of the remaining images to the reference
image.
All duplicates clusters which only contain one image are removed from the list
as they are not relevant
anymore. This all should be technically quite easy to implement until the
release of 5.5.

What do the other devs think?

If this is confirmed, I would do that after I am finished with my small garbage
collection project.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

[hidden email] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #4 from [hidden email] ---
Mario,

I read your proposal from comment #3 and it sound fine for me.

Gilles

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

bugzilla_noreply
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=375573

Mario Frank <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Latest Commit|                            |https://commits.kde.org/dig
                   |                            |ikam/7ceca1f172828e48b47c50
                   |                            |88b61b2452b7820e52
   Version Fixed In|                            |5.5.0
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Mario Frank <[hidden email]> ---
Git commit 7ceca1f172828e48b47c5088b61b2452b7820e52 by Mario Frank.
Committed on 04/02/2017 at 15:47.
Pushed by mfrank into branch 'master'.

We do not rescan for duplicates if an image is deleted any more.
Instead, all duplicates albums in left pane are updated, i.e. the items count,
and average similarity are recalculated. If only one duplicate is left,
the duplicates album is hidden. This solves the problem of losing context
due to the rebuild of the SAlbums. I see no other good technical
possibility of preserving the context since the SAlbums are deleted
automatically.
Also, the similarities to images are not deleted any more. Otherwise the
calculation
of the average similarity would be wrong. We will take care of the similarity
values
in garbage collection branch.
FIXED-IN: 5.5.0

M  +2    -1    NEWS
M  +9    -20   libs/album/albummanager.cpp
M  +1    -1    libs/album/albummanager.h
M  +17   -0    libs/database/item/imageinfo.cpp
M  +5    -0    libs/database/item/imageinfo.h
M  +18   -0    utilities/fuzzysearch/findduplicatesalbum.cpp
M  +4    -0    utilities/fuzzysearch/findduplicatesalbum.h
M  +65   -16   utilities/fuzzysearch/findduplicatesalbumitem.cpp
M  +10   -0    utilities/fuzzysearch/findduplicatesalbumitem.h
M  +4    -13   utilities/fuzzysearch/findduplicatesview.cpp
M  +1    -1    utilities/fuzzysearch/findduplicatesview.h

https://commits.kde.org/digikam/7ceca1f172828e48b47c5088b61b2452b7820e52

--
You are receiving this mail because:
You are the assignee for the bug.