https://bugs.kde.org/show_bug.cgi?id=369051
Bug ID: 369051 Summary: Too low similarity threshold in fuzzy/duplicate search bloats the results with potentially unwished high-similarity results Product: digikam Version: 5.1.0 Platform: Archlinux Packages OS: Linux Status: UNCONFIRMED Severity: wishlist Priority: NOR Component: Searches-Fuzzy Assignee: [hidden email] Reporter: [hidden email] When having many pictures, including variants of one picture with different quality, e.g. due to resizing, conversion and Collage creation, the lower-quality pictures may be found only with low similarity threshold (e.g. 45 %). But the result set will contain all pictures with a similarity between 45 % and 100 %. This can make the search for low-quality variants frustrating. Having the possibility to specify the maximum similarity may solve the problem. Reproducible: Always Steps to Reproduce: 1.Have many series pictures you want to keep and some lower-quality variants you want to get rid off. 2. Start a duplicate search with, let's say 40 % Actual Results: You will get all pictures with a similarity above 40 % Expected Results: It is designed to do that. But having an option to specify a maximum similarity could be more convenient. I implemented and tested that. Also, I can provide a patch file against the master branch. Here is the local commit message describing the implementation: "Extended the findduplicatesview and fuzzysearchview with an additional QSpinBox which denotes the maximum similarity. The new QSpinBox has a minimum value that is the current value of the minimal similarity threshold. When the minimum threshold is altered, the range of the new QSpinBox is updated. If the minimum threshold is increased beyond the current value of the new QSpinBox, the value of the new QSpinBox is increased automatically. In the fuzzysearchview, altering the maximum similarity also triggers the reuild of the similar images album. The extension can be highly valuable if you knowingly want to ignore almost identical images but want to find images that have a similarity of, let's say 50-60%, due to resizing, cropping or something similar, without bloating your image pane." -- You are receiving this mail because: You are the assignee for the bug. |
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #1 from Mario Frank <[hidden email]> --- Created attachment 101176 --> https://bugs.kde.org/attachment.cgi?id=101176&action=edit The patch for introducing a similarity interval -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] Summary|Too low similarity |Too low similarity |threshold in |threshold in |fuzzy/duplicate search |fuzzy/duplicate search |bloats the results with |bloats the results with |potentially unwished |potentially unwished |high-similarity results |high-similarity results | |[patch] -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #2 from [hidden email] --- Mario, The patch is very interesting and well implemented. I plan to introduce your code after 5.3.0. Q : currently, the icon view of fuzzy searches result is not filter by average order. All items found are mixed. It can be a good idea to sort item in this view, this will increase the usability. Your viewpoint ? Best Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #3 from Mario Frank <[hidden email]> --- Hey Gilles, those are good news. I agree with you concerning the improved usability by ordering the, as I understand, list of results in the left pane where the reference image and count of similar images is shown. But introducing an order here means changing the signature of the functions in haariface. Since QMap automatically has a sorting on the keys, we could use this to introduce an order to the result set. One quite easy way would be to wrap the QMap<qlonglong,QList<qlonglong>> as value of a avg-similarity-map. This would surely increase the memory consumption during search. But the automatic ordering by the similarity would circumvent a signifficant increase of runtime. After a small glimpse at the source code with grep, I found no possible conflicts with other files concerning the definition of the result set. Changing the return value types in haariface should be most likely safe. Should I propose another patch for this issue? -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #4 from [hidden email] --- yes one another patch to one another report please. Thanks in advance Gilles -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- Version Fixed In| |5.4.0 Latest Commit| |http://commits.kde.org/digi | |kam/afe577f0b297a343ab412ce | |95c1f75303edfb18b Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from [hidden email] --- Git commit afe577f0b297a343ab412ce95c1f75303edfb18b by Gilles Caulier. Committed on 10/11/2016 at 04:48. Pushed by cgilles into branch 'master'. Apply big patch #101176 from Mario Frank This one extended the findduplicatesview and fuzzysearchview with an additional QSpinBox which denotes the maximum similarity. The new QSpinBox has a minimum value that is the current value of the minimal similarity threshold. When the minimum threshold is altered, the range of the new QSpinBox is updated. If the minimum threshold is increased beyond the current value of the new QSpinBox, the value of the new QSpinBox is increased automatically. In the fuzzysearchview, altering the maximum similarity also triggers the reuild of the similar images album. The extension can be highly valuable if you knowingly want to ignore almost identical images but want to find images that have a similarity of, let's say 50-60%, due to resizing, cropping or something similar, without bloating your image pane. FIXED-IN: 5.4.0 CCMAIL: [hidden email] M +2 -0 app/utils/searchmodificationhelper.cpp M +1 -0 app/utils/searchmodificationhelper.h M +4 -3 libs/database/dbjobs/dbjob.cpp M +16 -5 libs/database/dbjobs/dbjobinfo.cpp M +7 -3 libs/database/dbjobs/dbjobinfo.h M +27 -16 libs/database/haar/haariface.cpp M +9 -8 libs/database/haar/haariface.h M +9 -2 libs/database/item/imagelister.cpp M +53 -25 utilities/fuzzysearch/findduplicatesview.cpp M +1 -0 utilities/fuzzysearch/findduplicatesview.h M +58 -11 utilities/fuzzysearch/fuzzysearchview.cpp M +2 -1 utilities/fuzzysearch/fuzzysearchview.h M +16 -10 utilities/maintenance/duplicatesfinder.cpp M +2 -2 utilities/maintenance/duplicatesfinder.h http://commits.kde.org/digikam/afe577f0b297a343ab412ce95c1f75303edfb18b -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #6 from [hidden email] --- Mario, Your patch is now applied to current implementation and will be avaialble for next 5.4.0 release. Next step for me is to review your new patch from bug #372217. Note that your next patch must close certainly bug #302923 (please confirm). In parallel, can you check what can be do to improve again duplicate searches tool with: - bug #261417 : the searches album counter is not updated. - bug #353331 : typically this one can be certainly closed as we can limit search to a specific physical or virtual album. Please just review to confirm. - bug #207188 : as i remember, the algorithm to process fingerprints over image take a care about colors contents (else, this will have no sense...). So i"m not sure if this file is valid... - bug #274360 : i cannot figure why some king of image type are ignored. All image format supported by digiKam will be processed while fingerprints computation and searches. Again, thanks for your contributions. I appreciate the quality of your patches, which a a pleasure to review. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #7 from [hidden email] --- >Next step for me is to review your new patch from bug #372217. Note that your >next patch must close certainly bug #302923 (please confirm). I respond myself: your patch from bug #372217 cannot solve bug #302923, because patch is dedicated to sort search albums from left sidebar, not the icon view on the center. I will appreciate a patch aver icon-view model/view to be able to sort by similarly level. Thanks in advance Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #8 from Mario Frank <[hidden email]> --- Hey Gilles, Many thanks for the judgement about the quality of my patches. I will try to fix what I can. Some of the "bugs" do not seem to be hard to fix. Some other could be more complex. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #9 from Mario Frank <[hidden email]> --- By the way: the CCMAIL is incorrect. The correct one is [hidden email]. If the dot should be a problem, just use [hidden email]. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
Wolfgang Scheffner <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #10 from Wolfgang Scheffner <[hidden email]> --- Before I update the doc accordingly: shouldn't the labeling be changed now to "Similarity range" or at least "Thresholds"? -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=369051
--- Comment #11 from Mario Frank <[hidden email]> --- I agree, Wolfgang. Similarity range is a better description here. Moreover, I just realised that it is not possible to set a range in the maintainance dialog. I will open a new file for both parts and submit a patch. -- You are receiving this mail because: You are the assignee for the bug. |
Free forum by Nabble | Edit this page |