https://bugs.kde.org/show_bug.cgi?id=376661
Bug ID: 376661 Summary: When importing ~200,000 video files Digikam crashes in about 2-5 seconds of starting. Product: digikam Version: 5.4.0 Platform: MS Windows OS: MS Windows Status: UNCONFIRMED Severity: crash Priority: NOR Component: Import-Scanner Assignee: [hidden email] Reporter: [hidden email] Target Milestone: --- I added a bunch of folders that contain ~200,000 video files and hit refresh to scan them into the database. Digikam crashes after about 2-5 seconds. This is repeatable. Digikam will not add 200,000 video files. -- You are receiving this mail because: You are the assignee for the bug. |
https://bugs.kde.org/show_bug.cgi?id=376661
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #1 from [hidden email] --- Reproducible with 5.5.0pre release ? https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #2 from Poz <[hidden email]> --- Yes, same thing happens with 5.5.0pre release. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
[hidden email] changed: What |Removed |Added ---------------------------------------------------------------------------- Version|5.4.0 |5.5.0 --- Comment #3 from [hidden email] --- Maik, which solution can we apply to fix this entry : 1/ Disable autocompletion in tree search field. Report this problem to Qt team to open API of QCompleter in goal to use current private methods. 2/ re-use KCompletion to backport classes in digiKam core with ajusted API for digiKam Gilles -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
Maik Qualmann <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #4 from Maik Qualmann <[hidden email]> --- Gilles, I think you mean Bug 368468. This bug here has a different cause, possibly crash in Exiv2. To Bug 36846: The QCompleter is not the performance problem. This is fixed by a QTimer. The main problem is the ever slower adding of items to the QTreeView. Maik -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #5 from Maik Qualmann <[hidden email]> --- An edit function for the first minutes after the comment would not be bad... Maik -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #6 from [hidden email] --- Poz, We need a debugger backtrace to investigate in details. See this page for details : https://www.digikam.org/contrib Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #7 from [hidden email] --- Maik, In comment #4 you talk about a slower adding of items to the QTreeView. Where is located the problem exactly ? Did you profile execution time with Valgrind ? In Digikam treeviewitem widget implementation ? In Digikam model populated by the DB ? In DB interface to get data to host in widget ? In Qt5 implementation ? Gilles -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #8 from [hidden email] --- MAik, In my office i write a fast shared memory mapping viewer in Qt5 using QTreeview/item classes. I create item in treeview with no data, and i populate all items in a separated thread because it take a lot of time. At end i call a treeview update in main thread (X11 is not re-entrant). It's very fast. The amount of item in treeview is very huge (more than 1000 entries). Can we do the same in digiKam ? Gilles -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #9 from Poz <[hidden email]> --- Still running the 5.5.0pre Okay so I went to the https://www.digikam.org/contrib and tryed a few things with limited success, I will try more tomorrow. First, the gdb in windows, not working well. I type in 'catch throw', and get back 'Catchpoint 1 (throw)', seems good. Then I type in 'run' and get back: - Starting program: No executable specified, use `target exec'. - Not sure what to do here?? Second thing I tried is the third party debug tool from system internals: https://technet.microsoft.com/en-us/sysinternals/bb896647.aspx Looks like some bad stuff happening for about 10.2 seconds before it crashes: 00000009 1.02899146 [17040] digikam.general: Trying to load Embedded preview with libraw 00000010 1.02921200 [17040] digikam.rawengine: Failed to load embedded RAW preview 00000011 1.02923596 [17040] digikam.general: Trying to load half preview with libraw 00000012 1.02927971 [17040] digikam.general: Trying to load Embedded preview with Exiv2 00000013 1.04443121 [17040] digikam.dimg: "Removed file path and name" : QIMAGE file identified 00000014 1.04464126 [17040] digikam.dimg.qimage: Can not load " "Removed file path and name" " using DImg::QImageLoader! 00000015 1.04492271 [17040] digikam.general: mimetype = "" ext = "MOV" 00000016 1.04507148 [17040] digikam.general: Cannot create thumbnail for "Removed file path and name" 00000017 1.04512084 [17040] digikam.general: Thumbnail is null for "Removed file path and name" I removed the file path and name for privacy reasons. this repeats for various videos until crash, takes about 2/10ths of a second per loop? (looks like from that snipit I gave you). video file types are various, avi, flv, mov, mp4, and more, the example above is just mov. This happens before the loops start when I hit refresh: 00000005 0.91890234 [17040] digikam.general: Using 8 CPU core to run threads 00000006 0.91933465 [17040] digikam.general: Action Thread run 1 new jobs 00000007 0.93396312 [17040] digikam.general: Cancel Main Thread 00000008 0.93400776 [17040] digikam.general: One job is done I will try to get more info tomorrow. Also two other questions, I turned off the album sync when it starts because it was crashing. How do I start it artificially, I thought that is what refresh does, not apparently refresh only updates the thumbnails. Also is it possible to do the FUZZY search on the thumb nails to file potential duplicates? This is my real intent. I want to cut that 200,000 videos down to 100,000. If not, is this a future feature? Can it be one? High demand I think. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #10 from Poz <[hidden email]> --- Spent some more time trying to figure out how to provide more data. while running the debugger I also found this line: [11624] digikam.metaengine: Exiv2 ( 3 ) : Xmp.video.Metadata dataLength was found to be larger than 5000 entries considered invalid; not read. If there is anything else I can do to help debug this, let me know! Thank you. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #11 from [hidden email] --- The xmp warning is not the problem. But it's know that Exiv2 have many problem with video files. I recommend to not try to scan your huge collection in one time. Start with a fresh database and add video files by chunks step by step until crash appear. To goal is to isolate the file which introduce the dysfunction. After that, report the problem to Exiv2 bugzilla with the identified video file for investigations. As DK windows installer include current Exiv2 source code, we can rebuild a DK for windows with last fix from Exiv2. For your problem with GDB under Windows, if command line version won't to start digiKam (even if it work on my VM with Windows 7), you need to open a console and go to the directory where gdb and digikam excutable are installed (it's the same dir). After that it's simple. Look the generic page for details : http://stackoverflow.com/questions/4671900/how-do-i-use-the-mingw-gdb-debugger-to-debug-a-c-program-in-windows Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #12 from [hidden email] --- >Also is it possible to do the FUZZY search on the thumb nails to file >potential duplicates? This is my real intent. I want to cut that 200,000 >videos down to 100,000. >If not, is this a future feature? Can it be one? High demand I think. Poz, The Fuzzy Search work only with Still Image currently. To see a similar function for video, this will need an algorithm to create a fingerprint of the first frame of video, in goal to compare later with DB. This is how the fuzzy tool work actually. A simplified wavelets matrix is computed with still image. We compare matrix together to found similarities. For video we need a new matrix with the spacial information of video. Not impossible but complex to write and test. Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #13 from Poz <[hidden email]> --- Are the thumbnails not readily available to do the fuzzy search on? I know they are not the biggest but I think they are big enough, or if there is a setting to render them a slightly higher resolution... That is how I imagined it would work anyways, since the thumbnails would already be generated, half the work is already done to fuzzy search videos... -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
Mario Frank <[hidden email]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[hidden email] --- Comment #14 from Mario Frank <[hidden email]> --- (In reply to Poz from comment #13) > Are the thumbnails not readily available to do the fuzzy search on? I know > they are not the biggest but I think they are big enough, or if there is a > setting to render them a slightly higher resolution... That is how I > imagined it would work anyways, since the thumbnails would already be > generated, half the work is already done to fuzzy search videos... Hey Poz, Sadly, it is not this easy. The fuzzy search creates a signature from images. This does not hold for videos. Videos are quite more complex as the signature creation must be uniformly done for all videos. But if videos have black frames in the beginning, the search would lead to results which are, let's say, rubbish. The most stable way I see is to take the first frame from every video that is not plain, i.e. single-coloured. But this means we would have to generate images until we find the first appropriate frame. This would slow down the fingerprints generation significantly. A stable implementation is not trivial here. I will think about a way more closely over the weekend. Best, Mario -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #15 from Mario Frank <[hidden email]> --- Hi again, This will be a quite long text - sorry. But I want to make the problems as clear as possible. I thought about the fuzzy search for videos a bit more during my train travel. In fact, even the first non-plain frame is worthless. If a user really wants to use digiKam as catalog for videos (which is not the scope of digiKam in first place IMHO), he will potentially have videos that have the same beginning, i.e. intro but are different videos. Thus, also the first non-plain frame will potentially lead to rubbish. I remember that I found some tools to find video duplicates. The process they applied was to take the first n images of a video and compare it to all others. A quite bad process IMO as with m videos you generate n*m images and then have to make a comparison. This is awfully bad from the view of complexity theory. And in practice, this process is, as can be expected, awfully slow. Nevertheless, the process is the probably best way to really recognise duplicate videos. So, a way could be to generate a fingerprint over the first or last n images (slows down fingerprint generation extremely). This still is not robust as many videos may have the same intro (at least the first m seconds, e.g. about m*25 frames. Usual intros take many seconds. So a *rather* stable approach would be to take 1000 frames. As you can imagine, this is a big amount of data to compute fingerprints for. Just imagine your 200,000 videos. Fingerprinting them would mean to generate 200,000,000 images. Every image must be generated which is no const-time process but at least linear time. So, even with 1000 videos, i would expect computation time to be in measure of hours, not minutes. Let's take a look from the other side, outros are far more distinct than intros. So, a lower number n can be taken, e.g. 100. This reduces the time quite a lot. But is probably still not satisfying. If no or only short intros/outros are there, only few images should be sufficient and the process could work quite good. But we cannot estimate, how the videos are structured. The FPS count may/will differ from video to video. So, woking on frames explicitly may again lead to low-quality results. So, the best way would be to take the n first/last seconds and then the complexity cannot really be estimated here. Also, I think, users should decide themselves, how many seconds are taken (configuration) and if beginning or ending should be taken (configuration again). So, *if* this feature should be implemented, I see the following options for users: 1) Take the first non-plain frame for fingerprinting (fast, probable not suitable for e.g. cinema movies) 2) Take the n first seconds for fingerprinting (probably awfully slow, may be suitable for e.g. cinema movies, overkill for self-produced movies) 3) Take the n last seconds for fingerprinting (probably slow, probably suitable for e.g. cinema movies, less overkill for self-produced movies) In a more precise algorithmic way, we would need an adoption of the fingerprints maintenance stage: Option 1: take the first non-plain frame for video fingerprints Option 2: take the Option(number n) Option(first,last) seconds for video fingerprinting. Changing the current options *must* trigger delete the current fingerprints of videos as otherwise, different fingerprintings would coexist which leads to wrong results - except rebuild all fingerprints is chosen. Then, the fuzzy search could probably work without adoptions - but I am not completely sure if it would work out of the box. Best, Mario -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #16 from [hidden email] --- Mario, In my office we capture Infrared plan sequence of events in a Tokamak to catch physical dysfunctions while experience. video can take more than 2 minutes in HD, no more. More than 20 experiences can be done in a day. All video are lossless stored in a database. There is no camera movements. Only the plasma inside the machine change the contents. Depending of the experience parameters, the video contents willbe different. We have a process to recognize similar video into the database. It written in Matlab. As i know the process cut the first frames where there is nothing (black hole) until the light begin. After that a wavelets fingerprints is computed with a flat image taken from some frames inside the video. Not whole video is analyzed, but the algorithm try to detect the edge of change and adjust the fingerprint, by parsing a section of the movie. This is how the spacial (temporal) dimension is processed. For each file, the fingerprint can give the average of similarity of video comparing to others. When physicians want to look in experiences, they just compare a video made with Tokamak settings and look if another one is similar. The goal is to see if physical events are similar even if parameters are different. Of course, it's a special use case, as video are static plan with changing contents, but i think the process is not too bad if we want to apply it on a small section of DSC movies. Note : I know just the theory. The code is not available of course. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #17 from Poz <[hidden email]> --- Wow the discussion here is fantastic. Thank you for the time and thought! So yes, the approach I suggested of just using the thumbnails is clearly not robust enough given the wide array of video content out there. I think a lot of the problems come from very uniform videos, for example standard intros or outros. My case has very non uniform videos (without any intro or outros) where I can run through windows explorer and find duplicates myself from simply looking at the thumbnails so I know at least 20% are duplicates just from simple observation. The problem is that it is to much to go through that many files and click each one individually. I have used Digikam before on photos for duplicates and was amazed at how well it worked so naturally I thought, 'man, I wish I could get digikam to access these thumbnails for me, I could get rid of +95% of these duplicates in a day'. I know there could be false positives, but I could live with 1% or something like that. To further get rid of false positives there could be a video length option of +-X seconds (default at 2 or something). I currently use http://www.alldup.de/alldup_help/alldup.php The content method works very well, I would say less then 0.001% false positives. But it misses so very very much. It can take up for 48 hour to run, but builds a database so it only compares new files added into the search. I even use the file size method, for large files, this works very well. Smaller files (<10 mb?) tend to have more false positives. Unfortunately due to different compression and file types this does not catch them all either. I think in the end, until computer hardware is faster, video duplicate searches will require a number of different methods and some user input. Until then that is what we have to work with/ around. I was just hoping for another way to slim down on this video database. Thumbnail seemed like low hanging fruit. -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #18 from [hidden email] --- Ok, I disabled video metadata support in Exiv2 shared library used with windows installer. New version can be downloaded in GDrive repository in few minutes : https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM Can you reproduce the problem with this version ? Typically, the video file will be registered in database, but video metadata will not be parsed to populate the database. Thanks in advance for your feedback Gilles Caulier -- You are receiving this mail because: You are the assignee for the bug. |
In reply to this post by bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=376661
--- Comment #19 from Poz <[hidden email]> --- I tried the version with disabled video metadata support in Exiv2 shared library that you just posted. It allows me to import all of the video files! Success! However they all appear to be gray boxes with no thumbnails. Perhaps this is a separate issue? -- You are receiving this mail because: You are the assignee for the bug. |
Free forum by Nabble | Edit this page |