|
I recently (re-)discovered digikam for my huge collection of pictures. Initially I had a few issues to get the databases completed due to hangs and crashes. Almost all of them had an ugly cause: over the time some files on an external hard disk became corrupted without notice. I fixed the apparent show-stoppers by monitoring the error messages in the command line from where I started the program and moved/removed the bad files, but it is a bit clumsy to do so. Moreover, I realized that there are more errors in the collection, e.g. image files with 0 bytes, apparently good files which contain "data" when you try "file xy.jpg", garbage file names, and more - all cluttered across several TB.
I've been thinking how to systematically analyze all files in order to identify and recollect them from other backups. I think of scans with "find - size 0" or "file xy.cr2" to read the magic bytes, but I think I have at least to read the EXIF as well (maybe even to load the image?), and I have to handle jpg, gif, ppm, raw data from different cameras whatever. Digikam has all tools on board already, and while searching for new items it already reports (at least some) bad files.
Would it be possible to write a dedicated log file for such cases along with a reason for the failure?
I assume I'm not the only one facing data loss, I guess it would be helpful for others as well. I'm also thinking that a sort filter "bad file" might be also helpful ... at least if the broken files happen to be stored in the database as well (are they?). It would allow subsequent selection and handling with a script (took some time to find out how that works, but it is a nice feature).
|