digiKam › digikam-users

backup and data integrity

Classic

List

Threaded

14 messages Options

Arnd Baecker

backup and data integrity

Hi,

one important issue with digital images is the
question of backup (eg. CD/DVD, optical media, ..
several separate hard-disks, off-site hard-disks, ...).

Another (maybe often over-looked?) aspect is whether
the data (both on the master disk and the backups)
are still correct.
It may well happen that files just get corrupted
on the hard-disk.
(I recently had such an experience, where
fortunately an old back-up on CD allowed to recover
the few files).

So my question is:
How do you ensure the correctness of your data?

What methods are useable and could one maybe
integrate/provide part of the needed tools
inside digikam?

One approach might be to use a hash value (eg. md5):

- Digikam could compute a hash value for every
image and store it inside of digikam's database.

This would allow, by an additional tool, to periodically (etc.)
check for any possible changes (=corruption) of images.

- Of course, if an image gets changed
(e.g by adding comments, ratings, tags or other meta-data),
the hash needs to be recomputed by the photo management application.

((Another possibility is to only compute the hash of the image
data itself, but I think that a hash for the full file is better).

- Also, one might even think of checking the hash before
editing an image to ensure that it did not get corrupted.

((And maybe for the paranoid: even after saving a file
one could compare with the data in memory?))

For backups one could add a file with all the hash-values for
the files. Or each image file could be supplemented by
a *.hash file.
Again with a (simple) tool these hash values could
be recomputed and compared.

While maybe not yet fully sophisticated, this might
be already better than blindly believing
that all files on the hard-disk are still ok ;-).

Are there any other important aspects digikam
would need to enable checks of data integrity?

Note that this is to some extent related to
- "Md5 Checksums to identify pictures"
http://bugs.kde.org/show_bug.cgi?id=110066
- "Uniquely identifying each image in a collection of images"
http://bugs.kde.org/show_bug.cgi?id=125736
- "backup on dvd (and maybe sync with dvd-ram?)"
http://bugs.kde.org/show_bug.cgi?id=113715

Any comments, thoughts, suggestions are very welcome!

Best, Arnd
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Jakob "Ãstergaard"

Re: backup and data integrity

On Wednesday 16 January 2008, Arnd Baecker wrote:
> Hi,
>
> one important issue with digital images is the
> question of backup (eg. CD/DVD, optical media, ..
> several separate hard-disks, off-site hard-disks, ...).

I use rsync to an off-site server.

My mother uses rsync to a removable disk.

I work for a company who will be providing affordable remote backup as a
service to end users.

> Another (maybe often over-looked?) aspect is whether
> the data (both on the master disk and the backups)
> are still correct.
> It may well happen that files just get corrupted
> on the hard-disk.

Sure, that's unlikely but not impossible.

> (I recently had such an experience, where
> fortunately an old back-up on CD allowed to recover
> the few files).
>
> So my question is:
> How do you ensure the correctness of your data?

Use proper hardware with ECC memory.

No overclocked/non-ECC systems.

> What methods are useable and could one maybe
> integrate/provide part of the needed tools
> inside digikam?
>
> One approach might be to use a hash value (eg. md5):
>
> - Digikam could compute a hash value for every
> image and store it inside of digikam's database.
>
> This would allow, by an additional tool, to periodically (etc.)
> check for any possible changes (=corruption) of images.

But the application should not do this.

The reason why we use a modern OS on modern hardware is, that the OS and
the hardware will work together to provide us with everything from
hardware abstraction to error correction.

ZFS provides the extra layer of error detection along with error
correction that you seek. I believe time would be better spent lobbying
for a port of that technology to Linux (one way or another - yes I am
aware that there are complications), rather than trying to patch in a
little bit of that functionality into a single application (which will
not help the general case).

Anyway, that's my take on the issue. I believe any improvement that
would be needed, is needed on a general system-wide scale, not on a
single-application scale.

--
Jakob Østergaard Hegelund
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

David Edmondson

Re: backup and data integrity

We're wandering off topic a little here...

* [hidden email] [2008-01-17 09:29:59]
> Still, how would ZFS allow to check the integrity of backuped files
> in comparison with those on the master hard-disk?

From the "above the filesystem" view point ZFS doesn't provide
anything special here. You can use fingerprinting, checksums, etc.

The benefit of ZFS is that you can create a filesystem that has
redundant storage and ZFS ensures the validity of the data in each
copy that it keeps (using checksums).

In essence, with ZFS you don't worry about the validity of the data at
the application layer, the filesystem does it for you.

dme.
--
David Edmondson, http://dme.org

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Arnd Baecker

Re: backup and data integrity

On Thu, 17 Jan 2008, David Edmondson wrote:

> We're wandering off topic a little here...

;-)

> * [hidden email] [2008-01-17 09:29:59]
> > Still, how would ZFS allow to check the integrity of backuped files
> > in comparison with those on the master hard-disk?
>
> >From the "above the filesystem" view point ZFS doesn't provide
> anything special here. You can use fingerprinting, checksums, etc.
>
> The benefit of ZFS is that you can create a filesystem that has
> redundant storage and ZFS ensures the validity of the data in each
> copy that it keeps (using checksums).
>
> In essence, with ZFS you don't worry about the validity of the data at
> the application layer, the filesystem does it for you.

So ZFS sounds really good!

Now (trying to get back to the main topic):

Would some checksum system, integrated into digikam, be useful,
in view of ensuring data integrity for backups?
I think it wouldn't be too difficult to implement something like
this (I briefly discussed with Marcel on the IRC and
with digikam >=0.10 such additions to the database will be easy).
Note that it might come with a bit of a speed penalty when
images/metadata get changed; however, this could be made
configurable.

Any further ideas/opinions?

Best, Arnd

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Arnd Baecker

Re: backup and data integrity

On Thu, 17 Jan 2008, Arnd Baecker wrote:

[...]

> Would some checksum system, integrated into digikam, be useful,
> in view of ensuring data integrity for backups?
> I think it wouldn't be too difficult to implement something like
> this (I briefly discussed with Marcel on the IRC and
> with digikam >=0.10 such additions to the database will be easy).
> Note that it might come with a bit of a speed penalty when
> images/metadata get changed; however, this could be made
> configurable.

So in order to not just talk about stuff, but to try it out, I
set up two python scripts which
A) Generate a recursive tree which contains
for each file below digikams root (e.g. ~/Pictures)
a corresponding md5sum *.hash file

B) Perform a check for each file in the backup
if the checksum matches.

Interestingly, in my case this already revealed
around 500 files which did not match.
(In this particular case it was essentially a user
error, because I changed the metadata (GPS info) for
those files, but without changing the file date.
As I used rsync such that it would not copy over these
files, the back-up went out of sync).

So without a hash comparison, I would have never realized
the inconsistency!

Well, in my opinion we should get some tools to
enable the check of data-integrity into digikam itself ...

Any thoughts/comments/suggestions/... are welcome
to flesh out the ideas of what would be necessary/what makes sense/...!

Best, Arnd

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Gerry Patterson

Re: backup and data integrity

On Jan 21, 2008 1:44 PM, Arnd Baecker <[hidden email]> wrote:

On Thu, 17 Jan 2008, Arnd Baecker wrote:

[...]

> Would some checksum system, integrated into digikam, be useful,
> in view of ensuring data integrity for backups?
> I think it wouldn't be too difficult to implement something like
> this (I briefly discussed with Marcel on the IRC and
> with digikam >=0.10 such additions to the database will be easy).
> Note that it might come with a bit of a speed penalty when
> images/metadata get changed; however, this could be made
> configurable.

So in order to not just talk about stuff, but to try it out, I
set up two python scripts which
A) Generate a recursive tree which contains
for each file below digikams root ( e.g. ~/Pictures)
a corresponding md5sum *.hash file

B) Perform a check for each file in the backup
if the checksum matches.

Interestingly, in my case this already revealed
around 500 files which did not match.
(In this particular case it was essentially a user
error, because I changed the metadata (GPS info) for
those files, but without changing the file date.
As I used rsync such that it would not copy over these
files, the back-up went out of sync).

So without a hash comparison, I would have never realized
the inconsistency!

Well, in my opinion we should get some tools to
enable the check of data-integrity into digikam itself ...

Any thoughts/comments/suggestions/... are welcome
to flesh out the ideas of what would be necessary/what makes sense/...!

Best, Arnd

Hello Arnd,

What options are you passing to rsync? If you give it the '-c' option rsync will skip based on a checksum instead of mod-time and size. This would at least make your backup consistent with your master. However, it would not avoid the original-corrupted-then-backup issue you brought up earlier.

As I think about this, it sounds like implementing a SCM. Basically, you want to know if a file has changed on disc with or, in your case, without intention. In theory, when you have a new file you would 'check it in' to the picture repository. If you make changes you 'check in' the new version of the file. In your case a "check-in" would be to create a check-sum of the file. This leads me to thinking about the "Versioned image" request that is already in digikam. Perhaps a single solution would handle both cases?

Best Regards,

Gerry

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Gerhard Kulzer-3

Re: backup and data integrity

In reply to this post by Arnd Baecker

Am Monday 21 January 2008 schrieb Arnd Baecker:

> On Thu, 17 Jan 2008, Arnd Baecker wrote:
>
> [...]
>
> > Would some checksum system, integrated into digikam, be useful,
> > in view of ensuring data integrity for backups?
> > I think it wouldn't be too difficult to implement something like
> > this (I briefly discussed with Marcel on the IRC and
> > with digikam >=0.10 such additions to the database will be easy).
> > Note that it might come with a bit of a speed penalty when
> > images/metadata get changed; however, this could be made
> > configurable.
>
> So in order to not just talk about stuff, but to try it out, I
> set up two python scripts which
> A) Generate a recursive tree which contains
> for each file below digikams root (e.g. ~/Pictures)
> a corresponding md5sum *.hash file
>
> B) Perform a check for each file in the backup
> if the checksum matches.
>
> Interestingly, in my case this already revealed
> around 500 files which did not match.
> (In this particular case it was essentially a user
> error, because I changed the metadata (GPS info) for
> those files, but without changing the file date.
> As I used rsync such that it would not copy over these
> files, the back-up went out of sync).
>
> So without a hash comparison, I would have never realized
> the inconsistency!
>
> Well, in my opinion we should get some tools to
> enable the check of data-integrity into digikam itself ...
>
> Any thoughts/comments/suggestions/... are welcome
> to flesh out the ideas of what would be necessary/what makes sense/...!
>
> Best, Arnd
>
> _______________________________________________
> Digikam-users mailing list
> [hidden email]
> https://mail.kde.org/mailman/listinfo/digikam-users

Arnd, can you send me the script? I'd like to try too.

I just read that strigi is exactly doing what we want, comparing files with
sha1. Maybe sha1 is faster than md5?
Strigi creates a sha1 of every file and stores it its DB. Then it checks for
file date changes and if yes, runs sha1 to see if it really has changed
before grepping it thouroughly.

Gerhard

--
><((((º> ¸.·´¯`·... ><((((º> ¸.·´¯`·...¸ ><((((º>
http://www.gerhard.fr

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

signature.asc (196 bytes) Download Attachment

Arnd Baecker

Re: backup and data integrity

On Mon, 21 Jan 2008, Gerhard Kulzer wrote:

[...]

> Arnd, can you send me the script? I'd like to try too.

Done (off-list, it is really not ment for general consumption ... ;-)

> I just read that strigi is exactly doing what we want, comparing files with
> sha1. Maybe sha1 is faster than md5?

No idea. Maybe we should do a speed test at some point ;-)

> Strigi creates a sha1 of every file and stores it its DB. Then it checks for
> file date changes and if yes, runs sha1 to see if it really has changed
> before grepping it thouroughly.

Looking at
http://strigi.sourceforge.net/?q=features
it does not seem to support images?

I don't yet fully see how strigi will finally fit into
"the" solution, this is definitively something to look at in more detail!
Thanks for the pointer.

Best, Arnd
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Arnd Baecker

Re: backup and data integrity

In reply to this post by Gerry Patterson

On Mon, 21 Jan 2008, Gerry Patterson wrote:

> On Jan 21, 2008 1:44 PM, Arnd Baecker <[hidden email]> wrote:
>
> > On Thu, 17 Jan 2008, Arnd Baecker wrote:
> >
> > [...]
> >
> > > Would some checksum system, integrated into digikam, be useful,
> > > in view of ensuring data integrity for backups?
> > > I think it wouldn't be too difficult to implement something like
> > > this (I briefly discussed with Marcel on the IRC and
> > > with digikam >=0.10 such additions to the database will be easy).
> > > Note that it might come with a bit of a speed penalty when
> > > images/metadata get changed; however, this could be made
> > > configurable.
> >
> > So in order to not just talk about stuff, but to try it out, I
> > set up two python scripts which
> > A) Generate a recursive tree which contains
> > for each file below digikams root (e.g. ~/Pictures)
> > a corresponding md5sum *.hash file
> >
> > B) Perform a check for each file in the backup
> > if the checksum matches.
> >
> > Interestingly, in my case this already revealed
> > around 500 files which did not match.
> > (In this particular case it was essentially a user
> > error, because I changed the metadata (GPS info) for
> > those files, but without changing the file date.
> > As I used rsync such that it would not copy over these
> > files, the back-up went out of sync).
> >
> > So without a hash comparison, I would have never realized
> > the inconsistency!
> >
> > Well, in my opinion we should get some tools to
> > enable the check of data-integrity into digikam itself ...
> >
> > Any thoughts/comments/suggestions/... are welcome
> > to flesh out the ideas of what would be necessary/what makes sense/...!
> >
> > Best, Arnd
> >
>
> Hello Arnd,
>
> What options are you passing to rsync? If you give it the '-c' option rsync
> will skip based on a checksum instead of mod-time and size. This would at
> least make your backup consistent with your master.

Yes, I should have used that.
I did not do so because I feared that this would take
much longer, but never verified this belief ...

> However, it would not
> avoid the original-corrupted-then-backup issue you brought up earlier.

It seems to be something which happens more often
than one thinks. At least Gerhard told me that he
has this problem frequently ...

> As I think about this, it sounds like implementing a SCM. Basically, you
> want to know if a file has changed on disc with or, in your case, without
> intention. In theory, when you have a new file you would 'check it in' to
> the picture repository. If you make changes you 'check in' the new version
> of the file. In your case a "check-in" would be to create a check-sum of
> the file.

Yes, this sounds like what we will need!

> This leads me to thinking about the "Versioned image" request
> that is already in digikam. Perhaps a single solution would handle both
> cases?

It depends a lot on how the versioning of images
will be realized. But this should definitively be kept in mind!

Thanks a lot for your comments!

Best, Arnd
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Jakob "Ãstergaard"

Re: backup and data integrity

In reply to this post by Arnd Baecker

On Tuesday 22 January 2008, Arnd Baecker wrote:

Both sha1 and md5 are designed to make it difficult to create a file
with a specific checksum. This is necessary for applications like
digital signatures, but it usually comes at a significant performance
(and complexity) premium.

CRCs, on the other hand, were meant to catch what you're trying to
catch, and will usually be a lot faster.

A CRC64 should be more than sufficient to catch any of the mismatches
you're looking for (CRC32, such as reported by the cksum command, would
probably be good enough for most purposes as well). And it will
definitely be much much faster than the cryptographically secure
hashes.

--
Jakob Østergaard Hegelund
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

Gerhard Kulzer-3

Re: backup and data integrity

Am Wednesday 23 January 2008 schrieb Jakob Østergaard Hegelund:

> On Tuesday 22 January 2008, Arnd Baecker wrote:
> > On Mon, 21 Jan 2008, Gerhard Kulzer wrote:
> >
> > [...]
> >
> > > Arnd, can you send me the script? I'd like to try too.
> >
> > Done (off-list, it is really not ment for general consumption ... ;-)
> >
> > > I just read that strigi is exactly doing what we want, comparing
> > > files with sha1. Maybe sha1 is faster than md5?
> >
> > No idea. Maybe we should do a speed test at some point ;-)
>
> Both sha1 and md5 are designed to make it difficult to create a file
> with a specific checksum. This is necessary for applications like
> digital signatures, but it usually comes at a significant performance
> (and complexity) premium.
>
> CRCs, on the other hand, were meant to catch what you're trying to
> catch, and will usually be a lot faster.
>
> A CRC64 should be more than sufficient to catch any of the mismatches
> you're looking for (CRC32, such as reported by the cksum command, would
> probably be good enough for most purposes as well). And it will
> definitely be much much faster than the cryptographically secure
> hashes.

Well seen Jakob :-)

I just came across an arcticle by Martin Petersen from Oracle
(http://linux.sys-con.com/read/480659_1.htm, 3 pages).
They implement a end-to-end data protection mechanism using checksum metadata.
Citation:
"This CRC is quite expensive to calculate compared to other commonly used
checksums. To alleviate the impact on system performance the TCP/IP checksum
algorithm is used instead. This results in an almost negligible impact on
system performance."

To me that sounds logical, I think we can stop searching here, it's just a
matter of finding out how to best implement the TCP checksum.

Gerhard
--
><((((º> ¸.·´¯`·... ><((((º> ¸.·´¯`·...¸ ><((((º>
http://www.gerhard.fr

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

signature.asc (196 bytes) Download Attachment

Gerhard Kulzer-3

Re: backup and data integrity

In reply to this post by Jakob "Ãstergaard"

Am Wednesday 23 January 2008 schrieb Jakob Østergaard Hegelund:

Reading the FAQ

(http://oss.oracle.com/projects/data-integrity/dist/documentation/faq.html)
I have to halfway backtrack:

Q: The TCP/IP checksum algorithm is notoriously bad at detecting single-bit
errors. Why didn't you pick a stronger algorithm?
A: Other options were contemplated, including Fletcher and XOR. The IP
checksum was chosen because it was already implemented.
Also, the purpose of the checksum isn't necessarily to detect bit errors.
Server-class systems feature error checking and correcting memory and buses.
The main intent of the checksum is to allow verification that the data buffer
matches the integrity metadata. And the IP checksum handles that fine.

Gerhard

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

signature.asc (196 bytes) Download Attachment

Gerhard Kulzer-3

Re: backup and data integrity

In reply to this post by Arnd Baecker

Am Tuesday 22 January 2008 schrieb Arnd Baecker:

> On Mon, 21 Jan 2008, Gerhard Kulzer wrote:
>
> [...]
>
> > Arnd, can you send me the script? I'd like to try too.
>
> Done (off-list, it is really not ment for general consumption ... ;-)
>
> > I just read that strigi is exactly doing what we want, comparing files
> > with sha1. Maybe sha1 is faster than md5?
>
> No idea. Maybe we should do a speed test at some point ;-)
>
> > Strigi creates a sha1 of every file and stores it its DB. Then it checks
> > for file date changes and if yes, runs sha1 to see if it really has
> > changed before grepping it thouroughly.
>
> Looking at
> http://strigi.sourceforge.net/?q=features
> it does not seem to support images?
>
> I don't yet fully see how strigi will finally fit into
> "the" solution, this is definitively something to look at in more detail!
> Thanks for the pointer.
>
> Best, Arnd

Hi Arnd,
I try to sumarize what we said last night on IRC, just as a public memo.

Aim is to
a) prevent corrupt images to be saved onto disk and to
b) detect existing corrupt files on disk
(to prevent overwriting of potentially good backups)

Strategies like DIF and HARD are not available in the consumer market for
another couple of years, but given the inclrease in size, speed and
complexity of systems, consumer system will implement some kind of ECC
(horizon ~ 3y).

Protection on file system level as provided by zfs and btrfs are good but
insufficient as they protect the disk only and not the transmission chain
appl - OS - I/O controller - fs

So we have to do it 'by hand' (meaning digikam)
While saving a file after modification a)
1. keep it in memory
2. save it to disk
3. flush disk to clear cache
(3a. make sure all disk internal buffers are cleared by reading other data the
size of the disk buffer) = optional
5. run CRC checksum on file on disk and file in memory
5a. alternative: store checksum already in metadata and save it with file.
6. if mismatch, re-write file and repeat procedure

for problem b)
7. if 5a was used, as simple scrubbing scan can be launched, manually or
programmed at frequency X
7a. try to open files and look for errors produced (but this method is not
reliable, I have images that show the upper part, are corrupt and produce no
error message. However, the more severe error can be found)
8. generate user alert so that one can manually check between backup and
original.

This method may seem tedious, but has the advantage of being independent of OS
and file system, works on nfs as well.

Gerhard

--
><((((º> ¸.·´¯`·... ><((((º> ¸.·´¯`·...¸ ><((((º>
http://www.gerhard.fr

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

signature.asc (196 bytes) Download Attachment

Arnd Baecker

Re: backup and data integrity

On Wed, 23 Jan 2008, Gerhard Kulzer wrote:
[... previous discussion about checksum algorithms snipped ...]

> Hi Arnd,
> I try to sumarize what we said last night on IRC, just as a public memo.
>
> Aim is to
> a) prevent corrupt images to be saved onto disk and to
> b) detect existing corrupt files on disk
> (to prevent overwriting of potentially good backups)
>
> Strategies like DIF and HARD are not available in the consumer market for
> another couple of years, but given the inclrease in size, speed and
> complexity of systems, consumer system will implement some kind of ECC
> (horizon ~ 3y).
>
> Protection on file system level as provided by zfs and btrfs are good but
> insufficient as they protect the disk only and not the transmission chain
> appl - OS - I/O controller - fs
>
> So we have to do it 'by hand' (meaning digikam)

Yes, full agreement!

> While saving a file after modification a)
> 1. keep it in memory
> 2. save it to disk
> 3. flush disk to clear cache
> (3a. make sure all disk internal buffers are cleared by reading other data the
> size of the disk buffer) = optional
> 5. run CRC checksum on file on disk and file in memory
> 5a. alternative: store checksum already in metadata and save it with file.

Does this work? I mean: you compute the checksum, based on the
file contents. Then you add the check-sum to the file, but
then the file contents changes and thus its checksum. So
there is no way to embed the correct checksum of a file
in the file itself.

> 6. if mismatch, re-write file and repeat procedure
>
> for problem b)
> 7. if 5a was used, as simple scrubbing scan can be launched, manually or
> programmed at frequency X
> 7a. try to open files and look for errors produced (but this method is not
> reliable, I have images that show the upper part, are corrupt and produce no
> error message. However, the more severe error can be found)
> 8. generate user alert so that one can manually check between backup and
> original.
>
> This method may seem tedious, but has the advantage of being independent of OS
> and file system, works on nfs as well.

OK, the next thing is a proposal for the more technical side
on how to integrate all this into digikam:

A) For every new image/file getting under digikams control:
compute checksum/hash and
add
(hash, date of the hash computation,
modification time of the file on disk)
to the data-base

B) When editing images, use the above described procedure
to ensure that the file is correctly written to disk.
a) before editing: verify hash
b) After editing:
The corresponding (hash, date of hash, mod-time)
are stored in the data-base

C) What about files which get modified/added by external tools?
i) when digikam is running:
All such changes are detected by KDirwatch.
((Is this statement correct? E.g. even if the file date
is not changed?))
a) addition of a new file: see A)
b) modification of a file already in the database:
Here a warning should be given.
(but not much can be done, right?)
Apart from this: see A)
ii) When digikam is not running
a) addition of a new file: see A)
b) modification of a file already in the database:
If the file modification time is different
than the one in the data-base, this *could*
be detected.
However, this might take some additional time on the
initial scanning. ((not sure how much time ...))

- if such a change is detected: see i)b) before
- if such a change is not detected:
possible problem.
This can only be detected in a full check, see D)

D) New Check Tool for the Data integrity:

Visual side:
- will display: oldest non-checked file
- maybe a visual overview of files not checked (in a given time-window)
(could look similar to the time line ... ;-)
- reminder on startup of digikam to perform a check
in regular intervals (user-specified).

Actual check:
- just loop over all images and recompute the hash value
and update the date in the database for the last check.
- a quick version could just check for modification times

This tool should be stoppable/restartable at any time,
and run in the background,
while one can do all the normal stuff with digikam.

D) Backup

Here we have to ensure, that no "good" copies of
the backup get destroyed by corrupted images in
the main repository.

Using just rsync does not seem possible:
a) rsync --checksum takes a long time once the number
of files is large
b) It does not know about the hash stored inside digikams
database

This is of course a pity, because normally using unix tools
is always the best option, instead of re-inventing the wheel.
So we have to think about this point ...

Note that this is related to
- "Image backup with thumbs and metadata database for fast searching"
http://bugs.kde.org/show_bug.cgi?id=133638
- "backup on dvd (and maybe sync with dvd-ram?)"
http://bugs.kde.org/show_bug.cgi?id=113715
- "Sync Plugin: New Syncronisation Framework KIPI Plugin"
http://bugs.kde.org/show_bug.cgi?id=143978
and to some extent also to
- "Wish: Offline manager for Digikam"
http://bugs.kde.org/show_bug.cgi?id=114539
- "Wish: easy transport of albums, including tags, comments, etc."
http://bugs.kde.org/show_bug.cgi?id=103201

For the moment I think we should postpone the details
for this point until A) - C) are implemented and tested.
External tools could then use the information in the data-base
to test the right approach for D).

Comments are very much appreciated!
(And: Should we turn this into a BKO wish?)

Best, Arnd
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users