Long database sync

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Long database sync

Noeck
Hi,

I did Extras > Management… (Extras > Wartung… in German) in my digikam
4.9 installation from the ppa. And then I did a database sync from the
files to the database.

It takes hours (after 4h it currently reached 14%) and it shows a high
disk write rate: 14MB/s. Why does it write so much and where? The
database size does not change much (in those 4 hours it has grown by
1MB) and the image files do not seem to be touched (as expected when
doing a sync files->database). If it just changes the database contents
with 14MB/s, couldn't it do that in memory and then dump the result into
the file only once?

So where does it write to with 14MB/s?

Thanks for clarification,
Joram

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Henrique Santos Fernandes
Noek,

I think database sync means that you will sync the metadata inyour database to your files..

So it is going trought all you pictures and writing the data in it.
Thats maybe why that much disk activity.

But i am not sure about it.

hope that helps!


Em sáb, 9 de mai de 2015 às 15:52, Noeck <[hidden email]> escreveu:
Hi,

I did Extras > Management… (Extras > Wartung… in German) in my digikam
4.9 installation from the ppa. And then I did a database sync from the
files to the database.

It takes hours (after 4h it currently reached 14%) and it shows a high
disk write rate: 14MB/s. Why does it write so much and where? The
database size does not change much (in those 4 hours it has grown by
1MB) and the image files do not seem to be touched (as expected when
doing a sync files->database). If it just changes the database contents
with 14MB/s, couldn't it do that in memory and then dump the result into
the file only once?

So where does it write to with 14MB/s?

Thanks for clarification,
Joram

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Noeck
In reply to this post by Noeck
By Henrique Santos Fernandes:

> So it is going trought all you pictures and writing the data in it.
> Thats maybe why that much disk activity.

Thanks for your reply. I chose "From image metadata to database" and not
"From database to image metadata", so according to my understanding,
only the database changes. And indeed the image files were not modified
(according to the file system).

In the end, the mentioned database sync took 25h with a constant write
rate of 12 - 15 MB/s. Which naively calculated sums up to 1.2 TB (!).
The database only grew from 89 MB to 91 MB.

My guess was that the database is updated for each file seperately and
completely, no matter whether it contained changes or not. Such that all
this amount of written data replaces the previous content – this would
explain why no additional space was used (free space on disk only
reduced by a few MBs).

The thumbnails-digikam.db was not updated at the same time (no time
stamp change). It has 950 MB.

What puzzles me is that the sum of all images is only 200 GB (50k files)
so much less than the data written to disk.

Thanks for all further insights,
Joram
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Johannes Kapune
In reply to this post by Noeck
Hi Joram,

normally DigiKam uses SQLite, what's a database one file basis.
Each change has to be written directly to prevent data loss.
(and sometimes to read this databasefile back after each change) If you
have to do a lot of changes it can take a lot of time.

If you start this sync you have to wait until it comes to an end.

In later use there is normally only a few changes and for this this type
of database works well.

Johannes

Am 09.05.2015 um 20:51 schrieb Noeck:

> Hi,
>
> I did Extras > Management
 (Extras > Wartung
 in German) in my digikam
> 4.9 installation from the ppa. And then I did a database sync from the
> files to the database.
>
> It takes hours (after 4h it currently reached 14%) and it shows a high
> disk write rate: 14MB/s. Why does it write so much and where? The
> database size does not change much (in those 4 hours it has grown by
> 1MB) and the image files do not seem to be touched (as expected when
> doing a sync files->database). If it just changes the database contents
> with 14MB/s, couldn't it do that in memory and then dump the result into
> the file only once?
>
> So where does it write to with 14MB/s?
>
> Thanks for clarification,
> Joram
>
> _______________________________________________
> Digikam-users mailing list
> [hidden email]
> https://mail.kde.org/mailman/listinfo/digikam-users
>
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Gilles Caulier-4
In reply to this post by Noeck
2015-05-10 23:54 GMT+02:00 Noeck <[hidden email]>:
> By Henrique Santos Fernandes:
>
>> So it is going trought all you pictures and writing the data in it.
>> Thats maybe why that much disk activity.
>
> Thanks for your reply. I chose "From image metadata to database" and not
> "From database to image metadata", so according to my understanding,
> only the database changes. And indeed the image files were not modified
> (according to the file system).

yes exactly, but... image metadata must be read to re-populate DB.
It's delegate to Exiv2 shared library.

>
> In the end, the mentioned database sync took 25h with a constant write
> rate of 12 - 15 MB/s. Which naively calculated sums up to 1.2 TB (!).

I don't understand this value. Are you sure that 15 MB/s is for
writing and not reading ?

> The database only grew from 89 MB to 91 MB.

This value is correct.

>
> My guess was that the database is updated for each file seperately and
> completely, no matter whether it contained changes or not.

yes.

> Such that all
> this amount of written data replaces the previous content – this would
> explain why no additional space was used (free space on disk only
> reduced by a few MBs).

yes

>
> The thumbnails-digikam.db was not updated at the same time (no time
> stamp change). It has 950 MB.

size can be correct as it use PGF wavelets compression image data
(before following desktop.org paper, we unse PNG which square size
easily).

>
> What puzzles me is that the sum of all images is only 200 GB (50k files)
> so much less than the data written to disk.

Possible problem can be relevant of a bug about wrong albums list
passed to maintenance tools, discovered recently and fixed in 4.10.0

Q : did you use multicore option in Maintenance dialog ?

Gilles Caulier
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Noeck
Hi Gilles,

thanks for your reply and for confirming my guesses. Please find answers
to your questions inline.

>> I chose "From image metadata to database" and not
>> "From database to image metadata", so according to my understanding,
>> only the database changes. And indeed the image files were not modified
>> (according to the file system).
>
> yes exactly, but... image metadata must be read to re-populate DB.
> It's delegate to Exiv2 shared library.

Ok. But read not written and the system monitor showed it as written.

>> In the end, the mentioned database sync took 25h with a constant write
>> rate of 12 - 15 MB/s. Which naively calculated sums up to 1.2 TB (!).
>
> I don't understand this value. Are you sure that 15 MB/s is for
> writing and not reading ?

Yes. This is what astonishes me.

>> What puzzles me is that the sum of all images is only 200 GB (50k files)
>> so much less than the data written to disk.
>
> Possible problem can be relevant of a bug about wrong albums list
> passed to maintenance tools, discovered recently and fixed in 4.10.0
>
> Q : did you use multicore option in Maintenance dialog ?

No this was done without the multicore option. So there might be some
gain here. However it seems pretty much i/o bound.

And I chose all albums and all tags (no selection here). I was wondering
if that scans all files twice?

Is 4.10 already in the ppa for ubuntu, I could test it then.

Cheers,
Joram
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Gilles Caulier-4
2015-05-13 19:18 GMT+02:00 Noeck <[hidden email]>:

> Hi Gilles,
>
> thanks for your reply and for confirming my guesses. Please find answers
> to your questions inline.
>
>>> I chose "From image metadata to database" and not
>>> "From database to image metadata", so according to my understanding,
>>> only the database changes. And indeed the image files were not modified
>>> (according to the file system).
>>
>> yes exactly, but... image metadata must be read to re-populate DB.
>> It's delegate to Exiv2 shared library.
>
> Ok. But read not written and the system monitor showed it as written.
>
>>> In the end, the mentioned database sync took 25h with a constant write
>>> rate of 12 - 15 MB/s. Which naively calculated sums up to 1.2 TB (!).
>>
>> I don't understand this value. Are you sure that 15 MB/s is for
>> writing and not reading ?
>
> Yes. This is what astonishes me.
>
>>> What puzzles me is that the sum of all images is only 200 GB (50k files)
>>> so much less than the data written to disk.
>>
>> Possible problem can be relevant of a bug about wrong albums list
>> passed to maintenance tools, discovered recently and fixed in 4.10.0
>>
>> Q : did you use multicore option in Maintenance dialog ?
>
> No this was done without the multicore option. So there might be some
> gain here. However it seems pretty much i/o bound.
>
> And I chose all albums and all tags (no selection here). I was wondering
> if that scans all files twice?

yes, i suspect this. Look this entry in bugzilla :

https://bugs.kde.org/show_bug.cgi?id=342791

This affect Thumbnail Generator, Quality Sorter, and Fingerprints Generator.

We must take a look if other maintenance tools are affected (in your
case DB synchronizer).

Gilles Caulier
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Gilles Caulier-4
This is a copy of private mail from another developer who have tried
to reproduce the problem :

>The metadatasynchronizer processed TAlbums correctly. I have moved to test my
>DB on the SSD. My about 20000 images then require approximately 6 minutes. One
>should of course not select tags, because otherwise will double synronisiert.

Gilles Caulier

2015-05-13 20:07 GMT+02:00 Gilles Caulier <[hidden email]>:

> 2015-05-13 19:18 GMT+02:00 Noeck <[hidden email]>:
>> Hi Gilles,
>>
>> thanks for your reply and for confirming my guesses. Please find answers
>> to your questions inline.
>>
>>>> I chose "From image metadata to database" and not
>>>> "From database to image metadata", so according to my understanding,
>>>> only the database changes. And indeed the image files were not modified
>>>> (according to the file system).
>>>
>>> yes exactly, but... image metadata must be read to re-populate DB.
>>> It's delegate to Exiv2 shared library.
>>
>> Ok. But read not written and the system monitor showed it as written.
>>
>>>> In the end, the mentioned database sync took 25h with a constant write
>>>> rate of 12 - 15 MB/s. Which naively calculated sums up to 1.2 TB (!).
>>>
>>> I don't understand this value. Are you sure that 15 MB/s is for
>>> writing and not reading ?
>>
>> Yes. This is what astonishes me.
>>
>>>> What puzzles me is that the sum of all images is only 200 GB (50k files)
>>>> so much less than the data written to disk.
>>>
>>> Possible problem can be relevant of a bug about wrong albums list
>>> passed to maintenance tools, discovered recently and fixed in 4.10.0
>>>
>>> Q : did you use multicore option in Maintenance dialog ?
>>
>> No this was done without the multicore option. So there might be some
>> gain here. However it seems pretty much i/o bound.
>>
>> And I chose all albums and all tags (no selection here). I was wondering
>> if that scans all files twice?
>
> yes, i suspect this. Look this entry in bugzilla :
>
> https://bugs.kde.org/show_bug.cgi?id=342791
>
> This affect Thumbnail Generator, Quality Sorter, and Fingerprints Generator.
>
> We must take a look if other maintenance tools are affected (in your
> case DB synchronizer).
>
> Gilles Caulier
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: Long database sync

Noeck
Dear Gilles,

I tested it once again with these settings:
all albums (+ checked), all tags (not checked), multicore and only the
sync from files to the database (no other maintainance tools). The
difference to the last time is: all tags is not checked and the
multicore is selected.

I am now using DigiKam 4.10.0 under KDE 4.13.3.

It is now faster and shows these numbers on average:
load: 4 on four cores
25% CPU, 45% wait
disk read 25 MB/s, write 12 MB/s
It froze about 20 times (and stopped writing and reading) but resumed
after about 10-30 seconds each time.

At 72% it stopped and a message popped up 3 times within 5 min saying
»Der Prozess für das Protokoll digikamtags wurde unerwartet beendet.«
which roughly translates as: »The process for the protocol digikamtags
was terminated inexpectedly.«

The total time was now 27:19, the final message was something like: »All
processes terminated successfully«.
The write rate is probably just the maximum speed of the drive.
The database did not change this time (exact same number of bytes),
which makes sense as there are no new images and no new information in
the existing images. This is now much more consistent with my
expectations. Considering the deadtimes summing up to about 15 min and
the fact that my collection is twice the size of the mentioned 20000,
this is quite close to the other developer.

One more question: My database was created with digikam 2.5 and later
used with 3.5 and 4.2. The switch to 4.9 was very recent. Might it be
that it had to be rewritten drastically for that reason?

tl;dr: Much faster in the 2nd run and with 4.10. Might an old database
be the reason?

Cheers,
Joram


PS @Gilles and all involved developers: Thank you very much for this
great program! I use it for years now and I am very happy with it.



Am 13.05.2015 um 23:18 schrieb Gilles Caulier:
> This is a copy of private mail from another developer who have tried
> to reproduce the problem :
>
>> The metadatasynchronizer processed TAlbums correctly. I have moved to test my
>> DB on the SSD. My about 20000 images then require approximately 6 minutes. One
>> should of course not select tags, because otherwise will double synronisiert.
>
> Gilles Caulier
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users