about "synchronizing XMP sidecars and the digiKam database"

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

about "synchronizing XMP sidecars and the digiKam database"

M. Fioretti
Hi Elle, and list!

I 've been thinking a bit about what Elle explains here:

http://ninedegreesbelow.com/photography/dam-software-metadata.html

in the section "synchronizing XMP sidecars and the digiKam database"

> I want the speed and convenience of writing only to the database,
> and the safety of writing to a sidecar file. So during any given
> tagging session I set digiKam to write only to the database. At the
> end of each tagging session I change the settings and write
> everything out to sidecar files. Setting and resetting the digiKam
> metadata settings (Bug 227814) gets a bit tedious, but not as
> tedious as waiting for digiKam to write out a tag to a whole bunch
> of files every time I make a change on the tag tree.

my question is: can't we let the _computer_ (NOT digiKam!) take care
of this automatically? What if the user NEVER ever uses DK to write
metadata to image or sidecar files, but ONLY works in the database,
and a shell script runs automatically every ten minutes and:

- if digikam is running, set a RESYNC flag

- if digikam is NOT running and RESYNC is set (i.e. max ten minutes
  after you closed DK))

  - reads metadata from DK database
  - writes them with exiftool, to image or sidecar files, depending on
    configuration
  - reset RESYNC flag

no manual set and reset in each session, no wait for digiKam bugs to
be fixed (or new featurs to be implemented)...

could this work? Does it have any drawback? Oh, and of course: where
is, exactly, the documentation to study to write such a script,
without reverse-engineering the files generated by digiKam?

Marco




_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

Elle Stone
>What if the user NEVER ever uses DK to write
>metadata to image or sidecar files, but ONLY works in the database,
>and a shell script runs automatically every ten minutes and:
>
>no manual set and reset in each session, no wait for digiKam bugs to
>be fixed (or new featurs to be implemented)...

As a personal solution to digiKam's current difficulties keeping the
database and the sidecars and/or image files synchronized, such a
script is always an option, assuming someone has the expertise to
write a script that reads a database (way out of my league).

A few DAM tools (Resource Space, for one) do use exiftool, but for
whatever good and valid reason, digiKam decided to use exiv2 instead
of exiftool, probably because of the perl dependency? So digiKam
itself is not likely to ever interact with exiftool.

>could this work? Does it have any drawback? Oh, and of course: where
>is, exactly, the documentation to study to write such a script,
>without reverse-engineering the files generated by digiKam?

Such a script would need constant updating to keep up with the latest
digiKam/exiv2. The digiKam and exiv2 source code and documentation is
always available; for low-level interactions the source code itself is
often the best documentation. Or you can write out a few test images
with all the metadata that you normally save, and check using exiftool
to see what gets written. That's probably what you mean by
reverse-engineering, but in this case I think reverse-engineering is
probably the easier way to go.

Kind regards,
Elle

--
http://ninedegreesbelow.com - articles on open source digital photography
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

M. Fioretti
On Thu, Feb 21, 2013 11:52:24 AM -0500, Elle Stone wrote:

> A few DAM tools (Resource Space, for one) do use exiftool, but for
> whatever good and valid reason, digiKam decided to use exiv2 instead
> of exiftool, probably because of the perl dependency? So digiKam
> itself is not likely to ever interact with exiftool.

(if my assumptions are correct) it doesn't need to, and it doesn't
matter whether digiKam uses exiv2 or exiftool. digiKam writes stuff to
its database and to image/sidecar files in formats that are known or
can be known, e.g. (I'm making stuff up, just to explain my point) the
picture title may be written:

to the Sqlite or MySql database as  "Title: 2009 birthday"
to the picture  as  "Title = '2009 birthday'"
to the sidecar  as  "Title => {2009 birthday}"

so it doesn't really matter _who_ wrote what with _which_ library.
There is no need to know anything at all about how digiKam _works_,
only what it writes and where.

In other words, what matters is only:

1) where the stuff is (Sqlite or MySql db) or must go (image/sidecar
file) and what the exact formatting must be in each place

2) if there is some nasty side-effect

I know how to do (1): there are standard command line tools to extract
data from either Sqlite or MySql databases, regardless of _who_ put
them there; other tools to reformat those records, and exiftool to
write the results in the right places, and I know how to use this
stuff. What I don't know is (2), and what is the fastest way to find
the formatting information: the one you mention:

> write out a few test images with all the metadata that you normally
> save, and check using exiftool to see what gets written

which is no problem, or something faster.

> Such a script would need constant updating to keep up with the
> latest digiKam/exiv2.

Why? If I understand correctly, there is no more "constant updating"
to do in this that in the commands you use now. digiKam and exiv2 can
change as much as they want internally, but as long as the strings
they write remain the same we don't care, do we now?

Besides, if one has a lot of pictures, even changes whenever exiv2 or
digiKam _change_ should be way quicker and less boring than all the
manual configuration changes at each digiKam _session_.

Regards,
Marco
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

Elle Stone
Hmm, if I understand you, what you say makes sense.

The digiKam/exiv2 changes I had in mind were changes in how the actual
strings are written, and where they are written to. At present (or at
least last I checked, as noted in the article you mentioned) digiKam
writes some of the strings ("title", for example) to technically
incorrect places. So I've been using exiftool to pull these items from
the digikam sidecar and put them in the technically correct place in
the exiftool sidecar. And digiKam automatically writes a whole bunch
of stuff (also noted in the article) that I don't want at all, so I
don't transfer any of that stuff to the exiftool sidecar.

A configurable script (write this here, don't write that anywhere)
would be nice to have. I do get tired of changing those "write/don't
write" digiKam options.

Elle

--
http://ninedegreesbelow.com - articles on open source digital photography
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

Jean-François Rabasse
In reply to this post by M. Fioretti

Hi Marco,

Some comments :

> and a shell script runs automatically every ten minutes and:
>
> - if digikam is running, set a RESYNC flag
>
> - if digikam is NOT running and RESYNC is set (i.e. max ten minutes
>  after you closed DK))
> ...

Hem, if you idea, as far as I understand, is to start synchronization
job after Digikam session, the above seems very complicated.
Why not have a two lines script, let's call it "startdk", with :

#!/bin/bash
digikam && start-my-sync-script

To work with DK, you run startdk. And when you leave DK correctly (i.e.
no crash or so) your script starts. Immediately, not 10 mnn later.
IMHO


On Thu, 21 Feb 2013, M. Fioretti wrote:

> In other words, what matters is only:
>
> 1) where the stuff is (Sqlite or MySql db) or must go (image/sidecar
> file) and what the exact formatting must be in each place
>
> 2) if there is some nasty side-effect
>
> I know how to do (1): there are standard command line tools to extract
> data from either Sqlite or MySql databases, regardless of _who_ put
> them there; other tools to reformat those records, and exiftool to
> write the results in the right places, and I know how to use this
> stuff. What I don't know is (2), and what is the fastest way to find
> the formatting information: the one you mention:
Right, you can extract data from any database, with the proper tools,
but to be able to exploit that data you need the database schema.
How to build the list of tags associated with such image, etc. This
implies issuing select, join, et al.
So, you should first dump the database schema, study it, and organise your
extracting logic.
But it's technicaly possible (if it's worth the work).

And, about what Elle said :

>> Such a script would need constant updating to keep up with the
>> latest digiKam/exiv2.
>
> Why? If I understand correctly, there is no more "constant updating"
> to do in this that in the commands you use now. digiKam and exiv2 can
> change as much as they want internally, but as long as the strings
> they write remain the same we don't care, do we now?

Sure you care. If across a DK version change, the schema changes,
(it has already happened) you have to adapt your script(s).

As for point 2, possible side-effects, It shouldn't happen, but on one
point : timing !
What you describe, tools to extract, then tools to reformat, then tools
to write, exiftool et al., is all heavy processing. Scripting is powerful
but slow, exiftool is a powerful tool but slow (Perl, not compiled code).

I really doubt you do process one image that way in less than 0.5 or 1
second. It's peanuts, yes, when run once. When you will have to process
a database of 10000 images, it will last hours.

> Besides, if one has a lot of pictures, even changes whenever exiv2 or
> digiKam _change_ should be way quicker and less boring than all the
> manual configuration changes at each digiKam _session_.

Should be « way less boring », perhaps. But « quicker » ?

Regards,
Jean-François

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

Jean-François Rabasse
In reply to this post by Elle Stone

Hello,

I'd like to be back to that interesting issue that was longly
discussed this past week, metadata synchronisation. I found the
discussion interesting because it clearly shows that problems exists
and that a number of users feel concerned. But the final state seems
now to be : « So what ? »

Maybe it could be useful to try doing a bit of synthesis and extract
a few guidelines and wish-lists ?

As for me, I may be wrong but..., I see three points, with only one
of major importance.


1. Sidecar files vs. metadata into image files.

My personal feeling is that it's an off-topic issue. Metadata
import/export complies to standards, XMP/RDF, and the data stream is
the same. Be it into a separate sidecar file, or transported in the
image file is just a matter of taste and trust. Some users are
reluctant to write into images, because of experimental writing to
RAW files, or because they use other software that corrupts
metadata. Whatever, it's personal taste and option.

Perhaps the only expected thing from Digikam would be to be sure
that reading and writing works well in both case, from/to image,
from/to sidecar. For images files, it does work well, for sidecar
files I don't know. My version (Digikam 2.6.0) writes correct
sidecar files but doesn't read. Maybe this has been fixed in the
meantime and so, the issue is over.


2. Tags reloading into the DK database.

Hum, I feel this is THE major problem. It has been discussed in
details. There are two possible behaviours, merge tags to already
existing tags, or read and replace.
Stating if this should be an option or not isn't that easy without
users usage statistics and use cases.

Recently, Gilles said :

On Tue, 19 Feb 2013, Gilles Caulier wrote:

> The goal is to check which option must be turned on by default to
> satisfy a lots of users by default.

Sounds really wise !
My personal feeling (react if you think I'm wrong) is that the most
natural default is « replace » mode, not « merge » mode.

The rationale could be that this is the expected behaviour for all
users that manage their images with Digikam and only Digikam,
(it's a all-in-one software), doing reversible operations.
And it would greatly help users to reorganise their tags
(Marie-Noëlle and certainly many other users) and also having a
« mirror » backup of metadata, in images or sidecar files,
before re-installing a new version.

I think merging interests mostly users that deal with different
programs (I don't know if it's 5% of us or 95%), but these users
have, per se, different metadata management tools. And merging is
already possible to do outside Digikam
(exiftool -xmp-digikam:tagslist+=My/New/Tag... images or sidecars...).
And then Digikam only requires a simple re-read in replace mode,
et voilà, post-merge synchronisation is done.

** So, I vote for a « replace tags upon reading » mode as a default.
(And the fix is easy, this has been discussed, and will probably fix
80% of problems.)


3. Control over metadata export

This is a great and difficult problem. Digikam tends to export
metadata in many many places, and sometimes in a questionable way.
Cf.

On Thu, 21 Feb 2013, Elle Stone wrote:

> The digiKam/exiv2 changes I had in mind were changes in how the
> actual strings are written, and where they are written to.
> At present (or at least last I checked, as noted in the article
> you mentioned) digiKam writes some of the strings ("title",
> for example) to technically incorrect places.

I totally agree. I've already had data destroyed by Digikam,
when updating a title for some JPEG images that already contained a
JPEG comment plus a different Exif:UserComment. And all that
vanished after Digikam updating Dublin Core title and overwriting my
existing data. (Considering that dc:title and exif:usercomment and
jpeg:comment are synonyms is a semantic error ! )

And that's one of the reasons I've stopped allowing DK to write into
my images. Only sidecars are allowed as it's easy, afterwards,
to extract what is wanted from sidecars and to put into images and
where.

But should this require urgent fixes ? I'm not sure. As long as
users have the possibility to export from database to sidecars,
then to choose exactly what should be into images, it's not a real
problem. Let Digikam work in a symmetrical way between read/write,
cf. point 2, and process apart metadata exchange issues.

Maybe, what could be suggested for future releases is to do more
careful export, something like :
- write image title to XMP DC title
- IF exif:usercomment is empty/unset, write title too, ELSE left
unchanged.
- IF JPEG comment is empty/unset, write title too, ELSE left
unchanged. But this is detail, the workaround is already on the
shelf, « don't touch my images » , I'll do it myself : -)


Regards,
Jean-François


PS: a last comment, relating to what Gilles said :

On Tue, 19 Feb 2013, Gilles Caulier wrote:

> Secondary, the big question to know is how main Photo management
> program work with tags workflow in metadata. When i said main
> applications, i said real pro photograph application, as Aperture,
> NX, LR, etc...

Big question, yes. Tags trees are special metadata in the way that
it has been forgotten in standard schemes, and it is so useful that
every software team implements them, Digikam, Adobe Lightroom,
Microsoft Photo...

My first bet is that in the future next years, this will probably be
added to a standard schema (Dublin Core seems to be a good candidate
to host both dc:subject and dc:tagslist (plus support for a
controlled or managed vocabulary as already suggested for
dc:subject).)

My second bet is that the standard syntax (components separator in
tagstree, a '/' as in Digikam, a '|' as in Lightroom) will probably
be Adobe syntax. The reason is that, on the earth and since the
Neanderthal era, it's always the heavier that wins. : -)

So, all images related software should probably be prepared to do a
migration, one day or the other, and rewrite their tags to the now
standard place, under a standard syntax. We just have to wait...

_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

jdd@dodin.org
Le 24/02/2013 16:15, Jean-François Rabasse a écrit :

> My personal feeling (react if you think I'm wrong) is that the most
> natural default is « replace » mode, not « merge » mode.

yes, but as long as there is some sort of "get images backup" command,
where the contrary is true.

If I find corrupted images (think of disk failure) and get back image
backup, I don't want to lose database metadata

jdd

--
http://www.dodin.org
http://jddtube.dodin.org/20120616-52-highway_v1115
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

Jean-François Rabasse

On Sun, 24 Feb 2013, jdd wrote:

> Le 24/02/2013 16:15, Jean-François Rabasse a écrit :
>
>> My personal feeling (react if you think I'm wrong) is that the most
>> natural default is « replace » mode, not « merge » mode.
>
> yes, but as long as there is some sort of "get images backup" command, where
> the contrary is true.
>
> If I find corrupted images (think of disk failure) and get back image backup,
> I don't want to lose database metadata
Hum, I'm not sure to see exactly what you mean, JD.
Data moves, in backup/restore strategies, require to select a data source
and a data destination. And obviously, if on suspects the source to be
corrupted, the data transfert is to be avoided.
(Same if your database is corrupted or messy, you won't want to export all
metadata to images or sidecars.)

Anyway, no strategy can be based on failures, because you will never be
able to define a priori what failure or what kind of failure.
Also, given a standard image file, say 6 to 8 Mbytes, and the metadata
space, say 2 or 3 Kbytes, one sees that metadata occupies less than
0.5 % of the total file size. In case of a disk failure, e.g. bad block,
you're likely to loose parts of your image pixels data than metadata.
And your image is unuseable anyway.

So, disk failure => replace or repair => restore content from a backup.
IMHO

Jean-François
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

jdd@dodin.org
Le 24/02/2013 21:19, Jean-François Rabasse a écrit :

> Hum, I'm not sure to see exactly what you mean, JD.

I recently lose a 1Tb (brand new) disk (entirely). I had all my photos
on it

but I had a previous backup.

my database is on an other disk (the main one), so when I will use the
new photo disk I want to copy the database metadata to the photos, not
the other way round.

photos have the very same name, but not the same content

jdd
--
http://www.dodin.org
http://jddtube.dodin.org/20120616-52-highway_v1115
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users
Reply | Threaded
Open this post in threaded view
|

Re: about "synchronizing XMP sidecars and the digiKam database"

M. Fioretti
In reply to this post by Jean-François Rabasse
On Thu, Feb 21, 2013 19:36:16 PM +0100, Jean-François Rabasse wrote:

> Hem, if you idea, as far as I understand, is to start synchronization
> job after Digikam session, the above seems very complicated.
> Why not have a two lines script, let's call it "startdk", with :
>
> #!/bin/bash
> digikam && start-my-sync-script
>
> To work with DK, you run startdk. And when you leave DK correctly (i.e.
> no crash or so) your script starts. Immediately, not 10 mnn later.

yes, of course!

> So, you should first dump the database schema, study it, and
> organise your extracting logic.

yes, this is what I had in mind. And I am also aware that this needs
to be partially redone at every major release of digiKam, but
personally I'd still go for the "less boring" route.

Taking a look at the database schema can lead to automate other parts
of the digiKam flow, copying data semi-manually from db to (sidecar)
files and vice-versa only lead to more copying data semi-manually etc...

> I really doubt you do process one image that way in less than 0.5 or
> 1 second. It's peanuts, yes, when run once. When you will have to
> process a database of 10000 images, it will last hours.

true! I had not thought to this, honestly. I don't have a complete
(or final...) answer.

One solution may be to go back to the "cron-job" approach instead of
"digikam && start-my-sync-script". If it works well, I don't really
care if it takes two hours in the background when the computer would
be on anyway, as long as it doesn't block _me_. This is what the nice
command is for, after all. But I'll have to try.

Other comments are still very welcome, of course.

Thanks,
Marco
_______________________________________________
Digikam-users mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-users