Issues with Exiv2...

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues with Exiv2...

Gilles Caulier-4
Hi all,

Actually, i write and test with unit-tests, over Exiv2 and metadata extraction.

In DK, the metadata wrapper over Exiv2 is based on MetaEngine and the inherited class DMetadata. All my tests use DMetadata which provide the most complete API to play with information.

I already found few internal problems, and with the help of Maik, we fix all this stuff.

But the last unit test to finalize, is the most problematic : the stress access to Exiv2, with muti-cores and multi-threads.

Typically, i setup a manager which scan a huge collection of files (40.000 JPEG, PNG, TIFF, RAW, Video, etc.). Each file is read from separated thread to catch most important informations (the famous ones stored in database). Here, 8 cores are used, and 32Gb of ram permit to eliminate all memory leak side effects.

The unit test do not play with database, but stress only Exiv2. It's enough and in fact show quickly the Exiv2 limitations :

- Exiv2 API are not re-entrant. 
- This is not only limited to files parser. 

In digiKam, all is multi-threaded, especially the database scanners, and the maintenance jobs. The threads running Exiv2 API can crash suddenly and let's the application in pending states for a while. Depending of the context in memory, the application can crash or not, but in all cases, the threads management is broken and application will not responding.

The only solution in whole digiKam implementation is to wrap all calls to DMetadata to a thread-safe interface using Mutex Locker. This will reduce the performances, but gain for stability is incredible.

Best

Gilles Caulier

Reply | Threaded
Open this post in threaded view
|

Re: Issues with Exiv2...

Gilles Caulier-4
Another second big problem with Exiv2 is the memory allocation of DMetadata before to use, especially with large files to parse (depending of file mime types).

DMetadata (and in fact the internal Exiv2 objects) need to be allocated on heap, not on stack. Similar problem was discovered by Maik with last libraw 0.19 objects.

So the DMetadata wrapper need to use QScopedPointer. See my code from this unit test function :


Gilles Caulier

Le mer. 7 nov. 2018 à 23:41, Gilles Caulier <[hidden email]> a écrit :
Hi all,

Actually, i write and test with unit-tests, over Exiv2 and metadata extraction.

In DK, the metadata wrapper over Exiv2 is based on MetaEngine and the inherited class DMetadata. All my tests use DMetadata which provide the most complete API to play with information.

I already found few internal problems, and with the help of Maik, we fix all this stuff.

But the last unit test to finalize, is the most problematic : the stress access to Exiv2, with muti-cores and multi-threads.

Typically, i setup a manager which scan a huge collection of files (40.000 JPEG, PNG, TIFF, RAW, Video, etc.). Each file is read from separated thread to catch most important informations (the famous ones stored in database). Here, 8 cores are used, and 32Gb of ram permit to eliminate all memory leak side effects.

The unit test do not play with database, but stress only Exiv2. It's enough and in fact show quickly the Exiv2 limitations :

- Exiv2 API are not re-entrant. 
- This is not only limited to files parser. 

In digiKam, all is multi-threaded, especially the database scanners, and the maintenance jobs. The threads running Exiv2 API can crash suddenly and let's the application in pending states for a while. Depending of the context in memory, the application can crash or not, but in all cases, the threads management is broken and application will not responding.

The only solution in whole digiKam implementation is to wrap all calls to DMetadata to a thread-safe interface using Mutex Locker. This will reduce the performances, but gain for stability is incredible.

Best

Gilles Caulier

Reply | Threaded
Open this post in threaded view
|

Re: Issues with Exiv2...

Gilles Caulier-4
Hi all,

With this commit :


The unit test settings can be customized to perform a check over a local collection. I would to know if crash can be reproducibled when metadata are read from files in other computer.

Best

Gilles Caulier

Le jeu. 8 nov. 2018 à 00:18, Gilles Caulier <[hidden email]> a écrit :
Another second big problem with Exiv2 is the memory allocation of DMetadata before to use, especially with large files to parse (depending of file mime types).

DMetadata (and in fact the internal Exiv2 objects) need to be allocated on heap, not on stack. Similar problem was discovered by Maik with last libraw 0.19 objects.

So the DMetadata wrapper need to use QScopedPointer. See my code from this unit test function :


Gilles Caulier

Le mer. 7 nov. 2018 à 23:41, Gilles Caulier <[hidden email]> a écrit :
Hi all,

Actually, i write and test with unit-tests, over Exiv2 and metadata extraction.

In DK, the metadata wrapper over Exiv2 is based on MetaEngine and the inherited class DMetadata. All my tests use DMetadata which provide the most complete API to play with information.

I already found few internal problems, and with the help of Maik, we fix all this stuff.

But the last unit test to finalize, is the most problematic : the stress access to Exiv2, with muti-cores and multi-threads.

Typically, i setup a manager which scan a huge collection of files (40.000 JPEG, PNG, TIFF, RAW, Video, etc.). Each file is read from separated thread to catch most important informations (the famous ones stored in database). Here, 8 cores are used, and 32Gb of ram permit to eliminate all memory leak side effects.

The unit test do not play with database, but stress only Exiv2. It's enough and in fact show quickly the Exiv2 limitations :

- Exiv2 API are not re-entrant. 
- This is not only limited to files parser. 

In digiKam, all is multi-threaded, especially the database scanners, and the maintenance jobs. The threads running Exiv2 API can crash suddenly and let's the application in pending states for a while. Depending of the context in memory, the application can crash or not, but in all cases, the threads management is broken and application will not responding.

The only solution in whole digiKam implementation is to wrap all calls to DMetadata to a thread-safe interface using Mutex Locker. This will reduce the performances, but gain for stability is incredible.

Best

Gilles Caulier

Reply | Threaded
Open this post in threaded view
|

Re: Issues with Exiv2...

Gilles Caulier-4
With this commit : 


I introduced the first stage to consolidate metadata extraction with Exiv2 using multi-threading.

Gilles Caulier

Le jeu. 8 nov. 2018 à 09:22, Gilles Caulier <[hidden email]> a écrit :
Hi all,

With this commit :


The unit test settings can be customized to perform a check over a local collection. I would to know if crash can be reproducibled when metadata are read from files in other computer.

Best

Gilles Caulier

Le jeu. 8 nov. 2018 à 00:18, Gilles Caulier <[hidden email]> a écrit :
Another second big problem with Exiv2 is the memory allocation of DMetadata before to use, especially with large files to parse (depending of file mime types).

DMetadata (and in fact the internal Exiv2 objects) need to be allocated on heap, not on stack. Similar problem was discovered by Maik with last libraw 0.19 objects.

So the DMetadata wrapper need to use QScopedPointer. See my code from this unit test function :


Gilles Caulier

Le mer. 7 nov. 2018 à 23:41, Gilles Caulier <[hidden email]> a écrit :
Hi all,

Actually, i write and test with unit-tests, over Exiv2 and metadata extraction.

In DK, the metadata wrapper over Exiv2 is based on MetaEngine and the inherited class DMetadata. All my tests use DMetadata which provide the most complete API to play with information.

I already found few internal problems, and with the help of Maik, we fix all this stuff.

But the last unit test to finalize, is the most problematic : the stress access to Exiv2, with muti-cores and multi-threads.

Typically, i setup a manager which scan a huge collection of files (40.000 JPEG, PNG, TIFF, RAW, Video, etc.). Each file is read from separated thread to catch most important informations (the famous ones stored in database). Here, 8 cores are used, and 32Gb of ram permit to eliminate all memory leak side effects.

The unit test do not play with database, but stress only Exiv2. It's enough and in fact show quickly the Exiv2 limitations :

- Exiv2 API are not re-entrant. 
- This is not only limited to files parser. 

In digiKam, all is multi-threaded, especially the database scanners, and the maintenance jobs. The threads running Exiv2 API can crash suddenly and let's the application in pending states for a while. Depending of the context in memory, the application can crash or not, but in all cases, the threads management is broken and application will not responding.

The only solution in whole digiKam implementation is to wrap all calls to DMetadata to a thread-safe interface using Mutex Locker. This will reduce the performances, but gain for stability is incredible.

Best

Gilles Caulier