Gilles,
the "sharpen" feature with the refocus option takes - even when using a 3.4 GHz CPU - very long on a 16 Mpix raw image (Nikon D800). I have a 6 cores CPU but only one core is being used on "sharpen". Wouldn't it be possible to parallelize sharpen such that all 6 cores could work at the same time on different domains of the image such that the turnaround time would be sped up by a factor of 6 ? Regards, Robert _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users robert_zeller.vcf (839 bytes) Download Attachment |
Hi,
I would be very pleased too if some speed improvements were on the todo list. Actually, i would love if some algorythms, most time consuming, were ported on cudas too. For example, sharpening, local contrast, cimg... Please, Gilles, tell us more about speed improvement in the future, thanks. Regards, -------------- Clément Moignard _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Hi, This require to rewrite algorithm a little bit to parallelize operation.
If somebody know well mathematical stuff relevant, code is here : https://projects.kde.org/projects/extragear/graphics/digikam/repository/revisions/master/entry/libs/dimg/filters/sharp/refocusfilter.cpp#L169 Best Gilles Caulier 2014-04-16 19:29 GMT+02:00 ultimateclem <[hidden email]>: > Hi, > I would be very pleased too if some speed improvements were on the todo > list. Actually, i would love if some algorythms, most time consuming, were > ported on cudas too. For example, sharpening, local contrast, cimg... > Please, Gilles, tell us more about speed improvement in the future, thanks. > > Regards, > > -------------- > Clément Moignard > > > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users > Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
In reply to this post by ultimateclem
On Wednesday 16 April 2014 19:29:56 ultimateclem wrote:
> Hi, > I would be very pleased too if some speed improvements were on the todo > list. Actually, i would love if some algorythms, most time consuming, were > ported on cudas too. For example, sharpening, local contrast, cimg... > Please, Gilles, tell us more about speed improvement in the future, thanks. > > Regards, > > -------------- > Clément Moignard One thing about CUDA is that it is specific for NVidia GPUs (afaik). So a CPU version of the routine would still be needed for users of other GPUs. Remco _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
I said CUDAs, but i mean opencl (or your prefered lib using GPU/CPU combination). I know this is a lot of work, and that is not possible right now. But some times, when i'm waiting for the end of a routine wathcing tv, i dream of a really speed version of digikam. I'm using digikam for professionnal purpose, all my pictures (more than 400 pix for a wedding) are done in digikam... so i'm very concerned with speed.
Gilles, i would love to help the team, but i'm really unable to write a single line of code, sorry. -------------- Clément Moignard 2014-04-17 7:20 GMT+02:00 Remco Viëtor <[hidden email]>:
_______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Hallo
It is not digikam, but darktable is able to use gpgpu/openCL/cuda for raw processing. I use it for photo processing and digikam for photo management. Darktable handles multithreading as well. regards Martin Am 17.04.2014 13:05, schrieb ultimateclem: > I said CUDAs, but i mean opencl (or your prefered lib using GPU/CPU > combination). I know this is a lot of work, and that is not possible > right now. But some times, when i'm waiting for the end of a routine > wathcing tv, i dream of a really speed version of digikam. I'm using > digikam for professionnal purpose, all my pictures (more than 400 pix > for a wedding) are done in digikam... so i'm very concerned with speed. > Gilles, i would love to help the team, but i'm really unable to write a > single line of code, sorry. > > -------------- > Clément Moignard > > > 2014-04-17 7:20 GMT+02:00 Remco Viëtor <[hidden email] > <mailto:[hidden email]>>: > > On Wednesday 16 April 2014 19:29:56 ultimateclem wrote: > > Hi, > > I would be very pleased too if some speed improvements were on the > todo > > list. Actually, i would love if some algorythms, most time consuming, > were > > ported on cudas too. For example, sharpening, local contrast, cimg... > > Please, Gilles, tell us more about speed improvement in the future, > thanks. > > > > Regards, > > > > -------------- > > Clément Moignard > > One thing about CUDA is that it is specific for NVidia GPUs (afaik). > So a > CPU version of the routine would still be needed for users of other > GPUs. > > Remco > _______________________________________________ > Digikam-users mailing list > [hidden email] <mailto:[hidden email]> > https://mail.kde.org/mailman/listinfo/digikam-users > > > > > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users > _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Hello,
Rawtherapee also runs multithreaded on compute intensive tasks. Since all modern PCs use a multicore cpu on shared memory, the parallelization paradigm that comes in handy in my opinion would be OpenMP. It should not be too difficult to insert an omp parallel pragma in front of time consuming loops and divide the loop in as many chunks as cores are available on the CPU, assigning each chunk to an individual core / thread. Unfortunately I cannot step in and help coding since I don't have any C++ knowledge; furthermore I have been out of the code writing business for more than a decade. But I think, parallelizing the time consuming parts of DK might be an interesting task for a student and most valuable for us users. (There are several tutorials on the internet on how to apply OpenMP). Regards, Robert On 04/17/2014 01:35 PM, Martin (KDE) wrote: > Hallo > > It is not digikam, but darktable is able to use gpgpu/openCL/cuda for > raw processing. I use it for photo processing and digikam for photo > management. Darktable handles multithreading as well. > > regards > Martin > > Am 17.04.2014 13:05, schrieb ultimateclem: >> I said CUDAs, but i mean opencl (or your prefered lib using GPU/CPU >> combination). I know this is a lot of work, and that is not possible >> right now. But some times, when i'm waiting for the end of a routine >> wathcing tv, i dream of a really speed version of digikam. I'm using >> digikam for professionnal purpose, all my pictures (more than 400 pix >> for a wedding) are done in digikam... so i'm very concerned with speed. >> Gilles, i would love to help the team, but i'm really unable to write a >> single line of code, sorry. >> >> -------------- >> Clément Moignard >> >> >> 2014-04-17 7:20 GMT+02:00 Remco Viëtor <[hidden email] >> <mailto:[hidden email]>>: >> >> On Wednesday 16 April 2014 19:29:56 ultimateclem wrote: >> > Hi, >> > I would be very pleased too if some speed improvements were on the >> todo >> > list. Actually, i would love if some algorythms, most time consuming, >> were >> > ported on cudas too. For example, sharpening, local contrast, cimg... >> > Please, Gilles, tell us more about speed improvement in the future, >> thanks. >> > >> > Regards, >> > >> > -------------- >> > Clément Moignard >> >> One thing about CUDA is that it is specific for NVidia GPUs (afaik). >> So a >> CPU version of the routine would still be needed for users of other >> GPUs. >> >> Remco >> _______________________________________________ >> Digikam-users mailing list >> [hidden email] <mailto:[hidden email]> >> https://mail.kde.org/mailman/listinfo/digikam-users >> >> >> >> >> _______________________________________________ >> Digikam-users mailing list >> [hidden email] >> https://mail.kde.org/mailman/listinfo/digikam-users >> > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users > > _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users robert_zeller.vcf (839 bytes) Download Attachment |
Hi all,
Good news here. I'm starting to review code wich can be paralleled in algorithm from digiKam core. The goal is to use QConcurrentRun when it's possible which will simplify multicore support port without to use a 3rd party library as OpenMP for ex. After all we use Qt which provide API for multicore support. The first one filter ported by me is LocalContrast. I use it everyday due to excelent image result generated. The algorithm is very time consuming in some part, and multicore port give excelent results. I use a 24Mpx image taken with my Sony A77, processing on my i7 CPU (8 cores). Look below the result : Without multicore : digikam(12729)/digikam (core) Digikam::EditorToolThreaded::slotPreview: Preview "Local Contrast" started... digikam(12729)/digikam (core) Digikam::DImgThreadedFilter::startFilterDirectly: "LocalContrast" :: excecution time : 15530 ms digikam(12729)/digikam (core) Digikam::EditorToolThreaded::slotFilterFinished: Preview "Local Contrast" completed... I can see one core used during processing. With multicore : digikam(12729)/digikam (core) Digikam::EditorToolThreaded::slotPreview: Preview "Local Contrast" started... digikam(12729)/digikam (core) Digikam::DImgThreadedFilter::startFilterDirectly: "LocalContrast" :: excecution time : 7359 ms digikam(12729)/digikam (core) Digikam::EditorToolThreaded::slotFilterFinished: Preview "Local Contrast" completed... I can see all cores used during processing in this case. To resume : processing time is divided by 2. It's not negligible... This will be a good base to work on other filter algorithms by a student. I will include this LocalContrast multicore support in next 4.0.0 Best Gilles Caulier 2014-04-22 15:18 GMT+02:00 Robert Zeller <[hidden email]>: > Hello, > > Rawtherapee also runs multithreaded on compute intensive tasks. > Since all modern PCs use a multicore cpu on shared memory, the > parallelization paradigm that comes in handy in my opinion would be > OpenMP. It should not be too difficult to insert an omp parallel pragma > in front of time consuming loops and divide the loop in as many chunks > as cores are available on the CPU, assigning each chunk to an individual > core / thread. > Unfortunately I cannot step in and help coding since I don't have any > C++ knowledge; furthermore I have been out of the code writing business > for more than a decade. But I think, parallelizing the time consuming > parts of DK might be an interesting task for a student and most valuable > for us users. (There are several tutorials on the internet on how to > apply OpenMP). > > Regards, > Robert > > On 04/17/2014 01:35 PM, Martin (KDE) wrote: >> Hallo >> >> It is not digikam, but darktable is able to use gpgpu/openCL/cuda for >> raw processing. I use it for photo processing and digikam for photo >> management. Darktable handles multithreading as well. >> >> regards >> Martin >> >> Am 17.04.2014 13:05, schrieb ultimateclem: >>> I said CUDAs, but i mean opencl (or your prefered lib using GPU/CPU >>> combination). I know this is a lot of work, and that is not possible >>> right now. But some times, when i'm waiting for the end of a routine >>> wathcing tv, i dream of a really speed version of digikam. I'm using >>> digikam for professionnal purpose, all my pictures (more than 400 pix >>> for a wedding) are done in digikam... so i'm very concerned with speed. >>> Gilles, i would love to help the team, but i'm really unable to write a >>> single line of code, sorry. >>> >>> -------------- >>> Clément Moignard >>> >>> >>> 2014-04-17 7:20 GMT+02:00 Remco Viëtor <[hidden email] >>> <mailto:[hidden email]>>: >>> >>> On Wednesday 16 April 2014 19:29:56 ultimateclem wrote: >>> > Hi, >>> > I would be very pleased too if some speed improvements were on the >>> todo >>> > list. Actually, i would love if some algorythms, most time consuming, >>> were >>> > ported on cudas too. For example, sharpening, local contrast, cimg... >>> > Please, Gilles, tell us more about speed improvement in the future, >>> thanks. >>> > >>> > Regards, >>> > >>> > -------------- >>> > Clément Moignard >>> >>> One thing about CUDA is that it is specific for NVidia GPUs (afaik). >>> So a >>> CPU version of the routine would still be needed for users of other >>> GPUs. >>> >>> Remco >>> _______________________________________________ >>> Digikam-users mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://mail.kde.org/mailman/listinfo/digikam-users >>> >>> >>> >>> >>> _______________________________________________ >>> Digikam-users mailing list >>> [hidden email] >>> https://mail.kde.org/mailman/listinfo/digikam-users >>> >> _______________________________________________ >> Digikam-users mailing list >> [hidden email] >> https://mail.kde.org/mailman/listinfo/digikam-users >> >> > > > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users > Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Hi Gilles,
sounds good; any progress here is highly appreciated; though a speedup of a factor of 2 on an 8 core CPU calls for further improvement. BTW: OpenMP is not 3rd party software; libgomp1 (openMP runtime library) is available across all Linux distributions. Thanks and best regards, Robert On 04/29/2014 03:51 PM, Gilles Caulier wrote: > Hi all, > > Good news here. > > I'm starting to review code wich can be paralleled in algorithm from > digiKam core. > > The goal is to use QConcurrentRun when it's possible which will > simplify multicore support port without to use a 3rd party library as > OpenMP for ex. After all we use Qt which provide API for multicore > support. > > The first one filter ported by me is LocalContrast. I use it everyday > due to excelent image result generated. > > The algorithm is very time consuming in some part, and multicore port > give excelent results. I use a 24Mpx image taken with my Sony A77, > processing on my i7 CPU (8 cores). Look below the result : > > Without multicore : > > digikam(12729)/digikam (core) > Digikam::EditorToolThreaded::slotPreview: Preview "Local Contrast" > started... > digikam(12729)/digikam (core) > Digikam::DImgThreadedFilter::startFilterDirectly: "LocalContrast" :: > excecution time : 15530 ms > digikam(12729)/digikam (core) > Digikam::EditorToolThreaded::slotFilterFinished: Preview "Local > Contrast" completed... > > I can see one core used during processing. > > With multicore : > > digikam(12729)/digikam (core) > Digikam::EditorToolThreaded::slotPreview: Preview "Local Contrast" > started... > digikam(12729)/digikam (core) > Digikam::DImgThreadedFilter::startFilterDirectly: "LocalContrast" :: > excecution time : > 7359 ms > digikam(12729)/digikam (core) > Digikam::EditorToolThreaded::slotFilterFinished: Preview "Local > Contrast" completed... > > I can see all cores used during processing in this case. > > To resume : processing time is divided by 2. It's not negligible... > > This will be a good base to work on other filter algorithms by a student. > > I will include this LocalContrast multicore support in next 4.0.0 > > Best > > Gilles Caulier > > 2014-04-22 15:18 GMT+02:00 Robert Zeller <[hidden email]>: >> Hello, >> >> Rawtherapee also runs multithreaded on compute intensive tasks. >> Since all modern PCs use a multicore cpu on shared memory, the >> parallelization paradigm that comes in handy in my opinion would be >> OpenMP. It should not be too difficult to insert an omp parallel pragma >> in front of time consuming loops and divide the loop in as many chunks >> as cores are available on the CPU, assigning each chunk to an individual >> core / thread. >> Unfortunately I cannot step in and help coding since I don't have any >> C++ knowledge; furthermore I have been out of the code writing business >> for more than a decade. But I think, parallelizing the time consuming >> parts of DK might be an interesting task for a student and most valuable >> for us users. (There are several tutorials on the internet on how to >> apply OpenMP). >> >> Regards, >> Robert >> >> On 04/17/2014 01:35 PM, Martin (KDE) wrote: >>> Hallo >>> >>> It is not digikam, but darktable is able to use gpgpu/openCL/cuda for >>> raw processing. I use it for photo processing and digikam for photo >>> management. Darktable handles multithreading as well. >>> >>> regards >>> Martin >>> >>> Am 17.04.2014 13:05, schrieb ultimateclem: >>>> I said CUDAs, but i mean opencl (or your prefered lib using GPU/CPU >>>> combination). I know this is a lot of work, and that is not possible >>>> right now. But some times, when i'm waiting for the end of a routine >>>> wathcing tv, i dream of a really speed version of digikam. I'm using >>>> digikam for professionnal purpose, all my pictures (more than 400 pix >>>> for a wedding) are done in digikam... so i'm very concerned with speed. >>>> Gilles, i would love to help the team, but i'm really unable to write a >>>> single line of code, sorry. >>>> >>>> -------------- >>>> Clément Moignard >>>> >>>> >>>> 2014-04-17 7:20 GMT+02:00 Remco Viëtor <[hidden email] >>>> <mailto:[hidden email]>>: >>>> >>>> On Wednesday 16 April 2014 19:29:56 ultimateclem wrote: >>>> > Hi, >>>> > I would be very pleased too if some speed improvements were on the >>>> todo >>>> > list. Actually, i would love if some algorythms, most time consuming, >>>> were >>>> > ported on cudas too. For example, sharpening, local contrast, cimg... >>>> > Please, Gilles, tell us more about speed improvement in the future, >>>> thanks. >>>> > >>>> > Regards, >>>> > >>>> > -------------- >>>> > Clément Moignard >>>> >>>> One thing about CUDA is that it is specific for NVidia GPUs (afaik). >>>> So a >>>> CPU version of the routine would still be needed for users of other >>>> GPUs. >>>> >>>> Remco >>>> _______________________________________________ >>>> Digikam-users mailing list >>>> [hidden email] <mailto:[hidden email]> >>>> https://mail.kde.org/mailman/listinfo/digikam-users >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Digikam-users mailing list >>>> [hidden email] >>>> https://mail.kde.org/mailman/listinfo/digikam-users >>>> >>> _______________________________________________ >>> Digikam-users mailing list >>> [hidden email] >>> https://mail.kde.org/mailman/listinfo/digikam-users >>> >>> >> >> _______________________________________________ >> Digikam-users mailing list >> [hidden email] >> https://mail.kde.org/mailman/listinfo/digikam-users >> > _______________________________________________ > Digikam-users mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-users _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users robert_zeller.vcf (1K) Download Attachment |
2014-04-29 17:56 GMT+02:00 Robert Zeller <[hidden email]>:
> Hi Gilles, > > sounds good; any progress here is highly appreciated; though a speedup > of a factor of 2 on an 8 core CPU calls for further improvement. BTW: > OpenMP is not 3rd party software; libgomp1 (openMP runtime library) is > available across all Linux distributions. for digiKam OpemMP is a 3rd party dependency : - digiKam do not use OpenMP. - we don't want to extend dependencies to infinite. - digiKam already use Qt of course. - OpenMP syntax use pre-processor rules which is old style coding, not C++ Gilles Caulier _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
2014-04-29 18:38 GMT+02:00 Gilles Caulier <[hidden email]>:
> 2014-04-29 17:56 GMT+02:00 Robert Zeller <[hidden email]>: >> Hi Gilles, >> >> sounds good; any progress here is highly appreciated; though a speedup >> of a factor of 2 on an 8 core CPU calls for further improvement. BTW: >> OpenMP is not 3rd party software; libgomp1 (openMP runtime library) is >> available across all Linux distributions. > > for digiKam OpemMP is a 3rd party dependency : > > - digiKam do not use OpenMP. > - we don't want to extend dependencies to infinite. > - digiKam already use Qt of course. > - OpenMP syntax use pre-processor rules which is old style coding, not C++ ==> Another one : OpenMP is not multi-platform as Qt : For ex, OSX do not support it... Gilles Caulier _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
I found time to port Refocus tool about multicore support :
Same condition, same image than with LocalContrast tool : Without multicore : digikam(2436)/digikam (core) Digikam::RefocusFilter::refocusImage: RefocusFilter::Compute matrix... digikam(2436)/digikam (core) Digikam::RefocusFilter::refocusImage: RefocusFilter::Apply Matrix to image... digikam(2436)/digikam (core) Digikam::DImgThreadedFilter::startFilterDirectly: "Refocus" :: excecution time : 31563 ms digikam(2436)/digikam (core) Digikam::EditorToolThreaded::slotFilterFinished: Preview "Sharpen" completed... With multicore support : digikam(5291)/digikam (core) Digikam::RefocusFilter::refocusImage: RefocusFilter::Compute matrix... digikam(5291)/digikam (core) Digikam::RefocusFilter::refocusImage: RefocusFilter::Apply Matrix to image... digikam(5291)/digikam (core) Digikam::DImgThreadedFilter::startFilterDirectly: "Refocus" :: excecution time : 10080 ms digikam(5291)/digikam (core) Digikam::EditorToolThreaded::slotFilterFinished: Preview "Sharpen" completed... To resume : Refocus tool is 3 times faster than before now ! Gilles Caulier 2014-04-29 18:40 GMT+02:00 Gilles Caulier <[hidden email]>: > 2014-04-29 18:38 GMT+02:00 Gilles Caulier <[hidden email]>: >> 2014-04-29 17:56 GMT+02:00 Robert Zeller <[hidden email]>: >>> Hi Gilles, >>> >>> sounds good; any progress here is highly appreciated; though a speedup >>> of a factor of 2 on an 8 core CPU calls for further improvement. BTW: >>> OpenMP is not 3rd party software; libgomp1 (openMP runtime library) is >>> available across all Linux distributions. >> >> for digiKam OpemMP is a 3rd party dependency : >> >> - digiKam do not use OpenMP. >> - we don't want to extend dependencies to infinite. >> - digiKam already use Qt of course. >> - OpenMP syntax use pre-processor rules which is old style coding, not C++ > > ==> Another one : OpenMP is not multi-platform as Qt : For ex, OSX do > not support it... > > Gilles Caulier Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Results sound amazing ! Gilles, did you spend a lot of time/energy to recode ans test this? _______________________________________________ Digikam-users mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-users |
Free forum by Nabble | Edit this page |