Threading for face detection

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Threading for face detection

Aditya Bhatt
Hi,

I had some small fragmented discussions with Gilles, and Johannes about using OpenMP to speed up the batch face detection.
Here is part of the conversation I just had:

"
Aditya Bhatt wrote:
> In my latest bunch of commits, I have added a face scanner to digiKam, which
> detects faces in all images in every album. The relevant code is in
> utilities/batch/batchfacedetector.cpp.
>
> It works, but due to the nature of the algorithm, the CPU usage is very high
> and digiKam UI slows down to a crawl. (Press the "rescan all images" button
> at the top of the people sidebar at the left).

This is definitely a bug. Sounds like you are running the detection in
the main user interface thread. Instead it must be executed in (a)
separate thread(s).

> Gilles says that using OpenMP would be a good idea for this.
> I don't know anything about OpenMP, so I'd like it if you can have a look at
> the code and suggest how to parallelize it.

OpenMP is one way to parallelize things but it's something that was
designed to work deeply in the algorithm code, e.g. by parallelizing
single loops etc. For your task I'd suspect that a more higher level
approach is better suitable because you simply can run the algorithm as
it is on several faces in parallel. This is normally done with a
task-based approach. You first create list of tasks, which would be one
face to scan here, and then give these tasks to an executor. I don't
know if Qt provides a thread pool, but this would be the most classical
versions to create such a service. You have a synchronized queue of
tasks that is filled from the one end with new faces to recognize and
that is dispatched by the executor to several parallel working threads
that perform the recognition algorithm.

For C++ there is one especially notable library that implements a task
pattern in a very versatile way: Intel's Threading Building Blocks. We
don't use them yet in digikam but I think this is definitely the way to
go if there is nothing similar in Qt.

The main statement should be: don't manage threads on your own whenever
you can. Instead focus on tasks that can be executed in parallel. If you
want to improve a single algorithm, then OpenMP would be a solution but
not for these high-level tasks. Also, Intel's TBB include a solution to
parallelize loops in a way that's more or less equal to OpenMP.

If you want to have more insight on task-based approaches you can have a
look at the java.util.concurrent package from standard platform.
Executor and ExecutorService etc. are a very well designed implementation.
"

Also, I'm not very knowledgeable about TBB or even OpenMP for that matter.
So I've started the thread so that Alex and Marcel can join in...

Aditya

_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

alexjironkin
Well if OpenCV compiled with TBB enabled (OpenCV have completely dropped support for OpenMP), then detection code itself will be faster, because all libface does underneath is running classifier, which takes about 1 sec I think per classifier. 

So to me the bottleneck is indeed sheer number of images to be done. And as suggested it would make sense to use TBB to do them in parallel loop somewhere, which also should do detection using TBB, rather than trying to manage fiddly threads using OpenMP. 

So I would agree with the quoted conversation to use TBB to relieve a headache of managing threads and focus on tasks. I think Intel is always had decent documentation so it should be straight forward to pick up.

Alex


On 21 Jul 2010, at 17:18, Aditya Bhatt wrote:

Hi,

I had some small fragmented discussions with Gilles, and Johannes about using OpenMP to speed up the batch face detection.
Here is part of the conversation I just had:

"
Aditya Bhatt wrote:
> In my latest bunch of commits, I have added a face scanner to digiKam, which
> detects faces in all images in every album. The relevant code is in
> utilities/batch/batchfacedetector.cpp.
>
> It works, but due to the nature of the algorithm, the CPU usage is very high
> and digiKam UI slows down to a crawl. (Press the "rescan all images" button
> at the top of the people sidebar at the left).

This is definitely a bug. Sounds like you are running the detection in
the main user interface thread. Instead it must be executed in (a)
separate thread(s).

> Gilles says that using OpenMP would be a good idea for this.
> I don't know anything about OpenMP, so I'd like it if you can have a look at
> the code and suggest how to parallelize it.

OpenMP is one way to parallelize things but it's something that was
designed to work deeply in the algorithm code, e.g. by parallelizing
single loops etc. For your task I'd suspect that a more higher level
approach is better suitable because you simply can run the algorithm as
it is on several faces in parallel. This is normally done with a
task-based approach. You first create list of tasks, which would be one
face to scan here, and then give these tasks to an executor. I don't
know if Qt provides a thread pool, but this would be the most classical
versions to create such a service. You have a synchronized queue of
tasks that is filled from the one end with new faces to recognize and
that is dispatched by the executor to several parallel working threads
that perform the recognition algorithm.

For C++ there is one especially notable library that implements a task
pattern in a very versatile way: Intel's Threading Building Blocks. We
don't use them yet in digikam but I think this is definitely the way to
go if there is nothing similar in Qt.

The main statement should be: don't manage threads on your own whenever
you can. Instead focus on tasks that can be executed in parallel. If you
want to improve a single algorithm, then OpenMP would be a solution but
not for these high-level tasks. Also, Intel's TBB include a solution to
parallelize loops in a way that's more or less equal to OpenMP.

If you want to have more insight on task-based approaches you can have a
look at the java.util.concurrent package from standard platform.
Executor and ExecutorService etc. are a very well designed implementation.
"

Also, I'm not very knowledgeable about TBB or even OpenMP for that matter.
So I've started the thread so that Alex and Marcel can join in...

Aditya
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel

If we knew what we were doing, it wouldn't be called research, would it?
-- Albert Einstein




_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

Aditya Bhatt
Don't worry, I fixed the issue in digiKam :D

I had to launch the batch detection in a separate thread, but now it doesn't freeze the UI. Check it out :)

Aditya

_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

Aditya Bhatt
But yeah, you're right, we should have a look at TBB to use it in libface internally, to launch the 3 cascades in separate threads.
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

alexjironkin
If it takes a sec to do each cascade and there are only 2-3 then it will take longer to fork and merge then run in single thread. Parallelise on the level of images done in parallel, similar to "parfor" in MPI.


Alex
 
On 21 Jul 2010, at 20:04, Aditya Bhatt wrote:

> But yeah, you're right, we should have a look at TBB to use it in libface internally, to launch the 3 cascades in separate threads. _______________________________________________
> Digikam-devel mailing list
> [hidden email]
> https://mail.kde.org/mailman/listinfo/digikam-devel

If we knew what we were doing, it wouldn't be called research, would it?
-- Albert Einstein



_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

Michael G. Hansen
In reply to this post by Aditya Bhatt
On 07/21/2010 06:18 PM, Aditya Bhatt wrote:
> Hi,
>
> I had some small fragmented discussions with Gilles, and Johannes about
> using OpenMP to speed up the batch face detection.
> Here is part of the conversation I just had:

Qt provides Qt Concurrent for thread management. You can give it a
function and a list of items to process, and it processes them based on
the number of cores in your system. I couldn't yet find a way to run
these processes at lower priority though. I am using this in GPSSync.

KDE itself provides ThreadWeaver, which IIRC provides more fine grained
control over resource usage. For example, you can say that you want only
one process at a time reading data from the hard disk, but that for
processing of in-memory data as many processes as cores would be better.
I have not used ThreadWeaver yet.

Michael
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Threading for face detection

Gilles Caulier-4
2010/7/21 Michael G. Hansen <[hidden email]>:

> On 07/21/2010 06:18 PM, Aditya Bhatt wrote:
>> Hi,
>>
>> I had some small fragmented discussions with Gilles, and Johannes about
>> using OpenMP to speed up the batch face detection.
>> Here is part of the conversation I just had:
>
> Qt provides Qt Concurrent for thread management. You can give it a
> function and a list of items to process, and it processes them based on
> the number of cores in your system. I couldn't yet find a way to run
> these processes at lower priority though. I am using this in GPSSync.

Very interresting. We must use it in Batch Queue Manager in the future...

Gilles Caulier
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel