|
Hi,
I am implementing the face recognition algorithm for digikam, and wanted to use GPGPU frameworks for the same. But -- was not able to decide which framework to use OpenCL or CUDA (C for CUDA specifically). PS: I am willingly not including any more information about either of the above frameworks to attract unbiased opinions. Also, i could code part that would execute on the GPU in python , shortening the development cycle. Good python bindings exist for either GPU programming frameworks. Py[CUDA,OpenCL] are the bindings. Python functions are easily callable from within C/C++ code as demonstrated by Link 1 Link2 and Link3 so, is it fine if the algorithms are implemented in python and then called from within digikam. all suggestions , comments welcome. regards ------- Kunal Ghosh Dept of Computer Sc. & Engineering. Sir MVIT Bangalore,India Quote: "Ignorance is not a sin, the persistence of ignorance is" -- "If you find a task difficult today, you'll find it difficult 10yrs later too !" ----- "Failing to Plan is Planning to Fail" Blog:kunalghosh.wordpress.com Website:www.kunalghosh.net46.net V-card:http://tinyurl.com/86qjyk _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
Hi Kunal,
If you wish to use GPGPU, OpenCL is the right choice over CUDA. CUDA's API is not only proprietary, it also is specifically for nVidia hardware. OpenCL is good in this sense. If you have an nVidia card, it seems that OpenCL will internally use CUDA, therefore OpenCL will work on all users' computers.
However, I'm not sure, but there seems to be a small problem with adoption : http://www.khronos.org/opencl/adopters/ There are some weird conditions regarding publishable usage that I'm not entirely sure about - it seems that you must gain some sort of approval and pass some tests before you are allowed to say that you used OpenCL in digiKam. If one doesn't want to publish his/her code, but keep it for personal/closed usage, then you don't have to pay royalty. Please correct me if I'm wrong, since this seems free as in speech/usage, but not free as in beer - there are some royalty issues if you don't pass the conformance tests. As a side note - I had a talk with Alex about using EBGM a few days ago, and we decided not to use it for the moment. We don't have anything against the algorithm, it's just that if only I did it, there won't be enough time to implement all the algorithms and also complete the tagging part within the GSoC period. We definitely want to have eigenface and fisherface, despite the limitations - the retraining is slow only if there are more than ~400 tagged friends in the database, which is a rarity. The main concern is pose variation - for that, I plan to link multiple poses of the same person with the same ID. As a consequence, the initial accuracy while training shall be less, but after some time it would be good enough.
But still, if you can implement EBGM for libface, it'd be great :) We'd have one more algorithm in the bag. It's just that one person can't finish everything in time. So since you're willing, start EBGM then. Eigenfaces is almost complete.
PS: Hopefully someone will figure out if OpenCL can be used in digiKam or not. Cheers On Sat, Apr 17, 2010 at 9:21 PM, kunal ghosh <[hidden email]> wrote: Hi, -- Aditya Bhatt Blog : http://adityabhatt.wordpress.com Face Recognition Library : http://libface.sourceforge.net _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
Thanks for the reply Aditya,
comments in-lined,
That is my opinion too.
Actually, there is a small interpretation error, the conditions regarding publishable usage is only for the adopters (ie. NVIDIA or ATI etc http://www.khronos.org/members/conformant/ ) who come up with OpenCL compliant drivers / API implementation. As mentioned in the link "main adopters page" we as implementers should not have any problems. (Quoting from the site:http://www.khronos.org/adopters) " Implementers - for no cost or license fees you may:
of free speech & free beer , but asks us not to publicize it :) (without conformance testing).
Actually, freely available implementations of EBGM are available (as i had pointed out , in a reply to Marcel's mail sometime back the implementations can be found at http://malic.sourceforge.net/
and also at http://www.cs.colostate.edu/evalfacerec/algorithms5.html ). Also i would like to interact with you'll on IRC,on which IRC do you'll (Alex and you) usually meetup. IMHO , there would be sufficient time with respect to create the tagging widget since its quite easy to write plugins for digikam.Also the EBGM algorithm will not take more than about a month as the free implementations already exist. I would have to use the OpenCL API to modify the necessary portions of the already existing code.
As you mentioned , your main concern is pose variation. Fisherface is IMHO ( from my work on face recognition in the past 1 year ) not the right way to go for the following reasons: 1. Fisherfaces uses the same methodology as eigenfaces ( which is easily known from a preliminary survey of the subject) and doesn't yield satisfactory results in expression and pose variant images. 2. Only advantage of fisherface over eigenface is that it makes the recognition illumination invariant. But that's not much of a problem as in family / group photos that digikam will mostly encounter photos taken with camera flash and outdoor photographs which result in well lit photographs. Also the problem with pose and retraining of eigenfaces and fisherfaces is as follows: Assume you are training your recognition model with training images of a single person. Since eigenfaces and fisherfaces rely on the nearest neighbour classifier for recognition you have to train the model with more number of images to give satisfactory recognition results. Now let us look at how many training images we would need (of a single person) for satisfactory recognition results. Assuming a person looking straight at the camera, a 1 degree variation in pose from 0 degree ( face towards left) to 180 degree (face towards right) would result in 180 images. Now if the person looks upwards 1 degree (again we would have 180 images from left to right) and if we keep varying poses we would get approximately 180 x 180 images of the same person. Also for each successive pose varied image added to the training set the training time would increase exponentially. Now, there are two problems to this: 1.There will not be sufficient training images (so many pose variations of a single person is difficult to get) to get satisfactory results.i.e Training would take a long time. 2.Since Eigenfaces and Fisherfaces ( in general Principal Component based models ) calculate a single set of eigenfaces/fisherfaces from the training set. For a well trained recognizer the training time would be enormous.
I would love to add the implementation to libface. But looking at the similarity between Eigenfaces and Fisherfaces IMHO the effort should be to get as many different algorithms implemented as possible.
more opinions / suggestions from the digikam community are welcome.
-- regards ------- Kunal Ghosh Dept of Computer Sc. & Engineering. Sir MVIT Bangalore,India Quote: "Ignorance is not a sin, the persistence of ignorance is" -- "If you find a task difficult today, you'll find it difficult 10yrs later too !" ----- "Failing to Plan is Planning to Fail" Blog:kunalghosh.wordpress.com Website:www.kunalghosh.net46.net V-card:http://tinyurl.com/86qjyk _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
Hi Kunal,
Ok then. But what I'm not sure about is : if we incorporate this in libface and consequently digiKam, doesn't that amount to publicizing it? If not, then that's great with me. But please ask the others about their opinions too - we'd want as less dependencies as possible.
Yes, the CSU project was also my first choice for EBGM.
Actually, no - we can assume about 10-20 degrees tolerance. What I think is, ~4 to 5 representative images per person should be okay for a photo management suite. What is required for a person is : A frontal face, a profile face, and a sideways face. And then more faces can be added on the fly as more variations are encountered. This way, we get a denser sampling over pose as time proceeds.
And as I already said, the average number of friends/acquaintances that people would tag in photos is not so large as to noticeably slow down the re-training. And compared to eigenfaces, the PCA + LDA approach is, to some extent, able to accomodate pose variations (20-30 degrees) and therefore fisherfaces should be fine for digiKam if the above method is followed.
Very true. But for now, the priority is to get fisherfaces up-and-running. It's sort of like an insurance of sorts - fisherfaces should be quite easy to implement and although less accurate than EBGM, it is okay enough to be incorporated into digiKam. So we'll definitely have at least one working algorithm. Meanwhile, we might want EBGM at some point, so jump in :)
And I and Alex (and the rest of the digiKam team, for that matter) doesn't communicate much via IRC, we use the ML more. Get yourself subscribed to the libface ML. As for me, I usually idle in #digikam, #kde, and #kde-in. I think I've talked to you before on #kde-in :)
Cheers, Aditya Bhatt Blog : http://adityabhatt.wordpress.com Face Recognition Library : http://libface.sourceforge.net _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
PS: Google for H-Eigenfaces (aka Hybrid-Eigenfaces). It is a nifty tweak to the original method. A professor in my university just showed me a paper that proposes this method, and it greatly solves the problem of pose variation.
On Sun, Apr 18, 2010 at 10:20 AM, Aditya Bhatt <[hidden email]> wrote: Hi Kunal, -- Aditya Bhatt Blog : http://adityabhatt.wordpress.com Face Recognition Library : http://libface.sourceforge.net _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by kunal ghosh
For my viewpoint, it's always a bad idea to force implementation to
use specific API relevant of hardware specification. Your code must be compiled as well without an indeed dependency. Optionally of course providing a way to speedup computation is always welcome. For ex, in libraw, we use OpenMp to parallelize computation during RAW demosaicing. OpenMP is GNU compatible and available with GCC and other compiler as M$ VisualC Other consideration is platform compatible. Always use a libs which is available under all platforms, as Linux, Mac and windows. My 10cts€ Gilles Caulier 2010/4/17 kunal ghosh <[hidden email]>: > Hi, > I am implementing the face recognition algorithm for digikam, and wanted to > use GPGPU frameworks for the same. But > was not able to decide which framework to use OpenCL or CUDA (C for CUDA > specifically). > > PS: I am willingly not including any more information about either of the > above frameworks to attract unbiased opinions. > > Also, i could code part that would execute on the GPU in python , shortening > the development cycle. Good python bindings exist for either GPU programming > frameworks. Py[CUDA,OpenCL] are the bindings. > > Python functions are easily callable from within C/C++ code as demonstrated > by Link 1 Link2 and Link3 > so, is it fine if the algorithms are implemented in python and then called > from within digikam. > > all suggestions , comments welcome. > > -- > regards > ------- > Kunal Ghosh > Dept of Computer Sc. & Engineering. > Sir MVIT > Bangalore,India > > Quote: > "Ignorance is not a sin, the persistence of ignorance is" > -- > "If you find a task difficult today, you'll find it difficult 10yrs later > too !" > ----- > "Failing to Plan is Planning to Fail" > > Blog:kunalghosh.wordpress.com > Website:www.kunalghosh.net46.net > V-card:http://tinyurl.com/86qjyk > > > _______________________________________________ > Digikam-devel mailing list > [hidden email] > https://mail.kde.org/mailman/listinfo/digikam-devel > > Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by Aditya Bhatt
Hi aditya, PS: Google for H-Eigenfaces (aka Hybrid-Eigenfaces). It is a nifty tweak to the original method. A professor in my university just showed me a paper that proposes this method, and it greatly solves the problem of pose variation. would it be possible for you to forward the relevant paper to me , personal mail ? I am right now not at the disposal of my college springer/elsevier/ieee accounts. -- regards ------- Kunal Ghosh Dept of Computer Sc. & Engineering. Sir MVIT Bangalore,India Quote: "Ignorance is not a sin, the persistence of ignorance is" -- "If you find a task difficult today, you'll find it difficult 10yrs later too !" ----- "Failing to Plan is Planning to Fail" Blog:kunalghosh.wordpress.com Website:www.kunalghosh.net46.net V-card:http://tinyurl.com/86qjyk _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by Aditya Bhatt
Hi Aditya, On Sun, Apr 18, 2010 at 10:26 AM, Aditya Bhatt <[hidden email]> wrote: -- PS: Google for H-Eigenfaces (aka Hybrid-Eigenfaces). It is a nifty tweak to the original method. A professor in my university just showed me a paper that proposes this method, and it greatly solves the problem of pose variation. As we discussed on #digikam, in the paper "Pose invariant virtual classifiers from single training image using novel hybrid-eigenfaces" published in http://linkinghub.elsevier.com/retrieve/pii/S0925231210001475 in page 6, right column, second paragraph it mentions that. [Quoting] "It should be noted that the synthesized virtual views would strictly be under the same pose as H-eigenfaces so only those pose variations can be obtained in training images which are present in H-eigenfaces. It is required to have a face dataset consisting of different subject’s face images under different viewpoints to obtain H-eigenfaces under those viewpoints. Consequently, the method relies on the availability of a generic face dataset containing face images under different pose. In this article FERET face Database [38] serves the purpose of generic dataset." [/Quoting] ( from the italicized text ) So many pose varied images of a person are readily available in a face database as FERET but difficult to get in a Personal Photo album. ( Usage of the system overtime will increase recognition results but users may not continue using the system for that long ! ) regards ------- Kunal Ghosh Dept of Computer Sc. & Engineering. Sir MVIT Bangalore,India Quote: "Ignorance is not a sin, the persistence of ignorance is" -- "If you find a task difficult today, you'll find it difficult 10yrs later too !" ----- "Failing to Plan is Planning to Fail" Blog:kunalghosh.wordpress.com Website:www.kunalghosh.net46.net V-card:http://tinyurl.com/86qjyk _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by kunal ghosh
Without knowing much about the practical status of OpenCl as of today, it seems the natural choice to me:
On Sat, Apr 17, 2010 at 5:51 PM, kunal ghosh <[hidden email]> wrote: Hi, _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
In reply to this post by kunal ghosh
Hi Kunal, There was a slight error in your interpretation of their method :
They actually use the FERET database to train the coefficients for the reconstruction of the profile/side face into a frontal face. Later, those same coefficients can be used to map a non-FERET profile face to it's virtual frontal equivalent. Therefore, I, as a developer, can generate these coefficients using FERET's huge database, and then ship bundle a file with libface containing the values, for end-users to use :)
There is a very nice paper ( admittedly better-framed than the one I showed you ), that explains how GLR (Global Linear Regression) can be applied to predict the frontal face from the profile view : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.8750&rep=rep1&type=pdf
In fact, the authors of this paper go one step further and explain LLR, or Local Linear Regression, which applies the above GLR algorithm to localized "patches" of a face to achieve much finer accuracy in rotation.
So as I see it, this method is well-suited for the problem at hand
Cheers, -- Aditya Bhatt Blog : http://adityabhatt.wordpress.com Face Recognition Library : http://libface.sourceforge.net _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
|
PS: Combining the above algorithm with LDA should give even better results, solving both pose and illumination variation.
On Sun, Apr 18, 2010 at 8:54 PM, Aditya Bhatt <[hidden email]> wrote:
-- Aditya Bhatt Blog : http://adityabhatt.wordpress.com Face Recognition Library : http://libface.sourceforge.net _______________________________________________ Digikam-devel mailing list [hidden email] https://mail.kde.org/mailman/listinfo/digikam-devel |
| Free forum by Nabble | Edit this page |
