Regarding face recognition in digikam GSoC project

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding face recognition in digikam GSoC project

kunal ghosh
hello sirs,
I was interested in implementing the face recognition idea proposed in the GSoC KDE 2010 ideas page.I am working on a proposal
for the same and had the following 1st draft .
kindly indicate the shortcomings in the proposal for the betterment of the same.

Title:Project to implement Face Recognition engine in digikam for automatic face recognition and tagging.
Motivation:I have always been a big fan and supporter of intelligent and elegant technologies and so when i had to join a Special Interest Group at our institute my obvious choice was the face recognition SIG (as it interested me the most).Eventually i built up a huge collection of facial test images and had to experiment with photo management applications . Having had a brush with picasa and then digikam as i shifted to KDE, made me feel the lack of face tagging and it being most demanded feature in digikam mailing lists motivated me to take it up as my GSoC project.
Also face recognition is what i am working on as my area of research ( my previous discussion with the digikam-devel list http://old.nabble.com/face-recognition-in-digikam-td26844374.html ). I have a deep desire in pursuing this project as it would be most fruitful for the project and me.
Implementation Details:
*** I have two ideas regarding the same and would like your opinion.***
both ideas follow a, library and plugin architecture.
plugin for digikam would definitely have to be written in c++
Idea 1: to implement the library in python.
The decision for implementing the library in python is for the following reasons:
1.Availability of OpenCV bindings and fast numpy libraries.
2.Once the KROSS http://kross.dipe.org/ project becomes stable easy migration to Kross-python.
3.Python is most favored by new developers and thus would attract them , hence richer digikam developer base in future.
4.Fast develop and test cycle, more stable code.
5.Relatively (to C) slower single threaded execution but easy to use Py-OpenCL (for parallel programming on heterogeneous devices including CPUs, GPUs, and others processors) libraries to implement parallel (future scalable) algorithms.

Idea 2:to implement the library in c++.
reasons:
1.More mature and latest OpenCV libraries (python bindings come later).
2.Integrates well into c++ codebase of digikam.
3.Faster code execution speed ( may be noticeable in single threaded applications.)

Implementation details of face detection and recognition library:

Detection algorithm: Haar-Cascades based detection, tried , tested and implemented in openCV with promising results.

Recognition algorithm: Elastic bunch graph matching.
Reason for choice of algorithm:As per our (at my institute) survey on existing face recognition models.eigen-faces and fisher-faces based recognition algorithms have 2 major drawbacks.
1.They cannot overcome pose,expression variations (major problem in personal photo management applications. Fisher faces and eigen faces are similar , its only that fisher faces can overcome illumination variation).
2.The model need to be "re-trained fully" even if one new training image is to be added to the training set of faces.
So I propose to use Elastic Bunch Graph Matching which provides much better accuracy (data regarding accuracy obtained from this pdf) and overcomes the problem faced by the above two approaches.
The idea behind dividing the task into a library is to extend the face-recognition to object recognition in general, in future,
and to make it easy for other projects to adopt the same library in their code.


About me:
Name: Kunal Ghosh
IRC : gancient
Location: Bangalore , India

I am right now pursuing my Bachelor of Engineering in Computer Science and Engineering and have been deeply motivated and inspired by the philosophy of Free Software.I take interest in music and arts of all forms.Pattern Recognition and Robotics interest me ( Though being opposite fields, in my opinion they complement each other ), and i love to use python to accomplish my programming needs whenever possible (language should be a means of easy and efficient communication and python is exactly that).




_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Regarding face recognition in digikam GSoC project

kunal ghosh
hi , this is my revised proposal for the Google Summer of Code KDE-digikam project to implement face recognition in digikam.
I would kindly request you to please provide suggestions / comments to improve the proposal.
Proposal follows:

Title: Implementation of Face Recognition engine in digikam for automatic face recognition and tagging.

Motivation:I have always been a big fan and supporter of intelligent and elegant technologies and so when i had to join a Special Interest Group at our institute my obvious choice was the face recognition SIG ,as it interested me the most.

Having had a brush with Picasa 3.5 and then Digikam as i shifted to KDE, made me feel the lack of face tagging.
Automatic Face tagging being most demanded feature in digikam mailing lists (request-id 46288) motivated me to take it up as my GSoC project.

Also since face recognition is what i am working on as my area of research at my institute, i have a deep desire in pursuing this project as it would be most fruitful for the project and me. ( I had discussed this feature with the digikam-devel list http://old.nabble.com/face-recognition-in-digikam-td26844374.html ).

Implementation Details:
I propose to follow a library and plugin architecture wherein the front-end would be menu additions to digikam in the form of a face-recognition plugin and the back end would have the detection and recognition engine performing the image processing.

Library implementation:

1.The face detection would be done using multiple Haar-Cascades based Libface's detector which has been tried , tested and implemented in openCV with promising results..
2.Since i propose to use Elastic Bunch Graphs for recognizing faces (reason and comparison with other models follow) i propose to use OpenCV to perform the image processing needs. 
3.Since Most modern CPUs have multiple cores and recent advent of General Purpose GPU SDKs i propose to implement the library routines in either CUDA or OpenCL. (discussions with mentors required).
CUDA and OpenCL would be more interesting for the library because the recognition speed would increase overtime as more core CPUs and faster GPUs appear common desktop.Hence making digikam future proof.

The idea behind dividing the task into a library is to extend the face-recognition to object recognition in general, in future,
and to make it easy for other projects to adopt the same library in their code.


Why Elastic Bunch Graph (EBG) Based Face Recognition:
1.It adds semantics to face recognition.It recognizes faces taking into consideration various facial features(eyes, nose etc) and uses matching criterion like distance between the features.
As compared to the above PCA and LDA based methods like Eigen Faces and Fisher Faces do not extract facial feature data and thus do not use the patterns in faces , resulting in lower recognition accuracy.

2.Elastic Bunch Graphs can store the feature data in matrices (may vary depending on implementation) so adding another training image is just a matter of adding a matrix entry.Where as Eigen Faces and Fisher Faces based methods create a set of representative images (eigen faces or fisher faces) which need to be recalculated every time a new training image is added leading to a lot of time overhead for large training sets.
This lack of retraining in the case of EBG based method makes it a better contender for the library.

3.Since EBG based method considers semantic data of face it understands where there is a pose and expression variation in the test image. This is a major concern in a photo management application wherein there may be photos of the same person in a different mood hence a different facial expression.

Further statistical data comparing EBG method to various other methods for face recognition can be found here http://tinyurl.com/facerec-compare

The UI and the Digikam/KDE integration:
The face recognition feature would be a plugin/built-in feature of digikam which can be enabled/disabled from the 
"Configure Digikam Menu" in the "Settings" Main Menu item already in digikam.

Once enabled the detection would be run through all the digikam albums.This would be done asynchronously to not hinder usability.

Once the detection is done, the user is presented with a simple widget containing the detected faces along with UI elements necessary to name the faces and also to reject non face images detected by the detection system.
This would include code from the current "Edit Menu" where in a box can be drawn on a portion of the image.

After which the metadata would be stored in RDF or XMP format inside the image. Also region tagging would be based on the following image annotation system http://www.kanzaki.com/docs/sw/img-annotator.html .

The detected faces would be stored in a database ( choice depends on project constraints, mostly SQLite) along with the
bunch graphs for faster data access and updation.

The metadata generated , as in the names of people in a photograph would be registered with Nepomuk for linking with names in emails etc.

Proposed schedule:
Now     - April  9th :Build digikam and tinker around with the code and fix few bugs to increase familiarity.

April 10-April 30th :Finalize on the metadata format i.e RDF / XMP and get the basic detection working.

May 1st-June 1st   :OpenCL/CUDA based Recognition engine ready.Documentation and bundling into library with some test images.
June 1st-July 1st   : Finish UI building plugin and Integration with the recognition engine.Document code and write a User's Guide.

July 1st-August 1st: Write a small demon to collect performance data from willing beta testers.Thoroughly test the system and collect anonymous usage data , fine tune the plugin and the recognition engine. Documents the tests and performance statistics.

August 1st-August 9th:Last minute modifications if any.


Why should i be chosen:

1. I have a good understanding of the Qt framework and have used as well as given demos of the same in local LUG meetings.
2. I am pursuing research on face recognition in my institute and would be able to give a lot to this project.
3. I have been an active member  of the Bangalore Open Solaris User's Group (BOSUG) for over a year and am deeply committed     to free software. I am also working on an installer based on Qt for Belenix OS.
4. I have experience using svn and git and am comfortable with working with tools like cmake etc.


About me

Name: Kunal Ghosh
IRC : gancient
Location: Bangalore , India

I am right now pursuing my Bachelor of Engineering in Computer Science and Engineering and have been deeply motivated and inspired by the philosophy of Free Software.I take interest in music and arts of all forms.

Pattern Recognition and Robotics interest me ( Though being opposite fields, in my opinion they complement each other ).

I also love to use python to accomplish my programming needs whenever possible (language should be a means of easy and efficient communication and python is exactly that).

_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Regarding face recognition in digikam GSoC project

Madhusudan C.S
Hi Kunal,
  As I already said, in an overview the proposal looks really
good and well baked already. There are not much for me to
suggest changes since it is closing perfections. One
important thing is you must still pay closer attention to your
punctuation since it is an integral part of technical writing.
Other than that my minor suggestions inlined.

Also don't wait for anything else, first submit it and then
read the review :P

It will be good although not necessary to wrap your proposal
at 80 chars per line. It will make the proposal look more
professional.

On Fri, Apr 2, 2010 at 11:50 PM, kunal ghosh <[hidden email]> wrote:
hi , this is my revised proposal for the Google Summer of Code KDE-digikam project to implement face recognition in digikam.
I would kindly request you to please provide suggestions / comments to improve the proposal.
Proposal follows:

Title: Implementation of Face Recognition engine in digikam for automatic face recognition and tagging.

Motivation:I have always been a big fan and supporter of intelligent and elegant technologies and so when i had to join a Special Interest Group at our institute my obvious choice was the face recognition SIG ,as it interested me the most.

Having had a brush with Picasa 3.5 and then Digikam as i shifted to KDE, made me feel the lack of face tagging.
Automatic Face tagging being most demanded feature in digikam mailing lists (request-id 46288) motivated me to take it up as my GSoC project.

Also since face recognition is what i am working on as my area of research at my institute, i have a deep desire in pursuing this project as it would be most fruitful for the project and me. ( I had discussed this feature with the digikam-devel list http://old.nabble.com/face-recognition-in-digikam-td26844374.html ).

Implementation Details:
I propose to follow a library and plugin architecture wherein the front-end would be menu additions to digikam in the form of a face-recognition plugin and the back end would have the detection and recognition engine performing the image processing.

Library implementation:

1.The face detection would be done using multiple Haar-Cascades based Libface's detector which has been tried , tested and implemented in openCV with promising results..
2.Since i propose to use Elastic Bunch Graphs for recognizing faces (reason and comparison with other models follow) i propose to use OpenCV to perform the image processing needs. 
3.Since Most modern CPUs have multiple cores and recent advent of General Purpose GPU SDKs i propose to implement the library routines in either CUDA or OpenCL. (discussions with mentors required).
CUDA and OpenCL would be more interesting for the library because the recognition speed would increase overtime as more core CPUs and faster GPUs appear common desktop.Hence making digikam future proof.


The idea behind dividing the task into a library is to extend the face-recognition to object recognition in general, in future,
and to make it easy for other projects to adopt the same library in their code.


Why Elastic Bunch Graph (EBG) Based Face Recognition:
1.It adds semantics to face recognition.It recognizes faces taking into consideration various facial features(eyes, nose etc) and uses matching criterion like distance between the features.
As compared to the above PCA and LDA based methods like Eigen Faces and Fisher Faces do not extract facial feature data and thus do not use the patterns in faces , resulting in lower recognition accuracy.

2.Elastic Bunch Graphs can store the feature data in matrices (may vary depending on implementation) so adding another training image is just a matter of adding a matrix entry.Where as Eigen Faces and Fisher Faces based methods create a set of representative images (eigen faces or fisher faces) which need to be recalculated every time a new training image is added leading to a lot of time overhead for large training sets.
This lack of retraining in the case of EBG based method makes it a better contender for the library.

3.Since EBG based method considers semantic data of face it understands

It will be good to have a comma in between, a single sentence
is creating some confusion.
 
where there is a pose and expression variation in the test image. This is a major concern in a photo management application wherein there may be photos of the same person in a different mood hence a different facial expression.

Further statistical data comparing EBG method to various other methods for face recognition can be found here http://tinyurl.com/facerec-compare

The UI and the Digikam/KDE integration:
The face recognition feature would be a plugin/built-in feature of digikam which can be enabled/disabled from the 
"Configure Digikam Menu" in the "Settings" Main Menu item already in digikam.

Once enabled the detection would be run through all the digikam albums.This would be done asynchronously to not hinder usability.

Once the detection is done, the user is presented with a simple widget containing the detected faces along with UI elements necessary to name the faces and also to reject non face images detected by the detection system.
This would include code from the current "Edit Menu" where in a box can be drawn on a portion of the image.

After which the metadata would be stored in RDF or XMP format inside the image. Also region tagging would be based on the following image annotation system http://www.kanzaki.com/docs/sw/img-annotator.html .

The detected faces would be stored in a database ( choice depends on project constraints, mostly SQLite) along with the
bunch graphs for faster data access and updation.

The metadata generated , as in the names of people in a photograph would be registered with Nepomuk for linking with names in emails etc.

Proposed schedule:
Now     - April  9th :Build digikam and tinker around with the code and fix few bugs to increase familiarity.

April 10-April 30th :Finalize on the metadata format i.e RDF / XMP and get the basic detection working.

May 1st-June 1st   :OpenCL/CUDA based Recognition engine ready.Documentation and bundling into library with some test images.
June 1st-July 1st   : Finish UI building plugin and Integration with the recognition engine.Document code and write a User's Guide.

July 1st-August 1st: Write a small demon to collect performance data from willing beta testers.Thoroughly test the system and collect anonymous usage data , fine tune the plugin and the recognition engine. Documents the tests and performance statistics.


Month long targets are a bit too much. It will be better
to have a more fine grained, say a fortnight deliverables.
But depends on your mentors if he cares about the
schedule much in the proposal. If he doesn't then don't
spend time on this.

August 1st-August 9th:Last minute modifications if any.


Why should i be chosen:

1. I have a good understanding of the Qt framework and have used as well as given demos of the same in local LUG meetings.
2. I am pursuing research on face recognition in my institute and would be able to give a lot to this project.
3. I have been an active member  of the Bangalore Open Solaris User's Group (BOSUG) for over a year and am deeply committed     to free software. I am also working on an installer based on Qt for Belenix OS.
4. I have experience using svn and git and am comfortable with working with tools like cmake etc.


About me

Name: Kunal Ghosh
IRC : gancient
Location: Bangalore , India

I am right now pursuing my Bachelor of Engineering in Computer Science and Engineering and have been deeply motivated and inspired by the philosophy of Free Software.I take interest in music and arts of all forms.

Pattern Recognition and Robotics interest me ( Though being opposite fields, in my opinion they complement each other ).

I also love to use python to accomplish my programming needs whenever possible (language should be a means of easy and efficient communication and python is exactly that).

Awesome proposal I must say. Wish you a very good
luck and let us share the love of summer!!!

--
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
My Online Identity: madhusudancs

_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel
Reply | Threaded
Open this post in threaded view
|

Re: Regarding face recognition in digikam GSoC project

kunal ghosh
In reply to this post by kunal ghosh
Hi Marcel,

Based on the discussion about kanzaki metadata program that we had earlier, i wanted to know that:
So, if a user has to add image location metadata he/she would have to be presented with a UI.
Once the portion to be tagged is identified, the following metadata would have to be added to the image,
using the XMP and Dublin Core formats.

<foaf:Image rdf:about="<path of the image whose sub-region is to be tagged">
 <image:hasPart>
  <image:Rectangle rdf:ID="p1">
   <image:points>top_left_x,top_left_y,bottom_left_x,bottom_left_y</image:points>
   <dc:title>title of the image segment</dc:title>
   <dc:description>description of the tagged region</dc:description>
  </image:Rectangle>
 </image:hasPart>
 </foaf:Image>

Also, i have got the svn release of digikam built successfully. Had a bit of a glitch as i didnot know that some environment
variables need to be defined.
I will be working on a patch to add the "draw box around object" to the already existing set of XMP fields under
Image->Metadata->edit XMP.
Would it be worthwhile ?

--
regards
-------
Kunal Ghosh
Dept of Computer Sc. & Engineering.
Sir MVIT
Bangalore,India

Quote:
"Ignorance is not a sin, the persistence of ignorance is"
--
"If you find a task difficult today, you'll find it difficult 10yrs later too !"
-----
"Failing to Plan is Planning to Fail"

Blog:kunalghosh.wordpress.com
Website:www.kunalghosh.net46.net
V-card:http://tinyurl.com/86qjyk


_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel