digiKam › digikam-devel

Reusing the "fuzzy search" algorithm

Classic

List

Threaded

2 messages Options

Bugzilla from hdevalence@gmail.com

Reusing the "fuzzy search" algorithm

Note: this is only tangentially related to Digikam, but it was suggested that
I send a mail here, so feel free to stop reading if you're not interested.

Basically I'm working on a program which is mostly inspired by the blog post
here: http://tinyurl.com/5rm8af which basically builds representations of an
image using coloured polygons. But I wanted to make my own that used a real
genetic algorithm.

Anyways, I need a way to determine how close the images that are created by
painting coloured polygons are to the original image, in order to be able to
select and crossover the "best" patterns of polygons. Right now I basically
convert the coordinates to Y-I-Q colour space and then use the Pythagorean
theorem to find the distance and add up the total, but I want a better way,
like how Digikam's fuzzy search works. So, I'm wondering if it would be
possible to reuse Digikam's implementation of the search in my program (which
would be Free software of course).

However I looked at the paper and though I don't fully understand a lot of the
math (I am only in highschool right now, not at a university, so I hope it's
forgivable), I noticed some things that might cause problems: e.g. the
algorithm in the paper is optimised for matching an ill-defined query image to
well-defined target images, whereas I have a well-defined query image [1] and
I'm trying to rank a series of ill-defined targets (like [2]). Also, I don't
really need to store a searchable index of the images since I am only using
the computed value once.

Would it be possible to adapt the Digikam implementation to my needs or would
I be better off to attempt to reimplement a modified version of the algorithm in
the paper (e.g. drop the condition in section 2.3 "we only consider terms in
which the query has a non-zero wavelet coefficient ~Q[i,j]. A potential benefit
(…) however it does not allow a detailed query to match a target that does not
contain that same detail" ) ?

Henry de Valence

[1] Source image: http://imagebin.ca/view/3JL760.html
[2] After 229 generations: http://imagebin.ca/view/3fKx7IT.html
_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel

Marcel Wiesweg

Re: Reusing the "fuzzy search" algorithm

> Note: this is only tangentially related to Digikam, but it was suggested

> that I send a mail here, so feel free to stop reading if you're not

> interested.

> Basically I'm working on a program which is mostly inspired by the blog

> post here: http://tinyurl.com/5rm8af which basically builds representations

> of an image using coloured polygons. But I wanted to make my own that used

> a real genetic algorithm.

> Anyways, I need a way to determine how close the images that are created by

> painting coloured polygons are to the original image, in order to be able

> to select and crossover the "best" patterns of polygons. Right now I

> basically convert the coordinates to Y-I-Q colour space and then use the

> Pythagorean theorem to find the distance and add up the total, but I want a

> better way, like how Digikam's fuzzy search works. So, I'm wondering if it

> would be possible to reuse Digikam's implementation of the search in my

> program (which would be Free software of course).

You are free to use the source, of course ;-)

> However I looked at the paper and though I don't fully understand a lot of

> the math (I am only in highschool right now, not at a university, so I hope

> it's forgivable), I noticed some things that might cause problems: e.g. the

> algorithm in the paper is optimised for matching an ill-defined query image

> to well-defined target images, whereas I have a well-defined query image

> [1] and I'm trying to rank a series of ill-defined targets (like [2]).

> Also, I don't really need to store a searchable index of the images since I

> am only using the computed value once.

The interesting thing about the algorithm is that the Math is very simple. It's just sometimes a bit confusing.

We did not do the initial implementation, but adopted it from the imgseek program. Comparing the old code with the matching part of the paper was the best way to get a good understanding of both.

The paper suggests and imgseek implemented an efficient method for in-memory storage of the index which we had to implement differently, because our signatures are stored in database we ruled out to keep it all in memory. We can now read signatures sequentially. But that's not of your interest. Other differences to Imgseek code is that we use good C++ features and syntax in my taste.

Have a look at our code:

libs/database/haar/haar.cpp is mostly low-level code which you can use as-is

libs/database/haar/haariface.cpp interfaces with our data storage, but the bestMatches() and searchDatabase() methods contain essential code.

> Would it be possible to adapt the Digikam implementation to my needs or

> would I be better off to attempt to reimplement a modified version of the

> algorithm in the paper (e.g. drop the condition in section 2.3 "we only

> consider terms in which the query has a non-zero wavelet coefficient

> ~Q[i,j]. A potential benefit (…) however it does not allow a detailed query

> to match a target that does not contain that same detail" ) ?

I dont know a good answer on that. Either you think it through or you try it out ;-)

> Henry de Valence

> [1] Source image: http://imagebin.ca/view/3JL760.html

> [2] After 229 generations: http://imagebin.ca/view/3fKx7IT.html

> _______________________________________________

> Digikam-devel mailing list

> [hidden email]

> https://mail.kde.org/mailman/listinfo/digikam-devel

_______________________________________________
Digikam-devel mailing list
[hidden email]
https://mail.kde.org/mailman/listinfo/digikam-devel