Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diacritics on search: parameter #1665

Open
robsonsobral opened this issue Dec 16, 2021 · 6 comments
Open

Diacritics on search: parameter #1665

robsonsobral opened this issue Dec 16, 2021 · 6 comments

Comments

@robsonsobral
Copy link
Contributor

When submitting "dívida", the search module uses "divida", but the search: parameter doesn't!

Shouldn't the search: parameter behave just like the Search Module?

@robsonsobral
Copy link
Contributor Author

robsonsobral commented Dec 16, 2021

On the Channel Model, there's a conditional to use REGEXP instead of LIKE, in case of the \W flag:

        foreach ($terms as $term) {
            if ($search_sql !== '') {
                $search_sql .= $andor;
            }
            if ($term == 'IS_EMPTY') {
                $empty = true;
                // Empty string
                $search_sql .= ' (' . $col_name . ($not ? '!' : '') . '=""';
                // IS (NOT) NULL
                $search_sql .= $not ? ' AND ' : ' OR ';
                $search_sql .= $col_name . ' IS ' . ($not ?: '') . ' NULL) ';
            } elseif (strpos($term, '\W') !== false) { // full word only, no partial matches
                // Note: MySQL's nutty POSIX regex word boundary is [[:>:]]
                $term = '([[:<:]]|^)' . preg_quote(str_replace('\W', '', $term)) . '([[:>:]]|$)';

                $search_sql .= ' (' . $col_name . ' ' . $not . ' REGEXP "' . ee()->db->escape_str($term) . '") ';
            } else {
                $search_sql .= ' (' . $col_name . ' ' . $not . ' LIKE "%' . ee()->db->escape_like_str($term) . '%") ';
            }
        }

The Search Module doesn't use REGEXP neither RLIKE, but spaces around the search terms:

LIKE '% ".$terms_like['0']." %') "

@intoeetive
Copy link
Contributor

@robsonsobral I think there could be different expectation here.
I can think of people expecting :search parameter work exactly as search tag, while some others could probably expect the search tag be more fuzzy while :search= parameter being more strict (like a filter and not search).
What would be your personal expectations here?

It would probably best if the both would use same code, to keep the things DRY, and make the behaviour configurable with a parameter or flag.

@robsonsobral
Copy link
Contributor Author

Really? I don't know. I would expect both work the same way. Today, we can't make them gave the same results.

I guess, the better way I can see this issue is that search: works differently from search:\W when using diacritics.

An example!

  search:field="maçã" search:field="maçã\W"
maçã matches matches
macaco matches no match
maca matches no match

And now without diacritics:

  search:field="maca" search:field="maca\W"
maçã matches no match
macaco matches no match
maca matches match

The same problem doesn't happen with Search Module.

Does it make sense?

@intoeetive
Copy link
Contributor

So the problem is not with search on exp:channel:entries vs exp:search, but rather with different results with and without \W modifier, did I get it correctly this time?

@robsonsobral
Copy link
Contributor Author

robsonsobral commented Jan 18, 2022

I'm sorry, @intoeetive . Yeah, you're right.

My intention was to show that Search Module has a different approach to the same issue.

@robsonsobral
Copy link
Contributor Author

I'm not a database guy, so it took me some time to get my mind around this. What I have found isn't cool.

The current search for full words simply doesn't work properly on Search form. So the solution used there isn't a solution for the :search parameter.

There are two ways to make full words search to work both on search module as :search parameter:

  • do some find and replace magic to turn every search term into a diacritic agnostic RegEx, which I think maybe have an impact on performance;
  • add full text indexes to all searchable fields.

The way things are isn't good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants