snippetphpMinor
PHP function to convert a Portuguese word from plural to singular
Viewed 0 times
convertphpfunctionwordsingularportuguesefromplural
Problem
I know, this sounds really difficult, but it is really easy.
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
Special case: accents should be removed, if needed. The only cases I saw were
Special case: words ending in
Below, here's the code:
You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
- If the word ends in a vowel, remove the
sat the end
- Words ending in
ões,ãesandãosshould end withão
- Words ending in
is, remove theisand addlto the end
Special case: accents should be removed, if needed. The only cases I saw were
anéis and pastéis, which have to be anel and papel.- Words ending in
nsget it replaced withm
- Words ending with
[rsz]esshould lose thees
Special case: words ending in
eses need the first e replaced with ê, like in meses => mês- Some words are always used in the plural, like
óculos,parabénsandférias.
Below, here's the code:
function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}
$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);
foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}
return $string;
}You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
Solution
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, I disagree with some of his review.
-
After my implementation of
-
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
$regex_configcan not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...(yatta-yatta). The key clash would be onel.
- Simply throwing
1at the end ofpreg_replace()is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()doesn't need to be called becausepreg_replace()is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
returnas soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.
- You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
\Kmetacharacter (restart fullstring match). This way you don't need to use$1or rewrite a literals from the pattern into the replacement.
- If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper().
- I don't have a sample string to test against
~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.
-
After my implementation of
\K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]\Kes|s)$~iu' => ''-
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]\Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]\Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "\n";
}Code Snippets
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]\Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]\Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "\n";
}Context
StackExchange Code Review Q#149991, answer score: 3
Revisions (0)
No revisions yet.