HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

Removing accents from certain characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fromremovingcharacterscertainaccents

Problem

I have a method that I am using to remove accents from certain characters. The problem is the massive slew of characters I am expected to work with. I have to, basically, remove accents from all Latin characters that fit within the 26 English Latin characters. (A through Z.) Performance is a very large requirement. It has to be lightning fast, as I have to run this on every character within a string, and process many large strings at a time.

Currently, I use a gigantic switch statement to detect what character it is, and return the appropriate A through Z "naked" character, while preserving case.

As of now, my switch looks something like the following:

switch (input)
{
    case 'À': // 0192
    case 'Á': // 0193
    case 'Â': // 0194
    case 'Ã': // 0195
    case 'Ä': // 0196
    case 'Å': // 0197
    case 'Ā': // 0256
    case 'Ă': // 0258
    case 'Ą': // 0260
        return 'A';
    case 'Ç': // 0199
    case 'Ć': // 0262
    case 'Ĉ': // 0264
    case 'Ċ': // 0266
    case 'Č': // 0268
        return 'C';
    case 'Ď': // 0270
    case 'Đ': // 0272
        return 'D';
    // Other upper case characters
    case 'à': // 0224
    case 'á': // 0225
    case 'â': // 0226
    case 'ã': // 0227
    case 'ä': // 0228
    case 'å': // 0229
    case 'ā': // 0257
    case 'ă': // 0259
    case 'ą': // 0261
        return 'a';
    case 'ç': // 0231
    case 'ć': // 0263
    case 'ĉ': // 0265
    case 'ċ': // 0267
    case 'č': // 0269
        return 'c';
    case 'ď': // 0271
    case 'đ': // 0273
        return 'D';
    // Other lower case characters
    default:
        return input;
}


As you can probably imagine, this method is over 200 lines, and this is the only thing it does.

private char RemoveAccent(char input)
{
    switch (input)
    {
        // You saw all the case statements
    }
}


Literally, that is it. My questions come down to the following, and this is more of a question of performance/better ways of handling the situati

Solution

This appears to be a duplicate of this question. The link suggests using .NET's String.Normalize. If it's too slow, you could simply create an associative array (e.g., a Dictionary that maps char->char) for constant-time lookup. This is going to be large, too, but I would think it's probably easier to maintain.

Context

StackExchange Code Review Q#93438, answer score: 11

Revisions (0)

No revisions yet.