patternpythonMinor
Remove all characters except
Viewed 0 times
removecharactersexceptall
Problem
My code takes a string and replaces all characters which are not:
I have tested it and it seems to generally work well enough. But it may have some catastrophic bug in it and/or can be simplified.
- English letters
- Numbers
, / -
I have tested it and it seems to generally work well enough. But it may have some catastrophic bug in it and/or can be simplified.
x <- "dog/John is a cutting-edge pilot^¢„þ"
gsub("[^a-zA-Z0-9,-:space:]+", " ", x, perl = TRUE)
"dog/John is a cutting-edge pilot "Solution
The
Notice that the colon and period are still present after the substitution.
In fact, inside the character class,
A literal hyphen must be the first or the last character in a character class; otherwise, it is treated as a range (like
If you want a character class for whitespace, use
So, if you wanted to convert all consecutive strings of junk to a single space, preserving only letters, digits, commas, slashes, hyphens, and whitespace, you could write:
or
:space: portion of the regex makes no sense, and probably does not do what you intend.> x gsub("[^a-zA-Z0-9,-:space:]", " ", x, perl = TRUE)
[1] "abc:def."Notice that the colon and period are still present after the substitution.
In fact, inside the character class,
,-: means "all characters with ASCII codes from 44 (the comma) up to 58 (the colon)".A literal hyphen must be the first or the last character in a character class; otherwise, it is treated as a range (like
A-Z).If you want a character class for whitespace, use
"\\s" or [:space:].So, if you wanted to convert all consecutive strings of junk to a single space, preserving only letters, digits, commas, slashes, hyphens, and whitespace, you could write:
gsub("[^-,/a-zA-Z0-9[:space:]]+", " ", x, perl = TRUE)or
gsub("[^-,/a-zA-Z0-9\\s]+", " ", x, perl = TRUE)Code Snippets
> x <- "abc:def."
> gsub("[^a-zA-Z0-9,-:space:]", " ", x, perl = TRUE)
[1] "abc:def."gsub("[^-,/a-zA-Z0-9[:space:]]+", " ", x, perl = TRUE)gsub("[^-,/a-zA-Z0-9\\s]+", " ", x, perl = TRUE)Context
StackExchange Code Review Q#157350, answer score: 5
Revisions (0)
No revisions yet.