patternMinor
String replacement in OCaml
Viewed 0 times
ocamlstringreplacement
Problem
This is part of my first OCaml program.
Its job is to replace a set of placeholder characters with umlauts. Taking the German word
Generally, everytime the program encounters the characters
I've used the library
I am happy for all suggestions, especially if they help make the code more idiomatic or simpler.
To compile the program, I used the command
```
open Core.Std
open Re2.Std
open Re2.Infix
( If the word contains an umlaut placeholder like "ue" we replace that with the proper umlaut "ü". Except if there is a vowel directly before the placeholder like in "Treue" )
let placeholders_to_umlauts = [("ue", "ü"); ("oe","ö"); ("ae","ä"); ("Ue", "Ü"); ("Oe","Ö"); ("Ae","Ä")]
( A regex that matches a vowel )
let vowel = ~/"[aeiou]"
( Applies a list of changes to a word )
let rec apply_changes word changes =
match changes with
| [] -> word
| change :: rest -> apply_changes (change word) rest
( Replaces replace_this with replacement inside the text if the preceding character not a vowel. Since Re2 doesn't implement lookarounds we can't use a negative lookbehind )
let replace_if_not_after_vowel replace_this replacement text =
Re2.replace_exn ~/replace_this text ~f:(fun regex_match ->
( Returns true if there is a vowel at the given position in the text )
let is_vowel text pos =
if pos >= 0 && pos replace_if_not_aft
Its job is to replace a set of placeholder characters with umlauts. Taking the German word
Ruebe as an example, the program turns it into Rübe. Other examples are Moewe -> Möwe and aendern -> ändernGenerally, everytime the program encounters the characters
ae, oe, and ue it turns them into umlauts. There is an exception though: If the preceding character is a vowel, it does not change the word. This ensures words like Treue remain as they are (since there is no word Treü).I've used the library
Re2 for regular expressions. Re2 doesn't implement lookarounds, which was my first thought for checking for preceding vowels. This is the reason for the function replace_if_not_after_vowel.I am happy for all suggestions, especially if they help make the code more idiomatic or simpler.
To compile the program, I used the command
ocamlbuild -use-ocamlfind -package re2 -package core -tag thread myCompiledFile.byte```
open Core.Std
open Re2.Std
open Re2.Infix
( If the word contains an umlaut placeholder like "ue" we replace that with the proper umlaut "ü". Except if there is a vowel directly before the placeholder like in "Treue" )
let placeholders_to_umlauts = [("ue", "ü"); ("oe","ö"); ("ae","ä"); ("Ue", "Ü"); ("Oe","Ö"); ("Ae","Ä")]
( A regex that matches a vowel )
let vowel = ~/"[aeiou]"
( Applies a list of changes to a word )
let rec apply_changes word changes =
match changes with
| [] -> word
| change :: rest -> apply_changes (change word) rest
( Replaces replace_this with replacement inside the text if the preceding character not a vowel. Since Re2 doesn't implement lookarounds we can't use a negative lookbehind )
let replace_if_not_after_vowel replace_this replacement text =
Re2.replace_exn ~/replace_this text ~f:(fun regex_match ->
( Returns true if there is a vowel at the given position in the text )
let is_vowel text pos =
if pos >= 0 && pos replace_if_not_aft
Solution
Re version, since it was amusing to do. My version uses a slightly different technique: I build one big regexp, with two groups and replace everything in one go without any additional checking. If
I used the combinators for building the regexp, instead of the symbolic version, because that's much more readable, really.
On your version, apart from the change in algorithm, I have only one comment: You really don't need a regexp to check if a char is a voyel. ;)
Also, do note all of this is playing fast and, more important, very very loose, with Unicode.
Re.replace provided slightly more control (per-group substitution) it would avoid the concatenation.I used the combinators for building the regexp, instead of the symbolic version, because that's much more readable, really.
let map_to_umlauts =
[ "ue","ü" ; "oe","ö" ; "ae","ä" ; "Ue","Ü" ; "Oe","Ö" ; "Ae","Ä" ]
let regexp =
let open Re in compile @@
seq [
group @@ alt [ bow ; compl [no_case @@ set "aeiouy"] ] ;
group @@ alt (List.map (fun (s,_) -> str s) map_to_umlauts) ;
]
let replace s =
let f subs =
Re.get subs 1 ^ List.assoc (Re.get subs 2) map_to_umlauts
in
Re.replace ~f regexp s
let () =
print_endline @@ replace Sys.argv.(1)On your version, apart from the change in algorithm, I have only one comment: You really don't need a regexp to check if a char is a voyel. ;)
Also, do note all of this is playing fast and, more important, very very loose, with Unicode.
Code Snippets
let map_to_umlauts =
[ "ue","ü" ; "oe","ö" ; "ae","ä" ; "Ue","Ü" ; "Oe","Ö" ; "Ae","Ä" ]
let regexp =
let open Re in compile @@
seq [
group @@ alt [ bow ; compl [no_case @@ set "aeiouy"] ] ;
group @@ alt (List.map (fun (s,_) -> str s) map_to_umlauts) ;
]
let replace s =
let f subs =
Re.get subs 1 ^ List.assoc (Re.get subs 2) map_to_umlauts
in
Re.replace ~f regexp s
let () =
print_endline @@ replace Sys.argv.(1)Context
StackExchange Code Review Q#106871, answer score: 3
Revisions (0)
No revisions yet.