HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

String replacement in OCaml

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
ocamlstringreplacement

Problem

This is part of my first OCaml program.

Its job is to replace a set of placeholder characters with umlauts. Taking the German word Ruebe as an example, the program turns it into Rübe. Other examples are Moewe -> Möwe and aendern -> ändern

Generally, everytime the program encounters the characters ae, oe, and ue it turns them into umlauts. There is an exception though: If the preceding character is a vowel, it does not change the word. This ensures words like Treue remain as they are (since there is no word Treü).

I've used the library Re2 for regular expressions. Re2 doesn't implement lookarounds, which was my first thought for checking for preceding vowels. This is the reason for the function replace_if_not_after_vowel.

I am happy for all suggestions, especially if they help make the code more idiomatic or simpler.

To compile the program, I used the command ocamlbuild -use-ocamlfind -package re2 -package core -tag thread myCompiledFile.byte

```
open Core.Std
open Re2.Std
open Re2.Infix

( If the word contains an umlaut placeholder like "ue" we replace that with the proper umlaut "ü". Except if there is a vowel directly before the placeholder like in "Treue" )
let placeholders_to_umlauts = [("ue", "ü"); ("oe","ö"); ("ae","ä"); ("Ue", "Ü"); ("Oe","Ö"); ("Ae","Ä")]

( A regex that matches a vowel )
let vowel = ~/"[aeiou]"

( Applies a list of changes to a word )
let rec apply_changes word changes =
match changes with
| [] -> word
| change :: rest -> apply_changes (change word) rest

( Replaces replace_this with replacement inside the text if the preceding character not a vowel. Since Re2 doesn't implement lookarounds we can't use a negative lookbehind )
let replace_if_not_after_vowel replace_this replacement text =

Re2.replace_exn ~/replace_this text ~f:(fun regex_match ->
( Returns true if there is a vowel at the given position in the text )
let is_vowel text pos =
if pos >= 0 && pos replace_if_not_aft

Solution

Re version, since it was amusing to do. My version uses a slightly different technique: I build one big regexp, with two groups and replace everything in one go without any additional checking. If Re.replace provided slightly more control (per-group substitution) it would avoid the concatenation.

I used the combinators for building the regexp, instead of the symbolic version, because that's much more readable, really.

let map_to_umlauts =
  [ "ue","ü" ; "oe","ö" ; "ae","ä" ; "Ue","Ü" ; "Oe","Ö" ; "Ae","Ä" ]

let regexp =
  let open Re in compile @@
  seq [
    group @@ alt [ bow ; compl [no_case @@ set "aeiouy"] ] ;
    group @@ alt (List.map (fun (s,_) -> str s) map_to_umlauts) ;
  ]

let replace s =
  let f subs =
    Re.get subs 1 ^ List.assoc (Re.get subs 2) map_to_umlauts
  in
  Re.replace ~f regexp s

let () =
  print_endline @@ replace Sys.argv.(1)


On your version, apart from the change in algorithm, I have only one comment: You really don't need a regexp to check if a char is a voyel. ;)

Also, do note all of this is playing fast and, more important, very very loose, with Unicode.

Code Snippets

let map_to_umlauts =
  [ "ue","ü" ; "oe","ö" ; "ae","ä" ; "Ue","Ü" ; "Oe","Ö" ; "Ae","Ä" ]

let regexp =
  let open Re in compile @@
  seq [
    group @@ alt [ bow ; compl [no_case @@ set "aeiouy"] ] ;
    group @@ alt (List.map (fun (s,_) -> str s) map_to_umlauts) ;
  ]

let replace s =
  let f subs =
    Re.get subs 1 ^ List.assoc (Re.get subs 2) map_to_umlauts
  in
  Re.replace ~f regexp s

let () =
  print_endline @@ replace Sys.argv.(1)

Context

StackExchange Code Review Q#106871, answer score: 3

Revisions (0)

No revisions yet.