HiveBrain v1.2.0
Get Started
← Back to all entries
principlerubyMinor

Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
maptheserializingtabularhashcorrectrubydataapproachflatten

Problem

I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:

katakana:
 v_eng:   a i u e o
 v_jap:   ア イ ウ エ オ
 K:   カ キ ク ケ コ
 S:   サ シ ス セ ソ
 T:   タ チ ツ テ ト
 N:   ナ ニ ヌ ネ ノ
 H:   ハ ヒ フ ヘ ホ
 M:   マ ミ ム メ モ
 Y:   ヤ _ ユ _ ヨ
 R:   ラ リ ル レ ロ
 W:   ワ ヰ _ ヱ ヲ
hiragana:
 v_eng:   a i u e o
 v_jap:   あ い う え お
 k:   か き く け こ
 s:   さ し す せ そ
 t:   た ち つ て と
 n:   な に ぬ ね の
 h:   は ひ ふ へ ほ
 m:   ま み む め も
 y:   や _ ゆ _ よ
 r:   ら り る れ ろ
 w:   わ ゐ _ ゑ を
 nn:   ん _ _ _ _


I was able to create the serializing function, syllabarys():

```
#!/usr/bin/env ruby
require 'yaml'

def syllabarys
@syllabarys ||= lambda{
raw_data = YAML.load_file 'japanese.dic'

syllabary_names = ['katakana','hiragana']

a = syllabary_names.map{|syllabary|

syllabary_data = raw_data[syllabary]

veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split

vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat

#jp row strings by en consonants:
jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/}

#jp row arrays by en consonants:
jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)]

#en vowels with jp row arrays by en consonants:
evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat

#jp syllables by en syllables:
#outer map provides en consonant to inner map
#inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..]
#flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat
jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)]

#remove forgotten syllables:
jp_by_en.select{|en_syl,jp_syl|

Solution

Feedback:

  • Instead of ||= lambda { ... }.call, you can use ||= begin ... end



  • Instead of Hash[*arr] you can use arr.to_h in Ruby 2.0+



  • You don't need to convert everything to a hash if you just want to use it for .map later -- [[1, 2], [3, 4]].map { |k, v| k + v } #=> [3, 7]



  • Instead of .map { ... }.flatten(1), you can use .flat_map { ... }



  • If you won't be using a variable in a block, you can use _ instead, like .map { |key, _| key }



I rewrote the code to be like what I'd code it today.

require 'yaml'

def do_it(raw)
  map = raw["v_eng"].split.zip(raw["v_jap"].split)

  raw.select do |k, _|
    k.size == 1
  end.flat_map do |pre, japs|
    map.zip(japs.split).map do |(post, _), jap|
      [pre + post, jap] unless jap == '_'
    end.compact
  end.to_h
end

want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}

raw = YAML.load_file 'japanese.dic'

p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true


Hope that helps. Let me know if you want any other clarification(s) in a comment below.

Code Snippets

require 'yaml'

def do_it(raw)
  map = raw["v_eng"].split.zip(raw["v_jap"].split)

  raw.select do |k, _|
    k.size == 1
  end.flat_map do |pre, japs|
    map.zip(japs.split).map do |(post, _), jap|
      [pre + post, jap] unless jap == '_'
    end.compact
  end.to_h
end

want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}

raw = YAML.load_file 'japanese.dic'

p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true

Context

StackExchange Code Review Q#40389, answer score: 3

Revisions (0)

No revisions yet.