HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Three-way conversion between Japanese writing systems

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
threeconversionjapanesesystemswritingwaybetween

Problem

When my go-to Japanese transcription site went down for a while, I decided to write my own. My application converts between Romaji, Hiragana and Katakana — however, unlike any other converter I've seen, this one does a three-way conversion: there are three text boxes, and typing in one will update the content of the other two.

There's a working version here.

I'd like any feedback to focus on the big picture; that is, how I implemented this conversion. If anything else in my JS could be improved, though, don't hesitate to point that out as well.

How Japanese works

I figured I'd quickly introduce anyone who isn't familiar with Japanese to its writing systems. Keep in mind this is massively simplified, not least because I'm a beginner myself.

  • A Japanese character represents one syllable, which can either be a vowel, or a consonant followed by a vowel.



  • There are a few exceptions to how these combinations are transcribed: s + i is shi, t + i is chi, t + u is tsu, h + u is fu.



  • The only consonant that can appear without a vowel is n.



  • There are two different alphabets: Hiragana and Katakana. They both encode the same syllables, and they're virtually equivalent, they just use different-looking characters: hiragana are mostly round, katakana are blockier.



  • There's also Romaji, which is just representations of hiragana and katakana in the latin alphabet.



  • Example: the syllable me is written as め in hiragana and as メ in katakana. amerika in romaji is あめりか in hiragana and アメリカ in katakana.



  • A small tsu (っ or ッ) doubles the consonant that comes after it.



  • A small ya, yu or yo after a syllable ending in i combines the sounds (ki + small ya is kya).



  • ゙ or ゚ in the top right corner modify the consonant sound.



  • A ー doubles the vowel sound that comes before it. In Romaji, long vowels can also be written with a dash on top: ā is the same as aa.



  • Example: The Japanese word for "presentation" is happyōkai in Romaji, はっぴょうかい in hiragana and ハッピョウカイ in katak

Solution

This is so cool! I may finally realize the dream of learning Japanese :D

Anyways, back to your code.

var romajiInput = document.getElementById('romaji');
var hiraganaInput = document.getElementById('hiragana');
var katakanaInput = document.getElementById('katakana');

var converter = new Converter();

romajiInput.onkeyup = hiraganaInput.onkeyup = katakanaInput.onkeyup = function () {

    var from = this.id;

    converter.convert(this.value, from);
    var conversionResult = converter.getResult();

    if (this !== romajiInput) {
        romajiInput.value = conversionResult.romajiText;
    }
    if (this !== hiraganaInput) {
        hiraganaInput.value = conversionResult.hiraganaText;
    }
    if (this !== katakanaInput) {
        katakanaInput.value = conversionResult.katakanaText;
    }

};


This is cool, nothing wrong about it. But if you're considering, try using a framework that supports basic two-way binding. That way, you don't have to deal with syncing the DOM with your data. Here's an example using Ractive.js

var JapaneseConversionWidget = Ractive.extend({
  // If you have the luxury of ES6, you can use template strings
  template: `
    
    
    
  `,
  // This autobinds to the DOM
  data: {
    hiragana: '',
    katakana: '',
    romaji: '',
  },
  // Assuming convert returns an object like {hiragana: '', katakana: '', romaji: ''}
  // Now all I'm doing is `set. The library does everything else for me.
  fromHiragana: function(text){
    this.set(convertFromHiragana(text))
  },
  fronKatakana: function(text){
    this.set(convertFromKatakana(text))
  },
  fromRomaji: function(text){
    this.set(convertFromRomaji(text))
  }
});

new JapaneseConversionWidget({
  el: document.body,
  append: true
});


Another thing is that its better if you split your convert into more distinct operations. In the sample framework code shown above, I explicitly created functions for conversion from Hiragana, Katakana and Romaji. This prevents your convert function from becoming bloated, especially when you add dialect-specific parsing routines.

As for your converter, I don't think you really need to use prototypes for it although there's nothing wrong with doing so either. It's just that you're not doing inheritance, and the same feat can be done with just a series of transformation functions.

Now usually I'd do things in a "functional" way (not really a follower of the paradigm, but know enough to get the benefits). I suggest you create your functions transparently. That means given the same input, the function should always give the same output, regardless of what's happening on the outside, specifically the implicit mutations of properties on this.

// convert
while (this.text !== '') {
    var token = this._getToken();
    this.result.romajiText += token.romaji;
    this.result.hiraganaText += token.hiragana;
    this.result.katakanaText += token.katakana;
    this.text = this.text.substr(token.strLength);
}


The one problem I see is the use of a loop. It gives me the scares, and the fear that this will be an infinite loop eventually. What I would suggest is to have a function that accepts a string, and returns an array of tokens instead. That way, you have a finite set to operate with and easily used by array methods like map, reduce etc.

function tokenizeRomaji(text){
  return text.split('').reduce(function(syllables, character){
    // Logic to group individual characters to syllables.
    // For Romaji, you can add Romaji-specific routines
  }, [])
}

function mapTokensToCharacters(tokens){
  return tokens.map(function(token){
    return //Convert token into another dialect
  });
}


What I suggest is doing something like this:

function convertFromHiragana(text){

  var lowerCasedText = text.toLowerCase();
  var preprocessedText = preprocess(lowerCasedText);

  // Instead of "running with" in getting tokens, why not create an array of
  // tokens instead, then hand it off to individual translators? This also
  // makes the tokenizer dialect-specific. This means that even if your table
  // is shared, dialect-specific quirks can be worked-around.
  var tokenizedText = tokenizeHiragana(preprocessedText);

  // Explicitly separating translators. Since we come from Hiragana, we don't
  // translate Hiragana.
  var katakanaTranslation = convertToKatakana(tokenizedText);
  var romajiTranslation = convertToRomaji(tokenizedText);

  // Return as object. Note that we explicitly postProcess Romaji instead of
  // blindly calling postProcess and making it an implicit Romaji-only operation.
  return {
    hiragana: text,
    katakana: katakanaTranslation,
    romaji: postProcess(romajiTranslation)
  }
}


Sure, there's a lot of typing here, and more explicitness of code. However, we know that tokenizeHiragana does just tokenizing a Hiragana string into an array of tokens. We know we come from Hiragana, thus avoid Hiragana conversion. We know that the convert*

Code Snippets

var romajiInput = document.getElementById('romaji');
var hiraganaInput = document.getElementById('hiragana');
var katakanaInput = document.getElementById('katakana');

var converter = new Converter();

romajiInput.onkeyup = hiraganaInput.onkeyup = katakanaInput.onkeyup = function () {

    var from = this.id;

    converter.convert(this.value, from);
    var conversionResult = converter.getResult();

    if (this !== romajiInput) {
        romajiInput.value = conversionResult.romajiText;
    }
    if (this !== hiraganaInput) {
        hiraganaInput.value = conversionResult.hiraganaText;
    }
    if (this !== katakanaInput) {
        katakanaInput.value = conversionResult.katakanaText;
    }

};
var JapaneseConversionWidget = Ractive.extend({
  // If you have the luxury of ES6, you can use template strings
  template: `
    <textarea value="{{ hiragana }}" on-change="fromHiragana(hiragana)"></textarea>
    <textarea value="{{ katakana }}" on-change="fromKatakana(katakana)"></textarea>
    <textarea value="{{ romaji }}" on-change="fromRomaji(romaji)"></textarea>
  `,
  // This autobinds to the DOM
  data: {
    hiragana: '',
    katakana: '',
    romaji: '',
  },
  // Assuming convert returns an object like {hiragana: '', katakana: '', romaji: ''}
  // Now all I'm doing is `set. The library does everything else for me.
  fromHiragana: function(text){
    this.set(convertFromHiragana(text))
  },
  fronKatakana: function(text){
    this.set(convertFromKatakana(text))
  },
  fromRomaji: function(text){
    this.set(convertFromRomaji(text))
  }
});

new JapaneseConversionWidget({
  el: document.body,
  append: true
});
// convert
while (this.text !== '') {
    var token = this._getToken();
    this.result.romajiText += token.romaji;
    this.result.hiraganaText += token.hiragana;
    this.result.katakanaText += token.katakana;
    this.text = this.text.substr(token.strLength);
}
function tokenizeRomaji(text){
  return text.split('').reduce(function(syllables, character){
    // Logic to group individual characters to syllables.
    // For Romaji, you can add Romaji-specific routines
  }, [])
}

function mapTokensToCharacters(tokens){
  return tokens.map(function(token){
    return //Convert token into another dialect
  });
}
function convertFromHiragana(text){

  var lowerCasedText = text.toLowerCase();
  var preprocessedText = preprocess(lowerCasedText);

  // Instead of "running with" in getting tokens, why not create an array of
  // tokens instead, then hand it off to individual translators? This also
  // makes the tokenizer dialect-specific. This means that even if your table
  // is shared, dialect-specific quirks can be worked-around.
  var tokenizedText = tokenizeHiragana(preprocessedText);

  // Explicitly separating translators. Since we come from Hiragana, we don't
  // translate Hiragana.
  var katakanaTranslation = convertToKatakana(tokenizedText);
  var romajiTranslation = convertToRomaji(tokenizedText);

  // Return as object. Note that we explicitly postProcess Romaji instead of
  // blindly calling postProcess and making it an implicit Romaji-only operation.
  return {
    hiragana: text,
    katakana: katakanaTranslation,
    romaji: postProcess(romajiTranslation)
  }
}

Context

StackExchange Code Review Q#111480, answer score: 3

Revisions (0)

No revisions yet.