HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Efficient way to copy unordered String into ordered String

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
unorderedintoefficientwaystringorderedcopy

Problem

I'm doing this in Hadoop Java where I'm reading a String. The string is huge that has been tokenized and put in an array. It has key-value pairs but they are not in any order. I want this order to be rigid so I can load that as a table. So in SQL, if I select a column (after loading this in a table), all the keys of one type should be in colA.

I'm checking each word of the String array and copying them in a new string in a fixed position. The way I thought of doing this is using if else ladder like this:

//row is the tokenized unordered String

String[] newRow = new String[150];

for (int i = 0; i < row.length; ++i) {

    if(row[i].equals("token1")){
        newRow[0] = row[i]; //key
        newRow[1] = row[i+1];//value
    }

    else if(row[i].equals("token2")){
        newRow[2] = row[i];
        newRow[3] = row[i+1];
    }//...and so on. Elseif ladder at least is at least 100 long.


I wanted to know if there is a more efficient way to do this?

PS: I'm not sorting the string. Example: row1 String is {apple,good,banana,bad}, row2 String is {banana,good,apple,bad} where apple and banana are keys. Now in my output I will have two records with say apple as the first key and then banana. So output will be : newRow1: {apple,good,banana,bad}, newRow2: {apple,bad,banana,good}. Essentially I'm rearranging all input to a fixed output.

Solution

I'd put token names and positions in a Map like this:

Map tokenIndexes = new HashMap();
tokenIndexes.put("token1", 0);
tokenIndexes.put("token2", 2);
tokenIndexes.put("token3", 4);
// ...


and then in the "sorting" part:

String[] newRow = new String[150];

for (int i = 0; i < row.length; i+=2) { // go two by two as you have keys in even indexes
    if(tokenIndexes.contains(row[i])) {
        int index = tokenIndexes.get(row[i]);
        newRow[index] = row[i];
        newRow[index + 1] = row[i + 1];
    } else {
        // handle missing token
    }
}


This way I would get rid of all "if-else" statements, although now I have to maintain a Map (which I think is easier than maintaining a list of "if-else").

UPDATE

I'm supposing you just receive that String array and you're unable to change the way the information is retrieved. If you can change the initial array with a Map that would simplify your code even further.

If that's the case then try this to capture your tokens instead of putting everything in an array (I'm supposing a key is the first token to appear):

Map info = new HashTable();
boolean isKey = true;
String lastKey = null;
String token;
while(tokensAvailable()  /* or (token = readToken()) != null */) {
    token = readToken();
    if(isKey) {
        lastKey = token;
    } else {
        info.put(lastKey, token);
    }
    isKey = !isKey;
}


And then when you have to print your table, you can do something like this:

printOut("VAL_1    --   VAL_2   --   VAL_3");
printOut(String.format("%08d  --  %10.2  -- %s", info.get("numericVal1"), info.get("monetaryVal2"), info.get("val3")));


String.format() is useful in these cases, you can control the format (like the width) of how every field is printed.

Code Snippets

Map<String, Integer> tokenIndexes = new HashMap<String, Integer>();
tokenIndexes.put("token1", 0);
tokenIndexes.put("token2", 2);
tokenIndexes.put("token3", 4);
// ...
String[] newRow = new String[150];

for (int i = 0; i < row.length; i+=2) { // go two by two as you have keys in even indexes
    if(tokenIndexes.contains(row[i])) {
        int index = tokenIndexes.get(row[i]);
        newRow[index] = row[i];
        newRow[index + 1] = row[i + 1];
    } else {
        // handle missing token
    }
}
Map<String, String> info = new HashTable<String, String>();
boolean isKey = true;
String lastKey = null;
String token;
while(tokensAvailable()  /* or (token = readToken()) != null */) {
    token = readToken();
    if(isKey) {
        lastKey = token;
    } else {
        info.put(lastKey, token);
    }
    isKey = !isKey;
}
printOut("VAL_1    --   VAL_2   --   VAL_3");
printOut(String.format("%08d  --  %10.2  -- %s", info.get("numericVal1"), info.get("monetaryVal2"), info.get("val3")));

Context

StackExchange Code Review Q#29154, answer score: 3

Revisions (0)

No revisions yet.