HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Baseline tagger

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
taggerbaselinestackoverflow

Problem

This is a baseline tagger coed I have written. How can I optimize this code?

public class Tagger {

private ArrayList copyTagsList = new ArrayList();

public Tagger(WordChecker tagsList){
    for(int i =0; i trainList){

int freq = 0;
ArrayList tagFreqOfWord = new ArrayList();
System.out.println(word);

for(ArrayList wordState: trainList)
{           
    if (word.equals(wordState.get(1)))
    {

        freq+=1;
        if (tagFreqOfWord.size() == 0)
        {
            for (int i = 0 ; i<copyTagsList.size();i++)
            {
                if (((String)wordState.get(4)).equalsIgnoreCase((String)copyTagsList.get(i).get(0)))
                {
                    tagFreqOfWord.add(copyTagsList.get(i));
                    tagFreqOfWord.get(0).set(1, (int)1);
                    break;
                }
            }
        }
        else 
        {
            Boolean exist= false;
            for (List j :tagFreqOfWord)
            {
                if (((String)wordState.get(4)).equalsIgnoreCase((String)j.get(0)))
                {
                    int var = (int)tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).get(1);
                    var+=1;
                    tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).set(1,var);
                    exist = true;
                    break;
                }
            }           
            if (!exist)
            {
                List newTag = new ArrayList();
                newTag.add(wordState.get(4));
                newTag.add(1, 1);;
                tagFreqOfWord.add(newTag);
            }                       
        }
    }
}
System.out.println(tagFreqOfWord);  
}           

}


Confusion matrix (Part of AccuracyChecker class):

```
public ArrayList createConfusionMatrix(WordChecker checkedTagsList, ArrayList listOfWords){
//BeanComparator fieldComparator = new BeanComparator();
//Collections.sort(checkedTagsList, fieldComparator);

int distinctTagsSize = correctTags.size();
for(int i=0; i li

Solution

private ArrayList copyTagsList = new ArrayList();


You should always use generic version (use raw type only if you have a good reason!)

Same inside extractTag using generic version you will not need anymore String cast.

The same will be for other collection.

I usually tend to cache the .size() specially when using it in a for to avoid method call overhead, but i don't remember if JVM could optimizate the call.

Example here:

for (int i = 0 ; i<copyTagsList.size();i++)


could be

int size = copyTagsList.size();
for (int i = 0 ; i<size;i++)


Avoid Wrapper objects if not needed (only in generic!)

tagFreqOfWord.get(0).set(1, (int)1);


No need for cast here.

Boolean exist= false;


Here Boolean refer to the wrapper class, not to the native type.. with this line you cause a boxing-unboxing everytime. Replace it with boolean exist = false;.

Use better names

for (List j :tagFreqOfWord)


while reading for (int i = 0; i < x; i++) i is OK in most cases, reading this line could lead to confusion because someone could think: "well, it's j it could be the second index in a inner-for!"

Or, it could lead to confusion yourself, use one-char-name only if you are working with indexs in for/while/etc. (outside you could use idx!)

int var = (int)tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).get(1);


Do you really need the cast here? I don't think. If it's a Integer it will be read as int, so cast needed.

int var = tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).get(1);
var+=1;
tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).set(1,var);


You do indexOf() two times, which could be a problem (it checks two times the same thing) just do it one time and save the result in a variable.

int indexOf = tagFreqOfWord.indexOf(j);
int var = tagFreqOfWord.get(indexOf).get(1);
var+=1;
tagFreqOfWord.get(indexOf).set(1,var);


Remember, indexOf:

Returns the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element. More formally, returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i))), or -1 if there is no such index.

You chould check if it returns -1

Seems like wordState.get(4) can be saved and avoid .get() call N times (In a LinkedList, call .get everytime could be a problem)

So it could be

Boolean exist= false;
String specialWord = (String)wordState.get(4);
for (List j :tagFreqOfWord)
{
    if (specialWord.equalsIgnoreCase((String)j.get(0)))
    {
        int var = (int)tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).get(1);
        var+=1;
        tagFreqOfWord.get(tagFreqOfWord.indexOf(j)).set(1,var);
        exist = true;
        break;
    }
}

if (!exist)
{
    List newTag = new ArrayList();
    newTag.add(specialWord);
    newTag.add(1, 1);;
    tagFreqOfWord.add(newTag);
}


So you avoid do a Cast everytime!

int distinctTagsSize = correctTags.size();
for(int i=0; i  list = new ArrayList();
    confusionMatrix.add(i, list);;
    for(int j=0;j<distinctTagsSize; j++ ){
        confusionMatrix.get(i).add(j, 0);
        if (i == j)
            confusionMatrix.get(i).set(j, checkedTagsList.getCheckedTags().get(i).get(1));
    }
}


Here you create a new list everytime, it's what you want? Ok i know it's Code review so you already know the code is correct, i just want to be sure you know what happens here.

For now i can't see anything wrong in this block.

Here

for(int i=0; i <distinctTagsSize;i++){
    for(int j=0;j<listOfWords.size();j++){
            if (((String)correctTags.get(i).get(0)).equalsIgnoreCase((String)listOfWords.get(j).get(4))){
                if(!((String)listOfWords.get(j).get(5)).equalsIgnoreCase((String)listOfWords.get(j).get(4))){
                    for (List m: correctTags){
                        if(((String)listOfWords.get(j).get(5)).equalsIgnoreCase((String)m.get(0))){
                            int freq= (int)confusionMatrix.get(i).get(correctTags.indexOf(m));
                            freq+=1;
                            confusionMatrix.get(i).set(correctTags.indexOf(m),freq);
                        }
                    }
                }

        }
    }
}


(String)correctTags.get(i).get(0) will not change while for(..; j; ..) so you could do

for(int i=0; i <distinctTagsSize;i++){
   String cache1 = (String)correctTags.get(i).get(0);
    for(int j=0;j<listOfWords.size();j++){


and then use cache1. (example name!!!!)

You could do the same thing for (String)listOfWords.get(j).get(4) and (String)listOfWords.get(j).get(5) but inside for (..; j; ..) scope.

Again, as above

int freq= (int)confusionMatrix.get(i).get(correctTags.indexOf(m));


You don't need a cast here, and you can save correctTags.indexOf(m) return value inside a variable and reuse it inside

confusionMatrix.get(i).set(correctTags.indexOf(m),freq);


too

```
confusionMatrix.get(i).set(indexO

Code Snippets

private ArrayList<List> copyTagsList = new ArrayList<List>();
for (int i = 0 ; i<copyTagsList.size();i++)
int size = copyTagsList.size();
for (int i = 0 ; i<size;i++)
tagFreqOfWord.get(0).set(1, (int)1);
Boolean exist= false;

Context

StackExchange Code Review Q#45640, answer score: 5

Revisions (0)

No revisions yet.