patternjavaMinor
Reorganizing inventory items into a data structure
Viewed 0 times
reorganizingintoinventoryitemsstructuredata
Problem
I'm currently refactoring some old code of mine that I had done a few years ago, where it takes in two Lists to compare their contents, and then return the mapped contents that were matched. With 750 items for potential needles, and over 25000 items as the haystack, the original code took about 22-25 seconds to process. After a bit of refactoring however I was able to reduce it to just over a second. Is there a way to improve it further?
Here's the original code
And here's the new code:
```
Map>> map = new TreeMap>>();
String charsToFilter = "[" + Pattern.quote("_-") + "]";
ArrayList productNameArray = new ArrayList();
for (Map baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ").toLowerCase());
}
for (Map inventoryItem : inventoryArray) {
//moving this to the for loop below increase execution time by 100%
String processedInventoryItem = inventoryItem.get("title").replaceAll(charsToFilter, " ").toLowerCase();
for (String productName : productNameArray) {
if (processedInventoryItem.contains(productName)) {
ArrayList> currentListingArray = new ArrayList>();
TreeMap currentListingMap = new TreeMap();
for (Map.Entry keyValue : inventoryItem.entrySet()) {
Here's the original code
Map>> map = new TreeMap>>();
String charsToFilter = "[" + Pattern.quote("_-") + "]";
for (int i = 0 ; i > currentListingArray = new ArrayList>();
TreeMap currentListingMap = new TreeMap();
for (Map.Entry keyValue : inventoryArray.get(j).entrySet()) {
currentListingMap.put(keyValue.getKey(), keyValue.getValue());
}
currentListingArray.add(currentListingMap);
if (map.get(currentProduct) == null) {
map.put(currentProduct, currentListingArray);
} else {
map.get(currentProduct).add(currentListingMap);
}
}
}
}
return map;And here's the new code:
```
Map>> map = new TreeMap>>();
String charsToFilter = "[" + Pattern.quote("_-") + "]";
ArrayList productNameArray = new ArrayList();
for (Map baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ").toLowerCase());
}
for (Map inventoryItem : inventoryArray) {
//moving this to the for loop below increase execution time by 100%
String processedInventoryItem = inventoryItem.get("title").replaceAll(charsToFilter, " ").toLowerCase();
for (String productName : productNameArray) {
if (processedInventoryItem.contains(productName)) {
ArrayList> currentListingArray = new ArrayList>();
TreeMap currentListingMap = new TreeMap();
for (Map.Entry keyValue : inventoryItem.entrySet()) {
Solution
The first thing to note is that Java no longer requires the generic types to be specified in the declaration of the collections you have, so, for example, the code
can be replaced with:
That simplifies your line-lengths in a few places.
This is how I would simplify the code cosmetically:
OK, that fits on a single screen, and is almost narrow-enough for StackExchange code blocks.
Right, now, what are you doing? You're iterating over the inventoryArray, and, for each inventoryItem, you check to see whether there's a baseDataArray member who's filtered name is part of the filtered inventoryItem's name.
You extract the processing of the baseDataArray names to a separate loop to prevent recalculating it often. That's a good idea. It would be better if you reused a single compiled
Then, use that pattern in the loop, replacing the following
with:
Similarly, inside the inventory loop, you can also do:
That should reuse the pattern more efficiently, and improve the replaceall time (though Java probably does a good job of detecting that misuse, and compiling it efficiently anyway.
There is a potential bug in your code, though, and it possibly accounts for the speed-up too.
In the original code, you check every inventory item against every product.
In the second code, you check each product against only the first inventory. You "break" the inner loop.
As a result, your "faster" code does not do a full join between the data sets, but only a partial one. This may or may not impact your results, but it will impact the performance.
As an aside, this code would look great in Java 8 with streams... do you have Java 8 available?
Map>> map = new TreeMap>>();can be replaced with:
Map>> map = new TreeMap<>();That simplifies your line-lengths in a few places.
This is how I would simplify the code cosmetically:
Map>> map = new TreeMap<>();
String charsToFilter = "[" + Pattern.quote("_-") + "]";
ArrayList productNameArray = new ArrayList<>();
for (Map baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ")
.toLowerCase());
}
for (Map inventoryItem : inventoryArray) {
//moving this to the for loop below increase execution time by 100%
String processedInventoryItem = inventoryItem.get("title")
.replaceAll(charsToFilter, " ")
.toLowerCase();
for (String productName : productNameArray) {
if (processedInventoryItem.contains(productName)) {
ArrayList> currentListingArray = new ArrayList<>();
TreeMap currentListingMap = new TreeMap<>();
for (Map.Entry keyValue : inventoryItem.entrySet()) {
currentListingMap.put(keyValue.getKey(), keyValue.getValue());
}
currentListingArray.add(currentListingMap);
if (map.get(productName) == null) {
map.put(productName, currentListingArray);
} else {
map.get(productName).add(currentListingMap);
}
break;
}
}
}
return map;OK, that fits on a single screen, and is almost narrow-enough for StackExchange code blocks.
Right, now, what are you doing? You're iterating over the inventoryArray, and, for each inventoryItem, you check to see whether there's a baseDataArray member who's filtered name is part of the filtered inventoryItem's name.
You extract the processing of the baseDataArray names to a separate loop to prevent recalculating it often. That's a good idea. It would be better if you reused a single compiled
Pattern:Pattern charfilter = Pattern.compile("[" + Pattern.quote("_-") + "]");Then, use that pattern in the loop, replacing the following
for (Map baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ").toLowerCase());
}with:
for (Map baseDataItem : baseDataArray) {
productNameArray.add(charfilter.matcher(baseDataItem.get("product_name"))
.replaceAll(" ").toLowerCase());
}Similarly, inside the inventory loop, you can also do:
String processedInventoryItem = charfilter.matcher(inventoryItem.get("title")).replaceAll(" ").toLowerCase();That should reuse the pattern more efficiently, and improve the replaceall time (though Java probably does a good job of detecting that misuse, and compiling it efficiently anyway.
There is a potential bug in your code, though, and it possibly accounts for the speed-up too.
In the original code, you check every inventory item against every product.
In the second code, you check each product against only the first inventory. You "break" the inner loop.
As a result, your "faster" code does not do a full join between the data sets, but only a partial one. This may or may not impact your results, but it will impact the performance.
As an aside, this code would look great in Java 8 with streams... do you have Java 8 available?
Code Snippets
Map<String, ArrayList<TreeMap<String, String>>> map = new TreeMap<String, ArrayList<TreeMap<String, String>>>();Map<String, ArrayList<TreeMap<String, String>>> map = new TreeMap<>();Map<String, ArrayList<TreeMap<String, String>>> map = new TreeMap<>();
String charsToFilter = "[" + Pattern.quote("_-") + "]";
ArrayList<String> productNameArray = new ArrayList<>();
for (Map<String, String> baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ")
.toLowerCase());
}
for (Map<String, String> inventoryItem : inventoryArray) {
//moving this to the for loop below increase execution time by 100%
String processedInventoryItem = inventoryItem.get("title")
.replaceAll(charsToFilter, " ")
.toLowerCase();
for (String productName : productNameArray) {
if (processedInventoryItem.contains(productName)) {
ArrayList<TreeMap<String, String>> currentListingArray = new ArrayList<>();
TreeMap<String, String> currentListingMap = new TreeMap<>();
for (Map.Entry<String, String> keyValue : inventoryItem.entrySet()) {
currentListingMap.put(keyValue.getKey(), keyValue.getValue());
}
currentListingArray.add(currentListingMap);
if (map.get(productName) == null) {
map.put(productName, currentListingArray);
} else {
map.get(productName).add(currentListingMap);
}
break;
}
}
}
return map;Pattern charfilter = Pattern.compile("[" + Pattern.quote("_-") + "]");for (Map<String, String> baseDataItem : baseDataArray) {
productNameArray.add(baseDataItem.get("product_name")
.replaceAll(charsToFilter, " ").toLowerCase());
}Context
StackExchange Code Review Q#95258, answer score: 5
Revisions (0)
No revisions yet.