patternjavascriptMinor
Matrix of articles and authors from large JSON dataset
Viewed 0 times
authorsjsondatasetarticleslargeandfrommatrix
Problem
Given the following JSON data containing 4 articles by 4 different authors:
I need to answer the following query:
Generate a matrix such that each cell
For above example the output would simply look like:
I have devised the solution below without thinking too much about design and performanc
const data = {
"DataSet": {
"Article": [
{
"Title": 'A',
"AuthorList": {
"Author": [{
"FirstName": "Al",
"LastName": "Ab",
"Initials": "A"
}, {
"FirstName": "Bi",
"LastName": "By",
"Initials": "B"
}] // Author
} // AuthorList
}, // Article
{
"Title": 'B',
"AuthorList": {
"Author": [{
"FirstName": "Bi",
"LastName": "By",
"Initials": "B"
}, {
"FirstName": "Ch",
"LastName": "Ch",
"Initials": "C"
}] // Author
} // AuthorList
}, // Article
{
"Title": 'C',
"AuthorList": {
"Author": [{
"FirstName": "Ch",
"LastName": "Ch",
"Initials": "C"
}, {
"FirstName": "Al",
"LastName": "Ab",
"Initials": "A"
}] // Author
} // AuthorList
}, // Article
{
"Title": 'D',
"AuthorList": {
"Author": [{
"FirstName": "Da",
"LastName": "Do",
"Initials": "D"
}, {
"FirstName": "Bi",
"LastName": "By",
"Initials": "B"
}] // Author
} // AuthorList
} // Article
] // Articles
} // DataSet
};I need to answer the following query:
Generate a matrix such that each cell
(i, j) contains say N the number of articles authored by the author in column and co-authored by author in row. A paper may have more than 2 authors. It may be assumed that JSON is present as a file.For above example the output would simply look like:
[2, 1, 1, 0]
[1, 3, 1, 1]
[1, 1, 2, 0]
[0, 1, 0, 1]I have devised the solution below without thinking too much about design and performanc
Solution
You seem to be taking a very non-optimal approach from a performance standpoint, as you iterate over the data set unnecessarily trying to extra only derive certain pieces from it at a single time. I would think your goal would be to map the
Right now you do:
As you can see that is a lot of unnecessary complexity (and memory consumption).
You should be able to build your matrix in a single pass.
The target data structure may look like this:
This all could be as simple as:
Note: I used for loops for performance optimization rather than array methods. You really could use either. If writing code for real-world system where I did not expect millions of rows and the performance micro-optimization was not necessary I would probably use array methods for cleaner code.
-
Fill 0's in matrix on display (or as secondary process if you really need the data structure). You do this by iterating the author set in nested loop.
-
This would result in complexity that is \$O(n)\$ (n = articles)for article iteration + \$O(m^2)\$ (m = authors) for matrix display. So, if # of articles are assumed to be >> # of authors, this would likely be a good approach.
data into your data structure with a single iteration over the data.dataSet.article array, resulting in \$O(n)\$ operational complexity to read the data in).Right now you do:
- Build an array of authors - \$O(n)\$, where n is number of articles
- Cast authors in to a Set - \$O(n)\$, where n is number of authors from previous step
- Get articles into array - \$O(n)\$, where n is # of articles. Note you gain nothing from mapping articles in place into there own array, but double memory usage and add this \$O(n)\$ performance hit. At a minimum you should kill this step.
- You map authors into an object with index position as value - \$O(n)\$ where n i # of authors. Another unnecessary step it seems, as you could iterate the authors set just as you have done here later if needed.
- You initialize the matrix to hold empty 0's - \$O(n)\$ where n is number of authors. This could have been done at any other point where you already iterated author set.
- You iterate the articles array - \$O(n)\$ where n is # of articles.
As you can see that is a lot of unnecessary complexity (and memory consumption).
You should be able to build your matrix in a single pass.
- Initialize your matrix as an empty object which will later be filled in .
The target data structure may look like this:
{
'ch ch': {
'ch ch': *,
'other author': *,
...
},
'other author': { ... },
...
}- iterate through
data.dataSet.article. On each element populate counters on properties in proposed data structure above.
- populate all authors into a set for use in rendering display.
This all could be as simple as:
const matrix = {};
const authors = new Set();
const articleLen = data.dataSet.article.length;
for (let artIdx = 0; artIdx < articleLen; artIdx++) {
const len = element[artIdx].authorList.length;
for (let i = 0; i < len; i++) {
let author = fullname(element[artIdx].authorList[i]);
authors.add(author);
for (let j = 0; j < len; j++) {
let coauthor = fullname(element[artIdx].authorList[j]);
matrix[coauthor][author] = (matrix[coauthor][author]++ || 0) + 1;
}
}
}Note: I used for loops for performance optimization rather than array methods. You really could use either. If writing code for real-world system where I did not expect millions of rows and the performance micro-optimization was not necessary I would probably use array methods for cleaner code.
-
Fill 0's in matrix on display (or as secondary process if you really need the data structure). You do this by iterating the author set in nested loop.
-
This would result in complexity that is \$O(n)\$ (n = articles)for article iteration + \$O(m^2)\$ (m = authors) for matrix display. So, if # of articles are assumed to be >> # of authors, this would likely be a good approach.
Code Snippets
{
'ch ch': {
'ch ch': *,
'other author': *,
...
},
'other author': { ... },
...
}const matrix = {};
const authors = new Set();
const articleLen = data.dataSet.article.length;
for (let artIdx = 0; artIdx < articleLen; artIdx++) {
const len = element[artIdx].authorList.length;
for (let i = 0; i < len; i++) {
let author = fullname(element[artIdx].authorList[i]);
authors.add(author);
for (let j = 0; j < len; j++) {
let coauthor = fullname(element[artIdx].authorList[j]);
matrix[coauthor][author] = (matrix[coauthor][author]++ || 0) + 1;
}
}
}Context
StackExchange Code Review Q#160616, answer score: 3
Revisions (0)
No revisions yet.