patterncsharpMinor
Linq query performance that use ToList()
Viewed 0 times
linqquerythatperformancetolistuse
Problem
This code written by @Rahul Singh in this post:
This code has a problem when used inside database context and we should use
The problem of this is that
The performance of this code is better but it has many lines of code. Any idea about improving the performance of the latter, or reducing the length of the former?
var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)
.Select(x =>
{
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).Contains(f.Extension));
return new
{
Category = x.Key,
TotalFileCount = files
};
});This code has a problem when used inside database context and we should use
ToList() like this to fix "Only primitive types or enumeration types are supported in this context" error:var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)The problem of this is that
ToList() fetches all records and reduces performance. Now I wrote my own code:var categoriesByExtensionFileCount =
_dbContext.ExtensionsCategories.Select(
ec =>
new
{
Category = ec.Category,
TotalSize = _dbContext.FileLists.Count(w => w.Extension == ec.Extension)
});
var categoriesTotalFileCount =
categoriesByExtensionFileCount.Select(
se =>
new
{
se.Category,
TotalCount =
categoriesByExtensionFileCount.Where(w => w.Category == se.Category).Sum(su => su.TotalSize)
}).GroupBy(x => x.Category).Select(y => y.FirstOrDefault());The performance of this code is better but it has many lines of code. Any idea about improving the performance of the latter, or reducing the length of the former?
Solution
You can indeed improve the query by changing the strategy.
If you first
Now for the few file extensions it's very easy and quick to get the category names:
This will produce the following query:
I've tested it with
and compared with the
In the last rows of the statistics you can see that short version is about 3 times faster then the original one.
If you first
GroupBy file extension and Count them inside the GroupBy element selector you can remove ToList and the query will be executed on the server and it won't be necessary to download all the data to group it later on your machine. You will then only get a few groups and their number of elements.Now for the few file extensions it's very easy and quick to get the category names:
var result =
Files
.GroupBy(x => x.Extension, (ext, fs) => new
{
Extension = ext,
Category = Categories.Single(c => c.Extension == ext).Name,
FileCount = fs.Count()
})
// the second GroupBy sums categories with multiple file extensions like pdf, doc etc.
.GroupBy(x => x.Category, (cat, counts) => new
{
Category = cat,
FileCount = counts.Sum(y => y.FileCount)
});This will produce the following query:
SELECT SUM([t3].[value2]) AS [FileCount], [t3].[value] AS [Category]
FROM (
SELECT (
SELECT [t2].[Name]
FROM [Category] AS [t2]
WHERE [t2].[Extension] = [t1].[Extension]
) AS [value], [t1].[value] AS [value2]
FROM (
SELECT COUNT(*) AS [value], [t0].[Extension]
FROM [File] AS [t0]
GROUP BY [t0].[Extension]
) AS [t1]
) AS [t3]
GROUP BY [t3].[value]I've tested it with
1.000.000 rows: and compared with the
ToList version:In the last rows of the statistics you can see that short version is about 3 times faster then the original one.
Code Snippets
var result =
Files
.GroupBy(x => x.Extension, (ext, fs) => new
{
Extension = ext,
Category = Categories.Single(c => c.Extension == ext).Name,
FileCount = fs.Count()
})
// the second GroupBy sums categories with multiple file extensions like pdf, doc etc.
.GroupBy(x => x.Category, (cat, counts) => new
{
Category = cat,
FileCount = counts.Sum(y => y.FileCount)
});SELECT SUM([t3].[value2]) AS [FileCount], [t3].[value] AS [Category]
FROM (
SELECT (
SELECT [t2].[Name]
FROM [Category] AS [t2]
WHERE [t2].[Extension] = [t1].[Extension]
) AS [value], [t1].[value] AS [value2]
FROM (
SELECT COUNT(*) AS [value], [t0].[Extension]
FROM [File] AS [t0]
GROUP BY [t0].[Extension]
) AS [t1]
) AS [t3]
GROUP BY [t3].[value]Context
StackExchange Code Review Q#108276, answer score: 3
Revisions (0)
No revisions yet.