patterncsharpMinor
Batch save with progress reporting
Viewed 0 times
reportingwithbatchsaveprogress
Problem
I reviewed this question that in my opinion uses a not so pretty batch-save so I tried improve it and this is what I came up with.
In this more generic solution that allows to reuse it I wanted to implement the
It is an extension that you can use on a collection of entities. It requires the size of the batch and a factory method for the context. The last parameter is for reporting progress.
Unlike the original solution this one works with double so it is able to report progress for collection that are smaller then 100 items.
Internally it uses two lambdas: one for checking if the batch should be saved and the other one for progress reporting. The context is wrapped with a using statement.
```
static class EnumerableExtensions
{
public static void BatchSave(
this IList items,
int batchSize,
Func createContext,
IProgress progress
)
where T : class
where TContext : DbContext
{
var count = 0;
var lastPercentage = 0d;
var nextBatch = new Func(() => ++count % batchSize == 0);
var reportProgress = new Action(() =>
{
var percentage = Math.Round((double)count / (double)items.Count * 100d, 1);
if (percentage > lastPercentage)
{
progress.Report(percentage);
lastPercentage = percentage;
}
});
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
do
{
//context.Set().Add(enumerator.Current);
Console.WriteLine("Add " + enumerator.Current);
reportProgress();
} whi
In this more generic solution that allows to reuse it I wanted to implement the
using statement instead of manually disposing and recreating the context.It is an extension that you can use on a collection of entities. It requires the size of the batch and a factory method for the context. The last parameter is for reporting progress.
Unlike the original solution this one works with double so it is able to report progress for collection that are smaller then 100 items.
Internally it uses two lambdas: one for checking if the batch should be saved and the other one for progress reporting. The context is wrapped with a using statement.
```
static class EnumerableExtensions
{
public static void BatchSave(
this IList items,
int batchSize,
Func createContext,
IProgress progress
)
where T : class
where TContext : DbContext
{
var count = 0;
var lastPercentage = 0d;
var nextBatch = new Func(() => ++count % batchSize == 0);
var reportProgress = new Action(() =>
{
var percentage = Math.Round((double)count / (double)items.Count * 100d, 1);
if (percentage > lastPercentage)
{
progress.Report(percentage);
lastPercentage = percentage;
}
});
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
do
{
//context.Set().Add(enumerator.Current);
Console.WriteLine("Add " + enumerator.Current);
reportProgress();
} whi
Solution
var count = 0;
var lastPercentage = 0d;
var nextBatch = new Func(() => ++count % batchSize == 0);It was a clever usage of context capture but I think it ended up playing against you.
Your method has way too much responsibility. You should had at least another method to resolve the batching problem. Because you mixed the two things(batch + db stuff) you ended up by having this:
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
//...
do{
//...
} while (!nextBatch() && enumerator.MoveNext());
}And the first thing that I thought about was: "Why the hell is he advancing the enumerator twice? Why couldn't he use
foreach instead?" And than I realized, without even parsing the code too much: "Ah it must be because of the batching stuff...". It doesn't really matter if I got that right or not, I had to spend time to think about it, asking this kind questions is very useful actually. I would strongly suggest you next time you see yourself using
enumerator.MoveNextto ask the same thing. Why aren't you using foreach?Anyway, as I said you should separate those two concerns. A batch implementation would become like this:
public IEnumerable> Batch(int total, int batchSize){
for(int i = 0; i < total; i += batchSize){
var currentBatchSize = Math.Min(batchSize, total - i);
yield return Enumerable.Range(i * batchSize, currentBatchSize);
}
}A similar thing could be said about
reportProgress. You ended up by creating abstraction over a simple calculation that is to calculate the report progress(with a simple mathematical formula). It didn't really contribute in any positive way to the readability/maintainability of your code.Putting that all into a new implementation would look like this:
public static IEnumerable> GetBatches(int total, int batchSize){
for(int i = 0; i (
this IList items,
int batchSize,
Func createContext,
IProgress progress
)
where T : class
where TContext : DbContext
{
var batches = GetBatches(items.Count, batchSize).ToList();
int currentBatch = 0;
foreach(var batch in batches){
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
foreach(var i in batch){
Console.WriteLine("Add " + items[i]);
}
//context.SaveChanges();
progress.Report(++currentBatch * 100.0 / batches.Count);
Console.WriteLine("Save batch = " + currentBatch);
}
}
progress.Report(100);
}This alternative might yet lack something. For example, if you are really a neat freak, maybe you would try to change the return result of
GetBatches in someway to be more useful. One thing that could be useful is to know which batch is the current batch. Should the return be IEnumerable>(where tuple can be another classes with two integers if you want...). Would it be more useful to turn it to GetBatches? with IEnumerable> return? Well I will leave that for you to decide, at least I am providing the foundations for it.One another thing that might be worth pointing is that there isn't really a reason to
createContext() multiple times, you can create it just one time, and just call SaveChanges per each batch. Or maybe there is an efficiency reason, that I am not aware of... but you have to prove it by measuring.Code Snippets
var count = 0;
var lastPercentage = 0d;
var nextBatch = new Func<bool>(() => ++count % batchSize == 0);var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
//...
do{
//...
} while (!nextBatch() && enumerator.MoveNext());
}public IEnumerable<IEnumerable<int>> Batch(int total, int batchSize){
for(int i = 0; i < total; i += batchSize){
var currentBatchSize = Math.Min(batchSize, total - i);
yield return Enumerable.Range(i * batchSize, currentBatchSize);
}
}public static IEnumerable<IEnumerable<int>> GetBatches(int total, int batchSize){
for(int i = 0; i < total; i += batchSize){
var currentBatchSize = Math.Min(batchSize, total - i);
yield return Enumerable.Range(i, currentBatchSize);
}
}
public static void BatchSave<T, TContext>(
this IList<T> items,
int batchSize,
Func<TContext> createContext,
IProgress<double> progress
)
where T : class
where TContext : DbContext
{
var batches = GetBatches(items.Count, batchSize).ToList();
int currentBatch = 0;
foreach(var batch in batches){
using (var context = createContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
foreach(var i in batch){
Console.WriteLine("Add " + items[i]);
}
//context.SaveChanges();
progress.Report(++currentBatch * 100.0 / batches.Count);
Console.WriteLine("Save batch = " + currentBatch);
}
}
progress.Report(100);
}Context
StackExchange Code Review Q#150636, answer score: 6
Revisions (0)
No revisions yet.