patterncsharpModerate
Job pipeline application
Viewed 0 times
jobpipelineapplication
Problem
We have a console application which runs to perform an incremental load of our data warehouse staging area, with data from a number of production databases.
The source database don't necessarily have a rowversion/timestamp/etc. column so we are currently using change data tracking in SQL to load changes to the tables. This all works fine and I think most of the code is okay, however, I don't like the way the 'main' pipeline works and was wondering if anyone had any better suggestions.
At the moment I have each 'step' as an instance of a class like this. This is so I can show a status of each step and if errors happen we have a descriptive name of the action it was performing. I was also thinking of adding pre and post conditions to each step so we can more easily tell the reason it fails.
These steps are run by the following method to enable a common type of logging. If an error is detected in the run, and we are not performing a full load, then the process will run again but perform a full load.
```
private bool RunStep(PipeLineStep step, TableContext context, Boolean isFullLoad, Boolean errorRerun)
{
try
{
step.Run();
return true;
}
catch (Exception ex)
{
Logging.Log(false, String.Format("Error during Pipeline Step [{0}]. {1}", step.StepName, ex.Message), context.TaskId, context.TableId, step.StepName, true, CurrentRunNumber);
Logging.Log(false, String.Format("{0}", !isFullLoad && !errorRerun ? "Attempting Full Load of Table." : "Aborting Table Load."), context.TaskId, context.TableId, step.StepName, true, CurrentRunNumber);
// If not currently doing a full load and we have an error - trying doing full load to refresh schema
if (!isFullLoad && !errorRerun)
{
Run(context, CurrentRunNumber, isFullLoad, true)
The source database don't necessarily have a rowversion/timestamp/etc. column so we are currently using change data tracking in SQL to load changes to the tables. This all works fine and I think most of the code is okay, however, I don't like the way the 'main' pipeline works and was wondering if anyone had any better suggestions.
At the moment I have each 'step' as an instance of a class like this. This is so I can show a status of each step and if errors happen we have a descriptive name of the action it was performing. I was also thinking of adding pre and post conditions to each step so we can more easily tell the reason it fails.
public abstract class PipeLineStep
{
public abstract String StepName { get; }
public bool WasSuccessful { get; set; }
internal abstract void Run();
}These steps are run by the following method to enable a common type of logging. If an error is detected in the run, and we are not performing a full load, then the process will run again but perform a full load.
```
private bool RunStep(PipeLineStep step, TableContext context, Boolean isFullLoad, Boolean errorRerun)
{
try
{
step.Run();
return true;
}
catch (Exception ex)
{
Logging.Log(false, String.Format("Error during Pipeline Step [{0}]. {1}", step.StepName, ex.Message), context.TaskId, context.TableId, step.StepName, true, CurrentRunNumber);
Logging.Log(false, String.Format("{0}", !isFullLoad && !errorRerun ? "Attempting Full Load of Table." : "Aborting Table Load."), context.TaskId, context.TableId, step.StepName, true, CurrentRunNumber);
// If not currently doing a full load and we have an error - trying doing full load to refresh schema
if (!isFullLoad && !errorRerun)
{
Run(context, CurrentRunNumber, isFullLoad, true)
Solution
I'd need more context so please consider some suggestions as implicit questions.
In
Property name itself is little bit redundant, you do not need to repeat
In the same class
In the same class
In general visibility issues won't make your code broken but they will help to make it safer (from usage point of view) and they communicate intent.
In
You're mixing
Parameter
You're using a parameter
Expression is
In general you're using too many function arguments and too many of them are
To reduce number of arguments you have various techniques for example (again it depends on surrounding code)
Boolean parameters are easier to remove, you may use
Minor question: shouldn't
Free thinking: if
```
private void RunWithErrorHandling() {
try {
Run();
}
catch (StepExecutionException e) {
// Try again? I don't know your full logic here
Run();
}
}
private void Run() {
foreach (var step in Steps)
RunStep(step);
}
private void RunStep(PipelineStep step) {
try {
In
PipeLineStep class StepName property is a read-only property. If you just return a constant value you may want to make it a constructor parameter instead (where you may also include some sort of validation):protected PipeLineStep(string name) {
if (String.IsNullOrWhiteSpace(name))
throw new ArgumentException("Step name must not be nor empty.", "name");
StepName = name;
}
public string StepName {
get;
private set;
}Property name itself is little bit redundant, you do not need to repeat
Step for a property of a class named Step:public string Name {
get;
private set;
}In the same class
WasSuccessful is a public read/write property. Does anyone (inside or outside its assembly) has right to both read and write this property? Assuming (because Run() is internal) that also WasSuccessful is set from within same assembly then I'd make its setter internal.In the same class
Run() method is internal. Because it's also abstract it seems that it's used only from the same assembly where class is declared and also that all derived classes are located in the same assembly. This arises a question: does it make sense to have a public abstract class that can be used and extended only from the same assembly? It may be - if other assemblies will access Name ans WasSuccessful properties - but if it's not then don't make this class public.In general visibility issues won't make your code broken but they will help to make it safer (from usage point of view) and they communicate intent.
In
RunStep method you're catching a generic Exception. You should formalize which exceptions each step will throw to possibly narrow down to a smaller set of exceptions, catching Exception you get everything, from OutOfMemoryException to ArgumentException and it's bad because they're exceptions that need a special handling. Also, knowing what you're catching will help you to determine corner cases and error scenarios: a very good exercise for you and for your tests. You're mixing
bool and Boolean (for example), they're equivalent then I'd suggest to use the style you prefer and be consistent, reader won't repeatedly ask himself why (variations are eye-catching, they must be justified).Parameter
isFullLoad is not meaningful without more context, I'd rename it to something else but I actually don't know what it means (and your future-you won't know either without re-reading surrounding code...)You're using a parameter
errorRerun to know if you're retrying because of a previous error. While it's not an issue in such simple scenario it may be confusing (see your handling logic) and it adds more parameters to a function that should be as simple (you already have four parameters, two/three more than what's good). I can't suggest anything without much more context and knowledge but it's a point I'd investigate further.Expression is
!isFullLoad && !errorRerun is used twice, it's a good chance to extract a local variable with a meaningful name (again, rename it according to its exact meaning):bool canRunFullPipelineAgain = !fullLoad && !errorRerun;In general you're using too many function arguments and too many of them are
bool. Pick any line where you use them and you can't understand their meaning without jumping back and forward to called function prototype (see RunStep(), Run(), UpdateProgress() and Logging.Log().)To reduce number of arguments you have various techniques for example (again it depends on surrounding code)
errorRerun may be a class member property as context and isFullLoaded (you have classes, containers of methods and state, you don't need to pass everything as parameters). Your logging function is a good example of this: its prototype is waaaaay too complex and convoluted. If you really can't change it - at least - use named parameters.Boolean parameters are easier to remove, you may use
enum instead of them (eventually - when it makes sense - combined as Flags). For example:var options = RunStepOptions.TableIsFullyLoaded | RunStepOptions.RetryOnError);
if (!RunStep(step1, context, options))
return false;Minor question: shouldn't
PipeLine be Pipeline?Free thinking: if
Run() executes all steps in pipeline then why you're putting error handling in RunStep()? Isn't easier to write a method Run() without error handling and an outer method RunWithErrorHandling() which takes care of this? You may also introduce a StepExecutionException, almost pseudo-code:```
private void RunWithErrorHandling() {
try {
Run();
}
catch (StepExecutionException e) {
// Try again? I don't know your full logic here
Run();
}
}
private void Run() {
foreach (var step in Steps)
RunStep(step);
}
private void RunStep(PipelineStep step) {
try {
Code Snippets
protected PipeLineStep(string name) {
if (String.IsNullOrWhiteSpace(name))
throw new ArgumentException("Step name must not be nor empty.", "name");
StepName = name;
}
public string StepName {
get;
private set;
}public string Name {
get;
private set;
}bool canRunFullPipelineAgain = !fullLoad && !errorRerun;var options = RunStepOptions.TableIsFullyLoaded | RunStepOptions.RetryOnError);
if (!RunStep(step1, context, options))
return false;private void RunWithErrorHandling() {
try {
Run();
}
catch (StepExecutionException e) {
// Try again? I don't know your full logic here
Run();
}
}
private void Run() {
foreach (var step in Steps)
RunStep(step);
}
private void RunStep(PipelineStep step) {
try {
}
catch (Exception e) { // !!!
// Logging?
throw new StepExecutionException("...", e);
}
}Context
StackExchange Code Review Q#133267, answer score: 10
Revisions (0)
No revisions yet.