principleModerate
Data Science vs Operations Research
Viewed 0 times
sciencedataoperationsresearch
Problem
The general question, as the title suggests, is:
On a conceptual level I understand that DS tries to extract knowledge from the available data and uses mostly Statistical, Machine Learning techniques. On the other hand, OR uses the data in order to make decisions based on the data, for example by optimizing some objective function (criterion) over the data (input).
I wonder, how these two paradigms compare.
In particular, I am interested in the following:
Is there any example where OR techniques are used to solve a Data Science question/problem?
- What is the difference between DS and OR/optimization.
On a conceptual level I understand that DS tries to extract knowledge from the available data and uses mostly Statistical, Machine Learning techniques. On the other hand, OR uses the data in order to make decisions based on the data, for example by optimizing some objective function (criterion) over the data (input).
I wonder, how these two paradigms compare.
- Is one subset of the other?
- Are they consider complementary fields?
- Are there examples that one field complements the other or they are used in conjuction?
In particular, I am interested in the following:
Is there any example where OR techniques are used to solve a Data Science question/problem?
Solution
While both Operations Research and Data Science both cover a large amount of topics and areas, I'll try to give my perspective on what I see as the most representative and mainstream parts of each.
As others have pointed out, the bulk of Operations Research is primarily concerned with making decisions. While there are many different ways to determine how to make decisions, the most mainstream parts of OR (in my opinion) are focused on modelling decision problems in a mathematical programming framework. In these kinds of frameworks, you typically have a set of decision variables, constraints over these variables, and an objective function dependent on your decision variables that you are trying to minimize or maximize. When the decision variables can take values in $\mathbb{R}$, the constraints are linear inequalities over your decisions variables, and the objective function is a linear function of the decision variables, then you have a linear program -- the main workhorse of OR for the past sixty years. If you have other kinds of objective functions or constraints, you find yourself in the realm of integer programming, quadratic programming, semi-definite programming, etc...
Data Science, on the other hand, is mostly concerned with making inferences. Here, you're typically starting with a big pile of data and you'd like to infer something about data you haven't seen yet in your big pile. The typical sorts of things you see here are: 1) the big pile of data represents the past results of two different options and you'd like to know which option will yield the best results, 2) the big pile of data represents a time series and you'd like to know how that time series will extend into the future, 3) the big pile of data represents a labelled set of observations and you'd like infer labels for new, unlabelled observations. The first two examples fall squarely into classical statistical areas (hypothesis testing and time-series forecasting, respectively) while the third example I think is more closely associated with modern machine learning topics (classification). Fun trivia: I believe that for a long time, the job title at Google for people doing what we now call Data Science was (is?) "Statistician".
So, in my opinion, Operations Research and Data Science are mostly orthogonal disciplines, although there is some overlap. In particular, I think that time-series forecasting appears in a non-trivial amount in OR; it's one of the more significant, non-math programming-based parts of OR. Operations Research is where you turn if you have a known relationship between inputs and outputs; Data Science is where you turn if you're trying to determine that relationship (for some definition of input and output).
As others have pointed out, the bulk of Operations Research is primarily concerned with making decisions. While there are many different ways to determine how to make decisions, the most mainstream parts of OR (in my opinion) are focused on modelling decision problems in a mathematical programming framework. In these kinds of frameworks, you typically have a set of decision variables, constraints over these variables, and an objective function dependent on your decision variables that you are trying to minimize or maximize. When the decision variables can take values in $\mathbb{R}$, the constraints are linear inequalities over your decisions variables, and the objective function is a linear function of the decision variables, then you have a linear program -- the main workhorse of OR for the past sixty years. If you have other kinds of objective functions or constraints, you find yourself in the realm of integer programming, quadratic programming, semi-definite programming, etc...
Data Science, on the other hand, is mostly concerned with making inferences. Here, you're typically starting with a big pile of data and you'd like to infer something about data you haven't seen yet in your big pile. The typical sorts of things you see here are: 1) the big pile of data represents the past results of two different options and you'd like to know which option will yield the best results, 2) the big pile of data represents a time series and you'd like to know how that time series will extend into the future, 3) the big pile of data represents a labelled set of observations and you'd like infer labels for new, unlabelled observations. The first two examples fall squarely into classical statistical areas (hypothesis testing and time-series forecasting, respectively) while the third example I think is more closely associated with modern machine learning topics (classification). Fun trivia: I believe that for a long time, the job title at Google for people doing what we now call Data Science was (is?) "Statistician".
So, in my opinion, Operations Research and Data Science are mostly orthogonal disciplines, although there is some overlap. In particular, I think that time-series forecasting appears in a non-trivial amount in OR; it's one of the more significant, non-math programming-based parts of OR. Operations Research is where you turn if you have a known relationship between inputs and outputs; Data Science is where you turn if you're trying to determine that relationship (for some definition of input and output).
Context
StackExchange Computer Science Q#71525, answer score: 10
Revisions (0)
No revisions yet.