patternMinor
In Data Mining, what does it mean to be greedy?
Viewed 0 times
miningwhatmeandoesgreedydata
Problem
I am looking at a number of algorithms in Data Mining and some are described as being greedy. My issue is that they seem to be using the term greedy in different ways, which seems contradictory.
For example, with Hierarchical Clustering the decision to merge clusters is described as greedy. However, a filter attribute selection method where we select a subset of the attributes which possess the highest Information Gain is also described as greedy because it does not cope well with the interaction of attributes.
What does the term greedy mean in terms of Data Mining?
For example, with Hierarchical Clustering the decision to merge clusters is described as greedy. However, a filter attribute selection method where we select a subset of the attributes which possess the highest Information Gain is also described as greedy because it does not cope well with the interaction of attributes.
What does the term greedy mean in terms of Data Mining?
Solution
I'm not a Data Mining expert, but from what I understand, it means the same thing as it does outside of data mining: to improving the solution in a locally optimal way, rather than one that is guaranteed globally optimal. (And, as Raphael comments, to not backtrack on that decision).
For example, if you are training a graphical model, the process of repeatedly adding the edge that improves your model's fit the most, until no changes improve the model, is a greedy process. at each stage, you've taken the action which has the most incremental improvement over your current model, but the result may not be globally optimal.
Likewise, in the clustering example, to say that the clusters are merged greedily means that they are merged in the way that provides the best incremental improvement over other merges, but may not lead to the optimal clustering at the end.
For example, if you are training a graphical model, the process of repeatedly adding the edge that improves your model's fit the most, until no changes improve the model, is a greedy process. at each stage, you've taken the action which has the most incremental improvement over your current model, but the result may not be globally optimal.
Likewise, in the clustering example, to say that the clusters are merged greedily means that they are merged in the way that provides the best incremental improvement over other merges, but may not lead to the optimal clustering at the end.
Context
StackExchange Computer Science Q#57112, answer score: 3
Revisions (0)
No revisions yet.