HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is CYK still relevant today?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
cykstillrelevanttoday

Problem

I've come across the CYK algorithm and was wondering, as it's quite old, if it is still relevant today.

Is it or an extension of it still being used in compilers (for example), or have other algorithms proven to be better?

Solution

CYK is still relevant, afaik, as the simplest example of a family of general
parsing algorithm based on dynamic programming, ranging over all
parsing technique (that I know of) and many syntactic formalisms. There is a simpler
general parsing algorithm (below), but where the dynamic programming (DP)
aspect is no longer visible. as things are defined more globally.

What I mean by general parsing algorithm, is a parsing algorithm that
can parse all memeber of a family of languages, according to their
given grammar, and produce all the possible parse structures when the
parsed string is ambiguous, combined in a condensed form called a
(shared-)parse-forest. Though this is often considered only for
context-free (CF) grammars, it does generalize to other formalisms.

Most known such algorithms are for CF grammars: such as CYK, Earley,
GLR, GLL with diverse variants. They are also known as chart parsers,
and extend beyond formal languages to some logical formalisms, often introduced by computational linguists via so-called feature structures. They can all be seen as extensions or refinement of CYK, though that is often a subjective assessment. They can also be seen as extensions or refinement of the more basic algorithm presented below.

My experience (but new research might prove me wrong) is that the DP
structure of CYK is a proper route to understand, in a simple context,
some of the issues raised by the various techniques for improving the
performance of those parsers in the structured context of the DP
calculation, or to introduce various features such as Viterbi selection of the best parses.

Some of the most basic issues, such as the grammatical forms to be
used to keep a low time and space complexity (cubic), or how to
represent parse-forests. can be more readily viewed and understood
from the simplest algorithm, which is the old cross-product
construction (Bar-Hillel, Perles and Shamir, 1961) between a CF grammar and
a FSA, to produce the grammar of the CF intersection of their
languages. This view is developed in a paper by (Lang 1995) along with extensions to other formal languages.

From a practical point of view, i.e. for use in actual parsers, some
of the more recent general parsers may be better tool. But I believe
that CYK remains an important and simple educational step, more so in
my opinion than Earley's algorithm that is much more complex. Which of
the other DP algorithm should be used for actual systems remains the
object of some debate, as performance issues are somewhat subtle.

I discuss some aspects a bit more in depth in another answer.

Context

StackExchange Computer Science Q#32320, answer score: 9

Revisions (0)

No revisions yet.