patternModerate
Compression type that can be searched
Viewed 0 times
cancompressionsearchedtypethat
Problem
Is there ANY compression type that can compress a file, and then that compressed file can be searched without uncompressing the file?
Solution
Compressed self-indexes such as the FM Index allow arbitrary substring searches in near entropy-compressed space. These are essentially compressed suffix arrays or suffix trees, which have a lot of literature.
Basic substring search can be o(k) or o(k log n) in time for length k, depending on what data structures are chosen (different types of rank/select data structures). There are a range of issues that arise depending on whether one wants simple boolean containment predicates, the offset of each occurrence, or more complicated suffix tree operations; the former can be done in less space and time than the latter.
There's also an entire book about searching and selective decompression of strings: "Compressed Data Structures for Strings: On Searching and Extracting Strings" by Rossano Venturini, published 2014 Springer Science & Business Media.
Basic substring search can be o(k) or o(k log n) in time for length k, depending on what data structures are chosen (different types of rank/select data structures). There are a range of issues that arise depending on whether one wants simple boolean containment predicates, the offset of each occurrence, or more complicated suffix tree operations; the former can be done in less space and time than the latter.
There's also an entire book about searching and selective decompression of strings: "Compressed Data Structures for Strings: On Searching and Extracting Strings" by Rossano Venturini, published 2014 Springer Science & Business Media.
Context
StackExchange Computer Science Q#45218, answer score: 12
Revisions (0)
No revisions yet.