HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Compression type that can be searched

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
cancompressionsearchedtypethat

Problem

Is there ANY compression type that can compress a file, and then that compressed file can be searched without uncompressing the file?

Solution

Compressed self-indexes such as the FM Index allow arbitrary substring searches in near entropy-compressed space. These are essentially compressed suffix arrays or suffix trees, which have a lot of literature.

Basic substring search can be o(k) or o(k log n) in time for length k, depending on what data structures are chosen (different types of rank/select data structures). There are a range of issues that arise depending on whether one wants simple boolean containment predicates, the offset of each occurrence, or more complicated suffix tree operations; the former can be done in less space and time than the latter.

There's also an entire book about searching and selective decompression of strings: "Compressed Data Structures for Strings: On Searching and Extracting Strings" by Rossano Venturini, published 2014 Springer Science & Business Media.

Context

StackExchange Computer Science Q#45218, answer score: 12

Revisions (0)

No revisions yet.