HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Word Wrap Algorithm and Datastructure that supports modifications of text and window size

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
sizetextsupportsalgorithmworddatastructurethatmodificationsandwindow

Problem

I'm trying to find or create an algorithm for efficiently displaying text that is word wrapped. All the algorithms (if they an even be called that) for word wrapping presume a given lot of text and window size, and compute word wrapping in $O(n)$ time. That isn't realistic because text can be huge, users can modify text anywhere, and window sizes can change. It is quite a problem to expect $O(n)$ modification time whenever something changes. When a user modifies the text or resizes a window, I'd like to be able to update the display in something like logarithmic time.

Here is how I formalize the a slightly simplified version of the problem:

$A$ is an array of positive integers that of length $n$. It represents the lengths of words (and their trailing space). $W$ is a positive integer representing window size. For simplicity assume $W \ge \max(A)$. Each query of $I_W$ represents fetching the word at the start of a line. A modification of $A$ is like user modifying the text, and a modification of $W$ is like the window being resized.

I want to be able to repeatedly do queries and modifications of $A$. A modification is the insertion, deletion, or modification of an integer in $A$. A query is a request of $I_W[k] \in \mathbb N$, defined as

  • $I_W[0] = 0$



  • $I_W[k+1] = \text{ the biggest } z \text{ s.t. } \sum_{I_{k} \le j



To describe $I_W$ another way, suppose $A$ is $[5, 3, 4, 6, 5, 1, 1, 4, 4, \cdots]$ and $W = 9$. Then split $A$ into initial segments whose sums are $\le W$, so $[[5, 3], [4], [6], [5, 1, 1], [4, 4], \cdots]$, then $I_W(k)$ is the sum of the lengths of the first $k+1$ splits; that is, $I_W = [0, 2, 3, 4, 7, 9, \dots]$.

I'm looking for an algorithm and datastructure that would allow for linearithmic preprocessing on $A$ and something like logarithmic time modifications and queries. I'd also like to be able to handle changes in $W$ quickly but that can somewhat be handled with threads and software design so it isn't as important.

Th

Solution

I think you can handle single-word modifications in $O(d \log n)$ time, where $d$ (density) is the maximal number of words in a line, which is hopefully small for any reasonable $A$ and fixed $W$.

Let's take one of your sentences as an example. We want it to be wrapped with $W = 30$ characters like this:

I'm looking for an algorithm
and datastructure that would
allow for linearithmic
preprocessing on A and
something like logarithmic
time modifications and
queries.


First, let's find all possible lines. This can be done greedily in $O(n)$ using two pointers scanning through $A$. If one of these lines ends where the other begins, we connect them.

The path from node $1$ to the root of the tree represents $I_W$.

Now let's remove "an" from the first line, forcing all other lines to shift:

I'm looking for algorithm and
datastructure that would allow
for linearithmic preprocessing
on A and something like
logarithmic time modifications
and queries.


But here's one nice thing about the tree: any single-word change can only affect $O(d)$ edges. We find these edges by re-running the same pre-processing on a small window around the changed word. In our case, the changes are:

  • Edge 1-6 removed



  • Edge 4-8 removed



  • Edge 1-7 added



So, we need to be able to do these operations efficiently:

  • Add new edge into the tree



  • Remove existing edge from the tree



  • Find $k$-th ancestor of a given node.



All these operations can be done in $O(\log n)$ via Euler tour technique. I won't describe its implementation in detail, but feel free to ask a separate question about it.

Code Snippets

I'm looking for an algorithm
and datastructure that would
allow for linearithmic
preprocessing on A and
something like logarithmic
time modifications and
queries.
I'm looking for algorithm and
datastructure that would allow
for linearithmic preprocessing
on A and something like
logarithmic time modifications
and queries.

Context

StackExchange Computer Science Q#130828, answer score: 3

Revisions (0)

No revisions yet.