Recent Entries 10
- pattern minor 112d agoMerging Tuples of IntervalsSuppose I have a list of tuples. Each tuple contains 2 intervals. The intervals in each tuple have nothing to do with each other. I would like to find a smaller list of tupels that covers all elements in each interval. Therefore it´s necessary to merge the intervals of the first type and the intervals of the second type separately. It is important to know that the intervals of a tuple have a relationship. Thus, a tuple cannot be merged with another tuple unless both intervals can each be merged separately with the interval belonging to them. An example: Let valid values for the first elements of the tuples are number intervals. Let valid values for the second elements of the tuples are character intervals. A valid optimization of `[{[1-3], [a-d]}, {[4-7],[e-f]}]` would be `[{[1-7], [a-f]}`. But it is not possible to simplify tuples like `[{[1-3], [a-d]}, {[7-9], [e-k]}]` because between `1-3` and `7-9` there are still the numbers `{4, 5, 6}`. I am looking for efficient algorithms to simplify this kind of list of tuples that contains intervals. A simple solution would be to simply check all combinations. But how can large datasets be optimized? Checking all combinations would cost to many ressources. So my questions are: - Are there well-known (efficient) algorithms for such tasks, such as approximation algorithms? I'm looking for a kind of merge intervals algorithm, but for tupels. It should optimize both attributes so that one gets fewer elements in the tuples list. - Is this a well-known problem? (Is there a reduction from some NP-complete problem?) I guess my problem is somehow related to the set cover problem? Am I right? Thanks for any help!
- pattern moderate 112d agoWhat is the name of this type of program optimization where two loops operating over common data are combined into a single loop?On an imperative programming language, let us consider the following program: ``` for i in 0..N { // N is the length of the arrays A, B, C. A[i] = A[i] + B[i]; } for i in 0..N { A[i] = A[i] + C[i]; } ``` This program just sums three arrays $A + B + C$ component-wisely and store it to $A$. We can easily transform this program into the following equivalent one: ``` for i in 0..N { let tmp = A[i] + B[i]; A[i] = tmp + C[i]; } ``` I think the latter code is more efficient than the former because we can decrease the number of memory accesses. Now I have a question. What is the name of this type of program transform or program optimization? Can we also call this deforestation?
- pattern minor 112d agoDynamic Program to solve an NP-complete partitioning problemI have this problem for which I am struggling to find an efficient dynamic programming algorithm. Would be thankful for some help!! Let $A = \{ a_1, a_2, ..., a_n \}$ be a set where $a_i \in \mathbb{N}$ for $i=1,...,n$. The goal is to determine whether there exist two disjoint subsets $M,N \subset A$ such that the sum of all elements in $M$ is equal to exactly $\textit{twice}$ the sum of all elements in $N,$ and $M \not = \emptyset$ and $N \not = \emptyset.$
- pattern minor 112d agoAlgorithm for detecting overlapsThis is a real-world application, not a student assignment. Suppose a list of events of that have `startTime` and `endTime`, and some overlap information. In pseudocode: ``` class Event { Time startTime; Time endTime; bool overlapsStart; bool overlapsEnd; } ``` Events are considered overlaping if and only if their times intersect, but it's not considered overlap if some event just "touches" another (starts exactly when some other finishes). There are two types of event: `NormalEvent` and `EmptyEvent`. What I want? For each event in the list, I want to: 1) Remove all `EmptyEvent`s that overlap another event of any type. In special, if some `EmptyEvent` overlaps another `EmptyEvent`, only one needs to be removed, so that the overlap ceases. 2) The `NormalEvent`s are first ordered by `startTime`, and then their booleans are set: - `overlapsStart` is true if the event overlaps some event that comes before it. - `overlapsEnd` is true if the event overlaps some event that comes after it. A trivial example: ``` event1 = 6:00AM ➜ 8:00AM ``` and ``` event2 = 7:00AM ➜ 9:00AM ``` then: ``` event1.overlapsStart == false event1.overlapsEnd == true event2.overlapsStart == true event2.overlapsEnd == false ``` Another example: Now, this is what happens if some event "contains" another: ``` event1 = 6:00AM ➜ 9:00AM ``` and ``` event2 = 7:00AM ➜ 8:00AM ``` then, first event1 is put before event2, since it starts before. Then: ``` event1.overlapsStart == false event1.overlapsEnd == true event2.overlapsStart == true event2.overlapsEnd == false ``` Naive implementation: I could analyze each event in turn, one by one, looking at all other events. That's easy, however this is too slow for a large number of events. My question: What's an efficient way of solving this?
- pattern minor 112d agoCalculating all products of $n-1$ factors when given $n$ factorsLet's assume we have an operator $$ \times: E^2\to E$$ of which we merely know that it is associative. Let's say a multiplication $e\times f$ always takes up a time of $M$ for all $e, f\in E$. We're now given $n$ elements $e_1,...,e_n\in E$, and are tasked to calculate all $n$ products $$\def\bigtimes{\mathop{\vcenter{\huge\times}}} p_j :=\bigtimes_{i=1\\i\neq j}^n e_i$$ Naively multiplying out all $n$ products takes $O(n^2M)$. A more sophisticated approach that runs in $O(n^{3/2}M)$ splits $\bigtimes_{i=1}^n e_i$ at arbitrary positions into $\sqrt n$ factors. For each of the $n$ products, we now only have to recalculate one factor, and multiply the resulting factor, and the other $k-1$ factors together. What is the fastest algorithm for this problem, and how does it look like?
- pattern minor 112d agoEfficient data structure handling insert(number) and find(sum) returning pair a,b such that a + b = sumThere are two operations as follows: `insert(num)`: insert num into the data structure. `find(sum)`: return a pair(a, b) such that `a + b = sum`, if no such pair exists return -1 How such data structure could be designed, possibly with `find` $\in o(N)$ and `insert` $\in o(\log N)$ operations?
- pattern minor 112d agoEfficient algorithm to determine if a lambda calculus term is equivalent to one without a given free variableConsider the following problem: given a lambda calculus term $t$ and free variable $v$ determine whether $\phi(t,v)$, where $\phi(t,v) := \exists t'. t' \equiv t \land v \notin FV(t')$. This problem is undecidable. To prove this, assume the contrary. We define a term $s$ as follows: $$s x \equiv \lambda n.j. j y$$ if $x$ is an encoding of the configuration of a Turing machine and $y$ is the encoding of the configuration the Turing machine transitions into in one step and $$s x \equiv \lambda n.j. n$$ if $x$ is an encoding of the configuration of a Turing machine that has halted. Now define the term $t_M$ as follows: $$t_M := Y (\lambda f.x. s x (\lambda a.b. b) f) i_M$$ where $i_M$ is the encoding of the initial configuration of $M$, and $Y$ is the fixed point combinator. If $M$ halts, $t_mv \equiv (\lambda b. b)$. Otherwise, $t_mv$ does not have a normal form, and every term equivalent to it has $v$ in it. Therefore, $\phi(t, v)$ iff $M$ halts. This algorithm can be used to solve the halting problem, which is a contradiction. On the other hand, it is semidecidable. For any $t$ and $v$, simply enumerate the $t'$ such that $t' \equiv t$. If a $t'$ is enumerated such that $v \notin FV(t')$, output yes and halt. This algorithm outputs yes iff $\phi(t, v)$. That said, this algorithm is not very efficient, since you have to use an enumeration that enumerates all $t'$ such that $t' \equiv t$. Is there an efficient algorithm to semidecide whether $\phi(t,v)$ is true? EDIT: We'll say an algorithm for determining $\phi(t,v)$ is efficient if, when $\phi(t,v)$ is true, it halts in an amount of time that is a polynomial of the minimum number of reduction and conversion steps required to eliminate $v$ from $t$. An approach one might consider is to normalize $t$, and then check if $v$ is free in the result. Although this would be efficient, it would not be correct. For example, this algorithm would not halt on the input $(\Omega ((\lambda a.b. a) \Omega v), v)$ (wher
- pattern minor 112d agoLongest substrings of common length with the same parityGiven two sequences $a$ and $b$, find largest $x$ such that in $a$ there is substring $A$ and substring $B$ in $b$ meeting those conditions: - length of both $A$ and $B$ is equal to $x$; - sum of elements in $A$ has the same parity as sum of elements in $B$. Lengths of $a$ and $b$ are up to $5\times10^5$, so simple $O(n^2)$ solution won't do. Example: $a = [0, 1, 2, 3, 4, 5]$ $b = [3, 1, 3, 6]$ Answer: 3 (one of the possible solutions is $A = [2, 3, 4], B = [3, 1, 3]$). I've thought about it for hours and can't find a solution. How to do this in linear or linearithmic time complexity? I'm quite sure the problem can be simplified to for index $i$, storing only sum of elements up to $i$ modulo $2$. However, it doesn't help me much. (The problem comes from a rather-old Israeli book תכנות תחרותי: סביב אתגרים ('Competetitive programming') by Mordechai Ben-Ari. The book isn't well known and I couldn't find any solution in Hebrew, so I translated the problem into English for a better chance of getting answer.) Any help will be appreciated.
- pattern minor 112d agoProblems that feel exponential but are PI'm trying to build a list of algorithms/problems that are "exceptionally useful", as in, solving problems that 'seem' very exponential in nature, but have some particularly clever algorithm that ultimately solves them. Examples of what I mean: - Linear Programming (The simplex algorithm is exponential time; it took a long time to find a polynomial time solution!) - More generally, Semidefinite Programming - Primality testing - 2-SAT and HORNSAT - Computing determinants (if this doesn't sound difficult, consider the permanent) - Finding perfect matchings - A variety of hard group theoretic problems that can be accomplished using the Classification of Finite Simple Groups - A variety of hard graph problems that can be accomplished using complicated Forbidden Minor characterizations (embeddability on an arbitrary surface; bounding of treewidth and branchwidth; Delta-Wye reducible graphs) - Computing exponentials in a bounded group (i.e. computing $a^b \mod k$ in $\log b$ steps, as accomplished by repeated squaring) - Computations relying on the LLL algorithm. (As a special case: Euclidean algorithm. As a more general case: PSLQ or HJLS algorithms.) - Constraint problems without Taylor terms (?). I admit I don't fully understand this, but it sounds like it probably subsumes the 2-SAT / HORNSAT cases above, and any linear algebra over finite fields. See here for a longer post - Problems computable via holographic reductions. As a honorable mention, I would also mention Graph Isomorphism, because it's still awfully easy ($n^{\log^2 n}$), and equivalent to so many other isomorphism problems: - Digraphs/multigraphs/hypergraphs (a sort of 'harder' problem) - Finite automata/CFGs Obviously there's a range of difficulty in these, but all leave at least some people with some sense of 'surprise' in the sense that the problem can sound difficult but turn out tractable. LP might sound relatively straightforward, but took people quite a while to build an actual sol
- gotcha minor 112d agoWhy does parallelising slow down this simple problem against looping through all the data?I've been using multiprocessing and parallelisation for the first time this week on a very large data set using 32 CPUs. I decided to explore it for a smaller task just to see if I could learn anything, just on the 4 CPUs of my Mac. I created a task to add 100 to every element in a 500,000 element list. To my surprise, I noticed that batching this data and using Python's parallelising tools to implement this actually slowed it down hugely, compared to just looping through the 500,000 elements and adding 1. I'd like to understand why. Consider the two methods for doing this task below: ``` import numpy as np from sqlitedict import SqliteDict from multiprocessing import Pool, cpu_count from gensim.corpora.wikicorpus import init_to_ignore_interrupt from itertools import zip_longest import timeit as t def grouper(iterable, n, fillvalue=None): args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue) class Add100ToData(): def __init__(self): self.data = [np.random.randint(0, 100) for _ in range(500000)] def add100(self): for i in range(len(self.data)): self.data[i] = self.data[i] + 100 return self.data class Add100ToDataMultiprocess(): def __init__(self): self.data = [np.random.randint(0, 100) for _ in range(500000)] def process_batch(self, batch): new_data = [] for i in batch: new_data.append(i + 100) return new_data def add100(self, batch_size): processes = cpu_count() pool = Pool(processes, init_to_ignore_interrupt) gr = grouper(self.data, batch_size) for batch_result in pool.imap(self.process_batch, gr): count = 0 for i in batch_result: count += 1 self.data[count] = i return self.data if __name__ == "__main__": add1 = Add100ToData() start = t.default_timer() final1 = add1.add100() end = t.default_timer() print("Loop