patterncsharpMinor
compression condition permutation
Viewed 0 times
compressionconditionpermutation
Problem
I have 16 permutations each with 104 variables using the standard alphabet. The 104 variable permutation is selected based off of compression complexity.
This is one example of the pattern I am able to generate it has a low compression ratio of 1.3 with overhead:
HHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFH
This is a goal pattern I would like to generate it has a better compression ratio of 1.5
mlcllltlgvalvcgvpamdipqtkqdlelpklagtwhsmamatnnislmatlkaplrvhitsllptpednleivlhrwennscvekkvlgektenpkkfkinytvaneatlldtdydnflflclqdtttpiqsmmcqylarvlveddeimqgfirafrplprhlwylldlkqmeepcrf
This is one set of permutation code generation:
```
static double GetCompressionRatio(string input)
{
if (string.IsNullOrEmpty(input))
throw new ArgumentNullException();
MemoryStream ms = new MemoryStream();
GZipStream gzip2 = new GZipStream(ms, CompressionMode.Compress, true);
byte[] raw = Encoding.UTF8.GetBytes(input);
gzip2.Write(raw, 0, raw.Length);
gzip2.Close();
byte[] zipped = ms.ToArray(); // as a BLOB
int startsize = raw.Length;
double percent = Convert.ToDouble(zipped.Length) / Convert.ToDouble(startsize);
return percent;
}
var query =
from sp1 in polar
...
//22 lines
...
from vp15 in polar
where GetCompressionRatio(sp1+...+vp15)>1.5
select sp1+...+vp15;
foreach (var element in query)
{
//output
}
string[] polar = new string[6];
polar[0]="H";
polar[1]="Q";
polar[2]="N";
polar[3]="K";
polar[4]="D";
polar[5]="E";
string[] nonpolar = new string[5];
nonpolar[0]="F";
nonpolar[1]="L";
nonpolar[2]="I";
nonpolar[3]="M";
nonpolar[4]="V";
string[] all = new string[24];
all[0]="A";
all[1]="B";
This is one example of the pattern I am able to generate it has a low compression ratio of 1.3 with overhead:
HHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFH
This is a goal pattern I would like to generate it has a better compression ratio of 1.5
mlcllltlgvalvcgvpamdipqtkqdlelpklagtwhsmamatnnislmatlkaplrvhitsllptpednleivlhrwennscvekkvlgektenpkkfkinytvaneatlldtdydnflflclqdtttpiqsmmcqylarvlveddeimqgfirafrplprhlwylldlkqmeepcrf
This is one set of permutation code generation:
```
static double GetCompressionRatio(string input)
{
if (string.IsNullOrEmpty(input))
throw new ArgumentNullException();
MemoryStream ms = new MemoryStream();
GZipStream gzip2 = new GZipStream(ms, CompressionMode.Compress, true);
byte[] raw = Encoding.UTF8.GetBytes(input);
gzip2.Write(raw, 0, raw.Length);
gzip2.Close();
byte[] zipped = ms.ToArray(); // as a BLOB
int startsize = raw.Length;
double percent = Convert.ToDouble(zipped.Length) / Convert.ToDouble(startsize);
return percent;
}
var query =
from sp1 in polar
...
//22 lines
...
from vp15 in polar
where GetCompressionRatio(sp1+...+vp15)>1.5
select sp1+...+vp15;
foreach (var element in query)
{
//output
}
string[] polar = new string[6];
polar[0]="H";
polar[1]="Q";
polar[2]="N";
polar[3]="K";
polar[4]="D";
polar[5]="E";
string[] nonpolar = new string[5];
nonpolar[0]="F";
nonpolar[1]="L";
nonpolar[2]="I";
nonpolar[3]="M";
nonpolar[4]="V";
string[] all = new string[24];
all[0]="A";
all[1]="B";
Solution
Ok.
Lots of stuff going on here and I'm writing this as I work your example so bear with me...
In you code, polar contains six letters. And you don't care about letter repetition in your top examples so what you are calculating here are not permutations but variations. Secondly, you're example shows 22 ommitted lines so you have 24
That's a very large number. Even if you could paralelize this across 1000 servers, they would still need to be calculating 150million compressions ratios per second for this to complete in a year. (((6^24)/1000)/365)/86400) = ~150,253,000
As soon as you call
But again... lets just assume that you have enough power & bandwidth & compute resource (GPUs are shiny) And lets bring it back to more manageable numbers. lets says 6^10 =
Strings are immutable in .NET. Everytime you alter a string, you throw away the old one, and make a new one and the garbage collector has to come along and clean up after you. so this part here will be very expensive where you concat all those strings together, twice.
at the very least you would want to change this to
Using LINQ like that to generate your variations is not optimal. As I suggested in the comments... use a library. The following code snippet.
RESULTS
On my machine this is repeatedly capable of performing 100,000 compressions on a 10 character string in 2200-2300ms. With the following modifications, that comes down to 1800-1900ms. Small improvement that's possibly related to what else my machine was doing but still.
Code
Lots of stuff going on here and I'm writing this as I work your example so bear with me...
In you code, polar contains six letters. And you don't care about letter repetition in your top examples so what you are calculating here are not permutations but variations. Secondly, you're example shows 22 ommitted lines so you have 24
from statements in your LINQ query... that's going to generate 6^24 variations (4738381338321616896)That's a very large number. Even if you could paralelize this across 1000 servers, they would still need to be calculating 150million compressions ratios per second for this to complete in a year. (((6^24)/1000)/365)/86400) = ~150,253,000
As soon as you call
foreach (var element in query) that query is going to execute and it's going to try and create a list of 4.7 Quintillion strings in memory. This is BAD. You're attempting a brute force calculation on an impossibly large problem set so you need to go back and rethink your approach to whatever it is you are trying to solve/accomplishBut again... lets just assume that you have enough power & bandwidth & compute resource (GPUs are shiny) And lets bring it back to more manageable numbers. lets says 6^10 =
- Strings in General
Strings are immutable in .NET. Everytime you alter a string, you throw away the old one, and make a new one and the garbage collector has to come along and clean up after you. so this part here will be very expensive where you concat all those strings together, twice.
where GetCompressionRatio(sp1+...+vp15)>1.5
select sp1+...+vp15;at the very least you would want to change this to
let concatval = sp1+...+vp15
where GetCompressionRation(concatVal) > 1.5
select concatVal- Big Numbers are Big.
Using LINQ like that to generate your variations is not optimal. As I suggested in the comments... use a library. The following code snippet.
var polar = new[] { "H", "Q", "N", "K", "D", "E" };
var sw = new Stopwatch();
sw.Start();
var results1 = from a1 in polar from a2 in polar
from a3 in polar from a4 in polar
from a5 in polar from a6 in polar
from a7 in polar from a8 in polar
from a9 in polar from a10 in polar
select a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10;
var l1 = results1.Count();
sw.Stop();
Console.WriteLine("{0} variations in {1}ms", l1, sw.ElapsedMilliseconds);
sw = new Stopwatch();
sw.Start();
var results2 = new Variations(polar, 10, GenerateOption.WithRepetition);
var l2 = results2.Count();
sw.Stop();
Console.WriteLine("{0} variations in {1}ms", l2, sw.ElapsedMilliseconds);
Console.ReadLine();RESULTS
60466176 variations in 54161ms
60466176 variations in 4624ms- The Compression Method.
On my machine this is repeatedly capable of performing 100,000 compressions on a 10 character string in 2200-2300ms. With the following modifications, that comes down to 1800-1900ms. Small improvement that's possibly related to what else my machine was doing but still.
MemoryStream&GZipStreamareIDisposable& should be wrapped inusing()statements
- removed unnecessary int assignments
- removed unnecessary if(check) at top... you know this to be impossible based on your input set.
Code
static float GetCompressionRatio2(string input)
{
using(var ms = new MemoryStream())
using (var gzip2 = new GZipStream(ms, CompressionMode.Compress, true))
{
var raw = Encoding.UTF8.GetBytes(input);
gzip2.Write(raw, 0, raw.Length);
var percent = ((float)ms.ToArray().Length) / raw.Length;
return percent;
}
}Code Snippets
where GetCompressionRatio(sp1+...+vp15)>1.5
select sp1+...+vp15;let concatval = sp1+...+vp15
where GetCompressionRation(concatVal) > 1.5
select concatValvar polar = new[] { "H", "Q", "N", "K", "D", "E" };
var sw = new Stopwatch();
sw.Start();
var results1 = from a1 in polar from a2 in polar
from a3 in polar from a4 in polar
from a5 in polar from a6 in polar
from a7 in polar from a8 in polar
from a9 in polar from a10 in polar
select a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10;
var l1 = results1.Count();
sw.Stop();
Console.WriteLine("{0} variations in {1}ms", l1, sw.ElapsedMilliseconds);
sw = new Stopwatch();
sw.Start();
var results2 = new Variations<string>(polar, 10, GenerateOption.WithRepetition);
var l2 = results2.Count();
sw.Stop();
Console.WriteLine("{0} variations in {1}ms", l2, sw.ElapsedMilliseconds);
Console.ReadLine();60466176 variations in 54161ms
60466176 variations in 4624msstatic float GetCompressionRatio2(string input)
{
using(var ms = new MemoryStream())
using (var gzip2 = new GZipStream(ms, CompressionMode.Compress, true))
{
var raw = Encoding.UTF8.GetBytes(input);
gzip2.Write(raw, 0, raw.Length);
var percent = ((float)ms.ToArray().Length) / raw.Length;
return percent;
}
}Context
StackExchange Code Review Q#15053, answer score: 4
Revisions (0)
No revisions yet.