HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Making as many unique strings as possible by removing two characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
uniqueremovingpossibletwomanycharactersmakingstrings

Problem

I'm attempting a programming challenge type task in C#, where the goal is to determine how many unique strings can be obtained by removing two characters. The prompt for the task implied that I should create the set of all possible strings with 2 chars removed, then return the number of items in the set. I was initially suspicious of this, as I've often found that the only way to complete these kinds of tasks is to abstract away from actually storing results or enumerating possibilities wherever possible - but due to the requirement that only unique strings should be counted, I don't know how to avoid storing information about every result so far. Supposedly I have to be able to handle strings of up to a million characters in length - and I can't think of any way right now to avoid the hideous iteration count and massive result set that a million character string would require.

Here is my code so far. It works, but its way too slow, and I think large inputs might be generating incorrect results, but I'm not actually sure:

private static void Main(String[] args)
{
    var input = Console.ReadLine();
    Console.WriteLine(FindBeautifulStrings(input).Count);
}

// B will always be larger than A because of the way we're iterating so we have to remove it first.
private static string RemoveTwo(string input, int indexA, int indexB)
{
    return input.Remove(indexB, 1).Remove(indexA, 1);
}

private static HashSet FindBeautifulStrings(string input)
{
    // Iterate over every character in the string, then for each character, iterate over every
    // other character, removing the two selected characters; return set of all possible results.
    int inputLength = input.Length;
    HashSet results = new HashSet();
    for (int i = 0; i < inputLength; ++i)
    {
        for (int j = i + 1; j < inputLength; ++j)
        {
            results.Add(RemoveTwo(input, i, j).GetHashCode());
        }
    }
    return results;
}


Storing hashes of strings instead of strings the

Solution


  • Instead of using string.Remove (creating 2 strings for each index combination), you could create an array of chars / ints and work on that array.



  • You don't have to iterate from i == 0 to i == str.Length and j == 0 to j == str.Length because that results in duplicated indexes (e.g. 1,2 and 2,1).



  • The idea with the hashes is good, but as mentioned in a comment, you have to check for all "uncertain duplicates" whether they are actually duplicates.



The following code shows a simple implementation that considers the points above:

public class Variation
{
    public Variation(int hash, int index1, int index2)
    {
        Hash = hash;
        Index1 = index1;
        Index2 = index2;
    }
    public int Index1 { get; }
    public int Index2 { get; }
    public int Hash { get; }
}

public int CountVariations(string input)
{
    int[] inputArray = input.ToCharArray().Select(c => (int)c).ToArray();
    var variations = new List();
    for (int i = 0; i  v.Hash).ToArray();
    
    var uncertainDuplicates = groups.Where(g => g.Skip(1).Any()).ToArray();
    
    var duplicatesRealCount = GetRealCount(inputArray, uncertain Dublicates);
    
    return groups.Length - uncertainDuplicates.Length + duplicatesRealCount;
}

private int GetRealCount(int[] inputArray, IEnumerable> duplicates)
{
    // todo: check if the duplicates are actually identical
    return duplicates.Count();
}

private static Variation GetVariation(int[] inputArray, int index1, int index2)
{
    var hashValue = Enumerable
        .Range(0, inputArray.Length)
        .Where(i => i != index1 && i != index2)
        .Select(i => inputArray[i])
        .Aggregate((hash, val) => hash ^ val);
        
    return new Variation(hashValue, index1, index2);
}

Code Snippets

public class Variation
{
    public Variation(int hash, int index1, int index2)
    {
        Hash = hash;
        Index1 = index1;
        Index2 = index2;
    }
    public int Index1 { get; }
    public int Index2 { get; }
    public int Hash { get; }
}

public int CountVariations(string input)
{
    int[] inputArray = input.ToCharArray().Select(c => (int)c).ToArray();
    var variations = new List<Variation>();
    for (int i = 0; i < input.Length; i++)
        for (int j = i + 1; j < input.Length; j++)
            variations.Add(GetVariation(inputArray, i, j));
    
    var groups = variations.GroupBy(v => v.Hash).ToArray();
    
    var uncertainDuplicates = groups.Where(g => g.Skip(1).Any()).ToArray();
    
    var duplicatesRealCount = GetRealCount(inputArray, uncertain Dublicates);
    
    return groups.Length - uncertainDuplicates.Length + duplicatesRealCount;
}

private int GetRealCount(int[] inputArray, IEnumerable<IGrouping<int, Variation>> duplicates)
{
    // todo: check if the duplicates are actually identical
    return duplicates.Count();
}

private static Variation GetVariation(int[] inputArray, int index1, int index2)
{
    var hashValue = Enumerable
        .Range(0, inputArray.Length)
        .Where(i => i != index1 && i != index2)
        .Select(i => inputArray[i])
        .Aggregate((hash, val) => hash ^ val);
        
    return new Variation(hashValue, index1, index2);
}

Context

StackExchange Code Review Q#131838, answer score: 4

Revisions (0)

No revisions yet.