HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Hackerrank: Prefix neighbors

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
hackerrankprefixneighbors

Problem

Problem statement


Mark has a dictionary, S, containing n distinct strings. He defines the benefit value of a string
as the sum of the ASCII values of its characters.


Mark calls some string A and some string B prefix neighbors if both of the following conditions are satisfied:



-
String A is a prefix of string B.

-
No other string C exists in the dictionary that is both smaller in length than string B and has string A as a prefix.



For example, in S = {"AA", "AB", "ABD", "ABDE"}, the strings AB and ABD are prefix neighbors because AB is a prefix of ABD and no other string of length

  • The sum of the benefit value of the selected strings is maximal.


Given
S, help Mark by finding and printing the maximum benefit value he can obtain from a subset of non-prefix-neighbors in S.



Input Format


The first line contains an integer denoting
n (the number of strings in the dictionary).


The second line contains
n space-separated strings.


Constraints


1 < n <= 4 * \$10^5\$


1 <= \$S_i\$'s length <= 11


Each string contains only uppercase letters.


Output Format


Print the maximum benefit value that Mark can obtain from a subset of non-prefix-neighbors in
S.


Sample Input 0


3


A B AE


Sample Output 0


200


Explanation 0


{"A", "B", "AE"}


Strings
A and AE are prefix neighbors, so they cannot both be in Mark's subset of S. String B has no prefix neighbor, so we include it in Mark's subset.


To maximize the benefit value, we choose
AE and B for our subset. We then calculate the following benefit values for the chosen subset:


Benefit value of
AE = 65 + 69 = 134


Benefit value of
B = 66


We then calculate and print the total benefit value of our subset, which is
134 + 66 = 200`.

My introduction of the algorithm

This is the algorithm to apply trie knowledge and also use maximum w

Solution

Indentation

Consistent indentation helps readability. The indentation of this code seems to mix tabstops of 1, 2, 3, and 4 spaces.

Prefer interfaces to implementations

public Dictionary Children { get; set; }


Is there any good reason why this should not be an IDictionary?

Try to expose an API rather than the entire implementation

public Dictionary Children { get; set; }
public bool   IsInDictionary    { get; set; }
public string LastVisitedWord   { get; set; }


Looking past the syntactic sugar, that's six public methods (three getters and three setters). How many of them should really be public? I think that at minimum the setters should all be private, and I see no good reason to expose LastVisitedWord at all or to allow an external class to get Children and then mutate it. I would prefer to publicly expose just the equivalent (see below...) of

public bool IsInDictionary { get; }
public IEnumerable> Children { get; }


public Tuple AddWordToTrieByOneCharATime(
    int    scanIndex, 
    char[] charArray, 
    string word, 
    string neighbour)


exposes far more than is necessary. If you make that private then a public method AddWord(string) can call it with the appropriate initial values.

Names

public bool   IsInDictionary    { get; set; }
public string LastVisitedWord   { get; set; }


The first of these names makes sense in the right context, but when you're reading the code you see it in the context of Dictionary Children { get; set; }. How about IsWord?

The second makes sense only in the middle of a call to AddWordToTrieByOneCharATime. I think it should be LongestPrefix or something similar.

Separation of responsibilities

If the trie is intended for general use, I think that the way finding the prefix is integrated into it violates the principle of separation of responsibilities. If it's intended to be single-purpose and separation of responsibilities is intentionally sacrificed for better performance, it doesn't go far enough: the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix.

I think that you should refactor to separate trie building from prefix graph building for one simple reason: the documentation states that the complexity is O(N*max_length_of_string), but that's not true because the merging of responsibilities in trie building means that the strings have to be sorted before insertion, giving Theta(N lg N).

KISS

var groupped = dict.GroupBy(x => x[0]);

// maximum 26 groups, 'A','B', ..., 'Z'
foreach (var group in groupped)   
{
    // sort by string's length in ascending order
    var sortedStrings = group.OrderBy(x => x.Length);

    var trie = new Trie();


Why? What's wrong with inserting them all into one trie?

Code Snippets

public Dictionary<char, Trie> Children { get; set; }
public Dictionary<char, Trie> Children { get; set; }
public bool   IsInDictionary    { get; set; }
public string LastVisitedWord   { get; set; }
public bool IsInDictionary { get; }
public IEnumerable<KeyValuePair<char, Trie>> Children { get; }
public Tuple<string, string> AddWordToTrieByOneCharATime(
    int    scanIndex, 
    char[] charArray, 
    string word, 
    string neighbour)
public bool   IsInDictionary    { get; set; }
public string LastVisitedWord   { get; set; }

Context

StackExchange Code Review Q#155307, answer score: 5

Revisions (0)

No revisions yet.