patternMinor
Finding number of times each string in an Array of strings occurs in a large stream of characters (i.e large string)
Viewed 0 times
streamnumbereacharraylargefindingcharacterstimesoccursstrings
Problem
Interview problem:
Given a stream of characters (e.g. acacabcatghhellomvnsdb) and a list of words (e.g. ["aca","cat","hello","world"] ) find and display count of each and every word once the stream ends.(Like : "aca" : 2 , "cat" : 1 , "hello" : 1 , "world" : 0 ).
This solution works functionally. Please let me know if there is a faster more optimized algorithm or if there are some corner scenarios I did not take care of.
Given a stream of characters (e.g. acacabcatghhellomvnsdb) and a list of words (e.g. ["aca","cat","hello","world"] ) find and display count of each and every word once the stream ends.(Like : "aca" : 2 , "cat" : 1 , "hello" : 1 , "world" : 0 ).
This solution works functionally. Please let me know if there is a faster more optimized algorithm or if there are some corner scenarios I did not take care of.
#define VALUE_KEY 1
#define COUNT_KEY 2
NSMutableArray *stringsToSearchFor = [@[[@{@VALUE_KEY: @"cat", @COUNT_KEY: @0} mutableCopy],[@{@VALUE_KEY: @"aca", @COUNT_KEY: @0} mutableCopy], [@{@VALUE_KEY: @"hello", @COUNT_KEY: @0} mutableCopy],[@{@VALUE_KEY: @"world", @COUNT_KEY: @0} mutableCopy]] mutableCopy];
NSString *stringStream = @"acacabcatghhellomvnsdb" ;
for(NSMutableDictionary *nextDict in stringsToSearchFor)
{
NSUInteger currRangeStart = 0 ;
NSUInteger currRangeLength = [nextDict[@VALUE_KEY] length] ;
NSUInteger currentStringNumFound = 0 ;
while(currRangeStart <= [stringStream length]-currRangeLength)
{
NSRange foundRange = [stringStream rangeOfString:nextDict[@VALUE_KEY]
options:0
range:NSMakeRange(currRangeStart, [stringStream length]-currRangeStart)] ;
if(foundRange.location != NSNotFound)
{
currRangeStart = foundRange.location+1 ;
currentStringNumFound++;
}
else
{
break ;
}
}
if(currentStringNumFound)
{
nextDict[@COUNT_KEY] = @(currentStringNumFound) ;
}
}
NSLog(@"Num occurrences dict is %@",stringsToSearchFor) ;Solution
Given that you stated this was for an interview, I'm going to focus not on how to make this code as fast as possible, but instead, the types of things I'd be looking for in an interview candidate.
First, and most concerning, there is not function or method definition here. Presumably, you just typed this code straight into the
Upon hearing the acceptance criteria, I'd immediately expect a function signature to be written out:
You're already getting bonus points from me if you type this out.
I'd also be okay if you wrote this as an extension method on
Second problem (get ready to notice a theme here), you haven't broken your problem down into logical sub-problems. How can we count the occurrences of an array of potential substrings if we can't yet even count the occurrences in a single potential substring?
The body of your
Why should it be this simple? Because the
For the problem of finding the count of a substring in a string, I'll just point you toward this Stack Overflow question which has multiple good answers.
As for actually writing the unit tests for either of these functions, I wouldn't necessarily expect it during an interview, but I'd be delighted to hear the candidate ask about it. Whether or not we did it would depend on whether the candidate asked about it, and completed the first part relatively quickly. But the combination of writing easily testable functions and asking me about testing itself would be enough information about the candidate's attitude toward testing (which I find important).
You notice how I still haven't really addressed any performance issues? As an interviewer, I simply have never cared. When candidates ask me if I want to fastest possible solution to a programming challenge, I always respond with something like this:
I want the solution that works with the least amount of your time invested. After we have a working solution, we can talk about optimizations.
We'll usually have specific criteria on the minimum acceptable performance of our code. That criteria is never phrased as "as fast as possible". It comes with a benchmark. It must perform in less than some measurable amount of time. It's either below that and good enough or above that and not good enough. For almost all things you do, there is never the requirement that a solution work as fast as possible. (There is usually the requirement that it not be slow, and slow is a concrete value.)
So, let's talk about style, and the readability of your solution.
These
I'd prefer
This is an absolute mess on so many levels.
First of all, a little bit of whitespace breaks cleans this up a lot.
```
NSMutableArray *stringsToSearchFor = [@
First, and most concerning, there is not function or method definition here. Presumably, you just typed this code straight into the
main function. That'd be the first worrying thing for me if I'm in this interview.Upon hearing the acceptance criteria, I'd immediately expect a function signature to be written out:
NSDictionary * _Nonnull findStringsInString(
NSArray * _Nonnull,
NSString * _Nonnull
);You're already getting bonus points from me if you type this out.
- You've demonstrated that you have some understanding of what is expected. At a minimum, you're completely clear on what sort of inputs and outputs make sense. And the function name will give some indication that you know what's going on.
- You've demonstrated above knowledge in the form of a function. The next step of the interview could be writing some unit tests around the function. Even if we don't get that far, even if you're not familiar with unit testing, if you can write your code in a testable manner, that's a plus.
- You've taken into account Swift annotations so this method will show up nicely in Swift. Even if it's not used from Swift, these annotations help self-document the function and lessen the need for comments.
I'd also be okay if you wrote this as an extension method on
NSString.- (NSDictionary * _Nonnull)occurencesOfStrings:(NSArray * _Nonnull)strings;Second problem (get ready to notice a theme here), you haven't broken your problem down into logical sub-problems. How can we count the occurrences of an array of potential substrings if we can't yet even count the occurrences in a single potential substring?
The body of your
findStringsInString function needs to really look simply like this:NSDictionary * _Nonnull findStringsInString(
NSArray * _Nonnull substrings,
NSString * _Nonnull string
) {
NSDictionary * results = [NSMutableDictionary dictionary];
for (NSString * substring in substrings) {
results[substring] = countOccurrencesOfSubstringInString(substring, string);
}
return results;
}Why should it be this simple? Because the
countOccurrencesOfSubstringInString also needs to be a very simple function that is easy to write tests around. Finding the count for multiple strings is just a simple expansion over the problem of finding the count for a single string.For the problem of finding the count of a substring in a string, I'll just point you toward this Stack Overflow question which has multiple good answers.
As for actually writing the unit tests for either of these functions, I wouldn't necessarily expect it during an interview, but I'd be delighted to hear the candidate ask about it. Whether or not we did it would depend on whether the candidate asked about it, and completed the first part relatively quickly. But the combination of writing easily testable functions and asking me about testing itself would be enough information about the candidate's attitude toward testing (which I find important).
You notice how I still haven't really addressed any performance issues? As an interviewer, I simply have never cared. When candidates ask me if I want to fastest possible solution to a programming challenge, I always respond with something like this:
I want the solution that works with the least amount of your time invested. After we have a working solution, we can talk about optimizations.
We'll usually have specific criteria on the minimum acceptable performance of our code. That criteria is never phrased as "as fast as possible". It comes with a benchmark. It must perform in less than some measurable amount of time. It's either below that and good enough or above that and not good enough. For almost all things you do, there is never the requirement that a solution work as fast as possible. (There is usually the requirement that it not be slow, and slow is a concrete value.)
So, let's talk about style, and the readability of your solution.
#define VALUE_KEY 1
#define COUNT_KEY 2These
#defines are weird and unnecessary. As you'll read later, a better data structure makes these completely obsolete, but if we must use them, I much prefer solutions that aren't preprocessor find & replace.I'd prefer
NSNumber * const kKeyValue = @1;
NSNumber * const kKeyCount = @2;NSMutableArray *stringsToSearchFor = [@[[@{@VALUE_KEY: @"cat", @COUNT_KEY: @0} mutableCopy],[@{@VALUE_KEY: @"aca", @COUNT_KEY: @0} mutableCopy], [@{@VALUE_KEY: @"hello", @COUNT_KEY: @0} mutableCopy],[@{@VALUE_KEY: @"world", @COUNT_KEY: @0} mutableCopy]] mutableCopy];This is an absolute mess on so many levels.
First of all, a little bit of whitespace breaks cleans this up a lot.
```
NSMutableArray *stringsToSearchFor = [@
Code Snippets
NSDictionary<NSString *, NSNumber *> * _Nonnull findStringsInString(
NSArray<NSString *> * _Nonnull,
NSString * _Nonnull
);- (NSDictionary<NSString *, NSNumber *> * _Nonnull)occurencesOfStrings:(NSArray<NSString *> * _Nonnull)strings;NSDictionary<NSString *, NSNumber *> * _Nonnull findStringsInString(
NSArray<NSString *> * _Nonnull substrings,
NSString * _Nonnull string
) {
NSDictionary * results = [NSMutableDictionary dictionary];
for (NSString * substring in substrings) {
results[substring] = countOccurrencesOfSubstringInString(substring, string);
}
return results;
}#define VALUE_KEY 1
#define COUNT_KEY 2NSNumber * const kKeyValue = @1;
NSNumber * const kKeyCount = @2;Context
StackExchange Code Review Q#125920, answer score: 6
Revisions (0)
No revisions yet.