HiveBrain v1.2.0
Get Started
← Back to all entries
patternswiftMinor

Hex String to Bytes (NSData)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
hexnsdatastringbytes

Problem

I'm trying to convert this Objective-C code (originally found in this Stack Overflow question) which turns an NSString into NSData to Swift:

- (NSData *)dataFromHexString {
    const char *chars = [self UTF8String];
    int i = 0, len = self.length;

    NSMutableData *data = [NSMutableData dataWithCapacity:len / 2];
    char byteChars[3] = {'\0','\0','\0'};
    unsigned long wholeByte;

    while (i < len) {
        byteChars[0] = chars[i++];
        byteChars[1] = chars[i++];
        wholeByte = strtoul(byteChars, NULL, 16);
        [data appendBytes:&wholeByte length:1];
    }

    return data;
}


My first pass looked like this:

func hexStringToBytes(hexString: String) -> NSData? {
    guard let chars = hexString.cStringUsingEncoding(NSUTF8StringEncoding) else { return nil}
    var i = 0
    let length = hexString.characters.count

    let data = NSMutableData(capacity: length/2)
    var byteChars: [CChar] = [0, 0, 0]

    var wholeByte = CUnsignedLong()

    while i < length {
        byteChars[0] = chars[i++]
        byteChars[1] = chars[i++]
        i+=1
        wholeByte = strtoul(byteChars, nil, 16)
        data?.appendBytes(&wholeByte, length: 1)
    }

    return data
}


I realized I could optimize this further, since ++ is deprecated and will be removed in Swift 3:

func hexStringToBytes(hexString: String) -> NSData? {
    guard let chars = hexString.cStringUsingEncoding(NSUTF8StringEncoding) else { return nil}
    var i = 0
    let length = hexString.characters.count

    let data = NSMutableData(capacity: length/2)
    var byteChars: [CChar] = [0, 0, 0]

    var wholeByte: CUnsignedLong = 0

    while i < length {
        byteChars[0] = chars[i]
        i+=1
        byteChars[1] = chars[i]
        i+=1
        wholeByte = strtoul(byteChars, nil, 16)
        data?.appendBytes(&wholeByte, length: 1)
    }

    return data
}


And then as an extension on String:

```
extension String {

func dataFromHexString() -> NSData?

Solution

First note that your code does not detect invalid input data.
For example, the string "XX" is just converted to a zero byte.
Detecting invalid input with strtoul() is a bit tricky, there is an
alternative suggestion below.

let data = NSMutableData(capacity: length/2)


creates an optional NSData. If that fails then the optional chaining

data?.appendBytes(&wholeByte, length: 1)


simply does nothing, i.e. the error is ignored. Better check the
success immediately:

guard let data = NSMutableData(capacity: length/2) else { return nil }



How can I optimize the ugly i+=1 lines?

By using stride:

func dataFromHexString() -> NSData? {
    guard let chars = cStringUsingEncoding(NSUTF8StringEncoding) else { return nil}
    let length = characters.count

    guard let data = NSMutableData(capacity: length/2) else { return nil }
    var byteChars: [CChar] = [0, 0, 0]
    var wholeByte: CUnsignedLong = 0

    for i in 0.stride(to: length, by: 2) {
        byteChars[0] = chars[i]
        byteChars[1] = chars[i + 1]
        wholeByte = strtoul(byteChars, nil, 16)
        data.appendBytes(&wholeByte, length: 1)
    }

    return data
}


This does not change the performance. The time to convert a 512,000
character string is 0.0140 sec on my computer (test code at the end).


Is cStringUsingEncoding(NSUTF8StringEncoding) the correct way to get the [CChar] from the string?

That is fine as far as I can see. There is also

self.withCString {
    // $0 is a pointer to the NUL-terminated UTF-8 string
}


which I personally prefer, but I could not detect a difference in
the performance.


Since this is almost a direct translation from Objective-C, is there any way to make this more "Swifty," say by using map or stride, without sacrificing speed?

stride() is already used for the iteration. I do not see an use-case
for map() here.

But the performance can be improved considerably.
As observed here
and here, accessing the UTF-16 view
of a Swift string is very fast. This leads to the following implementation:

func dataFromHexString() -> NSData? {

    let utf16 = self.utf16
    guard let data = NSMutableData(capacity: utf16.count/2) else { return nil }

    var byteChars: [CChar] = [0, 0, 0]
    var wholeByte: CUnsignedLong = 0
    var i = utf16.startIndex
    while i != utf16.endIndex {
        byteChars[0] = CChar(truncatingBitPattern: utf16[i])
        byteChars[1] = CChar(truncatingBitPattern: utf16[i.advancedBy(1, limit: utf16.endIndex)])
        wholeByte = strtoul(byteChars, nil, 16)
        data.appendBytes(&wholeByte, length: 1)
        i = i.advancedBy(2, limit: utf16.endIndex)
    }
    return data
}


which converts the 512,000 character string in 0.00185 sec.
Note that invalid input is still not detected.

We can still make it faster by converting the UTF-16 code points
"manually" instead of using strtoul(). This is more code,
but again faster, and also detects all kinds of invalid input:

func dataFromHexString() -> NSData? {

    // Convert 0 ... 9, a ... f, A ...F to their decimal value,
    // return nil for all other input characters
    func decodeNibble(u: UInt16) -> UInt8? {
        switch(u) {
        case 0x30 ... 0x39:
            return UInt8(u - 0x30)
        case 0x41 ... 0x46:
            return UInt8(u - 0x41 + 10)
        case 0x61 ... 0x66:
            return UInt8(u - 0x61 + 10)
        default:
            return nil
        }
    }

    let utf16 = self.utf16
    guard let data = NSMutableData(capacity: utf16.count/2) else {
        return nil
    }

    var i = utf16.startIndex
    while i != utf16.endIndex {
        guard let
            hi = decodeNibble(utf16[i]),
            lo = decodeNibble(utf16[i.advancedBy(1, limit: utf16.endIndex)])
        else {
                return nil
        }
        var value = hi << 4 + lo
        data.appendBytes(&value, length: 1)
        i = i.advancedBy(2, limit: utf16.endIndex)
    }
    return data
}


The time to convert the 512,000 character string is now
0.0008 seconds. This is more than 17 times faster than the original code.

Test code:

let s1 = (0 ... 255).map { String(format:"%02x", $0) }.joinWithSeparator("")
let str = Repeat(count: 100, repeatedValue: s1).joinWithSeparator("")
print(str.characters.count) // 51200

let start = NSDate()
if let data = str.dataFromHexString() {
    let duration = NSDate().timeIntervalSinceDate(start)
    print(duration)
} else {
    print("failed")
}


The tests were done on a MacBook, with the program compiled in
Release mode.

Code Snippets

let data = NSMutableData(capacity: length/2)
data?.appendBytes(&wholeByte, length: 1)
guard let data = NSMutableData(capacity: length/2) else { return nil }
func dataFromHexString() -> NSData? {
    guard let chars = cStringUsingEncoding(NSUTF8StringEncoding) else { return nil}
    let length = characters.count

    guard let data = NSMutableData(capacity: length/2) else { return nil }
    var byteChars: [CChar] = [0, 0, 0]
    var wholeByte: CUnsignedLong = 0

    for i in 0.stride(to: length, by: 2) {
        byteChars[0] = chars[i]
        byteChars[1] = chars[i + 1]
        wholeByte = strtoul(byteChars, nil, 16)
        data.appendBytes(&wholeByte, length: 1)
    }

    return data
}
self.withCString {
    // $0 is a pointer to the NUL-terminated UTF-8 string
}

Context

StackExchange Code Review Q#135424, answer score: 7

Revisions (0)

No revisions yet.