patterncppMinor
Implementing HTUnEscape
Viewed 0 times
implementinghtunescapestackoverflow
Problem
I came across the following piece of code in a UI application I need to maintain.
It looks like the function is trying to implement
int tool_unhex( char c )
{
return( c >= '0' && c = 'A' && c <= 'F' ? c - 'A' + 10
: c - 'a' + 10 );
}
void unescape2QString(const char *sOrg, QString & str)
{
/*
* Remove URL hex escapes from s... done in place. The basic concept for
* this routine is borrowed from the WWW library HTUnEscape() routine.
*/
char* s = (char*)sOrg;
unsigned short w = 0;
str = "";
for ( ; *s != '\0'; ++s ) {
if ( *s == '%' ) {
if(*(s+1) == 'u')
{
s++;
if ( *++s != '\0' ) {
w = (wchar_t) (tool_unhex( *s ) << 12);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 8);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 4);
}
if ( *++s != '\0' ) {
w += (wchar_t) tool_unhex( *s );
}
str += QString::fromUtf16(&w, 1);
}
}
else
{
str += QString::fromAscii(s, 1);
}
}
}- Isn't
QStringimmutable? The comment says it's changing in place, but isn't eachstr +=creating a newQString?
- Is this the most optimized way of doing this?
It looks like the function is trying to implement
HTUnEscape() from w3.org, with UTF16 strings.Solution
Firstly, to answer your first question,
This operation is typically very fast (constant time), because QString preallocates extra space at the end of the string data so it can grow without reallocating the entire string each time.
I'm not a fan of the nested ternary statements in
I'd suggest changing it to the following:
If you're sure that the character passed in will always be in range, this should be documented somewhere. Also, it should be documented that this will ONLY work for ASCII and ASCII-compatible encodings.
Your
Why cast away the
If you're just going to clear the
This can certainly be simplified with a loop.
In the end, I came up with something like this:
This definitely needs some documentation as to what exactly it's doing, as it is absolutely not clear currently.
QString is definitely not immutable. Directly from the documentation:This operation is typically very fast (constant time), because QString preallocates extra space at the end of the string data so it can grow without reallocating the entire string each time.
I'm not a fan of the nested ternary statements in
tool_unhex. Also, it does no error checking to make sure that whatever it is passed is actually a valid hex character. This is dangerous, as you use an unsigned short to store the shifted value of the character. If it's some character that's above F or f, then shifting and adding will potentially overflow and cause hard to track down bugs. For example, if there was a g that got in there somehow:'g' - 'a' + 10; // Equals 16
unsigned short w = (16 << 12) + (16 << 8) + (16 << 4) + 16; // OverflowI'd suggest changing it to the following:
#include
// Converts the given (ASCII encoded) hex character to
// an integer value 0 - 15.
// throw std::out_of_range if the given character is not
// a valid hex character.
int tool_unhex(char c)
{
if(std::isdigit(c))
return c - '0';
else if(c >= 'A' && c = 'a' && c <= 'f')
return c - 'a' + 10;
throw std::out_of_range("Not a hex character");
}If you're sure that the character passed in will always be in range, this should be documented somewhere. Also, it should be documented that this will ONLY work for ASCII and ASCII-compatible encodings.
Your
unescape method has some confusing parts. The comments and the code don't match.char* s = (char*)sOrg;Why cast away the
constness of sOrg? You don't modify s anyway, so just work directly on the (const) sOrg. str = "";If you're just going to clear the
QString passed in straight away, why not just create and return it instead of taking it as a reference?if ( *s == '%' ) {
if(*(s+1) == 'u')
{
s++;
if ( *++s != '\0' ) {
w = (wchar_t) (tool_unhex( *s ) << 12);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 8);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 4);
}
if ( *++s != '\0' ) {
w += (wchar_t) tool_unhex( *s );
}
str += QString::fromUtf16(&w, 1);
}
}This can certainly be simplified with a loop.
In the end, I came up with something like this:
QString unescape2QString(const char *sOrg)
{
QString str;
for ( ; *sOrg != '\0'; ++sOrg ) {
if (*sOrg == '%' && *(sOrg+1) == 'u') {
{
unsigned short w = 0;
++sOrg;
// Left shift value
int i = 12;
while(*++sOrg != '\0' && i >= 0) {
w += (wchar_t) (tool_unhex(*sOrg) << i);
i -= 4;
}
str += QString::fromUtf16(&w, 1);
}
else
{
str += QString::fromAscii(s, 1);
}
}
return str;
}This definitely needs some documentation as to what exactly it's doing, as it is absolutely not clear currently.
Code Snippets
'g' - 'a' + 10; // Equals 16
unsigned short w = (16 << 12) + (16 << 8) + (16 << 4) + 16; // Overflow#include <cctype>
// Converts the given (ASCII encoded) hex character to
// an integer value 0 - 15.
// throw std::out_of_range if the given character is not
// a valid hex character.
int tool_unhex(char c)
{
if(std::isdigit(c))
return c - '0';
else if(c >= 'A' && c <= 'F')
return c - 'A' + 10;
else if(c >= 'a' && c <= 'f')
return c - 'a' + 10;
throw std::out_of_range("Not a hex character");
}char* s = (char*)sOrg;if ( *s == '%' ) {
if(*(s+1) == 'u')
{
s++;
if ( *++s != '\0' ) {
w = (wchar_t) (tool_unhex( *s ) << 12);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 8);
}
if ( *++s != '\0' ) {
w += (wchar_t) (tool_unhex( *s ) << 4);
}
if ( *++s != '\0' ) {
w += (wchar_t) tool_unhex( *s );
}
str += QString::fromUtf16(&w, 1);
}
}QString unescape2QString(const char *sOrg)
{
QString str;
for ( ; *sOrg != '\0'; ++sOrg ) {
if (*sOrg == '%' && *(sOrg+1) == 'u') {
{
unsigned short w = 0;
++sOrg;
// Left shift value
int i = 12;
while(*++sOrg != '\0' && i >= 0) {
w += (wchar_t) (tool_unhex(*sOrg) << i);
i -= 4;
}
str += QString::fromUtf16(&w, 1);
}
else
{
str += QString::fromAscii(s, 1);
}
}
return str;
}Context
StackExchange Code Review Q#24888, answer score: 7
Revisions (0)
No revisions yet.