patterncppMinor
Boost.Spirit UTF-8 string literal parser with escape support
Viewed 0 times
boostutfspiritparserwithliteralstringsupportescape
Problem
I wrote (as part of a greater work) a Boost.Spirit grammar that would parse string literals, including support for the various escape sequences known from C/C++ (
At some point I encountered some problems, mostly due to my lack of understanding either Boost.Spirit or Boost.Phoenix to the detail required to have full control over what I'm doing, and despairing at the rather non-descriptive error messages Boost.Spirit generates. ;-) User sehe was very helpful over at StackOverflow, and my grammar is now functional.
However, some things are still bothering me:
Any other suggestions (like, how to better learn to fish in Boost.Spirit instead of asking you to hand me the fish...) are likewise welcome. I know the "test driver" main()` is crude; I didn't want to make this longer than necessary by going through th
\n, \x7f, \341, \u017f, \U00010451).At some point I encountered some problems, mostly due to my lack of understanding either Boost.Spirit or Boost.Phoenix to the detail required to have full control over what I'm doing, and despairing at the rather non-descriptive error messages Boost.Spirit generates. ;-) User sehe was very helpful over at StackOverflow, and my grammar is now functional.
However, some things are still bothering me:
- The functor
cp2utf8_fdoes the conversion of aUChar32to UTF-8 byte sequence. However, as astructinside the grammar, it is not exactly re-usable. I would like to have it as a stand-alone function, but have failed to make it work.
- The
escapesrule basically does the same thing in five different ways -- determine aUChar32code point, and pass it to the functor (see above) using semantic actions, which appends it to the result string. This should really be a rule with anUChar32result, which is then passed to the functor at the point the rule is called (to avoid the five-fold repitition of the functor call). Again, I had an idea of how it should work, but it didn't.
- The error handlers (straight from the tutorial) currently print to
std::cout. That's not nice; I'd rather have the error message generated by the handler thrown as exception (let's saystd::runtime_errorfor the sake of this review). Again, my lack of in-depth understanding of what is going on here exactly makes me scratch my head at why the compiler complains about "invalid use of void exception" when I replace thestd::cout
Any other suggestions (like, how to better learn to fish in Boost.Spirit instead of asking you to hand me the fish...) are likewise welcome. I know the "test driver" main()` is crude; I didn't want to make this longer than necessary by going through th
Solution
- The error handlers
The problem with the throw expression, as the compiler kindly reminded you, is that they're void-expressions.
Even if it compiled, it would not do what you want: it'd throw during the grammar constructor...
The repeating story here is that semantic actions (and error handlers in this case) require Phoenix actors (a.k.a. lazy or deferred functions), so that spirit knows how to evalute them against the spirit context when needed. The simple case:
qi::on_error
(
quoted_string,
phoenix::throw_(
phoenix::construct( "Illegal string literal. (Unterminated string?)" )
)
);The more complex version requires stream concatenation. You could do this with a local/let-expression, but I'd keep it simple and extract a Phoenix function
make_error_message:qi::on_error
(
escapes,
phoenix::throw_(
phoenix::construct( make_error_message(qi::_4, qi::_3, qi::_2) )
)
);Now, you can just code that function in any which way you like:
struct make_error_message_f {
template struct result { using type = std::string; };
template
std::string operator()(Info const& info, F f, L l) const {
std::ostringstream oss;
oss make_error_message;See below for ways to make
make_error_message a function that's adapted for Phoenix use.- Using a global function
However, as a struct inside the grammar, it is not exactly re-usable. I would like to have it as a stand-alone function, but have failed to make it work.
You can of course just relay the implementation of
cp2utf8_f::operator() to a re-usable function of your choice. Of course, that makes the cp2utf8_f function object merely red-tape code. If you don't mind putting traits in the Phoenix extensions namespaces, you can use the existing adaptation macros:namespace my_helpers {
void cp2utf8(std::string& a, UChar32 codepoint)
{
icu::StringByteSink bs(&a);
icu::UnicodeString::fromUTF32(&codepoint, 1).toUTF8( bs );
}
template
std::string make_error_message(boost::spirit::info const& info, Iterator first, Iterator last) {
std::ostringstream oss;
oss << "Illegal escape sequence. Expecting " << info << " here: \"" << std::string(first,last) << "\"";
return oss.str();
}
}
BOOST_PHOENIX_ADAPT_FUNCTION(void, cp2utf8_, my_helpers::cp2utf8, 2)
BOOST_PHOENIX_ADAPT_FUNCTION(std::string, make_error_message_, my_helpers::make_error_message, 3)// (And I don't like *result* and *cp2utf8* lying around here
// when a stand-alone function should do just as well.)They're private inner types. They inlining food. What's the cost you measured?
Personally, I prefer the localized function objects because they give you more control and prevent namespace pollution. Note that on sufficiently advanced version you may be able to drop the inner
result_type/result<>::type constructs (see RESULT_OF docs).- Reducing WET-ness (repetition)
Is this what you had in mind:
escapes = '\\' > (
escaped_character
| ("x" > qi::uint_parser())
| ("u" > qi::uint_parser())
| ("U" > qi::uint_parser())
| ( qi::uint_parser())
) [ cp2utf8_( qi::_val, qi::_1 ) ]
;DEMO
Includes the improvements described, and also some excess scope/namespace pollution issues.
Live on Coliru
```
#define BOOST_SPIRIT_UNICODE
#include
#include
#include
#include
#include
#include
namespace qi = boost::spirit::qi;
using boost::spirit::unicode::char_;
using boost::spirit::eol;
namespace my_helpers {
void cp2utf8(std::string& a, UChar32 codepoint)
{
icu::StringByteSink bs(&a);
icu::UnicodeString::fromUTF32(&codepoint, 1).toUTF8( bs );
}
template
std::string make_error_message(boost::spirit::info const& info, Iterator first, Iterator last) {
std::ostringstream oss;
oss
struct QuotedString : qi::grammar
{
QuotedString() : QuotedString::base_type( quoted_string )
{
quoted_string = '"' > *( +( char_ - ( '"' | eol | '\\' ) ) | escapes ) > '"';
escapes = '\\' > (
escaped_character
| ("x" > qi::uint_parser())
| ("u" > qi::uint_parser())
| ("U" > qi::uint_parser())
| ( qi::uint_parser())
) [ cp2utf8_( qi::_val, qi::_1 ) ]
;
escaped_character.add
( "a", 0x07 ) // alert
( "b", 0x08 ) // backspace
( "f", 0x0c ) // form feed
( "n", 0x0a ) // new line
( "r", 0x0d ) // carriage return
( "t", 0x09 ) // horizontal tab
( "v", 0x0b ) // vertical tab
( "\"", 0x22 ) // literal quotation mark
( "\\", 0x5c ) // literal backslash
;
namespace phx = boost::phoenix;
qi::on_error (
Code Snippets
qi::on_error< qi::fail >
(
quoted_string,
phoenix::throw_(
phoenix::construct<std::runtime_error>( "Illegal string literal. (Unterminated string?)" )
)
);qi::on_error< qi::fail >
(
escapes,
phoenix::throw_(
phoenix::construct<std::runtime_error>( make_error_message(qi::_4, qi::_3, qi::_2) )
)
);struct make_error_message_f {
template <typename ...> struct result { using type = std::string; };
template <typename Info, typename F, typename L>
std::string operator()(Info const& info, F f, L l) const {
std::ostringstream oss;
oss << "Illegal escape sequence. Expecting " << info << " here: \"" << std::string(f,l) << "\"";
return oss.str();
}
};
phoenix::function<make_error_message_f> make_error_message;namespace my_helpers {
void cp2utf8(std::string& a, UChar32 codepoint)
{
icu::StringByteSink<std::string> bs(&a);
icu::UnicodeString::fromUTF32(&codepoint, 1).toUTF8( bs );
}
template<typename Iterator>
std::string make_error_message(boost::spirit::info const& info, Iterator first, Iterator last) {
std::ostringstream oss;
oss << "Illegal escape sequence. Expecting " << info << " here: \"" << std::string(first,last) << "\"";
return oss.str();
}
}
BOOST_PHOENIX_ADAPT_FUNCTION(void, cp2utf8_, my_helpers::cp2utf8, 2)
BOOST_PHOENIX_ADAPT_FUNCTION(std::string, make_error_message_, my_helpers::make_error_message, 3)// (And I don't like *result* and *cp2utf8* lying around here
// when a stand-alone function should do just as well.)Context
StackExchange Code Review Q#102374, answer score: 6
Revisions (0)
No revisions yet.