patternrustMinor
Lexer Implementation in Rust - `PhantomData` Awkwardness
Viewed 0 times
lexerawkwardnessphantomdatarustimplementation
Problem
I've been working on a Tokeniser/Lexer in Rust. I would like to have the tokeniser take input from different sources, such as files or in-memory strings. To abstract over this concern i've created a trait
``
impl Iterator for TokenIter
where S: Source
{
type Item = &'a str;
fn next(&mut self) -> Option {
let ts = self.idx;
self.source.at(ts).map(|(_ch,
Source. This has left me with a wonky situation where I appear to need a PhantomData member in the token iterator. I've created a cut-down example here:``
/// Source of Characters
///
/// In this example all that a source can do is be sliced to retrieve
/// a subsection of the overall character buffer.
trait Source {
/// Character at Offset
///
/// Gets the character at the given offset in the buffer and
/// returns it. If no character is available at that offset None
/// is returned.
fn at(&self, offset: usize) -> Option;
/// Slice the Source Buffer
fn slice(&self, start: usize, end: usize) -> &'a str;
}
/// Source of Characters from a str Slice
struct StringSource {
pub buff: &'a str
}
/// Implementation of the Source trait.
impl Source for StringSource {
fn at(&self, offset: usize) -> Option {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}
fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}
/// Token Iterator Implementation
///
/// This token iterator takes a given source and steps through it returning string slices for each token
struct TokenIter
where S: Source,
S: 'a
{
source: S,
idx: usize,
phantom: ::std::marker::PhantomData,
}
impl TokenIter
where S: Source
{
/// Create a Token Iterator from a Source
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
phantom: ::std::marker::PhantomData,
}
}
}
/// Token Iterators implement Iterator`impl Iterator for TokenIter
where S: Source
{
type Item = &'a str;
fn next(&mut self) -> Option {
let ts = self.idx;
self.source.at(ts).map(|(_ch,
Solution
such as files or in-memory strings
I'm going to give the advice that people dread: I don't think your abstraction is going to work here. When I first read "in-memory strings", I expected a
That is, there's no way to say a reasonable lifetime for
Aside from that, your original error was probably something like:
The best advice I've gotten about that specific error was helpful... after thinking about it for a while. Paraphrased and elided for this situation:
what [the error is] trying to tell you is that it cannot get [the generic type] back from either
the implemented trait [...] or the type implemented on [...]. [The where clause]
is not enough to extract [the generic] from [the type] because one [...] type can have multiple
[...] impls with various arguments
Instead, I might suggest that you use an associated type instead of the generic parameter:
This separates the lifetimes from the trait. Specific implementations can still participate in it:
And you can just bubble up the inner type out of the iterator:
You potentially might need to add extra bounds on that generic there (
I'm going to give the advice that people dread: I don't think your abstraction is going to work here. When I first read "in-memory strings", I expected a
String, not a &str. Since you mentioned a file, I think it's still a valid comparison. I don't believe you can implement this trait for such a type:struct OwnedStringSource {
pub buff: String,
}
impl Source for OwnedStringSource {
fn at(&self, offset: usize) -> Option { None }
fn slice(&self, start: usize, end: usize) -> &'a str {
// Hmm.... what to put here?
}
}That is, there's no way to say a reasonable lifetime for
'a. I also think that it's the same root problem as Can I write an Iterator that yields a reference into itself?.Aside from that, your original error was probably something like:
error[E0207]: the lifetime parameter 'a is not constrained by the impl trait, self type, or predicates
--> src/main.rs:117:6
|
117 | impl Iterator for TokenIter
| ^^ unconstrained lifetime parameter
The best advice I've gotten about that specific error was helpful... after thinking about it for a while. Paraphrased and elided for this situation:
what [the error is] trying to tell you is that it cannot get [the generic type] back from either
the implemented trait [...] or the type implemented on [...]. [The where clause]
is not enough to extract [the generic] from [the type] because one [...] type can have multiple
[...] impls with various arguments
Instead, I might suggest that you use an associated type instead of the generic parameter:
trait Source {
type Slice;
fn at(&self, offset: usize) -> Option;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}This separates the lifetimes from the trait. Specific implementations can still participate in it:
impl Source for StringSource {
type Slice = &'a str;
// ...
}And you can just bubble up the inner type out of the iterator:
impl Iterator for TokenIter
where S: Source,
{
type Item = S::Slice;
// ...
}You potentially might need to add extra bounds on that generic there (
S::Slice: AsRef) depending on what you need to be able to do with the slice in the iterator implementation.trait Source {
type Slice;
fn at(&self, offset: usize) -> Option;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}
struct StringSource {
pub buff: &'a str
}
impl Source for StringSource {
type Slice = &'a str;
fn at(&self, offset: usize) -> Option {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}
fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}
struct TokenIter {
source: S,
idx: usize,
}
impl TokenIter {
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
}
}
}
impl Iterator for TokenIter
where S: Source,
{
type Item = S::Slice;
fn next(&mut self) -> Option {
let ts = self.idx;
self.source.at(ts).map(|(_ch, next)| {
self.idx = next;
self.source.slice(ts, next)
})
}
}
fn main() {
let source = StringSource{ buff: "hello world" };
let iter = TokenIter::new(source);
println!("{:?}", iter.collect::>());
}Code Snippets
struct OwnedStringSource {
pub buff: String,
}
impl<'a> Source<'a> for OwnedStringSource {
fn at(&self, offset: usize) -> Option<(char, usize)> { None }
fn slice(&self, start: usize, end: usize) -> &'a str {
// Hmm.... what to put here?
}
}trait Source {
type Slice;
fn at(&self, offset: usize) -> Option<(char, usize)>;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}impl<'a> Source for StringSource<'a> {
type Slice = &'a str;
// ...
}impl<'a, S> Iterator for TokenIter<S>
where S: Source,
{
type Item = S::Slice;
// ...
}trait Source {
type Slice;
fn at(&self, offset: usize) -> Option<(char, usize)>;
fn slice(&self, start: usize, end: usize) -> Self::Slice;
}
struct StringSource<'a> {
pub buff: &'a str
}
impl<'a> Source for StringSource<'a> {
type Slice = &'a str;
fn at(&self, offset: usize) -> Option<(char, usize)> {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}
fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}
struct TokenIter<S> {
source: S,
idx: usize,
}
impl<S> TokenIter<S> {
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
}
}
}
impl<S> Iterator for TokenIter<S>
where S: Source,
{
type Item = S::Slice;
fn next(&mut self) -> Option<Self::Item> {
let ts = self.idx;
self.source.at(ts).map(|(_ch, next)| {
self.idx = next;
self.source.slice(ts, next)
})
}
}
fn main() {
let source = StringSource{ buff: "hello world" };
let iter = TokenIter::new(source);
println!("{:?}", iter.collect::<Vec<_>>());
}Context
StackExchange Code Review Q#161017, answer score: 4
Revisions (0)
No revisions yet.