HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustMinor

Lexer Implementation in Rust - `PhantomData` Awkwardness

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lexerawkwardnessphantomdatarustimplementation

Problem

I've been working on a Tokeniser/Lexer in Rust. I would like to have the tokeniser take input from different sources, such as files or in-memory strings. To abstract over this concern i've created a trait Source. This has left me with a wonky situation where I appear to need a PhantomData member in the token iterator. I've created a cut-down example here:

``
/// Source of Characters
///
/// In this example all that a source can do is be sliced to retrieve
/// a subsection of the overall character buffer.
trait Source {
/// Character at Offset
///
/// Gets the character at the given offset in the buffer and
/// returns it. If no character is available at that offset
None
/// is returned.
fn at(&self, offset: usize) -> Option;

/// Slice the Source Buffer
fn slice(&self, start: usize, end: usize) -> &'a str;
}

/// Source of Characters from a
str Slice
struct StringSource {
pub buff: &'a str
}

/// Implementation of the Source trait.
impl Source for StringSource {
fn at(&self, offset: usize) -> Option {
self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
}

fn slice(&self, start: usize, end: usize) -> &'a str {
&self.buff[start..end]
}
}

/// Token Iterator Implementation
///
/// This token iterator takes a given source and steps through it returning string slices for each token
struct TokenIter
where S: Source,
S: 'a
{
source: S,
idx: usize,
phantom: ::std::marker::PhantomData,
}

impl TokenIter
where S: Source
{
/// Create a Token Iterator from a Source
fn new(source: S) -> Self {
TokenIter {
source: source,
idx: 0,
phantom: ::std::marker::PhantomData,
}
}
}

/// Token Iterators implement
Iterator`
impl Iterator for TokenIter
where S: Source
{
type Item = &'a str;

fn next(&mut self) -> Option {
let ts = self.idx;
self.source.at(ts).map(|(_ch,

Solution

such as files or in-memory strings

I'm going to give the advice that people dread: I don't think your abstraction is going to work here. When I first read "in-memory strings", I expected a String, not a &str. Since you mentioned a file, I think it's still a valid comparison. I don't believe you can implement this trait for such a type:

struct OwnedStringSource {
    pub buff: String,
}

impl Source for OwnedStringSource {
    fn at(&self, offset: usize) -> Option { None }

    fn slice(&self, start: usize, end: usize) -> &'a str {
        // Hmm.... what to put here?
    }
}


That is, there's no way to say a reasonable lifetime for 'a. I also think that it's the same root problem as Can I write an Iterator that yields a reference into itself?.

Aside from that, your original error was probably something like:

error[E0207]: the lifetime parameter 'a is not constrained by the impl trait, self type, or predicates
--> src/main.rs:117:6
|
117 | impl Iterator for TokenIter
| ^^ unconstrained lifetime parameter


The best advice I've gotten about that specific error was helpful... after thinking about it for a while. Paraphrased and elided for this situation:


what [the error is] trying to tell you is that it cannot get [the generic type] back from either
the implemented trait [...] or the type implemented on [...]. [The where clause]
is not enough to extract [the generic] from [the type] because one [...] type can have multiple
[...] impls with various arguments

Instead, I might suggest that you use an associated type instead of the generic parameter:

trait Source {
    type Slice;

    fn at(&self, offset: usize) -> Option;
    fn slice(&self, start: usize, end: usize) -> Self::Slice;
}


This separates the lifetimes from the trait. Specific implementations can still participate in it:

impl Source for StringSource {
    type Slice = &'a str;
    // ...
}


And you can just bubble up the inner type out of the iterator:

impl Iterator for TokenIter
    where S: Source,
{
    type Item = S::Slice;
    // ...
}


You potentially might need to add extra bounds on that generic there (S::Slice: AsRef) depending on what you need to be able to do with the slice in the iterator implementation.

trait Source {
    type Slice;

    fn at(&self, offset: usize) -> Option;
    fn slice(&self, start: usize, end: usize) -> Self::Slice;
}

struct StringSource {
    pub buff: &'a str
}

impl Source for StringSource {
    type Slice = &'a str;

    fn at(&self, offset: usize) -> Option {
        self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
    }

    fn slice(&self, start: usize, end: usize) -> &'a str {
        &self.buff[start..end]
    }
}

struct TokenIter {
    source: S,
    idx: usize,
}

impl TokenIter {
    fn new(source: S) -> Self {
        TokenIter {
            source: source,
            idx: 0,
        }
    }
}

impl Iterator for TokenIter
    where S: Source,
{
    type Item = S::Slice;

    fn next(&mut self) -> Option {
        let ts = self.idx;
        self.source.at(ts).map(|(_ch, next)| {
            self.idx = next;
            self.source.slice(ts, next)
        })
    }
}

fn main() {
    let source = StringSource{ buff: "hello world" };
    let iter = TokenIter::new(source);
    println!("{:?}", iter.collect::>());
}

Code Snippets

struct OwnedStringSource {
    pub buff: String,
}

impl<'a> Source<'a> for OwnedStringSource {
    fn at(&self, offset: usize) -> Option<(char, usize)> { None }

    fn slice(&self, start: usize, end: usize) -> &'a str {
        // Hmm.... what to put here?
    }
}
trait Source {
    type Slice;

    fn at(&self, offset: usize) -> Option<(char, usize)>;
    fn slice(&self, start: usize, end: usize) -> Self::Slice;
}
impl<'a> Source for StringSource<'a> {
    type Slice = &'a str;
    // ...
}
impl<'a, S> Iterator for TokenIter<S>
    where S: Source,
{
    type Item = S::Slice;
    // ...
}
trait Source {
    type Slice;

    fn at(&self, offset: usize) -> Option<(char, usize)>;
    fn slice(&self, start: usize, end: usize) -> Self::Slice;
}

struct StringSource<'a> {
    pub buff: &'a str
}

impl<'a> Source for StringSource<'a> {
    type Slice = &'a str;

    fn at(&self, offset: usize) -> Option<(char, usize)> {
        self.buff[offset..].chars().nth(0).map(|ch| { (ch, offset + ch.len_utf8()) })
    }

    fn slice(&self, start: usize, end: usize) -> &'a str {
        &self.buff[start..end]
    }
}

struct TokenIter<S> {
    source: S,
    idx: usize,
}

impl<S> TokenIter<S> {
    fn new(source: S) -> Self {
        TokenIter {
            source: source,
            idx: 0,
        }
    }
}

impl<S> Iterator for TokenIter<S>
    where S: Source,
{
    type Item = S::Slice;

    fn next(&mut self) -> Option<Self::Item> {
        let ts = self.idx;
        self.source.at(ts).map(|(_ch, next)| {
            self.idx = next;
            self.source.slice(ts, next)
        })
    }
}

fn main() {
    let source = StringSource{ buff: "hello world" };
    let iter = TokenIter::new(source);
    println!("{:?}", iter.collect::<Vec<_>>());
}

Context

StackExchange Code Review Q#161017, answer score: 4

Revisions (0)

No revisions yet.