patternrustMinor
What is the idiomatic Rust way to write this naive JavaScript token scanner code fragment?
Viewed 0 times
thisthenaivewhatjavascriptidiomaticwaywritetokencode
Problem
I am learning Rust and finding it quite a difficult but rewarding and fun challenge to learn how to write idiomatic and succinct, clean code in it.
I have written the following simple naive JavaScript scanner code as an exercise to learn some basics of Rust. While it works, it feels like it is far from ideal Rust code.
Are there better ways to write these simple functions? A few things that stand out to me are:
Note I am a compiler dev and know that this isn't great scanner code. That is not the point of this exercise. I am looking to learn how to do Rust basics through practice.
```
fn scan_identifier(first: char, chars: &mut std::str::Chars) {
let mut identifier = String::new();
identifier.push(first);
let mut id_chars = chars.clone();
while let Some(c) = id_chars.next() {
match c {
'a'...'z'|'A'...'Z'|'0'...'9' =>
identifier.push(c),
_ => break,
}
}
println!("{}", identifier);
chars.nth(identifier.chars().count() - 2);
}
fn scan_string(chars: &mut std::str::Chars) {
let mut string = String::new();
while let Some(c) = chars.next() {
match c {
'\'' => { println!("'{}'", string); return; },
_ => string.push(c),
}
}
println!("unterminated string! '{}", string);
}
fn scan(src: &str) {
let mut chars = src.chars();
while let Some(c) = chars.next() {
match c {
'(' => println!("("),
')' => println!(")"),
';' => println!(";"),
'\'' => scan_string(&mut chars),
_ => scan_identifier(c, &mut chars),
}
}
}
fn main() {
let input = "print('Hello, world!');";
prin
I have written the following simple naive JavaScript scanner code as an exercise to learn some basics of Rust. While it works, it feels like it is far from ideal Rust code.
Are there better ways to write these simple functions? A few things that stand out to me are:
- The loops and pattern matching feel more verbose than they could be
- The use of the
Charsiterator feels almost right but somehow off
- In particular, the
scan_identifierfunction is very clumsy since the iterator can't rewind already consumed characters in the string
Note I am a compiler dev and know that this isn't great scanner code. That is not the point of this exercise. I am looking to learn how to do Rust basics through practice.
```
fn scan_identifier(first: char, chars: &mut std::str::Chars) {
let mut identifier = String::new();
identifier.push(first);
let mut id_chars = chars.clone();
while let Some(c) = id_chars.next() {
match c {
'a'...'z'|'A'...'Z'|'0'...'9' =>
identifier.push(c),
_ => break,
}
}
println!("{}", identifier);
chars.nth(identifier.chars().count() - 2);
}
fn scan_string(chars: &mut std::str::Chars) {
let mut string = String::new();
while let Some(c) = chars.next() {
match c {
'\'' => { println!("'{}'", string); return; },
_ => string.push(c),
}
}
println!("unterminated string! '{}", string);
}
fn scan(src: &str) {
let mut chars = src.chars();
while let Some(c) = chars.next() {
match c {
'(' => println!("("),
')' => println!(")"),
';' => println!(";"),
'\'' => scan_string(&mut chars),
_ => scan_identifier(c, &mut chars),
}
}
}
fn main() {
let input = "print('Hello, world!');";
prin
Solution
- You should become very familiar with all the methods on
Iterator; in this casetake_whilewould be highly relevant.
- Other parts of the ecosystem that deal with iterators are invaluable, like
String::extend.
- Extract the logic about what is an identifier character, then you can write that code much simpler:
identifier.extend(chars.clone().take_while(is_id_char)).
- In most cases, you can use a
forloop instead ofwhile let.scan_stringis a good example of this.scanis not, because you wish to pass the iterator into further methods inside the loop body, which is not possible with theforsyntax.
- In addition to the standard library Iterator methods, you should internalize what Itertools provides.
- For example,
take_while_refavoids the need toclonethe input iterator and then drive the original withnth.
- You could chose to accept a generic iterator instead of
Charsspecifically.
extern crate itertools;
use itertools::Itertools;
fn is_id_char(c: &char) -> bool {
match *c {
'a'...'z' | 'A'...'Z' | '0'...'9' => true,
_ => false,
}
}
fn scan_identifier(first: char, chars: &mut I)
where I: Iterator + Clone
{
let mut identifier = String::new();
identifier.push(first);
identifier.extend(chars.take_while_ref(is_id_char));
println!("{}", identifier);
}
fn scan_string(chars: &mut I)
where I: Iterator + Clone
{
let string: String = chars.take_while_ref(|&c| c != '\'').collect();
if Some('\'') != chars.next() {
println!("unterminated string! '{}", string);
} else {
println!("'{}'", string);
}
}
fn scan(src: &str) {
let mut chars = src.chars();
while let Some(c) = chars.next() {
match c {
'(' => println!("("),
')' => println!(")"),
';' => println!(";"),
'\'' => scan_string(&mut chars),
_ => scan_identifier(c, &mut chars),
}
}
}
fn main() {
let input = "print('Hello, world!');";
println!("Input:\n\n{}\n\nTokens:\n", input);
scan(input);
}I wouldn't write any parsing / tokenizing / scanning that operates on characters, however. Especially since you are converting from a
&str to a String, when you could just be returning string slices and avoiding extra allocation.Code Snippets
extern crate itertools;
use itertools::Itertools;
fn is_id_char(c: &char) -> bool {
match *c {
'a'...'z' | 'A'...'Z' | '0'...'9' => true,
_ => false,
}
}
fn scan_identifier<I>(first: char, chars: &mut I)
where I: Iterator<Item = char> + Clone
{
let mut identifier = String::new();
identifier.push(first);
identifier.extend(chars.take_while_ref(is_id_char));
println!("{}", identifier);
}
fn scan_string<I>(chars: &mut I)
where I: Iterator<Item = char> + Clone
{
let string: String = chars.take_while_ref(|&c| c != '\'').collect();
if Some('\'') != chars.next() {
println!("unterminated string! '{}", string);
} else {
println!("'{}'", string);
}
}
fn scan(src: &str) {
let mut chars = src.chars();
while let Some(c) = chars.next() {
match c {
'(' => println!("("),
')' => println!(")"),
';' => println!(";"),
'\'' => scan_string(&mut chars),
_ => scan_identifier(c, &mut chars),
}
}
}
fn main() {
let input = "print('Hello, world!');";
println!("Input:\n\n{}\n\nTokens:\n", input);
scan(input);
}Context
StackExchange Code Review Q#149075, answer score: 3
Revisions (0)
No revisions yet.