snippetrustCritical
How to index a String in Rust
Viewed 0 times
stringindexhowrust
Problem
I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):
The error:
Is there a reason why
fn is_palindrome(num: u64) -> bool {
let num_string = num.to_string();
let num_length = num_string.len();
for i in 0 .. num_length / 2 {
if num_string[i] != num_string[(num_length - 1) - i] {
return false;
}
}
true
}The error:
error[E0277]: the trait bound std::string::String: std::ops::Index is not satisfied
--> :7:12
|
7 | if num_string[i] != num_string[(num_length - 1) - i] {
| ^^^^^^^^^^^^^
|
= note: the type std::string::String cannot be indexed by usize
Is there a reason why
String can not be indexed? How can I access the data then?Solution
Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are saved in a contiguous UTF-8 encoded buffer internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character / unicode code point, which is really bad if you need text processing), while code point indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string buffer to find the required code point.
If you are certain that your strings contain ASCII characters only, you can use the
If you do need to index code points, you have to use the
As I said above, this would require traversing the entire iterator up to the
Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. For example, many emojis are composed of multiple code points, but are perceived as one "character". With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:
Naturally, grapheme cluster indexing into the contiguous UTF-8 buffer has the same requirement of traversing the entire string as indexing into code points.
If you are certain that your strings contain ASCII characters only, you can use the
as_bytes() method on &str which returns a byte slice, and then index into this slice:let num_string = num.to_string();
// ...
let b: u8 = num_string.as_bytes()[i];
let c: char = b as char; // if you need to get the character as a unicode code pointIf you do need to index code points, you have to use the
chars() iterator:num_string.chars().nth(i).unwrap()As I said above, this would require traversing the entire iterator up to the
ith code element.Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. For example, many emojis are composed of multiple code points, but are perceived as one "character". With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:
use unicode_segmentation::UnicodeSegmentation
let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()
Naturally, grapheme cluster indexing into the contiguous UTF-8 buffer has the same requirement of traversing the entire string as indexing into code points.
Code Snippets
let num_string = num.to_string();
// ...
let b: u8 = num_string.as_bytes()[i];
let c: char = b as char; // if you need to get the character as a unicode code pointnum_string.chars().nth(i).unwrap()Context
Stack Overflow Q#24542115, score: 217
Revisions (0)
No revisions yet.