HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustMinor

Extract PNGs embedded in a file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
embeddedfilepngsextract

Problem

My approach was to find each occurrence of the PNG's file signature followed by its end of file (EOF) and write the bytes between to a new file whose name is simply a counter starting at zero.

use std::env;
use std::fs::File;
use std::io::prelude::*;

const SIGNATURE: [u8; 8] = [137, 80, 78, 71, 13, 10, 26, 10];
const EOF: [u8; 8] = [73, 69, 78, 68, 174, 66, 96, 130];

fn main() {
    let path = env::args().nth(1).unwrap();
    let mut f = File::open(path).unwrap();

    let mut buf = Vec::new();
    f.read_to_end(&mut buf).unwrap();

    let mut iter = buf.iter();
    let mut i = 0;

    while let Some(start) = iter.rposition(|&x| x == SIGNATURE[0]) {

        if buf[start..].starts_with(&SIGNATURE) {

            let mut iter = buf[start..].iter();

            let mut end = iter.position(|&x| x == 130).unwrap();

            loop {
                let slice = &buf[start..start + end + 1];
                if slice.ends_with(&EOF) {
                    let mut f = File::create(i.to_string() + ".png").unwrap();
                    f.write_all(slice).unwrap();
                    i += 1;
                    break;
                }

                end += match iter.position(|&x| x == 130) {
                    Some(x) => { x + 1 },
                    None => break,
                };
            }
        }
    }
}


The only reason I used rposition instead of position for start was because the file I was working with had all its PNG's embedded at the end.

I had attempted to replace some of the unwraps with try!s but realized that the macro needs to be called from a function that returns a Result, i.e. can't be used in fn main(), and couldn't figure out how best to break up the code.

Was wondering if there might be a better way to approach the problem. If not, how can I at least clean up this implementation?

Solution

-
Use expect instead of unwrap. This gives you and your users
a better chance at figuring out what went wrong. It's also one step closer
to good error reporting and handling.

-
Extract an inner_main that returns a Result and thus allows
you to use try!. main can call that method, pattern match on the
Result, print the failure message, have an exit code, etc.

-
To start with, return a Result>. Box is an easy
type to build, and try! can convert to it automatically.

-
Once you are returning Results, theexpects can be transformed into a
ResultusingOption::ok_or.

-
One Rust-specific way to think about splitting up functions is
to constrain mutability. Extracting out loading the file allows the
Vec to be mutable in the smallest possible scope.

-
It may be over engineering, but creating an iterator to handle
finding the embedded PNGs helps separate concerns. For example, I
didn't originally see that
i wasn't part of the main finding
logic. It would also allow you to create more focused unit tests on
just this complicated functionality.

-
The search algorithm itself isn't super efficient; it doesn't
skip as much data as it could in certain places, it seems to always
scan through to the end of the file. What if your data was
something like
SIG ... SIG ... SIG ... EOF?

-
The magic number
130 for the end position is duplicated.

-
Use a crate like twoway to do
the heavy lifting of searching for byte substrings. This takes care of a
lot of the annoying details.

-
The data is sliced for the same range multiple times. Any array access is a tiny bit expensive as it has to do range checks. Doing it once is good for both programmer sanity and for the tiny gain in speed.

-
Using
format! to construct the file name may be more obvious than adding a string slice to a String`.

extern crate twoway;

use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
use std::{env, process};

const SIGNATURE: [u8; 8] = [137, 80, 78, 71, 13, 10, 26, 10];
const EOF: [u8; 8] = [73, 69, 78, 68, 174, 66, 96, 130];

fn load_file(path: P) -> Result, Box>
    where P: AsRef
{
    let mut f = try!(File::open(path));

    let mut buf = Vec::new();
    try!(f.read_to_end(&mut buf));
    Ok(buf)
}

struct EmbeddedPng(&'a [u8]);

impl Iterator for EmbeddedPng {
    type Item = &'a [u8];

    fn next(&mut self) -> Option {
        loop {
            match twoway::rfind_bytes(self.0, &SIGNATURE) {
                None => return None,
                Some(start_idx) => {
                    let (before, after) = self.0.split_at(start_idx);
                    self.0 = before;

                    if let Some(end_idx) = twoway::find_bytes(after, &EOF) {
                        let png_data = &after[..end_idx + EOF.len()];
                        return Some(png_data)
                    }
                }
            }
        }
    }
}

fn inner_main() -> Result> {
    let path = try!(env::args().nth(1).ok_or("No filename provided"));
    let buf = try!(load_file(path));

    for (i, png_data) in EmbeddedPng(&buf).enumerate() {
        let fname = format!("{}.png", i);
        let mut f = try!(File::create(fname));
        try!(f.write_all(png_data));
    }

    Ok(())
}

fn main() {
    if let Err(e) = inner_main() {
        println!("{}", e);
        process::exit(1);
    }
}

Code Snippets

extern crate twoway;

use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
use std::{env, process};

const SIGNATURE: [u8; 8] = [137, 80, 78, 71, 13, 10, 26, 10];
const EOF: [u8; 8] = [73, 69, 78, 68, 174, 66, 96, 130];

fn load_file<P>(path: P) -> Result<Vec<u8>, Box<Error>>
    where P: AsRef<Path>
{
    let mut f = try!(File::open(path));

    let mut buf = Vec::new();
    try!(f.read_to_end(&mut buf));
    Ok(buf)
}

struct EmbeddedPng<'a>(&'a [u8]);

impl<'a> Iterator for EmbeddedPng<'a> {
    type Item = &'a [u8];

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match twoway::rfind_bytes(self.0, &SIGNATURE) {
                None => return None,
                Some(start_idx) => {
                    let (before, after) = self.0.split_at(start_idx);
                    self.0 = before;

                    if let Some(end_idx) = twoway::find_bytes(after, &EOF) {
                        let png_data = &after[..end_idx + EOF.len()];
                        return Some(png_data)
                    }
                }
            }
        }
    }
}

fn inner_main() -> Result<(), Box<Error>> {
    let path = try!(env::args().nth(1).ok_or("No filename provided"));
    let buf = try!(load_file(path));

    for (i, png_data) in EmbeddedPng(&buf).enumerate() {
        let fname = format!("{}.png", i);
        let mut f = try!(File::create(fname));
        try!(f.write_all(png_data));
    }

    Ok(())
}

fn main() {
    if let Err(e) = inner_main() {
        println!("{}", e);
        process::exit(1);
    }
}

Context

StackExchange Code Review Q#126500, answer score: 4

Revisions (0)

No revisions yet.