HiveBrain v1.2.0
Get Started
← Back to all entries
patternphplaravelMinor

Document Merger using PhpDocX

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mergerusingdocumentphpdocx

Problem

I'm developing a document merger that utilizes an FTP site containing hundreds of documents.

FTP Connection Function

private static function establishFTP() {
    $ftp_conn = ftp_connect(getenv('FTP_HOST'));
    ftp_login($ftp_conn,getenv('FTP_USER'),getenv('FTP_PASS'));
    ftp_pasv($ftp_conn,true);

    return $ftp_conn;
}


I have a function that connects to the FTP site and generates an array of the documents stored inside a designated directory. It will remove the extensions from any files that are of type .docx. Then it will filter out all values containing a .. This removes any extra files that are not a Word document and also removes the FTP root pathings of . and .. from the array.

Form Generation Function

public static function generateForm() {
    $ftp_conn = self::establishFTP();

    $forms = ftp_nlist($ftp_conn,getenv('FTP_DIRECTORY'));
    $forms = str_replace('.docx','',$forms);
    $forms = array_filter($forms,function($value) {
        return strpos($value,'.') === false;
    });

    ftp_close($ftp_conn);

    return view('testDoc')->with(array('forms'=>$forms));
}


My view is very basic right now just to prove functionality. There is lots of design work that needs done to make it more user-friendly and functional.

Blade View


Form

{{ Form::open(array('action'=>'DocumentController@mergeDocument')) }}
@foreach($forms as $doc)
    {{ Form::checkbox('documents[]',$doc) }}
    {{ Form::label(null,$doc) }}
@endforeach
{{ Form::submit('Submit',array('id'=>'submitBtn')) }}
{{ Form::close() }}


Lastly, I have a function that will accept an array of string values from the GUI. It will then establish a FTP connection and grab all documents requested from the FTP site. It will store these in a temp directory until the documents are merged and downloaded to the browser. Then it will unlink all temp files.

Merge and Download Documents Function

```
public static function mergeDocument() {
$ftp_conn = self::establishFT

Solution

Security: Credentials in ENV

It is less than ideal to store your credentials in environment variables, as they may easily leak.

phpinfo for example will print all environment variables.

Of course, you don't want to allow access to phpinfo to just anyone either, but it may still leak, and not everyone who should be allowed to see phpinfo should be allowed to see the FTP credentials.

Security: Directory Traversal which may lead to Information Leak and Limited DOS

An attacker can delete arbitrary .docx files in arbitrary locations because you never check the input for directory traversal. The problematic code is this:

$forms = Input::get('documents');
[...]
foreach($forms as $doc) {
    unlink("temp/files/$doc.docx");
}


The impact of this is rather small, but depending on what you store on your server, there may be some damage from this.

As you actually know what files are acceptable, you can easily use a whitelist approach here. At a minimum, you should check if the normalized path is inside the root dir.

Alternatively, if you do decide to cache filenames locally (see below), you could simply pass ids to the user instead of filenames, avoiding the problem altogether.

The same problem also exists here:

$forms = Input::get('documents');
[...]
foreach($forms as $doc) {
    ftp_get($ftp_conn,"temp/files/$doc.docx",getenv('FTP_DIRECTORY') . "/$doc.docx",FTP_BINARY);
    $mergeDocs[] = "temp/files/$doc.docx";
}


An attacker could read .doxc files outside of the FTP_DIRECTORY (if the FTP server allows this, and if the local server is writable at the given directory, both rather big ifs, but I would still protect against this).

Security: SSL

Your FTP connection will not be encrypted, meaning that a theoretical man in the middle can intercept your FTP credentials or the transfered files.

This may or may not be a problem for your use-case.

An alternative would be ftp_ssl_connect or using sftp (see also here).

Security/Funktionality: DOS & Performance

From what I can tell, each time I visit the website containing the form that lists files I can join, you open an FTP connection and retrieve a list of all the files (which may be quite a lot).

A user (or multiple user) may accidentally or on purpose reload the form a bunch of times, causing performance problems.

I would definitely profile this and see what strain it puts on your server. It may make sense to cache the list of docs locally (if it doesn't change that often).

Usability / Bug / Security: File Listing

Your mechanism will hide all files which contain a dot in their filename, which may lead to hard to trace bugs.

Your current mechanism may also have security implications, as files with no extension (for example secret) are listed as well, even though they shouldn't be.

Instead, I would probably use an approach such as this:

  • Filter out all files that do not have the docx extension



  • Remove the .docx extension



Naming

You use forms in multiple places as variable name where it doesn't really make sense. The variable doesn't contain multiple forms, but actually multiple document names. This naming also leads to the odd to read foreach($forms as $doc) loop. Instead, you should name name these variables $docs (in mergeDocument, generateForm, and in the view).

Code Snippets

$forms = Input::get('documents');
[...]
foreach($forms as $doc) {
    unlink("temp/files/$doc.docx");
}
$forms = Input::get('documents');
[...]
foreach($forms as $doc) {
    ftp_get($ftp_conn,"temp/files/$doc.docx",getenv('FTP_DIRECTORY') . "/$doc.docx",FTP_BINARY);
    $mergeDocs[] = "temp/files/$doc.docx";
}

Context

StackExchange Code Review Q#138868, answer score: 5

Revisions (0)

No revisions yet.