HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Allowing users to input a safe subset of HTML

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
allowinginputsafeuserssubsethtml

Problem

I am currently working on a project that requires users to input text that may or may not be formatted (if it helps, just think of it as a Stack Exchange clone, even though that's not quite accurate). Instead of opting for some sort of BB Code or other markup language to allow users to format their text, I figure it'd probably less stressful for the user to just use a subset of HTML. My users are all generally pretty experienced computer users, so I don't expect that to be much of an issue.

My plan is to have a whitelist of tags and attributes. Anything that doesn't match the whitelist will be removed. I figure that as long as I don't accidentally allow a dangerous attribute or element and I implement the code well, there won't be any problems.

I think I generally implemented the code well, but my knowledge of security issues is shaky at best, hence why I'm posting the code here.

```
public function set($value){
/*
First, I want to make sure that we're dealing with a string, so I cast it to a string. Should I throw an error if it's not a string instead?

Next, I transform \n to , so that whitespace appears correctly. I think it might be better to skip this step and use a CSS rule to allow whitespace.
*/
$value = (string) $value;
$value = nl2br(trim($value));

/*
Here is my whitelist. Notice the multiple br elements.
*/
$value = strip_tags($value, "");

/*
This caused some issues in my tests, where it'd unnecessarily escape certain things, so I commented it out. Are shell command injections something I need to worry about?
*/
//$value = escapeshellcmd($value);

$dom = new DOMDocument;
if($value === ""){
return false;
}
$dom->loadHTML($value);

$nodes = $dom->getElementsByTagName('*');
foreach($nodes as $node){
if($node->hasAttributes()){
foreach($node->attributes as $attr){
$name = $attr->name;

Solution

The way I see it, you must protect yourself from the following types of attacks:

  • XSS (Javascript injection to run on the browser)



  • PHP Injection



  • Information Disclosure



Allowing ` violates #3, so malicious users can embed a transparent pixel and see everyone who looks at a given page. I believe the style attribute could accomplish the same attack.

You may be vulnerable to PHP injection if you're ever careless about how you embed this user-generated content into your web pages. This can be very serious, because PHP runs on the server. It would expose any SQL databases and perhaps the whole system and network.

It appears that your code is vulnerable to XSS because Javascript can be embedded in the
style` attribute. You'll have to check what can be embedded in every attribute for HTML4 and HTML5. XSS attacks are serious because the script has access to your website with the current users session. In general XSS attacks can steal cookies, make new posts to propagate the XSS attack to other users, and attempt to change user account settings.

In my opinion, allowing users to embed HTML is dangerous, and you'd better do your homework. You'd be better off using some alternate markup like StackExchange uses.

Context

StackExchange Code Review Q#52303, answer score: 8

Revisions (0)

No revisions yet.