patternphpMinor
Have I missed some obvious XSS vulnerabilities?
Viewed 0 times
missedvulnerabilitiesobviousxsssomehave
Problem
As the question states: have I missed some obvious security vulnerabilities in this code? Particularly interested in XSS...
I'm trying to create something more robust that strip_tags, but simpler than HTML Purifier which just feels like overkill.
```
class StripTags {
private $tags;
private $dom;
function __construct($html, $tags = null) {
// Setup tags
$this->setTags($tags);
//Initialise document using provided HTML
$doc = new DOMDocument();
@$doc->loadHTML($html); //suppress invalid HTML warnings
$doc_elem = $doc->documentElement;
$this->traverseAndRemove($doc_elem);
// Store DOM for processing in __toString()
$this->dom = $doc;
}
public function __toString() {
// Remove unwanted DOCTYPE etc
$str = $this->dom->saveHTML();
$str = str_replace('', "", $str);
$str = preg_replace("/^/", "", trim($str));
$str = preg_replace("/$/", "", trim($str));
return $str;
}
private function setTags($t) {
if (empty($t) or !is_array($t)) {
$this->tags = array("html", "body", "p", "span", "div", "img", "b", "i", "em", "strong", "h1", "h2", "h3", "h4", "a", "pre");
} else {
$this->tags = $t;
}
if (!in_array("html",$this->tags)) {
$this->tags[] = "html";
}
if (!in_array("body",$this->tags)) {
$this->tags[] = "body";
}
}
private function traverseAndRemove(&$elem) {
// Check node type, attributes and child nodes
// Reset child/attribute loops if one is removed
if ($elem->nodeType === XML_ELEMENT_NODE) {
$tagName = $elem->tagName;
if (!in_array($tagName, $this->tags)) {
// Remove any elements in the tags array and stop processing it
$elem->parentNode->removeChild($elem);
return;
}
// Check attributes f
I'm trying to create something more robust that strip_tags, but simpler than HTML Purifier which just feels like overkill.
```
class StripTags {
private $tags;
private $dom;
function __construct($html, $tags = null) {
// Setup tags
$this->setTags($tags);
//Initialise document using provided HTML
$doc = new DOMDocument();
@$doc->loadHTML($html); //suppress invalid HTML warnings
$doc_elem = $doc->documentElement;
$this->traverseAndRemove($doc_elem);
// Store DOM for processing in __toString()
$this->dom = $doc;
}
public function __toString() {
// Remove unwanted DOCTYPE etc
$str = $this->dom->saveHTML();
$str = str_replace('', "", $str);
$str = preg_replace("/^/", "", trim($str));
$str = preg_replace("/$/", "", trim($str));
return $str;
}
private function setTags($t) {
if (empty($t) or !is_array($t)) {
$this->tags = array("html", "body", "p", "span", "div", "img", "b", "i", "em", "strong", "h1", "h2", "h3", "h4", "a", "pre");
} else {
$this->tags = $t;
}
if (!in_array("html",$this->tags)) {
$this->tags[] = "html";
}
if (!in_array("body",$this->tags)) {
$this->tags[] = "body";
}
}
private function traverseAndRemove(&$elem) {
// Check node type, attributes and child nodes
// Reset child/attribute loops if one is removed
if ($elem->nodeType === XML_ELEMENT_NODE) {
$tagName = $elem->tagName;
if (!in_array($tagName, $this->tags)) {
// Remove any elements in the tags array and stop processing it
$elem->parentNode->removeChild($elem);
return;
}
// Check attributes f
Solution
It would help if you explained exactly why you think this approach is more robust than using strip_tags. It will certainly be likely to be a lot slower.
From a quick inspection it doesn't seem to filter CSS behaviours nor data URIs.
If it were me I'd be building on to strip_tags rather than trying to start from scratch.
From a quick inspection it doesn't seem to filter CSS behaviours nor data URIs.
If it were me I'd be building on to strip_tags rather than trying to start from scratch.
Context
StackExchange Code Review Q#25620, answer score: 2
Revisions (0)
No revisions yet.