patternphpMinor
Regex to remove inline javascript from string
Viewed 0 times
javascriptremoveinlineregexfromstring
Problem
I need to remove inline javascript for a given string. Examples:
If user typed: `
Or if found an improvement that in a benchmark demonstrate better performance I should be happy to apply it as long as it does not break use cases already supported.
Thank You!
Update
I finally used htmlpurifier
If user typed: `
I should need to convert into
I created this PHP code and it works(apparently without issues):
http://writecodeonline.com/php/
function test_input($input){
//I have a list with all events but for this example I used two
$html_events = 'onload|onclick';
$pattern = "/(]*)($html_events)([\s]*=[\s]*)('[^>]*'|\"[^>]*\")([^>]*>)/i";
$replacement = '$1$5';
while( preg_match($pattern, $input) ){
$input = preg_replace($pattern, $replacement, $input);
}
return htmlentities($input);
}
echo test_input(''). '';
echo test_input(''). '';
echo test_input('hello buddies'). '';
I'm just looking for improvements or use cases that I did not supporting or that break my regex. I would appreciate if you tell me:
This: test_input('something bad');` breaks your regex.Or if found an improvement that in a benchmark demonstrate better performance I should be happy to apply it as long as it does not break use cases already supported.
Thank You!
Update
I finally used htmlpurifier
Solution
Parsing markup with regex is like building your house using lego... it's not the right tool for the job. HTML is not a regular language, therefore regular expressions don't cut the mustard. More than that: You're actively working to bring the world as we know it to an end, which drives people insane
What you need is a DOM parser, and as luck would have it, PHP has the
Tadaa... both
If you want, you can echo only the tags you've changed, like so:
As Jan said, for maintainability it's best to use an array of "forbidden attributes". That's what the
What you need is a DOM parser, and as luck would have it, PHP has the
DOMDocument object, which is just that:$dom = new DOMDocument;
$dom->loadHTML('');
$nodes = $dom->getElementsByTagName('*');//just get all nodes,
//$dom->getElementsByTagName('img'); would work, too
foreach($nodes as $node)
{
if ($node->hasAttribute('onload'))
{
$node->removeAttribute('onload');
}
if ($node->hasAttribute('onclick'))
{
$node->removeAttribute('onclick');
}
}
echo $dom->saveHTML();//will include html, head, body tags and doctypeTadaa... both
onload and onclick have been removed from the markup, without the pain of writing a reliable and stable regex, that can deal with in-line JS... As an added bonus, this code will be far more maintainable (and expandable) in the future. I'd much prefer maintaining this code, than having to rework a regular expression somebody wrote a couple of months ago...If you want, you can echo only the tags you've changed, like so:
$changed = array();
$attributesOfDeath = array('onload', 'onclick');
foreach($nodes as $node)
{
$current = null;
foreach($attributesOfDeath as $attr)
{
if ($node->hasAttribute($attr))
{
$node->removeAttribute($attr);
$current = $node;
}
}
if ($current)
{
$changed[] = $current;//add to changed array
}
}
$changed = array_map(array($dom, 'saveXML'), $changed);
echo implode(PHP_EOL, $changed);As Jan said, for maintainability it's best to use an array of "forbidden attributes". That's what the
$attributesOfDeath array is for. If you want to, later on, check for a third or fourth attribute, you can simply add that to the array, and nothing else in your code need change. It'll just keep on working as before.Code Snippets
$dom = new DOMDocument;
$dom->loadHTML('<img onload="alert(\'hello world\');" onclick="alert(\'hello world\');" />');
$nodes = $dom->getElementsByTagName('*');//just get all nodes,
//$dom->getElementsByTagName('img'); would work, too
foreach($nodes as $node)
{
if ($node->hasAttribute('onload'))
{
$node->removeAttribute('onload');
}
if ($node->hasAttribute('onclick'))
{
$node->removeAttribute('onclick');
}
}
echo $dom->saveHTML();//will include html, head, body tags and doctype$changed = array();
$attributesOfDeath = array('onload', 'onclick');
foreach($nodes as $node)
{
$current = null;
foreach($attributesOfDeath as $attr)
{
if ($node->hasAttribute($attr))
{
$node->removeAttribute($attr);
$current = $node;
}
}
if ($current)
{
$changed[] = $current;//add to changed array
}
}
$changed = array_map(array($dom, 'saveXML'), $changed);
echo implode(PHP_EOL, $changed);Context
StackExchange Code Review Q#30045, answer score: 5
Revisions (0)
No revisions yet.