patternMinor
Reading XML files from RSS of different websites in PHP
Viewed 0 times
readingrsswebsitesphpxmldifferentfilesfrom
Problem
I've created a working XML RSS reader. I wanted to know if it's a good one or not and if it's efficient for the server or not. I run this code every 12 hours or something give or take the amount of time that a new article will be published.
This is the code itself with all the explanation of every part of the code. Tell me if it's efficient or if it needs some changes to do an even better a job.
``
$stmt->bind_param("sss", $PublishingDate, $Title, $Description); // making sure that all of them are string
foreach ($RSSXMLURL as $URL) { // this for each loop will go to every xml file on the revious array
$XML = simplexml_load_file($URL); // getting the xml file content
foreach ($XML->channel->item as $item) { // this for each loop will get every item(article) which is
//how it's written in every xml file
$Title = $item->title; // getting the title
$String = $item->description; // getting the article body in a temp variable for now so we can remove the ads
// later on the code
$Link = $item->link; // getting the article link
$PublishingDate = $item->pubDate; // getting the publishing date of the article
// this part of code will get all the ads from the articles
// after some reseach i found out that most of them have one thing in commen
// clear='all' which is written in a tag
// so i check if there was any
if (strpos($String,"clear='all'") !== false) { // if there was any
$Pos = strpos($String, "clear='all'"); // i get it's position
$String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
// replacing it with an empty space with that i completly remove the ads
if (strlen($String) execute()){
$NumSa
This is the code itself with all the explanation of every part of the code. Tell me if it's efficient or if it needs some changes to do an even better a job.
``
prepare("insert INTO articles` (PublishDate, ArticleTitle, ArticleBody) VALUES (?, ?, ?)");$stmt->bind_param("sss", $PublishingDate, $Title, $Description); // making sure that all of them are string
foreach ($RSSXMLURL as $URL) { // this for each loop will go to every xml file on the revious array
$XML = simplexml_load_file($URL); // getting the xml file content
foreach ($XML->channel->item as $item) { // this for each loop will get every item(article) which is
//how it's written in every xml file
$Title = $item->title; // getting the title
$String = $item->description; // getting the article body in a temp variable for now so we can remove the ads
// later on the code
$Link = $item->link; // getting the article link
$PublishingDate = $item->pubDate; // getting the publishing date of the article
// this part of code will get all the ads from the articles
// after some reseach i found out that most of them have one thing in commen
// clear='all' which is written in a tag
// so i check if there was any
if (strpos($String,"clear='all'") !== false) { // if there was any
$Pos = strpos($String, "clear='all'"); // i get it's position
$String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
// replacing it with an empty space with that i completly remove the ads
if (strlen($String) execute()){
$NumSa
Solution
I'm not very familiar with PHP so this will only cover some basic observations.
Readability.
On the plus side, your code is full of comments explaining what everything does.
On the minus side. It also suggests you're not separating things out.
Really good code should never need to explain what it is doing. It should only ever need to explain why something was done the way it was.
The rest of it should be procedures and subprocedures which read remarkably close to plain english along the lines of:
Begin Procedure
Create Foo
Determine size of Foo
For one to size of Foo
Get basic info
Remove ABC from Foo
Reformat XYZ To GHI
Check DEF
Output BAR
Next
End Procedure
Which makes your procedure incredibly easy to follow, change and maintain.
To quote people much more experienced than me: "Refactor Mercilessly".
You may find this useful.
Generally, refactoring just means taking parts of your code that do specific things and breaking them down into smaller and smaller procedures / subprocedures / classes / etc.
As an example, this:
needs to be a separate procedure that you call. Probably named something like
And this:
Can be a separate procedure called something like
Nested
This segment:
Should really be something like:
Plus, it should really be its own procedure too.
And suddenly, you've gone from 'wall of text' code to:
Which is much easier to follow and understand and it doesn't need 6 lines of comments every few steps to document what everything is doing, because the procedure names are self-explanatory.
Readability.
On the plus side, your code is full of comments explaining what everything does.
On the minus side. It also suggests you're not separating things out.
Really good code should never need to explain what it is doing. It should only ever need to explain why something was done the way it was.
The rest of it should be procedures and subprocedures which read remarkably close to plain english along the lines of:
Begin Procedure
Create Foo
Determine size of Foo
For one to size of Foo
Get basic info
Remove ABC from Foo
Reformat XYZ To GHI
Check DEF
Output BAR
Next
End Procedure
Which makes your procedure incredibly easy to follow, change and maintain.
To quote people much more experienced than me: "Refactor Mercilessly".
You may find this useful.
Generally, refactoring just means taking parts of your code that do specific things and breaking them down into smaller and smaller procedures / subprocedures / classes / etc.
As an example, this:
$RSSXMLURL = array(
'http://www.buzzfeed.com/tech.xml',
'http://www.buzzfeed.com/category/celebrity.xml',
'http://feeds.bbci.co.uk/news/rss.xml',
....needs to be a separate procedure that you call. Probably named something like
FillArrayWithUrls.And this:
// this part of code will get all the ads from the articles
// after some reseach i found out that most of them have one thing in commen
// clear='all' which is written in a tag
// so i check if there was any
if (strpos($String,"clear='all'") !== false) { // if there was any
$Pos = strpos($String, "clear='all'"); // i get it's position
$String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
// replacing it with an empty space with that i completly remove the ads
if (strlen($String) <= 4) { // now check if the article body has any data in it after removing the ads
// if there wasn't any make it so it would show some meaniful data
$String = "There is no Description"; // <==
}
}
$Description = $String; // now after we finished all that, put it in the description variableCan be a separate procedure called something like
RemoveAdsFromString.Nested
if statements (especially beyond 2 or 3) always smell very fishy.This segment:
if (strpos($PublishingDate, "Jan") !== false) {
$PublishingDate = str_replace("Jan", "01", $PublishingDate);
$Month = "01";
} else if (strpos($PublishingDate, "Feb") !== false) {
$PublishingDate = str_replace("Feb", "02", $PublishingDate);
$Month = "02";
} else if (strpos($PublishingDate, "Mar") !== false) {
$PublishingDate = str_replace("Mar", "03", $PublishingDate);
$Month = "03";
}
.........Should really be something like:
Switch (strpos($PublishingDate)) {
case "Jan":
doFoo('jan');
break;
case "Feb":
doFoo('feb');
break;
...
default:
errorHandler();
}Plus, it should really be its own procedure too.
And suddenly, you've gone from 'wall of text' code to:
channel->item as $item) {
$title = $X;
$link = $y;
$publishingDate = $z;
..
..
RemoveAdsFromString();
ConvertStringMonthToNumerical();
ReformatPublishingDate();
CheckForSuccess();
}
}
}Which is much easier to follow and understand and it doesn't need 6 lines of comments every few steps to document what everything is doing, because the procedure names are self-explanatory.
Code Snippets
$RSSXMLURL = array(
'http://www.buzzfeed.com/tech.xml',
'http://www.buzzfeed.com/category/celebrity.xml',
'http://feeds.bbci.co.uk/news/rss.xml',
....// this part of code will get all the ads from the articles
// after some reseach i found out that most of them have one thing in commen
// clear='all' which is written in a tag
// so i check if there was any
if (strpos($String,"clear='all'") !== false) { // if there was any
$Pos = strpos($String, "clear='all'"); // i get it's position
$String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
// replacing it with an empty space with that i completly remove the ads
if (strlen($String) <= 4) { // now check if the article body has any data in it after removing the ads
// if there wasn't any make it so it would show some meaniful data
$String = "There is no Description"; // <==
}
}
$Description = $String; // now after we finished all that, put it in the description variableif (strpos($PublishingDate, "Jan") !== false) {
$PublishingDate = str_replace("Jan", "01", $PublishingDate);
$Month = "01";
} else if (strpos($PublishingDate, "Feb") !== false) {
$PublishingDate = str_replace("Feb", "02", $PublishingDate);
$Month = "02";
} else if (strpos($PublishingDate, "Mar") !== false) {
$PublishingDate = str_replace("Mar", "03", $PublishingDate);
$Month = "03";
}
.........Switch (strpos($PublishingDate)) {
case "Jan":
doFoo('jan');
break;
case "Feb":
doFoo('feb');
break;
...
default:
errorHandler();
}<?php
function GetRSSData(){
FillArrayWithUrls();
//Declare Key variables
..
..
foreach ($RSSXMLURL as $URL) { // this for each loop will go to every xml file on the revious array
$XML = simplexml_load_file($URL); // getting the xml file content
foreach ($XML->channel->item as $item) {
$title = $X;
$link = $y;
$publishingDate = $z;
..
..
RemoveAdsFromString();
ConvertStringMonthToNumerical();
ReformatPublishingDate();
CheckForSuccess();
}
}
}Context
StackExchange Code Review Q#102265, answer score: 6
Revisions (0)
No revisions yet.