HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Reading XML files from RSS of different websites in PHP

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
readingrsswebsitesphpxmldifferentfilesfrom

Problem

I've created a working XML RSS reader. I wanted to know if it's a good one or not and if it's efficient for the server or not. I run this code every 12 hours or something give or take the amount of time that a new article will be published.

This is the code itself with all the explanation of every part of the code. Tell me if it's efficient or if it needs some changes to do an even better a job.

``
prepare("insert INTO
articles` (PublishDate, ArticleTitle, ArticleBody) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $PublishingDate, $Title, $Description); // making sure that all of them are string

foreach ($RSSXMLURL as $URL) { // this for each loop will go to every xml file on the revious array
$XML = simplexml_load_file($URL); // getting the xml file content

foreach ($XML->channel->item as $item) { // this for each loop will get every item(article) which is
//how it's written in every xml file
$Title = $item->title; // getting the title
$String = $item->description; // getting the article body in a temp variable for now so we can remove the ads
// later on the code
$Link = $item->link; // getting the article link
$PublishingDate = $item->pubDate; // getting the publishing date of the article

// this part of code will get all the ads from the articles
// after some reseach i found out that most of them have one thing in commen
// clear='all' which is written in a tag
// so i check if there was any
if (strpos($String,"clear='all'") !== false) { // if there was any
$Pos = strpos($String, "clear='all'"); // i get it's position
$String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
// replacing it with an empty space with that i completly remove the ads
if (strlen($String) execute()){
$NumSa

Solution

I'm not very familiar with PHP so this will only cover some basic observations.

Readability.

On the plus side, your code is full of comments explaining what everything does.

On the minus side. It also suggests you're not separating things out.
Really good code should never need to explain what it is doing. It should only ever need to explain why something was done the way it was.
The rest of it should be procedures and subprocedures which read remarkably close to plain english along the lines of:


Begin Procedure

Create Foo

Determine size of Foo

For one to size of Foo

Get basic info

Remove ABC from Foo

Reformat XYZ To GHI

Check DEF

Output BAR

Next

End Procedure

Which makes your procedure incredibly easy to follow, change and maintain.

To quote people much more experienced than me: "Refactor Mercilessly".

You may find this useful.

Generally, refactoring just means taking parts of your code that do specific things and breaking them down into smaller and smaller procedures / subprocedures / classes / etc.

As an example, this:

$RSSXMLURL = array(
        'http://www.buzzfeed.com/tech.xml',
        'http://www.buzzfeed.com/category/celebrity.xml',
        'http://feeds.bbci.co.uk/news/rss.xml',  
        ....


needs to be a separate procedure that you call. Probably named something like FillArrayWithUrls.

And this:

// this part of code will get all the ads from the articles
        // after some reseach i found out that most of them have one thing in commen
        // clear='all' which is written in a tag
        // so i check if there was any
        if (strpos($String,"clear='all'") !== false) { // if there was any
            $Pos = strpos($String, "clear='all'"); // i get it's position
            $String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
            // replacing it with an empty space with that i completly remove the ads
            if (strlen($String) <= 4) { // now check if the article body has any data in it after removing the ads
                // if there wasn't any make it so it would show some meaniful data
                $String = "There is no Description"; // <==
            }
        }
        $Description = $String; // now after we finished all that, put it in the description variable


Can be a separate procedure called something like RemoveAdsFromString.

Nested if statements (especially beyond 2 or 3) always smell very fishy.

This segment:

if (strpos($PublishingDate, "Jan") !== false) {
            $PublishingDate = str_replace("Jan", "01", $PublishingDate);
            $Month = "01";
        } else if (strpos($PublishingDate, "Feb") !== false) {
            $PublishingDate = str_replace("Feb", "02", $PublishingDate);
            $Month = "02";
        } else if (strpos($PublishingDate, "Mar") !== false) {
            $PublishingDate = str_replace("Mar", "03", $PublishingDate);
            $Month = "03";
        } 
        .........


Should really be something like:

Switch (strpos($PublishingDate)) {

    case "Jan":
        doFoo('jan');
        break;

    case "Feb":
        doFoo('feb');
        break;
    ...

    default:
        errorHandler();
}


Plus, it should really be its own procedure too.

And suddenly, you've gone from 'wall of text' code to:

channel->item as $item) {
            $title = $X;
            $link = $y;
            $publishingDate = $z;
            ..
            ..

            RemoveAdsFromString();

            ConvertStringMonthToNumerical();

            ReformatPublishingDate();

            CheckForSuccess();
        }

    }
}


Which is much easier to follow and understand and it doesn't need 6 lines of comments every few steps to document what everything is doing, because the procedure names are self-explanatory.

Code Snippets

$RSSXMLURL = array(
        'http://www.buzzfeed.com/tech.xml',
        'http://www.buzzfeed.com/category/celebrity.xml',
        'http://feeds.bbci.co.uk/news/rss.xml',  
        ....
// this part of code will get all the ads from the articles
        // after some reseach i found out that most of them have one thing in commen
        // clear='all' which is written in a tag
        // so i check if there was any
        if (strpos($String,"clear='all'") !== false) { // if there was any
            $Pos = strpos($String, "clear='all'"); // i get it's position
            $String = str_replace(substr($String, $Pos), "", $String);// i remove it from the string through
            // replacing it with an empty space with that i completly remove the ads
            if (strlen($String) <= 4) { // now check if the article body has any data in it after removing the ads
                // if there wasn't any make it so it would show some meaniful data
                $String = "There is no Description"; // <==
            }
        }
        $Description = $String; // now after we finished all that, put it in the description variable
if (strpos($PublishingDate, "Jan") !== false) {
            $PublishingDate = str_replace("Jan", "01", $PublishingDate);
            $Month = "01";
        } else if (strpos($PublishingDate, "Feb") !== false) {
            $PublishingDate = str_replace("Feb", "02", $PublishingDate);
            $Month = "02";
        } else if (strpos($PublishingDate, "Mar") !== false) {
            $PublishingDate = str_replace("Mar", "03", $PublishingDate);
            $Month = "03";
        } 
        .........
Switch (strpos($PublishingDate)) {

    case "Jan":
        doFoo('jan');
        break;

    case "Feb":
        doFoo('feb');
        break;
    ...

    default:
        errorHandler();
}
<?php

function GetRSSData(){

    FillArrayWithUrls();

    //Declare Key variables
    ..
    ..

    foreach ($RSSXMLURL as $URL) { // this for each loop will go to every xml file on the revious array
        $XML = simplexml_load_file($URL); // getting the xml file content
        foreach ($XML->channel->item as $item) {
            $title = $X;
            $link = $y;
            $publishingDate = $z;
            ..
            ..

            RemoveAdsFromString();

            ConvertStringMonthToNumerical();

            ReformatPublishingDate();

            CheckForSuccess();
        }

    }
}

Context

StackExchange Code Review Q#102265, answer score: 6

Revisions (0)

No revisions yet.