patternphpMinor
Optimize web-scraping of Moscow grocery website
Viewed 0 times
websitescrapingmoscowgroceryoptimizeweb
Problem
This code works fine, but I believe it has optimization problems. Please review this.
Also, please keep in mind that it stops after each iteration of the loop
```
"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array('farm','dairy','delicatessen','confectionery','gastronomy'));
foreach($categories as $key => $cati){
{
if($key == 2){
$url = "http://www.delivery-club.ru/entities/groceries/farm/#group=%D0%93%D0%B0%D1%81%D1%82%D1%80%D0%BE%D0%BD%D0%BE%D0%BC%D0%B8%D1%8F&group=%D0%A4%D0%B5%D1%80%D0%BC%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%9C%D0%BE%D0%BB%D0%BE%D1%87%D0%BD%D1%8B%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%94%D0%B5%D0%BB%D0%B8%D0%BA%D0%B0%D1%82%D0%B5%D1%81%D1%8B&group=%D0%9A%D0%BE%D0%BD%D0%B4%D0%B8%D1%82%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%B8%D0%B7%D0%B4%D0%B5%D0%BB%D0%B8%D1%8F&group=%D0%92%D0%BE%D0%B4%D0%B0%2C+%D0%A7%D0%B0%D0%B9%2C+%D0%9A%D0%BE%D1%84%D0%B5&show=all";
get_data("Goods", str_get_html(file_get_contents($url)));
}
else{
foreach($mascow_sub_area as $subway){
//echo "Key: $key----Category: $sub.";
$url = "http://www.delivery-club.ru/ajax/entities/?mode=food&cat_id=$cati&mo_mode=null&district=null&okrug=null&cuisine=region¶ms=null&ajax_changer_subway=$subway";
$html = str_get_html(file_get_contents($url));
//echo $html;return;
//$html = "";
//echo "test";
get_data($subway, $html);
}
}
}
}
function get_data($subway, $html){
$data = array();
$data['subway'] = $subway;
$val = ($html->find('.dum'));
foreach($html->find('.dum') as $one){
foreach($one->find(".full_link") as
Also, please keep in mind that it stops after each iteration of the loop
foreach($mascow_sub_area as $subway).```
"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array('farm','dairy','delicatessen','confectionery','gastronomy'));
foreach($categories as $key => $cati){
{
if($key == 2){
$url = "http://www.delivery-club.ru/entities/groceries/farm/#group=%D0%93%D0%B0%D1%81%D1%82%D1%80%D0%BE%D0%BD%D0%BE%D0%BC%D0%B8%D1%8F&group=%D0%A4%D0%B5%D1%80%D0%BC%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%9C%D0%BE%D0%BB%D0%BE%D1%87%D0%BD%D1%8B%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%94%D0%B5%D0%BB%D0%B8%D0%BA%D0%B0%D1%82%D0%B5%D1%81%D1%8B&group=%D0%9A%D0%BE%D0%BD%D0%B4%D0%B8%D1%82%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%B8%D0%B7%D0%B4%D0%B5%D0%BB%D0%B8%D1%8F&group=%D0%92%D0%BE%D0%B4%D0%B0%2C+%D0%A7%D0%B0%D0%B9%2C+%D0%9A%D0%BE%D1%84%D0%B5&show=all";
get_data("Goods", str_get_html(file_get_contents($url)));
}
else{
foreach($mascow_sub_area as $subway){
//echo "Key: $key----Category: $sub.";
$url = "http://www.delivery-club.ru/ajax/entities/?mode=food&cat_id=$cati&mo_mode=null&district=null&okrug=null&cuisine=region¶ms=null&ajax_changer_subway=$subway";
$html = str_get_html(file_get_contents($url));
//echo $html;return;
//$html = "";
//echo "test";
get_data($subway, $html);
}
}
}
}
function get_data($subway, $html){
$data = array();
$data['subway'] = $subway;
$val = ($html->find('.dum'));
foreach($html->find('.dum') as $one){
foreach($one->find(".full_link") as
Solution
I want to focus principally on this section of your code:
Commented out code
There are some values and some code commented out. Why? If it's not needed, just remove it.
Values:
Code:
Hard-coded arbitrary values
You hard-coded dozens of arbitrary areas into the massive
-
Easy to add, update and remove values from a table without having to change the PHP script at all.
-
Takes advantage of the speed of SQL query optimizer to fetch and compare data.
-
Then just pass the result set back to PHP.
And that brings me to...
Wrong tool for the job.
U
$mascow_sub_area = array(//'Aviamotornaya','Avtozavodskaya','Akademicheskaya','Aleksandrovskiy_Sad',
'Alekseevskaya',
'Alma-Atinskaya','Altufevo','Annino','Arbatskaya','Aeroport','Babushkinskaya','Bagrationovskaya','Barrikadnaya','Baumanskaya','Begovaya','Belorusskaya','Belyaevo','Bibirevo','Biblioteka_imeni_Lenina','Borisovo','Borovitskaya','Botanicheskiy_Sad','Bratislavskaya','Bulvar_Admirala_Ushakova','Bulvar_Dmitriya_Donskogo','Buninskaya_Alleya','Varshavskaya','VDNKh','Vladykino','Vodnyy_Stadion','Voykovskaya','Volgogradskiy_Prospekt','Volzhskaya','Volokolamskaya','Vorobevy_Gory','Vystavochnaya','Vykhino','Delovoy_Tsentr','Dinamo','Dmitrovskaya','Dobryninskaya','Domodedovskaya','Dostoevskaya','Dubrovka','Zhulebino','Zyablikovo','Izmaylovskaya','Kaluzhskaya','Kantemirovskaya','Kakhovskaya','Kashirskaya','Kievskaya','Kitay-gorod','Kozhukhovskaya','Kolomenskaya','Komsomolskaya','Konkovo','Krasnogvardeyskaya','Krasnopresnenskaya','Krasnoselskaya','Krasnye_Vorota','Krestyanskaya_Zastava','Kropotkinskaya','Krylatskoe','Kuznetskiy_Most','Kuzminki','Kuntsevskaya','Kurskaya','Kutuzovskaya','Leninskiy_prospekt','Lermontovskij_prospekt','Lubyanka','Lyublino','Marksistskaya','Marina_roshcha','Marino','Mayakovskaya','Medvedkovo','Mezhdunarodnaya','Mendeleevskaya','Mitino','Molodezhnaya','Myakinino','Nagatinskaya','Nagornaya','Nakhimovskiy_prospekt','Novogireevo','Novokosino','Novokuznetskaya','Novoslobodskaya','Novoyasenevskaya','Novye_Cheremushki','Oktyabrskaya','Oktyabrskoe_Pole','Orekhovo','Otradnoe','Okhotnyy_Ryad','Paveletskaya','Park_Kultury','Park_Pobedy','Partizanskaya','Pervomayskaya','Perovo','Petrovsko-Razumovskaya','Pechatniki','Pionerskaya','Planernaya','Ploshchad_Ilicha','Ploshchad_Revolyutsii','Polezhaevskaya','Polyanka','Prazhskaya','Preobrazhenskaya_Ploshchad','Proletarskaya','Prospekt_Vernadskogo','Prospekt_Mira','Profsoyuznaya','Pushkinskaya','Pyatnickoe_shosse','Rechnoy_Vokzal','Rizhskaya','Rimskaya','Ryazanskiy_Prospekt','Savelovskaya','Sviblovo','Sevastopolskaya','Semenovskaya','Serpukhovskaya','Slavyanskiy_Bulvar','Smolenskaya','Sokol','Sokolniki','Sportivnaya','Sretenskiy_bulvar','Strogino','Studencheskaya','Sukharevskaya','Skhodnenskaya','Taganskaya','Tverskaya','Teatralnaya','Tekstilshchiki','Teletsentr','Teplyy_Stan','Timiryazevskaya','Tretyakovskaya','Trubnaya','Tulskaya','Turgenevskaya','Tushinskaya','Ulitsa_1905_goda','Ulitsa_Akademika_Koroleva','Ulitsa_Akademika_Yangelya','Ulitsa_Gorchakova','Ulitsa_Milashenkova','Ulitsa_Podbelskogo','Ulitsa_Sergeya_Eyzenshteyna','Ulitsa_Skobelevskaya','Ulitsa_Starokachalovskaya','Universitet','Filevskiy_Park','Fili','Frunzenskaya','Tsaritsyno','Tsvetnoy_bulvar','Cherkizovskaya','Chertanovskaya','Chekhovskaya','Chistye_Prudy','Chkalovskaya','Shabolovskaya','Shipilovskaya','Shosse_Entuziastov','Shchelkovskaya','Shchukinskaya','Elektrozavodskaya','Yugo-Zapadnaya','Yuzhnaya','Yasenevo');
$categories = array(1=>"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array('farm','dairy','delicatessen','confectionery','gastronomy'));
foreach($categories as $key => $cati){
{
if($key == 2){
$url = "http://www.delivery-club.ru/entities/groceries/farm/#group=%D0%93%D0%B0%D1%81%D1%82%D1%80%D0%BE%D0%BD%D0%BE%D0%BC%D0%B8%D1%8F&group=%D0%A4%D0%B5%D1%80%D0%BC%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%9C%D0%BE%D0%BB%D0%BE%D1%87%D0%BD%D1%8B%D0%B5+%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B&group=%D0%94%D0%B5%D0%BB%D0%B8%D0%BA%D0%B0%D1%82%D0%B5%D1%81%D1%8B&group=%D0%9A%D0%BE%D0%BD%D0%B4%D0%B8%D1%82%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D0%B5+%D0%B8%D0%B7%D0%B4%D0%B5%D0%BB%D0%B8%D1%8F&group=%D0%92%D0%BE%D0%B4%D0%B0%2C+%D0%A7%D0%B0%D0%B9%2C+%D0%9A%D0%BE%D1%84%D0%B5&show=all";
get_data("Goods", str_get_html(file_get_contents($url)));
}Commented out code
There are some values and some code commented out. Why? If it's not needed, just remove it.
Values:
$mascow_sub_area = array(//'Aviamotornaya','Avtozavodskaya','Akademicheskaya','Aleksandrovskiy_Sad',Code:
$categories = array(1=>"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array('farm','dairy','delicatessen','confectionery','gastronomy')); $categories = array(1=>"1",2=>"2");//array(1=>array(Hard-coded arbitrary values
You hard-coded dozens of arbitrary areas into the massive
$mascow_sub_area array, then you make PHP iterate over each value in multiple arrays to check for conditions. I would suggest that you make use of MySQL more and store your values there. This would have certain advantages:-
Easy to add, update and remove values from a table without having to change the PHP script at all.
-
Takes advantage of the speed of SQL query optimizer to fetch and compare data.
-
Then just pass the result set back to PHP.
And that brings me to...
Wrong tool for the job.
U
Code Snippets
$mascow_sub_area = array(//'Aviamotornaya','Avtozavodskaya','Akademicheskaya','Aleksandrovskiy_Sad',
'Alekseevskaya',
'Alma-Atinskaya','Altufevo','Annino','Arbatskaya','Aeroport','Babushkinskaya','Bagrationovskaya','Barrikadnaya','Baumanskaya','Begovaya','Belorusskaya','Belyaevo','Bibirevo','Biblioteka_imeni_Lenina','Borisovo','Borovitskaya','Botanicheskiy_Sad','Bratislavskaya','Bulvar_Admirala_Ushakova','Bulvar_Dmitriya_Donskogo','Buninskaya_Alleya','Varshavskaya','VDNKh','Vladykino','Vodnyy_Stadion','Voykovskaya','Volgogradskiy_Prospekt','Volzhskaya','Volokolamskaya','Vorobevy_Gory','Vystavochnaya','Vykhino','Delovoy_Tsentr','Dinamo','Dmitrovskaya','Dobryninskaya','Domodedovskaya','Dostoevskaya','Dubrovka','Zhulebino','Zyablikovo','Izmaylovskaya','Kaluzhskaya','Kantemirovskaya','Kakhovskaya','Kashirskaya','Kievskaya','Kitay-gorod','Kozhukhovskaya','Kolomenskaya','Komsomolskaya','Konkovo','Krasnogvardeyskaya','Krasnopresnenskaya','Krasnoselskaya','Krasnye_Vorota','Krestyanskaya_Zastava','Kropotkinskaya','Krylatskoe','Kuznetskiy_Most','Kuzminki','Kuntsevskaya','Kurskaya','Kutuzovskaya','Leninskiy_prospekt','Lermontovskij_prospekt','Lubyanka','Lyublino','Marksistskaya','Marina_roshcha','Marino','Mayakovskaya','Medvedkovo','Mezhdunarodnaya','Mendeleevskaya','Mitino','Molodezhnaya','Myakinino','Nagatinskaya','Nagornaya','Nakhimovskiy_prospekt','Novogireevo','Novokosino','Novokuznetskaya','Novoslobodskaya','Novoyasenevskaya','Novye_Cheremushki','Oktyabrskaya','Oktyabrskoe_Pole','Orekhovo','Otradnoe','Okhotnyy_Ryad','Paveletskaya','Park_Kultury','Park_Pobedy','Partizanskaya','Pervomayskaya','Perovo','Petrovsko-Razumovskaya','Pechatniki','Pionerskaya','Planernaya','Ploshchad_Ilicha','Ploshchad_Revolyutsii','Polezhaevskaya','Polyanka','Prazhskaya','Preobrazhenskaya_Ploshchad','Proletarskaya','Prospekt_Vernadskogo','Prospekt_Mira','Profsoyuznaya','Pushkinskaya','Pyatnickoe_shosse','Rechnoy_Vokzal','Rizhskaya','Rimskaya','Ryazanskiy_Prospekt','Savelovskaya','Sviblovo','Sevastopolskaya','Semenovskaya','Serpukhovskaya','Slavyanskiy_Bulvar','Smolenskaya','Sokol','Sokolniki','Sportivnaya','Sretenskiy_bulvar','Strogino','Studencheskaya','Sukharevskaya','Skhodnenskaya','Taganskaya','Tverskaya','Teatralnaya','Tekstilshchiki','Teletsentr','Teplyy_Stan','Timiryazevskaya','Tretyakovskaya','Trubnaya','Tulskaya','Turgenevskaya','Tushinskaya','Ulitsa_1905_goda','Ulitsa_Akademika_Koroleva','Ulitsa_Akademika_Yangelya','Ulitsa_Gorchakova','Ulitsa_Milashenkova','Ulitsa_Podbelskogo','Ulitsa_Sergeya_Eyzenshteyna','Ulitsa_Skobelevskaya','Ulitsa_Starokachalovskaya','Universitet','Filevskiy_Park','Fili','Frunzenskaya','Tsaritsyno','Tsvetnoy_bulvar','Cherkizovskaya','Chertanovskaya','Chekhovskaya','Chistye_Prudy','Chkalovskaya','Shabolovskaya','Shipilovskaya','Shosse_Entuziastov','Shchelkovskaya','Shchukinskaya','Elektrozavodskaya','Yugo-Zapadnaya','Yuzhnaya','Yasenevo');
$categories = array(1=>"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array$mascow_sub_area = array(//'Aviamotornaya','Avtozavodskaya','Akademicheskaya','Aleksandrovskiy_Sad',$categories = array(1=>"1",2=>"2");//array(1=>array('pizza','sushi','shashliki','pirogi','burger'),2=>array('farm','dairy','delicatessen','confectionery','gastronomy')); $categories = array(1=>"1",2=>"2");//array(1=>array(create table mascow_sub_area(
area_id int not null identity,
area_name varchar(255) not null
);
insert into mascow_sub_area(area_name) values
('Atinskaya'),('Altufevo'),('Annino') -- etc.Context
StackExchange Code Review Q#48531, answer score: 8
Revisions (0)
No revisions yet.