patternpythonMinor
Mixed scripting language API to determine file upload location and scrape city government website to find corresponding government official
Viewed 0 times
citywebsitefilegovernmentscriptinguploadscrapecorrespondinglanguagedetermine
Problem
I wrote this script as an NYC-specific-API for file upload for a mobile app. Users upload a video file and also their geographic coordinates.
I then use an external API to get the corresponding borough in NYC (since the iOS reverse geocoding SDK labels everything as New York, NY regardless of user location within NYC) before scraping the NYC Council homepage to find the appropriate council member based on geographic location. I don't have any experience with this sort of thing, so I really welcome all comments.
Here's the overall structure:
fileupload.php
```
// generalized server email functions + email functions specific to this file uploading API
require_once("email.php");
require_once("upload_mail_functions.php");
// MongoDB record constants
$SCRAPY_FAIL = "scrapyFail";
$NYT_FAIL = "nytFail";
$SUCCESS = "success";
// constant location parameters
$TARGET_DIR = "/var/www/html/uploads/";
$SCRAPY_DIR = "~/scrape/nyc_council";
$ADMIN_EMAIL = "someemail@example.com";
// move uploaded file to appropriate location, file name is made unique before upload
$target_dir = $TARGET_DIR;
$target_file = $target_dir . $_FILES["upload"]["name"];
$success = move_uploaded_file($_FILES["upload"]["tmp_name"], $target_file);
if($success) mail($ADMIN_EMAIL, "file uploaded", $_POST['latitude']);
else fail_email($mail, ' ', $email, $address, $short_file." FAILED TO UPlOAD" );
// I am using MongoDB so my understanding is that SQL injection isn't such a scary thing.
// mailing address
$address = $_POST['address'];
$email = $_POST['email'];
$address_components = explode(",", $address);
$street_address = $address_components[0];
// actual locatio
I then use an external API to get the corresponding borough in NYC (since the iOS reverse geocoding SDK labels everything as New York, NY regardless of user location within NYC) before scraping the NYC Council homepage to find the appropriate council member based on geographic location. I don't have any experience with this sort of thing, so I really welcome all comments.
Here's the overall structure:
- PHP file receives the upload, processes the upload, and calls an R script which calls NYT API to match latitude, longitude to NYC borough.
- Once the correct borough is returned from the R script, the original PHP script scalls Scrapy crawler.
- Once the Scrapy crawler returns, MongoDB is updated and appropriate emails are sent out.
fileupload.php
```
// generalized server email functions + email functions specific to this file uploading API
require_once("email.php");
require_once("upload_mail_functions.php");
// MongoDB record constants
$SCRAPY_FAIL = "scrapyFail";
$NYT_FAIL = "nytFail";
$SUCCESS = "success";
// constant location parameters
$TARGET_DIR = "/var/www/html/uploads/";
$SCRAPY_DIR = "~/scrape/nyc_council";
$ADMIN_EMAIL = "someemail@example.com";
// move uploaded file to appropriate location, file name is made unique before upload
$target_dir = $TARGET_DIR;
$target_file = $target_dir . $_FILES["upload"]["name"];
$success = move_uploaded_file($_FILES["upload"]["tmp_name"], $target_file);
if($success) mail($ADMIN_EMAIL, "file uploaded", $_POST['latitude']);
else fail_email($mail, ' ', $email, $address, $short_file." FAILED TO UPlOAD" );
// I am using MongoDB so my understanding is that SQL injection isn't such a scary thing.
// mailing address
$address = $_POST['address'];
$email = $_POST['email'];
$address_components = explode(",", $address);
$street_address = $address_components[0];
// actual locatio
Solution
First; You shouldn't be mixing languages to solve this. PHP can get website/api content using
Reviewing your PHP
-
Your large
-
-
You initialise a bunch of pointless variables, the idea that they're magic keywords is a little extraneous, you don't need to initialise them:
-
-
Injection Threat:
You directly use the
Consider checking your script for special characters, using a regex or direct character match.
Reviewing your Python:
Here, your code violates a few PEP8 points, such as too long lines, extraneous whitespace. Try using pep8online.com to test that.
I don't know whether
Into the following:
I don't know R very much, but here's a little:
Instead;
-
curl or file_get_contents, using Python and R is a little bit beyond extraneous.Reviewing your PHP
- You have double indentation; that is incorrect, use a single level of indentation (four spaces).
- In the following code sample, there's no reason to assign
$target_dir:
$target_dir = $TARGET_DIR;
$target_file = $target_dir . $_FILES["upload"]["name"];-
Your large
if statement is incorrectly formatted, you leave out the opening bracket, and still provide the closing brackets. Don't leave out brackets, it makes your code harder to maintain.-
array("email_sucess": you misspelled success.-
You initialise a bunch of pointless variables, the idea that they're magic keywords is a little extraneous, you don't need to initialise them:
$SCRAPY_FAIL = "scrapyFail";
$NYT_FAIL = "nytFail";
$SUCCESS = "success";-
$address_components: you don't use this more than once, forget initialising it at all. $address_components = explode(",", $address); $street_address = $address_components[0]; => $street_address = explode(",", $address)[0];-
$borough_abbrev: normally I'd suggest against using abbreviations as it makes your code harder to maintain, but in this case, it should be fine. However, you don't need to process it with a switch; use an array instead.$boroughAbbreviations = ["Manh" => 1, "Bron" => 2, "Broo" => 3, "Quee" => 4];
$borough = array_key_exists($abbreviation, $abbreviationList)
? $boroughAbbreviations[$abbreviation]
: 5;Injection Threat:
You directly use the
$_POST variables without any moderation. This could potentially be harmful. For example, let's say, instead of my latitude I put in || rm -rf;, your system would crash. And that's the least of it, pointing out the possibilities for viruses, trojans and all kinds of matter coming in.exec("r get_borough.R '$latitude' '$longitude'", $resultVar);Consider checking your script for special characters, using a regex or direct character match.
Reviewing your Python:
Here, your code violates a few PEP8 points, such as too long lines, extraneous whitespace. Try using pep8online.com to test that.
dmoz: the magical keywordclass DmozSpider(BaseSpider):
name = "dmoz"I don't know whether
dmoz is your username/nickname, but its presence is explained and probably unnecessary. The name variable is not even used.start_urls: is completely unnecessary to initialise as an array, when it contains a single string.
parseis extraneous, consider removing it altogether.
- After
anchoris defined you have extraneous whitespace, and at the end of the line, an extraneous\:
anchors = [td.find('a') for td in soup.findAll('td', {"class":"nav_text"})]\- The following block can be simplified.
link = mailto_remove.sub('', link)
f.write(link)
sys.stdout.write(a['href'])Into the following:
f.write(mailto_remove.sub('', link))
sys.stdout.write(link)I don't know R very much, but here's a little:
length_resultis extraneous, you don't need to initialise it:
length_result = length(df)
for(i in 1:length_result){Instead;
for(i in 1:length(df)){-
rd
-
lat & lon: you don't really need to initialise these, you can just put the argv[0] and argv[1]` definitions in the parameters directly.Code Snippets
$target_dir = $TARGET_DIR;
$target_file = $target_dir . $_FILES["upload"]["name"];$SCRAPY_FAIL = "scrapyFail";
$NYT_FAIL = "nytFail";
$SUCCESS = "success";$boroughAbbreviations = ["Manh" => 1, "Bron" => 2, "Broo" => 3, "Quee" => 4];
$borough = array_key_exists($abbreviation, $abbreviationList)
? $boroughAbbreviations[$abbreviation]
: 5;exec("r get_borough.R '$latitude' '$longitude'", $resultVar);class DmozSpider(BaseSpider):
name = "dmoz"Context
StackExchange Code Review Q#107731, answer score: 2
Revisions (0)
No revisions yet.