patternphpMinor
Concurrent HTTP request loop
Viewed 0 times
looprequestconcurrenthttp
Problem
I'm using the rolling curl library to fire HTTP requests for content-length headers of images (we need to know their size to weed out placeholders and low res images). The image URLs are stored in a database so I need to loop over the data in our products table (approx 1 million rows but will grow bigger, potentially much bigger).
I'm using PHP and the Laravel framework (the artisan CLI component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
```
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
Item2::where('img_size', '=', NULL)->chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/new/new_img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item)
{
if ($item->img !== '') {
$results[$item->id] = array('url' => $item->img, 'size' => null);
$rollingCurl->get($item->img);
}
}
//callback runs on each curl request
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
//var_dump($responseInfo);exit;
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_se
I'm using PHP and the Laravel framework (the artisan CLI component). The operation seems to slow down as time progresses e.g. it starts processing 100 requests in less than a second and later the time to process 100 rows/requests is logged at over 20 seconds. Can anyone explain this and / or offer any performance improvement suggestions? The task is running on an Amazon EC2 micro instance so processing power / memory is limited.
```
public function fire()
{
$dt = new DateTime();
Log::info("started: ".$dt->format('Y-m-d H:i:s'));
$counter = 1;
Item2::where('img_size', '=', NULL)->chunk(1000, function($items) use ( &$counter)
{
$results = array();
$filePath = storage_path().'/imports/new/new_img_sizes_'.$counter.'.csv';
if (!File::exists($filePath)) {
File::put($filePath, '');
}
$start = microtime(true);
$rollingCurl = new \RollingCurl\RollingCurl();
$rollingCurl->setOptions($this->curlOptions);
foreach ($items as $item)
{
if ($item->img !== '') {
$results[$item->id] = array('url' => $item->img, 'size' => null);
$rollingCurl->get($item->img);
}
}
//callback runs on each curl request
$rollingCurl->setCallback(function(\RollingCurl\Request $request, \RollingCurl\RollingCurl $rollingCurl) use (&$results, $filePath) {
$responseInfo = $request->getResponseInfo();
//var_dump($responseInfo);exit;
$length = $responseInfo['download_content_length'];
foreach ($results as $key => $value) {
if (array_se
Solution
I don't see what could be causing ever-decreasing performance since the only state outside the main loop is a counter, but adding calls to
But you can definitely improve the performance by avoiding the loop over
First, make the URL the key for the results array.
Next, use
Since you're placing each URL into the array before making the GET request, the
Separately, you can delay writing to the file until the end of the inner loop so you write one big chunk rather than one thousand tiny chunks. It may be that I/O is more costly, and perhaps you're getting killed by seeking to the end of an ever-growing file? Seems unlikely but maybe after a million writes it adds up.
General Review
There are a few things that could be cleaned up here as well.
-
Simplify testing if the image URL exists
can be simplified to
to skip
memory_get_usage as @jsanc623 recommends would help with that.But you can definitely improve the performance by avoiding the loop over
$results for every URL returned. Take advantage of PHP's array which is a hash table providing O(1) lookup.First, make the URL the key for the results array.
foreach ($items as $item) {
if ($item->img !== '') {
$url = $item->img;
$results[$url] = array(
'id' => $item->id,
'url' => $url,
'size' => null
);
$rollingCurl->get($url);
}
}Next, use
isset instead of array_search and looping.// foreach ($results as $key => $value) { ... } becomes
$url = $request->getURL();
if (isset($results[$url]) {
$results[$url]['size'] = $length;
File::append($filePath, $results[$url]['id'] . ',' . $length . "\r\n");
}Since you're placing each URL into the array before making the GET request, the
if isn't even necessary.Separately, you can delay writing to the file until the end of the inner loop so you write one big chunk rather than one thousand tiny chunks. It may be that I/O is more costly, and perhaps you're getting killed by seeking to the end of an ever-growing file? Seems unlikely but maybe after a million writes it adds up.
General Review
There are a few things that could be cleaned up here as well.
- Consistent formatting: put your opening brace on the same line or the next, but do it consistently.
- Blank lines before the closing brace seem misplaced to me. Use blank lines to logically group related code, though functions are preferred to whitespace.
-
Simplify testing if the image URL exists
if ($item->img !== '')can be simplified to
if ($item->img)to skip
null values as well. Even if the field cannot be null, it's still easier to read.Code Snippets
foreach ($items as $item) {
if ($item->img !== '') {
$url = $item->img;
$results[$url] = array(
'id' => $item->id,
'url' => $url,
'size' => null
);
$rollingCurl->get($url);
}
}// foreach ($results as $key => $value) { ... } becomes
$url = $request->getURL();
if (isset($results[$url]) {
$results[$url]['size'] = $length;
File::append($filePath, $results[$url]['id'] . ',' . $length . "\r\n");
}if ($item->img !== '')if ($item->img)Context
StackExchange Code Review Q#40351, answer score: 5
Revisions (0)
No revisions yet.