HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpModerate

Checking 160,000 records in a database for changes

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
160000recordscheckingdatabasechangesfor

Problem

I am checking over 160,000 records in my database every 24 hours to see if there were changes in the incoming data coming from API and my existing data in the database. If there were I am updating those, otherwise I'm doing nothing. Checking 189 days takes 10 seconds and 160,000 about 2.5 hours.

What can I improve in my code to speed this up?

```
$listings = Array('23169','23567','23114','5487'); //Rooms to check (189 days in each room)
$count_listings = count(listings); //count the rooms
$continue = TRUE;
$x = 0;

while ($continue == TRUE && $x generate_link($list);
$results_page = $this->http_get_contents($url, $this->generate_proxy()); //JSON output
$calendar_month = json_decode($results_page, true); //Into Array

$data = Array();

foreach ($calendar_month['calendar_months'] as $month) {
foreach ($month['days'] as $day) {
$data[] = $day; //Getting each day
}
}
//Super_unique removes duplicate days
foreach ($this->super_unique($data) as $day_info) {
$data_listing = Array(
'list_id' => $list,
'list_price' => $day_info['price']['local_price'],
'list_currency' => $day_info['price']['local_currency'],
'list_date' => $day_info['date'],
'list_available' => $this->check_list_available((int)$day_info['available'])
);
//First get incoming result and search in the database
$check = $this->db->select('list_price, list_available')
->from('airbnb_lists_price')
->where("(list_id = '{$data_listing['list_id']}' and list_date = '{$data_listing['list_date']}')")
->get();
//If the record exists
if ($check->num_rows() > 0) {
foreach ($check->result() as $row) {
//If the record info was changed update
if

Solution

There is multiple things that can be optimized in this code.

It is pretty obvious that most time is spent in HTTP requests to remote API and in database queries, so these points should be first in list of optimizations.

Your code doesn't tells much about what is coming from remote API and you didn't tell if you really need to run all requests for this API each day. If amount of requests for remote API can be somehow reduced - you should do it. If not - you need to think about how "fresh" information from this API you need to have, maybe part of information can be cached and re-used.

Besides this, if you need information from remote API on daily basis - you may think about refactoring API requests loop into separate service that will run during whole day and store received information in some local data storage (e.g. database or some cache). Then your primary script will not need to spend time waiting for remote API responses and will run much faster.

If you know that significant amount of information that came from remote API updates less often than daily and you need to process only information that really changed - you should think about hashing. For example you can query API (even in this loop), clean it up from any information that may change at every request (e.g. request date) and, calculate hash from it (e.g. sha1) and store pair (url=>hash) somewhere. Then on the next iteration of script run you will be able to compare hash from new API response with already available hash and if they're same - completely skip data comparison with database.

About database queries - you should check that your query uses database indexes, it may significantly improve query performance.

And less intensive, but still useful optimizations is to avoid repetitive creation of things that you need to create once. For example:

  • Do you need to call generate_proxy() at each request?



  • You should use prepared SQL queries if your library supports them



  • You may collect information that should be updated into database into local array and then send it into database as a single transaction at the end of data processing - it may be faster.

Context

StackExchange Code Review Q#113378, answer score: 12

Revisions (0)

No revisions yet.