HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Downloading the oldest available version of some data files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
theavailableoldestversiondownloadingfilessomedata

Problem

I am using the cURL tool in MATLAB 2013b with Ubuntu to download a whole bunch of files. The files have one of three possible versions: 1.0.0, 2.0.0, or 2.1.0. Using the HTTP, I first check the headers with three queries to see which version exists. Then whichever exists I download it. If none exists (because of nonexistent dates) then I just move on.

Is there any other more efficient way of querying and then downloading the files? For example, check 1.0.0. If it exists, download it. If it doesn't exist then check 2.0.0 and so on. The only way I can think of doing that was some very ugly nested if-elseif statements. But this way, I query three times no matter what so a little wasteful I think.

Also any other comments on my coding style are welcome.

```
clc
close all
clear all

for y = 2012:2013
for m = 1:12
for d = 1:31

% To limit the dates because of the available data
if y==2012 && m 9
break
end

% The date string is created with padded zeros
if (m < 10) && (d < 10)
thedate = [num2str(y) '0' num2str(m) '0' num2str(d)];
elseif m < 10
thedate = [num2str(y) '0' num2str(m) num2str(d)];
elseif d < 10
thedate = [num2str(y) num2str(m) '0' num2str(d)];
else
thedate = [num2str(y) num2str(m) num2str(d)];
end

newname = ['ephemA' thedate '.h5']

% The entire file name is created
thefile = ['http://www.rbsp-ect.lanl.gov/data_pub/rbspa/MagEphem/def/rbspa_def_MagEphem_TS04D_' thedate];

% Use curl with --head flag to check the header to see which version exists
[status, result1] = system(['curl --head ' thefile '_v1.0.0.h5']);
[status, result2] = system(['curl --head ' thefile '_v2.0.0.h5']);
[status, result3] = system(['curl --head ' thefile '_v2.1.0.h5']);

% 200 means the file exists and is downloaded
if ~isempty(strfind(result1,'HTTP/1.1 200'))
[status, result] = system(['curl -o ' newname ' ' thefile '_v1.0.0.h5']);
elseif ~is

Solution

Why don't you read the index from http://www.rbsp-ect.lanl.gov/data_pub/rbspa/MagEphem/def/ and use that to decide which files you want to download?

Also, try to give multiple files (eg. 20) to each curl call: this will enable curl to reuse the same connection for those files.

Please also note that Matlab is not the right tool for this job. For example, this would be easier in Python which has many libraries to perform HTTP requests.

Context

StackExchange Code Review Q#38205, answer score: 4

Revisions (0)

No revisions yet.