HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding travel distance between airports

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
airportsdistancebetweenfindingtravel

Problem

I have 2 nested for loops which i want to get rid of. Any thoughts?
I am calculating distance between cities based on their longitude and latitude. There is a custom function earth.dist() that i am using in the loop.

for (i in 1:nrow(dat)) {
  #for each other airport
  for (j in 1:nrow(dat)) {
    #if both airport are different
    if (dat[i,3]!=dat[j,3]){
      k=k+1
      #airport1
      airport1[k] <- dat[i,3]      
      #airport2
      airport2[k] <- dat[j,3]
      #find travel distance
      travdist[k] <- earth.dist(dat[i,5],dat[i,4],dat[j,5],dat[j,4])

    }
  }
}


function for distance calculation

earth.dist <- function (lon1, lat1, lon2, lat2){
  rad <- pi/180
  a1 <- lat1 * rad
  a2 <- lon1 * rad
  b1 <- lat2 * rad
  b2 <- lon2 * rad
  dlon <- b2 - a2
  dlat <- b1 - a1
  a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
  c <- 2 * atan2(sqrt(a), sqrt(1 - a))
  R <- 6378.145
  d <- R * c
  return(d)
}

Solution

First, let's download some data similar to yours (I assume). This csv available online has almost 7,000 airports:

url <- "https://commondatastorage.googleapis.com/ckannet-storage/2012-07-09T214020/global_airports.csv"
library(RCurl)
txt <- getURL(url)
data <- read.csv(textConnection(txt), stringsAsFactors = FALSE)


For illustration purposes, we will use a small sample: the six airports in Jamaica.

dat <- subset(data, country == "Jamaica",
              c("city", "country", "name", "latitude", "longitude"))
dat
#              city country               name latitude longitude
# 1745    Ocho Rios Jamaica           Boscobel 18.40425 -76.96902
# 1746     Kingston Jamaica Norman Manley Intl 17.93567 -76.78750
# 1747  Montego Bay Jamaica      Sangster Intl 18.50372 -77.91336
# 1748 Port Antonio Jamaica          Ken Jones 18.19881 -76.53453
# 1749     Kingston Jamaica         Tinson Pen 17.98856 -76.82376
# 5878       Negril Jamaica   Negril Aerodrome 18.34000 -78.33556


Now let's have a look at your code. I will not review the math in earth.dist, I'll assume it is correct. One beautiful thing about that function is that it is vectorized, i.e., you could give it n-long vectors as inputs and it will compute n distances in a single call. Unfortunately, the rest of your code does not take advantage of it. Instead, your double loop only calls earth.dist with scalars at each time...

Instead of a double loop, you should be using the outer function. Have a look at the doc (?outer) if you are not familiar with it. The typical usage is outer(X, Y, FUN) where X and Y are vectors and FUN is a vectorized function. The output is a matrix Z where Z[i, j] is the result of FUN(X[i], Y[j]). But what's brilliant about outer is that it does not call FUN as many times as there are entries in Z (length(X) * length(Y)). No, it calls is only once. How? Because FUN is vectorized (a requirement) and outer knows how to take advantage of it.

So, here is how we can massage your data a bit so we can use outer. First, remember that outer loops on the pairwise combinations from two vectors. In our case, we could use the names of the airports:

airport.names <- dat$name


so we will be calling outer(airport.names, airport.names, FUN = airport.dist). All that is left is to write airport.dist: a vectorized function that will take as inputs two vectors of airport names and return their distances. We could first put the important data in a matrix with airport names as row names for easy access:

dat.mat <- as.matrix(dat[, c("latitude", "longitude")])
rownames(dat.mat) <- airport.names


Then define:

airport.dist <- function(name1, name2, data = dat.mat) {
    lon1 <- data[name1, "longitude"]
    lat1 <- data[name1, "latitude"]
    lon2 <- data[name2, "longitude"]
    lat2 <- data[name2, "latitude"]
    return(earth.dist(lon1, lat1, lon2, lat2))
}


Then run outer:

dist.mat <- outer(airport.names, airport.names, FUN = airport.dist)


and attach names to the columns and rows:

dimnames(dist.mat) <- list(airport.names, airport.names)

#                     Boscobel Norman Manley Intl Sangster Intl Ken Jones Tinson Pen Negril Aerodrome
# Boscobel             0.00000           55.58311     100.33081  51.30027   48.75737        144.54554
# Norman Manley Intl  55.58311            0.00000     134.79830  39.68379    7.02926        169.83797
# Sangster Intl      100.33081          134.79830       0.00000 149.58621  128.67955         48.17112
# Ken Jones           51.30027           39.68379     149.58621   0.00000   38.52863        191.03056
# Tinson Pen          48.75737            7.02926     128.67955  38.52863    0.00000        164.62138
# Negril Aerodrome   144.54554          169.83797      48.17112 191.03056  164.62138          0.00000


If you need to convince yourself that earth.dist was called a single time, you could add a cat("HELLO\n") somewhere inside its body (I did!). earth.dist having been called only once, there is no need to say how fast the computation will be.

Finally, if you want to store the distances in a three column (airport1, airport2, distance) data.frame rather than a matrix, you can do:

d <- dist.mat
dist.df <- data.frame(airport1 = rownames(d)[row(d)],
                      airport2 = colnames(d)[col(d)],
                      distance = c(dist.mat))


I hope it helps! Don't hesitate to comment below if you have questions.

Code Snippets

url <- "https://commondatastorage.googleapis.com/ckannet-storage/2012-07-09T214020/global_airports.csv"
library(RCurl)
txt <- getURL(url)
data <- read.csv(textConnection(txt), stringsAsFactors = FALSE)
dat <- subset(data, country == "Jamaica",
              c("city", "country", "name", "latitude", "longitude"))
dat
#              city country               name latitude longitude
# 1745    Ocho Rios Jamaica           Boscobel 18.40425 -76.96902
# 1746     Kingston Jamaica Norman Manley Intl 17.93567 -76.78750
# 1747  Montego Bay Jamaica      Sangster Intl 18.50372 -77.91336
# 1748 Port Antonio Jamaica          Ken Jones 18.19881 -76.53453
# 1749     Kingston Jamaica         Tinson Pen 17.98856 -76.82376
# 5878       Negril Jamaica   Negril Aerodrome 18.34000 -78.33556
airport.names <- dat$name
dat.mat <- as.matrix(dat[, c("latitude", "longitude")])
rownames(dat.mat) <- airport.names
airport.dist <- function(name1, name2, data = dat.mat) {
    lon1 <- data[name1, "longitude"]
    lat1 <- data[name1, "latitude"]
    lon2 <- data[name2, "longitude"]
    lat2 <- data[name2, "latitude"]
    return(earth.dist(lon1, lat1, lon2, lat2))
}

Context

StackExchange Code Review Q#135387, answer score: 5

Revisions (0)

No revisions yet.