patternpythonMinor
Finding travel distance between airports
Viewed 0 times
airportsdistancebetweenfindingtravel
Problem
I have 2 nested for loops which i want to get rid of. Any thoughts?
I am calculating distance between cities based on their longitude and latitude. There is a custom function
function for distance calculation
I am calculating distance between cities based on their longitude and latitude. There is a custom function
earth.dist() that i am using in the loop.for (i in 1:nrow(dat)) {
#for each other airport
for (j in 1:nrow(dat)) {
#if both airport are different
if (dat[i,3]!=dat[j,3]){
k=k+1
#airport1
airport1[k] <- dat[i,3]
#airport2
airport2[k] <- dat[j,3]
#find travel distance
travdist[k] <- earth.dist(dat[i,5],dat[i,4],dat[j,5],dat[j,4])
}
}
}function for distance calculation
earth.dist <- function (lon1, lat1, lon2, lat2){
rad <- pi/180
a1 <- lat1 * rad
a2 <- lon1 * rad
b1 <- lat2 * rad
b2 <- lon2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}Solution
First, let's download some data similar to yours (I assume). This csv available online has almost 7,000 airports:
For illustration purposes, we will use a small sample: the six airports in Jamaica.
Now let's have a look at your code. I will not review the math in
Instead of a double loop, you should be using the
So, here is how we can massage your data a bit so we can use
so we will be calling
Then define:
Then run
and attach names to the columns and rows:
If you need to convince yourself that
Finally, if you want to store the distances in a three column (airport1, airport2, distance) data.frame rather than a matrix, you can do:
I hope it helps! Don't hesitate to comment below if you have questions.
url <- "https://commondatastorage.googleapis.com/ckannet-storage/2012-07-09T214020/global_airports.csv"
library(RCurl)
txt <- getURL(url)
data <- read.csv(textConnection(txt), stringsAsFactors = FALSE)For illustration purposes, we will use a small sample: the six airports in Jamaica.
dat <- subset(data, country == "Jamaica",
c("city", "country", "name", "latitude", "longitude"))
dat
# city country name latitude longitude
# 1745 Ocho Rios Jamaica Boscobel 18.40425 -76.96902
# 1746 Kingston Jamaica Norman Manley Intl 17.93567 -76.78750
# 1747 Montego Bay Jamaica Sangster Intl 18.50372 -77.91336
# 1748 Port Antonio Jamaica Ken Jones 18.19881 -76.53453
# 1749 Kingston Jamaica Tinson Pen 17.98856 -76.82376
# 5878 Negril Jamaica Negril Aerodrome 18.34000 -78.33556Now let's have a look at your code. I will not review the math in
earth.dist, I'll assume it is correct. One beautiful thing about that function is that it is vectorized, i.e., you could give it n-long vectors as inputs and it will compute n distances in a single call. Unfortunately, the rest of your code does not take advantage of it. Instead, your double loop only calls earth.dist with scalars at each time...Instead of a double loop, you should be using the
outer function. Have a look at the doc (?outer) if you are not familiar with it. The typical usage is outer(X, Y, FUN) where X and Y are vectors and FUN is a vectorized function. The output is a matrix Z where Z[i, j] is the result of FUN(X[i], Y[j]). But what's brilliant about outer is that it does not call FUN as many times as there are entries in Z (length(X) * length(Y)). No, it calls is only once. How? Because FUN is vectorized (a requirement) and outer knows how to take advantage of it.So, here is how we can massage your data a bit so we can use
outer. First, remember that outer loops on the pairwise combinations from two vectors. In our case, we could use the names of the airports:airport.names <- dat$nameso we will be calling
outer(airport.names, airport.names, FUN = airport.dist). All that is left is to write airport.dist: a vectorized function that will take as inputs two vectors of airport names and return their distances. We could first put the important data in a matrix with airport names as row names for easy access:dat.mat <- as.matrix(dat[, c("latitude", "longitude")])
rownames(dat.mat) <- airport.namesThen define:
airport.dist <- function(name1, name2, data = dat.mat) {
lon1 <- data[name1, "longitude"]
lat1 <- data[name1, "latitude"]
lon2 <- data[name2, "longitude"]
lat2 <- data[name2, "latitude"]
return(earth.dist(lon1, lat1, lon2, lat2))
}Then run
outer:dist.mat <- outer(airport.names, airport.names, FUN = airport.dist)and attach names to the columns and rows:
dimnames(dist.mat) <- list(airport.names, airport.names)
# Boscobel Norman Manley Intl Sangster Intl Ken Jones Tinson Pen Negril Aerodrome
# Boscobel 0.00000 55.58311 100.33081 51.30027 48.75737 144.54554
# Norman Manley Intl 55.58311 0.00000 134.79830 39.68379 7.02926 169.83797
# Sangster Intl 100.33081 134.79830 0.00000 149.58621 128.67955 48.17112
# Ken Jones 51.30027 39.68379 149.58621 0.00000 38.52863 191.03056
# Tinson Pen 48.75737 7.02926 128.67955 38.52863 0.00000 164.62138
# Negril Aerodrome 144.54554 169.83797 48.17112 191.03056 164.62138 0.00000If you need to convince yourself that
earth.dist was called a single time, you could add a cat("HELLO\n") somewhere inside its body (I did!). earth.dist having been called only once, there is no need to say how fast the computation will be.Finally, if you want to store the distances in a three column (airport1, airport2, distance) data.frame rather than a matrix, you can do:
d <- dist.mat
dist.df <- data.frame(airport1 = rownames(d)[row(d)],
airport2 = colnames(d)[col(d)],
distance = c(dist.mat))I hope it helps! Don't hesitate to comment below if you have questions.
Code Snippets
url <- "https://commondatastorage.googleapis.com/ckannet-storage/2012-07-09T214020/global_airports.csv"
library(RCurl)
txt <- getURL(url)
data <- read.csv(textConnection(txt), stringsAsFactors = FALSE)dat <- subset(data, country == "Jamaica",
c("city", "country", "name", "latitude", "longitude"))
dat
# city country name latitude longitude
# 1745 Ocho Rios Jamaica Boscobel 18.40425 -76.96902
# 1746 Kingston Jamaica Norman Manley Intl 17.93567 -76.78750
# 1747 Montego Bay Jamaica Sangster Intl 18.50372 -77.91336
# 1748 Port Antonio Jamaica Ken Jones 18.19881 -76.53453
# 1749 Kingston Jamaica Tinson Pen 17.98856 -76.82376
# 5878 Negril Jamaica Negril Aerodrome 18.34000 -78.33556airport.names <- dat$namedat.mat <- as.matrix(dat[, c("latitude", "longitude")])
rownames(dat.mat) <- airport.namesairport.dist <- function(name1, name2, data = dat.mat) {
lon1 <- data[name1, "longitude"]
lat1 <- data[name1, "latitude"]
lon2 <- data[name2, "longitude"]
lat2 <- data[name2, "latitude"]
return(earth.dist(lon1, lat1, lon2, lat2))
}Context
StackExchange Code Review Q#135387, answer score: 5
Revisions (0)
No revisions yet.