patternMinor
Azure virtual machine stalls in R due to parallel package
Viewed 0 times
duepackageazureparallelmachinestallsvirtual
Problem
I am writing an R package with tests using the
I want to plot the benefit of parallelisation on up to 24 cores, so I set up a virtual machine programmatically on Azure:
I call
When I remove the tests, the code stalls in a function call. I stopped and relaunched multiple times, so I think it's a problem with Azure.
Other sizes of machines,
How can I fix it?
Update
I stumbled upon the culprit when I was running another piece of code. The VM hangs when calling the R
I added some printouts around that call:
and then it hangs. Following the suggestion of
```
[pid 8571] 21:41:48.718625 write(1, "\n", 1[1] "num_cores = 2"
) = 1
[pid 8571] 21:41:48.719486 write(1, "[1]", 3) = 3
[pid 8571] 21:41:48.719550 write(1, " \"num_cores = 2\"", 16) = 16
[pid 8571] 21:41:48.719615 write(1, "\n", 1) = 1
[pid 8703] 21:41:48.730707 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8705] 21:41:48.730745 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8701] 21:41:48.730757 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8704] 21:41:48.730794 futex(0x49
testthat package. The tests pass locally and on Travis.I want to plot the benefit of parallelisation on up to 24 cores, so I set up a virtual machine programmatically on Azure:
az vm create \
--resource-group \
--name \
--image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:18.12.01 \
--size Standard_NV24 \
--admin-username \
--generate-ssh-keysI call
devtools::test() and the virtual machine gets stuck at testthat for hours with 0% CPU usage:✔ checking for unstated dependencies in ‘tests’ ...
─ checking tests ...
Running ‘testthat.R’devtools::test() has no specific arguments to print some output, and it calls testthat::test_dir(), which also has no arguments to print output.When I remove the tests, the code stalls in a function call. I stopped and relaunched multiple times, so I think it's a problem with Azure.
Other sizes of machines,
Standard_F16s_v2 and Standard DS3 v2, have the same problem.How can I fix it?
Update
I stumbled upon the culprit when I was running another piece of code. The VM hangs when calling the R
parallel package on multiple cores (it works fine on just one core).I added some printouts around that call:
[1] "num_cores = 2"
[1] "Entering parallel"
[1] "Drawing 1"
[1] "Drawing 2"and then it hangs. Following the suggestion of
strace, I launched again and see this output around this last printout:```
[pid 8571] 21:41:48.718625 write(1, "\n", 1[1] "num_cores = 2"
) = 1
[pid 8571] 21:41:48.719486 write(1, "[1]", 3) = 3
[pid 8571] 21:41:48.719550 write(1, " \"num_cores = 2\"", 16) = 16
[pid 8571] 21:41:48.719615 write(1, "\n", 1) = 1
[pid 8703] 21:41:48.730707 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8705] 21:41:48.730745 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8701] 21:41:48.730757 futex(0x498fcf4, FUTEX_WAIT_PRIVATE, 4032, NULL
[pid 8704] 21:41:48.730794 futex(0x49
Solution
The problem is with the image
Data Science VM" (on the Azure portal). This is an old image with
instead of the release version of
I succeeded with the following virtual machine (Ubuntu Server 19.04.19, size D64 v3):
and installed
and the machine can use multiple cores:
I also checked the duration and the speedup is as I expected.
microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:18.12.01 ("LinuxData Science VM" (on the Azure portal). This is an old image with
R 3.4instead of the release version of
3.5.I succeeded with the following virtual machine (Ubuntu Server 19.04.19, size D64 v3):
az vm create \
--resource-group \
--name \
--image Canonical:UbuntuServer:19.04:19.04.201906280 \
--size Standard_D64_v3 \
--admin-username \
--generate-ssh-keysand installed
R, Rstan, and an SSL library for curl and devtools with:sudo apt update
sudo apt -y install r-base
sudo apt -y install r-cran-rstan
# Add LibSSL for installing curl and devtools, see:
# https://stackoverflow.com/questions/44228055/r-rstudio-install-devtools-fails
sudo apt-get install libcurl4-openssl-dev libssl-devand the machine can use multiple cores:
Welcome to PosteriorBootstrap, a parallel approach for adaptive non-parametric learning
[1] "Speedup performance"
[1] "n_bootstrap = 100"
[1] "num_cores = 1"
[1] "Finished sampling"
[1] "num_cores = 2"
[1] "Finished sampling"
...
[1] "num_cores = 64"
[1] "Finished sampling"I also checked the duration and the speedup is as I expected.
Code Snippets
az vm create \
--resource-group <resource group> \
--name <name> \
--image Canonical:UbuntuServer:19.04:19.04.201906280 \
--size Standard_D64_v3 \
--admin-username <azure user> \
--generate-ssh-keyssudo apt update
sudo apt -y install r-base
sudo apt -y install r-cran-rstan
# Add LibSSL for installing curl and devtools, see:
# https://stackoverflow.com/questions/44228055/r-rstudio-install-devtools-fails
sudo apt-get install libcurl4-openssl-dev libssl-devWelcome to PosteriorBootstrap, a parallel approach for adaptive non-parametric learning
[1] "Speedup performance"
[1] "n_bootstrap = 100"
[1] "num_cores = 1"
[1] "Finished sampling"
[1] "num_cores = 2"
[1] "Finished sampling"
...
[1] "num_cores = 64"
[1] "Finished sampling"Context
StackExchange DevOps Q#8064, answer score: 1
Revisions (0)
No revisions yet.