snippetModerate
How do I run ansible on one host at a time and break on a failure
Viewed 0 times
timeonehosthowandbreakansiblefailurerun
Problem
I've got an Ansible playbook where I want to update a number of flaky devices in sequence. I can use
I'd also like to restart the playbook at the same host I stopped on. Currently using Ansible v2.0, but can also switch to a newer version if that sort of a feature is only available in newer versions.
serial:1, but I want to stop the playbook altogether if I get a failure so I can fix it before proceeding instead of accumulating errors.I'd also like to restart the playbook at the same host I stopped on. Currently using Ansible v2.0, but can also switch to a newer version if that sort of a feature is only available in newer versions.
Solution
Your playbook will stop when a failure occurs and you're using
By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed.
That said there seems to be some confusion in the community over the default behavior, and it seems to have changed--or been buggy--somewhere between 1.8 and 2.1.
So, if
In some situations, such as with the rolling updates described above, it may be desirable to abort the play when a certain threshold of failures have been reached. To achieve this, as of version 1.3 you can set a maximum failure percentage...
==
As for retrying your playbook, you should be seeing a failure message like this:
Use that
Retry files will be created unless you've set
Alternatively,
Sources:
https://github.com/ansible/ansible/issues/1663
https://github.com/ansible/ansible/issues/16241
http://docs.ansible.com/ansible/playbooks_delegation.html#rolling-update-batch-size
http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage
http://docs.ansible.com/ansible/intro_configuration.html#retry-files-enabled
http://docs.ansible.com/ansible/playbooks_startnstep.html#start-at-task
serial: 1 according to the documentation.By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed.
That said there seems to be some confusion in the community over the default behavior, and it seems to have changed--or been buggy--somewhere between 1.8 and 2.1.
So, if
serial: 1 doesn't suffice, use this additional setting:max_failure_percentage: 0In some situations, such as with the rolling updates described above, it may be desirable to abort the play when a certain threshold of failures have been reached. To achieve this, as of version 1.3 you can set a maximum failure percentage...
==
As for retrying your playbook, you should be seeing a failure message like this:
to retry, use: --limit @/home/user/site.retryUse that
--limit flag and on your next execution of ansible-playbook and it will continue from where it failed.Retry files will be created unless you've set
retry_files_enabled = False in your configuration.Alternatively,
--start-at-task may also work.Sources:
https://github.com/ansible/ansible/issues/1663
https://github.com/ansible/ansible/issues/16241
http://docs.ansible.com/ansible/playbooks_delegation.html#rolling-update-batch-size
http://docs.ansible.com/ansible/playbooks_delegation.html#maximum-failure-percentage
http://docs.ansible.com/ansible/intro_configuration.html#retry-files-enabled
http://docs.ansible.com/ansible/playbooks_startnstep.html#start-at-task
Context
StackExchange DevOps Q#246, answer score: 16
Revisions (0)
No revisions yet.