SECTOR 02 // HOW-TO GUIDES
Drive the docker stack
Configure the compose lifecycle, kill processes mid-flight, and read container logs as probes.
Goal: use the docker provider as both the stack lifecycle and a process
fault arsenal.
Configure once #
providers:
docker:
config:
composeFiles: [assets/stack.yml]
project: chaos-run
Compose file paths resolve against the project root, not the process cwd.
The docker provider is the lifecycle provider (it implements up/down).
Exactly one provider may hold that role per scenario, and it powers the
default teardown.
Bring services up, and wait for health #
setup:
- run: docker.up
with: [postgres, app, worker-a]
docker.up runs compose up -d --wait: it returns when health checks pass,
so give your services healthcheck: blocks and setup stays race-free.
Process faults #
- run: docker.kill # SIGKILL, no goodbye
with: worker-a
- run: docker.stop # SIGTERM, graceful shutdown path
with: worker-a
- run: docker.pause # SIGSTOP-like freeze: alive but unresponsive
with: worker-a
- run: docker.unpause
with: worker-a
- run: docker.start # resurrection
with: worker-a
kill versus stop is not cosmetic: SIGKILL tests crash recovery, SIGTERM
tests your shutdown hooks. A system can pass one and fail the other.
Network partition #
docker.disconnect severs one container from the network while its process
keeps running: a distinct failure mode from killing or pausing it, where peers
see connection refusals (or hangs and timeouts) instead of a dead process, and
anything co-located on that service stays up. docker.connect restores the
link:
- run: docker.disconnect # cut the network path, process stays alive
with: app
- run: docker.connect # reconnect; the service-name DNS alias is restored
with: app
Both target one network, defaulting to the compose default network
(<project>_default). Pass network: to target another, and disconnect a
multi-network container from each network to isolate it fully:
- run: docker.disconnect
with:
service: app
network: chaos-run_backend
Inspect runtime state #
docker.exec is a probe that runs a command inside a running container and
returns its stdout, so a scenario can baseline a resource count, inject churn,
then assert it has not drifted. Keep the command read-only:
- run: docker.exec
with:
service: app
command: "ls /proc/1/task | wc -l" # thread count
as: threads_before
Logs as an event gate #
docker.logs is a probe; combine it with wait_until to gate faults on
what the service says rather than on time:
- run: wait_until
with:
probe:
run: docker.logs
with: worker-a
matches: "stream started"
timeout: 30
- run: docker.kill
with: worker-a
Teardown semantics #
With no teardown: section, Shinari runs the lifecycle provider’s down
(compose down -v --remove-orphans). An explicit teardown: replaces that
default: if you add steps (e.g. toxiproxy.reset), add docker.down
yourself:
teardown:
- run: toxiproxy.reset
- run: docker.down