SECTOR 02 // HOW-TO GUIDES

Drive the docker stack

Configure the compose lifecycle, kill processes mid-flight, and read container logs as probes.

Goal: use the docker provider as both the stack lifecycle and a process fault arsenal.

Configure once #

providers:
  docker:
    config:
      composeFiles: [assets/stack.yml]
      project: chaos-run

Compose file paths resolve against the project root, not the process cwd. The docker provider is the lifecycle provider (it implements up/down). Exactly one provider may hold that role per scenario, and it powers the default teardown.

Bring services up, and wait for health #

setup:
  - run: docker.up
    with: [postgres, app, worker-a]

docker.up runs compose up -d --wait: it returns when health checks pass, so give your services healthcheck: blocks and setup stays race-free.

Process faults #

- run: docker.kill      # SIGKILL, no goodbye
  with: worker-a
- run: docker.stop      # SIGTERM, graceful shutdown path
  with: worker-a
- run: docker.pause     # SIGSTOP-like freeze: alive but unresponsive
  with: worker-a
- run: docker.unpause
  with: worker-a
- run: docker.start     # resurrection
  with: worker-a

kill versus stop is not cosmetic: SIGKILL tests crash recovery, SIGTERM tests your shutdown hooks. A system can pass one and fail the other.

Network partition #

docker.disconnect severs one container from the network while its process keeps running: a distinct failure mode from killing or pausing it, where peers see connection refusals (or hangs and timeouts) instead of a dead process, and anything co-located on that service stays up. docker.connect restores the link:

- run: docker.disconnect   # cut the network path, process stays alive
  with: app
- run: docker.connect      # reconnect; the service-name DNS alias is restored
  with: app

Both target one network, defaulting to the compose default network (<project>_default). Pass network: to target another, and disconnect a multi-network container from each network to isolate it fully:

- run: docker.disconnect
  with:
    service: app
    network: chaos-run_backend

Inspect runtime state #

docker.exec is a probe that runs a command inside a running container and returns its stdout, so a scenario can baseline a resource count, inject churn, then assert it has not drifted. Keep the command read-only:

- run: docker.exec
  with:
    service: app
    command: "ls /proc/1/task | wc -l"   # thread count
  as: threads_before

Logs as an event gate #

docker.logs is a probe; combine it with wait_until to gate faults on what the service says rather than on time:

- run: wait_until
  with:
    probe:
      run: docker.logs
      with: worker-a
    matches: "stream started"
    timeout: 30
- run: docker.kill
  with: worker-a

Teardown semantics #

With no teardown: section, Shinari runs the lifecycle provider’s down (compose down -v --remove-orphans). An explicit teardown: replaces that default: if you add steps (e.g. toxiproxy.reset), add docker.down yourself:

teardown:
  - run: toxiproxy.reset
  - run: docker.down