A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/moby/moby/issues/30820 below:

Swarm Mode at Scale · Issue #30820 · moby/moby · GitHub

Description

I'm in the process of migrating an LXC setup to a Docker Swarm environment. The size of this setup is 50 nodes with about 10k services running. The cluster with 50 nodes (32GB RAM) seems to be working as expected. When I now try to start services I get stuck at around 500 - 700 services. After that point docker ls shows entries like: rxlga525wqjd <service-name> replicated 0/1 <container>

and docker service ps <service-name> shows:

ixoggohaa7kc  <service-name>      <container>  prd-pro-16  Running        Running 1 second ago                                     
p0gziroccv42   \_ <service-name>  <container>  prd-pro-24  Shutdown       Failed 12 seconds ago  "starting container failed: co…"  
j0fvbve416gz   \_ <service-name>  <container>  prd-pro-35  Shutdown       Failed 2 minutes ago   "starting container failed: co…"  
873vsi6c80vx   \_ <service-name>  <container>  prd-pro-24  Shutdown       Failed 4 minutes ago   "starting container failed: co…"  

Every container is connected to 2 networks, proxy (10.1.0.0/16) and db (10.2.0.0/16). Running containers are reachable.

Steps to reproduce the issue:

  1. start Docker Swarm Mode with 50 nodes
  2. start > 500 services and wait till they fail to start up

Describe the results you received:

After about 500 containers creating a new service doesn't work anymore.

Describe the results you expected:

Service creation works within 1 or 2 seconds.

Additional information you deem important (e.g. issue happens only occasionally):

When I log into the node where the container is started and run docker ps the command hangs until the container fails. Creating a new container directly on a node that has failed before works as expected and container start up takes less than a second.

All machines appear to be idle and there are no tremendous peaks in load/memory from what I can see. The average load per machine is 10 - 15 containers with around 100 - 150 MB memory usage each.

I am aware that debugging this is hard and I'm very happy to do screen sharing or provide any logs that might be needed to get to the bottom of this behaviour. I'm grateful for every input!

Output of docker version:

# docker version
Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:50:17 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:50:17 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

# docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.13.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 7
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: ssci1al4vgi7nzirptugjrpsl
 Is Manager: true
 ClusterID: qfi82y6p6rpnnfxkarxxfdca4
 Managers: 3
 Nodes: 50
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.133.6.234
 Manager Addresses:
  10.133.6.234:2377
  10.133.8.162:2377
  10.133.8.89:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.42 GiB
Name: prd-pro-01
ID: 37WG:BKQR:YXQF:LTMK:P3IQ:CDQ5:ZEM5:PI6L:32DR:3KET:QH4Z:GB5N
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: ghostengineering
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
DigitalOcean


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4