This is a follow up issue on #70, since the issue was closed.
We’ve seen apps on the platform (running on AWS) which talk to DBs exposed via public IPs. These connections are going via an AWS NAT Gateway which has an idle timeout of 350 sec. If the apps run some queries (don’t know what these are :)) which need longer to get a response from the server, then the connections is “freed” on the NAT GW and only later if the app tries to send agains some data over the connection it gets a RST.
The connections via the NAT GW can be kept open if one of the sides would send tcp keepalive packets. However, the containers in which the apps are running have the defaults (net.ipv4.tcp_keepalive_time = 7200) - the first probe is made after 2h. On the Diego Cell VM the settings are different (net.ipv4.tcp_keepalive_time = 120, see https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/acc0c1d039be5beeb30be0c9385a1b1c54e89218/stemcell_builder/stages/bosh_sysctl/assets/60-bosh-sysctl.conf#L35) but the latter are not inherited in the container namespaces, there the defaults are used. So at the moment neither the app developers can modify the settings for the containers, neither we as operators of the platform (at least we haven’t figured out how).
There should be a mechanism to set kernel parameters inside a container to overcome problematic default parameters. It will not help by modifying cflinuxfs3 with e.g. /etc/sysctl.d/20-myconfiguration.conf, because a couple of kernel parameters can not get changed since they are readonly and can either get set for privileged containers (you do not want to do this...) or during creation of the container.
Steps to reproducelogin to a CF diego cell
diego-cell/f287dd7c-87db-42ea-b4e0-c490172fcd5c:~$ grep tcp_keepalive_time /etc/sysctl.d/60-bosh-sysctl.conf
net.ipv4.tcp_keepalive_time=120
diego-cell/f287dd7c-87db-42ea-b4e0-c490172fcd5c:~$ cat /proc/sys/net/ipv4/tcp_keepalive_time
120
diego-cell/f287dd7c-87db-42ea-b4e0-c490172fcd5c:~$ sudo /var/vcap/packages/runc/bin/runc --root /run/containerd/runc/garden/ exec -t 85580519-104b-4a7d-4240-7007 /bin/bash
root@85580519-104b-4a7d-4240-7007:/# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
Outline the steps to test or reproduce the PR here. Please also provide the
following information if applicable:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4