My process connects to, say, Postgres. What's going to happen to that connection upon restore?
Does crik guarantee the order of events (saving a checkpoint should be followed by killing the old process/pod, which should be followed by a restoration - the order of these 3 events is strict) and given that criu can checkpoint and restore sockets state correctly - how does that work for kubernetes? The new pod will have a different IP.
TCP connections are identified with source IP:port and target IP:port tuples. When a new pod is created, it gets a new IP so there is not much way to restore the TCP connections. So crik drops all TCP connections and lets the application handle the reconnection logic. There are some CNIs that can give a static IP to pod, but that’s rather unorthodox in k8s.
Right, and this shouldn't be a big issue for [competent] cloud-native software: it's a transient fault. If your software can't recover from transient faults then this is the wrong ecosystem to be considering.
The app in the pod is the client (of a DBMS server). The client's IP gets changed. A service in k8s is a network node with an address, but it is used for inbound connections, outbound connections (like from the app to a DBMS server, which may be outside of k8s cluster) usually do not use services (as it gives no benefits).
Does crik guarantee the order of events (saving a checkpoint should be followed by killing the old process/pod, which should be followed by a restoration - the order of these 3 events is strict) and given that criu can checkpoint and restore sockets state correctly - how does that work for kubernetes? The new pod will have a different IP.