Thursday Feb 18
"Failing over without falling over", using chaos engineering and cloud to build resilient systems
Observability is well how you use sensors to see the internal state of your system. Model-ability is how well you make sense of that state to translate sensors into controls. Control-ability is how well those controls work to manipulate your system. Poor designs and minimal testing for failure mitigation processes lead to hazards that disrupt this feedback loop, and this is why attempts to fail over, tend to fall over. The combination of API driven cloud infrastructure and chaos engineering experiments managed by tools such as Gremlin, ChaosIQ, or AWS Fault Injection Simulator can be used to ensure that the impact of failures is mitigated.