Preventing log loss with non-blocking mode in the AWSLogs container log driver

Introduction

For improved observability and troubleshooting, it is recommended to ship container logs from the compute platform to a container running on to a centralized logging server. In the real world, the logging server may occasionally be unreachable or unable to accept logs. There is an architectural tradeoff when designing for log server failures. Service owners must choose from the following considerations:

Should the application stop responding to traffic (or performing work) and wait for the centralized logging server to be restored? (i.e., is an accurate audit log higher priority than service availability?)
Should the application continue to serve traffic while buffering logs in the hope that the logging server comes back before the buffer is full. Should you accept the risk of log loss in the rare case when the log destination is unavailable?

In container logging drivers, this tradeoff is implemented with a configuration parameter blocking for the first consideration and …

Preventing log loss with non-blocking mode in the AWSLogs container log driver

Introduction

More posts like this: