On Tuesday, May 28 2019, at 9:27 the CS Team notified Engineering Support that some IVR phone numbers were intermittently getting “dead air”. As the day progressed, more clients began to experience the same issue, but would also report that it had begun working again. This made it difficult to resolve due to the intermittent symptoms.
Reviewing our application monitoring dashboard, it was clear that there was a CPU performance issue on our DBMS server that was occurring every 15 minutes. Since the issue was intermittent and the CPU performance issue appeared to coincide with our DBMS replication service, we attempted several solutions to resolve the issue. However, none of solutions we attempted resolved the issue. Reviewing the metrics from our monitoring dashboard from the last two weeks, we identified a pattern that began Thursday May 23rd where the CPU utilization began to increase incrementally. This coincided with an update made on that date. At this point, we were able to locate the specific change made that was causing the issue and resolve it.