Messages come in. Messages get processed. Messages get acknowledged. Last month at work, I found a case where a message came in and caused the processor to crash. The message can never get acknowledged. A different processor can try to process the message, and it will fail the same.
I’m helping the team that owns that process. I suggested finding a way to handle it in software. At scale, everything happens, and it happens at the worst possible time.
The team lead said no. When that happens, it’s a Severity 1 bug.
I had to accept that. When the team lead gets a page in the middle of the night, that’s a choice he has explicitly made.
For myself, I don’t want ever to be paged. I’m making a mess when explicitly including pageable incidents in my services.
Are there times you would want to be paged? Can you find ways to mitigate that?