At the base of a solid technology team is monitoring. Monitoring is your eyes – if you don’t monitor you don’t know if your processes are running fine. In fact you don’t know if they are running at all. Plus, you don’t want to find out about the crash from your business client, right?
Incident response
Now, monitoring opens the door to incident response, and allows you to quickly address the problem. The next important thing is to review what had happened, learn lessons and decide what needs to be done to prevent the same incident from happening again. It could be a bug fix, architectural review of the solution, or a preventive measure like purging old files from the disk on a periodic basis.
Avoiding the culture of fear
The postmortem review needs to be done in a manner that avoids putting blame on anyone, focusing on the technological and process side – this is critical if you want to keep high level of trust in the team. Otherwise a culture of fear instills and people tend to avoid communications as much as possible.
Value of continuous testing
Most of the times postmortem will point to the lack of testing or unreliable release procedures, or both.
That’s why it’s critical to maintain high level of testing, continuously updating unit tests as the code changes, and execute those tests with each build.
Comprehensive test coverage that incorporates many business cases allows you to have high level of confidence that each code change has a predictable impact and you don’t have to stress whether your response team will get paged in the middle of the night.
Bonus: capacity planning for free
Additionally, it allows you to predict your needs accurately and removes guests from capacity planning. You can plan according to the patterns and trends you observe.
This in turn gives you a better chance to plan your development phases and release new products faster.