This article describes my experience operating a high-scale service backed by a single-writer RDBMS. The system’s design was good for a quick launch and maybe an initial few years while the traffic was less. However, it resulted in a lot of pain once the scale increased, and the database problems caused many customer outages and heavy operational load for the team. I will recount the details, lessons learned, and advice for avoiding similar mistakes.
What Happened?
The service had many hosts servicing the typical CRUD (Create, Read, Update, and Delete) APIs backed by MySQL running on three very beefy hosts. One host acted as leader, servicing writes and strongly consistent reads (i.e., reading the most updated value), and two followers continuously syncing updates from the leader working as real-only: