You replicate data on multiple machines for:
- Keeping data geographically closer to your users (reduce latency)
- Allow system to continue working even with partitions (increase availability)
- Scale out number of machines that can serve read queries (increase read throughput)
How do we replicate changes between nodes?
- Single-leader
- Multi-leader
- Leaderless