Questions
How do you do sharding in MongoDB?
The Scenario
You are a database administrator at a social media company. You are responsible for a MongoDB database that is growing very quickly. The database is starting to experience performance issues, and you have identified that the bottleneck is the single server that is hosting the database.
You need to find a way to scale the database horizontally to handle the increasing load.
The Challenge
Explain how you would do sharding in MongoDB. What is a sharded cluster, and what are the different components of a sharded cluster?
A junior engineer might not be aware of sharding. They might try to solve this problem by just adding more resources to the single server, which would not be a very scalable solution.
A senior engineer would know that sharding is a critical part of database administration. They would be able to explain what a sharded cluster is and would have a clear plan for how to set up sharding for a production database.
Step 1: Understand What a Sharded Cluster Is
A sharded cluster is a group of MongoDB servers that work together to store and process data. A sharded cluster provides horizontal scalability by distributing the data across multiple servers.
Step 2: The Different Components of a Sharded Cluster
| Component | Description |
|---|---|
| Shard | Each shard is a separate MongoDB server that stores a subset of the data. |
| Config Servers | The config servers store the metadata for the sharded cluster, such as the location of each shard. |
| Query Routers | The query routers (mongos) are responsible for routing queries to the correct shard. |
Step 3: Set Up a Sharded Cluster
Here’s how we can set up a simple sharded cluster with two shards:
1. Start the config servers:
Start three config servers on different machines.
2. Start the query routers:
Start one or more query routers.
3. Add the shards:
Connect to one of the query routers and add the shards to the cluster.
sh.addShard("shard1.example.com:27017")
sh.addShard("shard2.example.com:27017")4. Enable sharding for a database:
sh.enableSharding("mydb")5. Shard a collection:
sh.shardCollection("mydb.mycollection", { my_key: 1 })Shard Keys
The shard key is the field that is used to distribute the data across the shards. It is important to choose a good shard key, as it will have a big impact on the performance of the sharded cluster.
Practice Question
You are designing a sharded cluster for a new application. Which of the following would be the most important consideration when choosing a shard key?