Questions
What is GridFS and when would you use it in MongoDB?
The Scenario
You are a backend engineer at a social media company. You are building a new service that needs to store large files, such as images and videos.
You are considering using either the file system or GridFS to store the files.
The Challenge
Explain what GridFS is in MongoDB and why it is a better choice than the file system for storing large files. What are the key benefits of using GridFS?
A junior engineer might try to store the large files in a `BSON` document. This would not work, because `BSON` documents have a size limit of 16MB. They might not be aware of GridFS, which is the correct tool for this job.
A senior engineer would know that GridFS is the perfect tool for this job. They would be able to explain what GridFS is and how to use it to store and retrieve large files. They would also be able to explain the benefits of using GridFS over the file system.
Step 1: Understand What GridFS Is
GridFS is a specification for storing and retrieving large files in MongoDB. It works by dividing a large file into smaller chunks and storing each chunk as a separate document.
Step 2: The fs.files and fs.chunks Collections
GridFS uses two collections to store the files:
| Collection | Description |
|---|---|
fs.files | Stores the metadata for the files, such as the file name, the content type, and the length. |
fs.chunks | Stores the chunks of the files. |
The Benefits of Using GridFS
| Benefit | Description |
|---|---|
| Scalability | GridFS can be used to store files that are larger than the 16MB BSON document size limit. |
| Replication | GridFS files are automatically replicated across the nodes in a replica set. |
| Sharding | GridFS files are automatically sharded across the shards in a sharded cluster. |
When to Use GridFS
You should use GridFS when you need to store files that are larger than the 16MB BSON document size limit.
You should not use GridFS if you need to perform atomic updates on the content of a file.
Practice Question
You are building a service that needs to store and retrieve large video files. Which of the following would be the most appropriate?