Exploring NoSQL: To Mongo(DB) or Not?
While building enterprise systems, choosing between SQL and NoSQL databases is a pivotal decision for architects and product owners.
It affects the overall application architecture and data flow, and also how we conceptually view and process various entities in our business processes.
Today, we’ll delve into MongoDB, a prominent NoSQL database, and discuss what it is, and when it can be a good choice for your data storage needs.
What is MongoDB?
At its core, MongoDB represents a shift from the conventional relational databases.
Unlike SQL databases, which rely on structured tables and predefined schemas, MongoDB operates as a document-oriented database. As a result, instead of writing SQL to access data, you use a different query language (hence NoSQL).
In MongoDB, data is stored as BSON (Binary JSON) documents, offering a lot of flexibility in data representation. Each of the documents can have different structures. This flexibility is particularly beneficial when dealing with data of varying structures, such as unstructured or semi-structured data.
Consider a simple example of employee records.
In a traditional SQL database, you would define a fixed schema with predefined columns for employee name, ID, department, and so on. Making changes to this structure is not trivial, especially if you have lots of volume, traffic and lots of indexes.
However, in MongoDB, each employee record can be a unique document with only the attributes that are relevant to that employee. This dynamic schema allows you to adapt to changing data requirements without extensive schema modifications.
How is Data Stored?
MongoDB’s storage model is centered around key-value pairs with BSON documents. This design choice simplifies data retrieval, as each piece of information is accessible through a designated key.
Let’s take the example of an employee record stored as a BSON document:
{
“_id”: ObjectId(“123”),
“firstName”: “John”,
“lastName”: “Doe”,
“department”: “HR”,
“salary”: 75000,
“address”: {
“street”: “123 Liberty Street”,
“city”: “Freedom Town”,
“state”: “TX”,
“zipCode”: “12345”
}
}
In this example, “_id” is the unique identifier for the document. If we specify the key or id, then MongoDB can quickly retrieve the relevant document object.
Accessing any attribute is also straightforward. For instance, to retrieve the employee’s last name, you use the key “lastName.” MongoDB’s ability to store complex data structures, such as embedded documents (like the address in our example), contributes to its flexibility.
MongoDB further enhances data organization by allowing documents to be grouped into collections. Collections serve as containers for related documents, even if those documents have different structures.
For example, you can have collections for employees, departments, and projects, each containing documents with attributes specific to their domain.
Query Language
In any database, querying data efficiently is essential for maintaining performance, especially as the data volume grows.
MongoDB provides a powerful query language that enables developers to search and retrieve data with precision.
Queries are constructed using operators, making it easy to filter and manipulate data.
Here’s a simple example of querying a MongoDB database to find all employees in the HR department earning more than $60,000:
db.employees.aggregate([
{
$match: {
department: “HR”,
salary: { $gt: 60000 }
}
}
])
The “$match” stage filters employees in the HR department with a salary greater than $60,000.
MongoDB’s query language provides the flexibility to construct sophisticated queries to meet specific data retrieval needs. One way to do that is to use aggregation pipelines. These enable you to do complex data transformations and analysis within the database itself.
Pipelines basically consist of a sequence of stages, each of which processes and transforms the documents as they pass through.
We saw the $match stage in the example above. There are other stages such as $group which allow us to group the results as needed.
For example, to group all employees by their average salary by department if the salary is greater than $60,000, we can use a pipeline like this:
db.employees.aggregate([
{
$match: {
salary: { $gt: 60000 } // Filter employees with a salary greater than $60,000
}
},
{
$group: {
_id: “$department”, // Group by the “department” field
avgSalary: { $avg: “$salary” } // Calculate the average salary within each group
}
}
])
Finally, while BSON documents, which store data in a binary JSON-like format, may not have predefined indexes like traditional SQL databases, MongoDB provides mechanisms for efficient data retrieval.
MongoDB allows you to create indexes on specific fields within a collection to improve query performance. These indexes act as guides for MongoDB to quickly locate documents that match query criteria.
In our example, to optimize the query for employees in the HR department, you can create an index on the “department” and “salary” fields. This index will significantly speed up queries that involve filtering by department and salary.
With the appropriate indexes in place, MongoDB efficiently retrieves the matching documents. Without an index, MongoDB would perform a full collection scan, which can be slow and resource-intensive for large datasets.
It’s important to note that indexes have trade-offs. While they enhance query performance, they also require storage space and can slow down write operations (inserts, updates, deletes) as MongoDB must maintain the index when data changes. Therefore, during database design, it is important to look at the applications needs and strike a balance between query performance and index management.
Performance Scalability
MongoDB’s scalability feature also sets it apart from traditional SQL databases.
Since it stores document objects instead of relational rows, it can offer both vertical and horizontal scalability, allowing you to adapt to changing workloads and data volume.
Vertical scaling involves adding more resources (CPU, RAM, storage) to a single server, effectively increasing its capacity. This approach suits scenarios where performance can be improved by upgrading hardware. This is the typical method used to upgrade traditional RDBMS systems.
In contrast, horizontal scaling involves distributing data across multiple servers or nodes, which is particularly useful for handling vast amounts of data and high traffic loads. MongoDB easily allows for this.
Consider an e-commerce platform as an example.
As the number of users and products grows, horizontal scaling in MongoDB can be employed by sharding the database across multiple servers. Each shard holds a portion of the data, ensuring efficient data distribution and high availability. This allows the MongoDB database to scale significantly and with much less complexity than a SQL database.
Choosing Between MongoDB vs. SQL
The choice between MongoDB and SQL databases depends on the specific requirements of your project.
MongoDB excels in scenarios where flexibility and scalability are paramount. If you have unpredictable data structures, need to store vast amounts of unstructured or semi-structured data, or require high availability and horizontal scaling, MongoDB is a strong contender.
However, when complex transactions and relationships between data are central to your application (e.g., joining multiple tables for complex queries), traditional SQL databases may be the better choice. SQL databases excel in maintaining data integrity and enforcing rigid schemas, making them suitable for applications that rely on ACID (Atomicity, Consistency, Isolation, Durability) transactions. In addition, modern infrastructures have developed enough to allow for scaling and other performance considerations.
Conclusion
In conclusion, both MongoDB and SQL databases are valuable tools in the architect’s toolkit.
MongoDB is a great choice when dealing with dynamic data, scalability demands, and flexible querying.
On the other hand, SQL databases excel in scenarios where data integrity and complex transactions are paramount.
So, ultimately, the choice between the two, or even a combination of the two, should be guided by your project’s specific needs and the skills of your team. In addition, given that data analytics and AI are incredibly important today, consider the data pipelines you may need to make your data programs successful.
Ignitho provides architecture advisory services of which database selection is a component. Get in touch with us to discuss your project and recommended architecture choices.