MongoDB’s aggregation pipeline is like a data processing assembly line, where we can pass data through different “processes” (aggregation operators) in sequence for filtering, statistics, and transformation, ultimately obtaining the desired results. Today, we’ll learn about the two most commonly used operators in the aggregation pipeline: $match and $group, to help you quickly get started with MongoDB’s aggregation analysis.
一、MongoDB Aggregation Pipeline Basics¶
An aggregation pipeline consists of multiple stages, each corresponding to an operator (e.g., $match, $group). Data flows through each stage from left to right for processing. Let’s use an example collection to understand this. Suppose we have a students collection with the following document structure:
{ "_id": 1, "name": "张三", "subject": "数学", "score": 90, "class": "一班" }
{ "_id": 2, "name": "李四", "subject": "语文", "score": 85, "class": "一班" }
{ "_id": 3, "name": "王五", "subject": "数学", "score": 78, "class": "二班" }
{ "_id": 4, "name": "赵六", "subject": "语文", "score": 92, "class": "二班" }
{ "_id": 5, "name": "钱七", "subject": "数学", "score": 88, "class": "一班" }
二、$match: The “Filter” for Data Screening¶
$match is similar to the WHERE clause in SQL, used to filter documents that meet the conditions. Only documents satisfying the conditions are passed to the next stage, avoiding processing invalid data later.
Syntax¶
{ $match: { <query conditions> } }
<query conditions> is consistent with the conditions in the find() method, supporting operators like equality, greater than, less than, and inclusion:
- Equality: { class: "一班" } (filter documents where the class is “一班”)
- Greater than: { score: { $gt: 80 } } (filter documents with scores greater than 80)
- Inclusion: { subject: { $in: ["数学", "语文"] } } (filter documents where the subject is “数学” or “语文”)
Example: Filtering Student Scores for “一班”¶
db.students.aggregate([
{ $match: { class: "一班" } } // Only keep documents with class "一班"
])
The result returns 3 documents where class is “一班”:
{ "_id": 1, "name": "张三", "subject": "数学", "score": 90, "class": "一班" }
{ "_id": 2, "name": "李四", "subject": "语文", "score": 85, "class": "一班" }
{ "_id": 5, "name": "钱七", "subject": "数学", "score": 88, "class": "一班" }
三、$group: The “Calculator” for Grouped Statistics¶
After filtering the data, $group groups documents by a specified field and performs statistics (e.g., counting, summing, averaging) on the data within each group.
Syntax¶
{ $group: { _id: <group key>, <custom field>: { <accumulator operator>: <field name> }, ... } }
_id: The grouping key, specifying the basis for grouping (e.g., byclassorsubject). If set tonull, the entire collection is treated as a single group.- Custom fields: Such as
student_countoravg_score, used to store statistical results. - Accumulator operators: Perform calculations on fields within each group. Common ones include:
$sum: Sum (e.g.,$sum: "$score"calculates total scores)$avg: Average (e.g.,$avg: "$score"calculates average scores)$count: Count (equivalent to$sum: 1, counts the number of documents)$max/$min: Find maximum/minimum values
Example 1: Group by Class and Count Students¶
db.students.aggregate([
{ $group: { _id: "$class", student_count: { $sum: 1 } } }
// Group by class and count students in each class
])
Result:
{ "_id": "一班", "student_count": 3 },
{ "_id": "二班", "student_count": 2 }
Example 2: Group by Subject and Calculate Total Scores¶
db.students.aggregate([
{ $group: { _id: "$subject", total_score: { $sum: "$score" } } }
// Group by subject and sum scores for each subject
])
Result:
{ "_id": "数学", "total_score": 90 + 78 + 88 = 256 },
{ "_id": "语文", "total_score": 85 + 92 = 177 }
Example 3: Group by Class and Calculate Average Scores¶
db.students.aggregate([
{ $group: { _id: "$class", avg_score: { $avg: "$score" } } }
// Group by class and calculate average score for each class
])
Result:
{ "_id": "一班", "avg_score": (90 + 85 + 88)/3 ≈ 87.67 },
{ "_id": "二班", "avg_score": (78 + 92)/2 = 85 }
四、Combining $match and $group¶
In practice, we often filter first and then group. For example: “Calculate the average score for each class in the ‘数学’ (Math) subject”.
Steps:¶
- First stage: Use
$matchto filter documents wheresubjectis “数学”. - Second stage: Use
$groupto group byclassand calculate the average score.
db.students.aggregate([
{ $match: { subject: "数学" } }, // Filter students with Math scores
{ $group: { _id: "$class", avg_math_score: { $avg: "$score" } } } // Group by class and calculate average
])
Result:
{ "_id": "一班", "avg_math_score": (90 + 88)/2 = 89 },
{ "_id": "二班", "avg_math_score": 78 }
Summary¶
$match: Acts as a “filter” to reduce the amount of data processed later by filtering early.$group: Acts as a “calculator” to group data by a key and use accumulator operators to summarize statistics for each group.- Combination: Filtering first and then grouping is the core pattern for aggregation analysis.
With these two operators, you can perform basic statistical analysis. Later, explore other operators like $project (projection, retaining only needed fields) and $sort (sorting) to enhance your data analysis capabilities!