Quick Start with MongoDB Aggregation: Detailed Explanation of $match and $group Operators

MongoDB’s aggregation pipeline is like a data processing assembly line, where we can pass data through different “processes” (aggregation operators) in sequence for filtering, statistics, and transformation, ultimately obtaining the desired results. Today, we’ll learn about the two most commonly used operators in the aggregation pipeline: $match and $group, to help you quickly get started with MongoDB’s aggregation analysis.

一、MongoDB Aggregation Pipeline Basics

An aggregation pipeline consists of multiple stages, each corresponding to an operator (e.g., $match, $group). Data flows through each stage from left to right for processing. Let’s use an example collection to understand this. Suppose we have a students collection with the following document structure:

{ "_id": 1, "name": "张三", "subject": "数学", "score": 90, "class": "一班" }
{ "_id": 2, "name": "李四", "subject": "语文", "score": 85, "class": "一班" }
{ "_id": 3, "name": "王五", "subject": "数学", "score": 78, "class": "二班" }
{ "_id": 4, "name": "赵六", "subject": "语文", "score": 92, "class": "二班" }
{ "_id": 5, "name": "钱七", "subject": "数学", "score": 88, "class": "一班" }

二、$match: The “Filter” for Data Screening

$match is similar to the WHERE clause in SQL, used to filter documents that meet the conditions. Only documents satisfying the conditions are passed to the next stage, avoiding processing invalid data later.

Syntax

{ $match: { <query conditions> } }

<query conditions> is consistent with the conditions in the find() method, supporting operators like equality, greater than, less than, and inclusion:
- Equality: { class: "一班" } (filter documents where the class is “一班”)
- Greater than: { score: { $gt: 80 } } (filter documents with scores greater than 80)
- Inclusion: { subject: { $in: ["数学", "语文"] } } (filter documents where the subject is “数学” or “语文”)

Example: Filtering Student Scores for “一班”

db.students.aggregate([
  { $match: { class: "一班" } } // Only keep documents with class "一班"
])

The result returns 3 documents where class is “一班”:

{ "_id": 1, "name": "张三", "subject": "数学", "score": 90, "class": "一班" }
{ "_id": 2, "name": "李四", "subject": "语文", "score": 85, "class": "一班" }
{ "_id": 5, "name": "钱七", "subject": "数学", "score": 88, "class": "一班" }

三、$group: The “Calculator” for Grouped Statistics

After filtering the data, $group groups documents by a specified field and performs statistics (e.g., counting, summing, averaging) on the data within each group.

Syntax

{ $group: { _id: <group key>, <custom field>: { <accumulator operator>: <field name> }, ... } }
  • _id: The grouping key, specifying the basis for grouping (e.g., by class or subject). If set to null, the entire collection is treated as a single group.
  • Custom fields: Such as student_count or avg_score, used to store statistical results.
  • Accumulator operators: Perform calculations on fields within each group. Common ones include:
  • $sum: Sum (e.g., $sum: "$score" calculates total scores)
  • $avg: Average (e.g., $avg: "$score" calculates average scores)
  • $count: Count (equivalent to $sum: 1, counts the number of documents)
  • $max/$min: Find maximum/minimum values

Example 1: Group by Class and Count Students

db.students.aggregate([
  { $group: { _id: "$class", student_count: { $sum: 1 } } } 
  // Group by class and count students in each class
])

Result:

{ "_id": "一班", "student_count": 3 },
{ "_id": "二班", "student_count": 2 }

Example 2: Group by Subject and Calculate Total Scores

db.students.aggregate([
  { $group: { _id: "$subject", total_score: { $sum: "$score" } } } 
  // Group by subject and sum scores for each subject
])

Result:

{ "_id": "数学", "total_score": 90 + 78 + 88 = 256 },
{ "_id": "语文", "total_score": 85 + 92 = 177 }

Example 3: Group by Class and Calculate Average Scores

db.students.aggregate([
  { $group: { _id: "$class", avg_score: { $avg: "$score" } } } 
  // Group by class and calculate average score for each class
])

Result:

{ "_id": "一班", "avg_score": (90 + 85 + 88)/3  87.67 },
{ "_id": "二班", "avg_score": (78 + 92)/2 = 85 }

四、Combining $match and $group

In practice, we often filter first and then group. For example: “Calculate the average score for each class in the ‘数学’ (Math) subject”.

Steps:

  1. First stage: Use $match to filter documents where subject is “数学”.
  2. Second stage: Use $group to group by class and calculate the average score.
db.students.aggregate([
  { $match: { subject: "数学" } }, // Filter students with Math scores
  { $group: { _id: "$class", avg_math_score: { $avg: "$score" } } } // Group by class and calculate average
])

Result:

{ "_id": "一班", "avg_math_score": (90 + 88)/2 = 89 },
{ "_id": "二班", "avg_math_score": 78 }

Summary

  • $match: Acts as a “filter” to reduce the amount of data processed later by filtering early.
  • $group: Acts as a “calculator” to group data by a key and use accumulator operators to summarize statistics for each group.
  • Combination: Filtering first and then grouping is the core pattern for aggregation analysis.

With these two operators, you can perform basic statistical analysis. Later, explore other operators like $project (projection, retaining only needed fields) and $sort (sorting) to enhance your data analysis capabilities!

Xiaoye