Embed vs reference
- Embed when data is always accessed together, has a 1:1 or 1:few relationship, and does not grow unboundedly.
- Reference when data is large, accessed independently, or shared across many documents.
The subset pattern
For a product with thousands of reviews, embed only the 10 most recent reviews in the product document. Store all reviews in a separate collection. The product page loads fast; the full review history is available on demand.
Bucket pattern for time-series
Instead of one document per reading, bucket multiple readings per document:
{
"sensor_id": "s001",
"hour": ISODate("2024-03-01T14:00:00Z"),
"readings": [
{ "ts": ISODate("2024-03-01T14:00:05Z"), "val": 22.4 },
...
],
"count": 720
}
Dramatically reduces document count and index size for high-frequency sensor data.
Index strategy
Every query filter field needs an index. Use compound indexes matching your query sort order. Covered indexes (all projected fields are in the index) avoid fetching documents entirely.