Debugging MongoDB’s 2dsphere geoIndex

Andrew Kouri
andrewkouri
Published in
3 min readSep 29, 2015

When I first learned about MongoDB’s geospatial capabilities similar to PostGIS, I was beyond excited because it meant I would not have to write helper functions to find data along geohash borders. Currently my data is indexed by geohash, which has great ordering properties, but is a little unwieldy for the “find me x near y”-type queries.

Unfortunately, it seems as though the 2dsphere index doesn’t cover geoWithin queries. As my collection of geoJSON documents grew from GBs to TBs, I noticed my tools taking extraordinary amounts of time to perform find queries, which caused me to investigate what was happening under the hood.

First, I double-checked that my geospatial index was present:

mongos> db.my_collection.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "telemetry.********"
},
{
"v" : 1,
"key" : {
"value.avg_location" : "2dsphere"
},
"name" : "geoindex",
"ns" : "telemetry.*******",
"2dsphereIndexVersion" : 2
}
]

So, the index was clearly there. I also verified that its size was appropriate — it was around 50mb, which seemed fine for this amount of test data.

Next, I ran .explain() on a query to see if it was hitting the index:

var query = {
"value.avg_location": {
"$geoWithin": {
"$polygon": [
[-122.07496047019957,
37.65882781135178
],
[-122.07496047019957,
37.663906822384874
],
[-122.0871913433075,
37.663906822384874
],
[-122.0871913433075,
37.65882781135178
],
[-122.07496047019957,
37.65882781135178
]
]
}
}
};
db.my_collection.find(query).explain("executionStats");

Nope:

"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"value.avg_location" : {
"$geoWithin" : {
"$polygon" : [
[
-122.07496047019957,
37.65882781135178
],
[
-122.07496047019957,
37.663906822384874
],
[
-122.0871913433075,
37.663906822384874
],
[
-122.0871913433075,
37.65882781135178
],
[
-122.07496047019957,
37.65882781135178
]
]
}
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
"executionStats" : {
"nReturned" : 164,
"executionTimeMillis" : 341340,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10760184,
"executionStages" : {
"stage" : "SINGLE_SHARD",
"nReturned" : 164,
"executionTimeMillis" : 341340,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10760184,
"totalChildMillis" : NumberLong(341334),
"shards" : [
{
"shardName" : "rs0",
"executionSuccess" : true,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
....
},
"nReturned" : 164,
"executionTimeMillisEstimate" : 334786,
"works" : 10760186,
"advanced" : 164,
"needTime" : 10760021,
"needFetch" : 0,
"saveState" : 84076,
"restoreState" : 84076,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 10760184
}
}
]
}
},
"ok" : 1

So, from the “stage” param COLLSCAN, we can clearly see it’s doing a collection scan to find the documents. We want it to say something like “IXSCAN” to indicate that it’s using the index.

I tried deleting and rebuilding the index:

mongos> db.mycoll.dropIndex("geoindex")
true
mongos> db.mycoll.createIndex({"value.avg_location" : "2dsphere" }, {name: "geoindex"});
mongos>

Still, no dice!

Okay, well Mongo’s docs says that the most efficient queries will query on AND return only the indexed fields. So, I tried limiting (“projecting” in Mongo-parlance) the query to just return the “value.avg_location” field, but we ended up still doing a “COLLSCAN.”

UPDATE: I ended up fixing this by re-structuring the query. It turns out the original syntax I used for the polygon query is the older syntax, which doesn’t hit the index. The newer syntax looks like this:

{
"location": {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [ [ <the coords> ] ]
}
}
}
}

--

--