Featured Posts

Top 5 Debugging and Diagnostic Rules for Programming I remember in 1991 an ad that appeared in the weekly news-magazine, popular at the time, Information Technology (IT) Weekly.  The ad showed some poor schmuck laboring away...

Read more

The 10 Worst Things about World of Warcraft - Mists... I've been playing WoW since vanilla version starting in 2006.  Except for a six-month hiatus in late 2011, I've been a daily player.  I've seen multiple patches come...

Read more

Best Breakfast Burritos, ever! I like eating a good breakfast, usually around lunchtime once I've had my fill of coffee and am awake enough to appreciate a good breakfast. This is my recipe for my ultimate...

Read more

Testing Arrays in PHP - Back to Basics... Sometimes, when you're wallowing through your abstraction class layers, you find yourself using code for simple functions that are normally the focus of an Intro to Programming...

Read more

PHP: Comparing Object Structures I'm working on a project where I am converting an established REST API over to a rabbitMQ service.  Because, you know, dinosaur, I'm continuing to use PHP as my language...

Read more

Subscribe

Why is my mongo query so slow?

Category : Technical
No Gravatar

Why’s my mongodb query so slow?

I got my geospatial collection set-up — I am running some really great queries making sure that the locations I am pulling aren’t in any sort of cache, and I am just blown-away by how fast data is being returned.

The problem is:  when I query the collection to pull up the requisite lon/lat data by name:  city & state, or city & country, the query seems to take seconds to complete!

I set-up the table correctly…I indexed the crap out of all my columns…a week or two ago, I was at the mongoSV 2011 in Santa Clara and learned some really cool stuff about queries, indexing, and performance management, so let’s dig-out the notes and see where I went wrong.  Because I strongly doubt that the problem is in mongo but, rather as we used to say in technical support: this is a PBCK issue…

The first thing I want to do is run an explain against my query so I can see mongo’s query plan for my query.  This should provide me with a starting point for trying to figure out what went wrong.

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama” }).explain();

By adding the trailing function: .explain(), I’m requesting that mongoDB return the query-plan to me instead of executing the query.  I hit enter to launch the explain() and get back the following output:

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
{
“cursor” : “BasicCursor“,
“nscanned” : 3691723,
“nscannedObjects” : 3691723,
“n” : 1,
“millis” : 2269,
“nYields” : 0,
“nChunkSkips” : 0,
“isMultiKey” : false,
indexOnly” : false,
“indexBounds” : {

}
}

The important information, I bold-faced in the query output (above).   What this output is telling me is that I’ve using a “BasicCursor” for my search cursor — which is indicates that, yes, I am doing a table-scan on the collection.  So, already I know my query is not optimal.  But, wait!  More good news…

The value for nscanned and nscannedObjects is the same: 3,691,723 — which coincidently is the same as the cardinality of the collection.  This number is the number of documents scanned to satisfy the query which, given it’s value, confirms that I am doing a full table scan.

millis tells me the number of milliseconds that the query would take:  2.269 seconds:  way too slow for my back-end methods() serving a REST API — unacceptable.

And then we get to the tell:  IndexOnly tells me that if the query could have been resolved by an (existing) covering index.  Seeing the value false here tells me that the collection has no index on the columns I am scanning against.

What?!?  I know I indexed this collection…

So, I run db.geodata_geo.getIndexes() to dump my indexes and … I … don’t see my name columns indexed.  Oh, I remembered to index the the ID and Code columns…but somehow, indexing the Name columns completely slipped past my lower brain-pan.

I add these indexes to my collection:

> db.geodata_geo.ensureIndex({ cityName : 1 });
> db.geodata_geo.ensureIndex({ stateName : 1 });

And then I rerun the query plan and see the following output:

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
{
“cursor” : “BtreeCursor cityName_1″,
“nscanned” : 2,
“nscannedObjects” : 2,
“n” : 1,
“millis” : 101,
“nYields” : 0,
“nChunkSkips” : 0,
“isMultiKey” : false,
“indexOnly” : false,
“indexBounds” : {
“cityName” : [
[
"Anniston",
"Anniston"
]
]
}
}

Instead of BasicCursor, I see BtreeCursor which gives me a happy.  I also see that the nscanned and nscannedObjects values are now more realistic…seriously:  2 is a LOT better than 3.6 million something, right?  Another happy for me!

I score the third happy when I see that the millis has dropped down to 101:  0.101 seconds to execute this search/query!  Not jaw-dropping, I agree — but acceptable considering that everything is running off my laptop…I know production times will be much, much lower.

 

In the end, I learned that a simple tool like .explain() can tell me where my attention is needed when it comes to optimization and fixing even simple, seemingly innocent queries.  Knowing what you’re looking at, and what you’re looking for, is pretty much thick-end of the baseball bat when it comes to crushing one out of the park.

I hope this helps!

 

Reference Link:  Explain

Page optimized by WP Minify WordPress Plugin

The forecast for 92143 by Wordpress Weather