Featured Posts

The 10 Worst Things about World of Warcraft - Mists... I've been playing WoW since vanilla version starting in 2006.  Except for a six-month hiatus in late 2011, I've been a daily player.  I've seen multiple patches come...

Read more

Best Breakfast Burritos, ever! I like eating a good breakfast, usually around lunchtime once I've had my fill of coffee and am awake enough to appreciate a good breakfast. This is my recipe for my ultimate...

Read more

Testing Arrays in PHP - Back to Basics... Sometimes, when you're wallowing through your abstraction class layers, you find yourself using code for simple functions that are normally the focus of an Intro to Programming...

Read more

PHP: Comparing Object Structures I'm working on a project where I am converting an established REST API over to a rabbitMQ service.  Because, you know, dinosaur, I'm continuing to use PHP as my language...

Read more

Mountain Lion and Tunnelblick - Playing Nice Together One of the things that requires some tweaking after the installation of Mac OS X (Mountain Lion) is Tunnelblick, a free and open-source GUI for openVPN.  I use Tunnelblick...

Read more

Subscribe

Can’t Connect to mongo 28017 from remote host…

Category : Technical
No Gravatar

Admittedly, it’s been a long, long time since I’ve had to do a fresh install of mongodb…I am in the process of setting up a couple of mongo servers behind my firewall to use for cluster testings.  Ancient PC’s.  AMD Athalon class.  One even has a floppy disk installed.

Anyway, once I had the OS installed (Ubuntu 12.10 server) and all the various packages, including mongodb, added to the system, I wanted to access the mongodb from another machine on my network but for the life of me I couldn’t connect to the default port of 28017.

Oh, I could connect from localhost using wget without problem.

netstat -a | grep -i listen

Showed port *:28017 in listen mode so no problem there…

I even added the port via iptables to the firewall rules:

iptables -A INPUT -p tcp -d 0/0 -s 0/0 –dport 28017 -j ACCEPT

But I still couldn’t connect.

I started to browse /etc/mongodb.conf file looking for a configuration setting that may prevent me from accessing remotely and there is was:

bind-ip = 127.0.0.1

Since I don’t have concerns about security on my private network, I commented out this line and restarted mongo services.

(side note:  you don’t want to do this on a production server – instead, use a comma-separated list of ip-addresses to specifically authorize which remote hosts you will permit to connect to your mongo server.)

Worked!  Full access from within my network to mongod!

Hope this helps!

Quick Update…

Category : Announcements, Site
No Gravatar

Sorry this blog has been inactive for so long but I’ve been really, really busy with work, and my move to Puerto Nuevo, Mexico in northern BC.

I am thinking about putting together a series of posts that detail how to set-up a data-processing stack, in PHP, for mongodb that allows you to dynamically generate all CRUD queries via the class stack.

The front-end interface, to this stack, is through RabbitMQ — also written in PHP — which eliminates Apache from the  LAMP stack, and no longer requires a REST interface for transferring data requests to-and-from store.

The stack includes services such as auditing, registration for public-facing requests, memcached and membase support, error-logging, and internal checks on requests that prevent things like query generation that result in full-table scans or any searches on un-indexed columns within either mongodb or mysql. (I think I still remember how to code for mysql… :) )

Anyway, this project has been all-consuming for me for the past year and the concept of generalizing the stack for instructional purposes has been rattling around in my can now, looking for a way out, for quite some time.  It’s not like there’s a plethora of PHP-based RabbitMQ tutorials out there either.

So, that’s the happs.  Now that things are settling down a bit, I’ll try to get more information out.

Thank you for checking-in!

Renaming mongodb Columns

Category : Technical
No Gravatar

Today I was putzing around in the geo-spatial collection when I noticed that I had an unhappy over one of the column names within the collection.

In the mySQL world, changing a column name is pretty straight-forward courtesy of the alter table command.

Mongo…not so much…

<BEGIN_UNRELATED_SIDE_RANT>

The Mongo documentation is normally the first place most of us go when we’re looking for help in using our favorite noSQL database.

Why?

Well…because that’s usually where Google directs us to go and also because there just isn’t a whole lot of documentation out there on the subject to begin with.

The mongo (10gen) documentation is pretty good.  It’s not, however, excellent.  And I can articulate the reason why.

It’s pretty easy to identify documentation written by engineers as opposed to documentation written by everyone else (on the planet).  And not because of technical content or the (ab)use of really big and impressive-sounding jargon.

No – it’s because most engineering-authored documents are written using a solution-based voice instead of a problem-based voice.

Think about it:  when I have to go to the man-page for help, it’s because I have a problem.  If I had a solution, I would be writing a blog post.    But since I have a problem, I need the man-pages, online docs, whatever, to help me figure-out a solution.

Engineering documents are written from a solution perspective:  the document assumes you possess some bit of arcane lore (which is probably just exactly that little bit of lore that you’re missing which has caused your trip to the documentation vault) and everything that is explained within the document all hinges on this piece of knowledge which the author, albeit with the finest of intentions, assumes is already firmly in your mental possession.

And that’s why I usually don’t like 10gen’s documentation.  But, like I said earlier, it’s the only game in (Google)town.

<END_UNRELATED_SIDE_RANT>

In mongo, to change the name of a column within a collection, you first have to be on a release of mongodb 1.7.2 or later.  Since most of us bleeding-edge, early-adopter types are all 2.x versioned, this shouldn’t be an issue.

This page from 10Gen is the update page and, within, talked about the $rename modifier to the update command.  What the section doesn’t say, because it’s assuming you’re wanting to update records and not schema, is how to apply a change to all of the records in your collection.

In my case, I have a column-name which I fat-fingered the name right out it’s camel-case:  CountryID instead of countryID.  (And, yes, OCD-peeps, I know that it’s not strictly camelCase, thank-you!)  I want to spin through all 3.7 million rows in my collection and rename this column…

> db.geodata_geo.update( {} , { $rename : { 'CountryID' : 'countryID' }}, true, true );

So what we have here is the update command to the collection (geodata_geo) and four parameters:

  1. {} — the empty set (this is what’s missing from the 10gen doc) implying to do whatever to each record in the collection
  2. $rename — the modifier to the update command which, in this case: replace ‘CountryID’ with ‘countryID’
  3. false — indicates to allow upserts if the record does not exist
  4. true — multi option:  means to apply command to all records since, by default, the update() quits after updating the first record

And I run this command and mongo goes off (whirr…whirr … I have two-node replication…) and renames the column in my collection!

What it didn’t do was update my index. 

So, after my column-renaming completed, I needed to drop the index(es) that had ‘CountryID’ as members and re-index the collection to reflect the new column name.

Executing getIndexes() confirmed that my mongo world was back in it’s correct orbit and life, once again, was good.

Why is my mongo query so slow?

Category : Technical
No Gravatar

Why’s my mongodb query so slow?

I got my geospatial collection set-up — I am running some really great queries making sure that the locations I am pulling aren’t in any sort of cache, and I am just blown-away by how fast data is being returned.

The problem is:  when I query the collection to pull up the requisite lon/lat data by name:  city & state, or city & country, the query seems to take seconds to complete!

I set-up the table correctly…I indexed the crap out of all my columns…a week or two ago, I was at the mongoSV 2011 in Santa Clara and learned some really cool stuff about queries, indexing, and performance management, so let’s dig-out the notes and see where I went wrong.  Because I strongly doubt that the problem is in mongo but, rather as we used to say in technical support: this is a PBCK issue…

The first thing I want to do is run an explain against my query so I can see mongo’s query plan for my query.  This should provide me with a starting point for trying to figure out what went wrong.

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama” }).explain();

By adding the trailing function: .explain(), I’m requesting that mongoDB return the query-plan to me instead of executing the query.  I hit enter to launch the explain() and get back the following output:

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
{
“cursor” : “BasicCursor“,
“nscanned” : 3691723,
“nscannedObjects” : 3691723,
“n” : 1,
“millis” : 2269,
“nYields” : 0,
“nChunkSkips” : 0,
“isMultiKey” : false,
indexOnly” : false,
“indexBounds” : {

}
}

The important information, I bold-faced in the query output (above).   What this output is telling me is that I’ve using a “BasicCursor” for my search cursor — which is indicates that, yes, I am doing a table-scan on the collection.  So, already I know my query is not optimal.  But, wait!  More good news…

The value for nscanned and nscannedObjects is the same: 3,691,723 — which coincidently is the same as the cardinality of the collection.  This number is the number of documents scanned to satisfy the query which, given it’s value, confirms that I am doing a full table scan.

millis tells me the number of milliseconds that the query would take:  2.269 seconds:  way too slow for my back-end methods() serving a REST API — unacceptable.

And then we get to the tell:  IndexOnly tells me that if the query could have been resolved by an (existing) covering index.  Seeing the value false here tells me that the collection has no index on the columns I am scanning against.

What?!?  I know I indexed this collection…

So, I run db.geodata_geo.getIndexes() to dump my indexes and … I … don’t see my name columns indexed.  Oh, I remembered to index the the ID and Code columns…but somehow, indexing the Name columns completely slipped past my lower brain-pan.

I add these indexes to my collection:

> db.geodata_geo.ensureIndex({ cityName : 1 });
> db.geodata_geo.ensureIndex({ stateName : 1 });

And then I rerun the query plan and see the following output:

> db.geodata_geo.find({ cityName : “Anniston”, stateName : “Alabama”}).explain();
{
“cursor” : “BtreeCursor cityName_1″,
“nscanned” : 2,
“nscannedObjects” : 2,
“n” : 1,
“millis” : 101,
“nYields” : 0,
“nChunkSkips” : 0,
“isMultiKey” : false,
“indexOnly” : false,
“indexBounds” : {
“cityName” : [
[
"Anniston",
"Anniston"
]
]
}
}

Instead of BasicCursor, I see BtreeCursor which gives me a happy.  I also see that the nscanned and nscannedObjects values are now more realistic…seriously:  2 is a LOT better than 3.6 million something, right?  Another happy for me!

I score the third happy when I see that the millis has dropped down to 101:  0.101 seconds to execute this search/query!  Not jaw-dropping, I agree — but acceptable considering that everything is running off my laptop…I know production times will be much, much lower.

 

In the end, I learned that a simple tool like .explain() can tell me where my attention is needed when it comes to optimization and fixing even simple, seemingly innocent queries.  Knowing what you’re looking at, and what you’re looking for, is pretty much thick-end of the baseball bat when it comes to crushing one out of the park.

I hope this helps!

 

Reference Link:  Explain

mongodb, geospatial indexing, and advanced queries….

Category : Technical
No Gravatar

I’ve been working to build, and re-build, a geospatial table for work.  There’s been a lot of challenges in this project for me as this is the first time that I’ve had to architect db designs incorporating mongodb with mySQL.

The mongo geospatial repository will be replacing several tables in the legacy mySQL system — as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is:  no more hideous trigonometric mySQL queries!

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

Let’s use a standard customer object as an example.  Every customer has at least one phone number.  Most, if not all, customers have more than one phone number.  We could define several columns in the customer table for the phone numbers: workphone, homephone, cellphone, otherphone and store the data that way.  Problem is that we will eventually hit the wall where we have the need to store numbers for which we don’t have columns pre-defined:  faxphone, skypephone, tddphone, vrsphone, etc.

RDBMS design demands a normalization of this 1:M data design by requiring a second table to store just phone numbers for each customer.  The phone table would have a primary key (id), the customer id, the customer phone number and perhaps a short, descriptive, field explaining the purpose of this number.  To get the phone data for a customer, then, you’d simply query (or join) the phone table based on the customer ID to get all the phone tuples for that customer.

Mongo, on the other hand, sees every customer as a document.  Think of each customer in your collection as a piece of paper.  You want to go into your collection and retrieve on piece of paper upon which has all the customer data.  So, for example, you retrieve the document for “John Smith” and on this document, it lists several key-value pairs, underneath an array called phone:

phone : {
home : (408) 123-4567,
work : (415) 123-4567,
cell : (312) 765-4321
}

…and so on…

Mongo stores the document for this, or any user, by de-normalizing the data relationships within the customer object.  These relationships can be maintained as sub-arrays within the document.  Because mongo is schema-less, every customer object isn’t required to have all the possible combinations of phone numbers.  So, if you were to do a search where you pull-up all customers with fax numbers, our Mr. Smith would not appear in this list since he has no fax number listed in his phone array.

See?

This first step towards clarity in mongo architecture, then, is to think of all the data when you design a class object and include that data within a single document.  Data that was stored, in traditional RDBMS relation-based tables, is incorporated into the document as sub-arrays to the document.

But, you’re asking, what if you want to later add a fax number to John Smith’s phone collection?  Can you do that?

Sure!

Again, this is the inherent strength in mongodb — it’s schema-less!  Adding another number to the existing collection of phone numbers, or adding a new “column” to the document itself, requires only that you update that document.  That’s it!

So, returning back to the geospatial build, I used mySQL pull the legacy data, and collect the updated catalog tables into a single database.  Then I built new tables that (a) eliminated columns I no longer needed and, (b), de-normalized the data so that every tuple in every table reflected all of the data.

I then combined the five tables into a single table under a new primary-key value and then imported this data directly into a mongo collection.  This took several hours as my collection has over 3.6 million rows.

Once I had the collection in mongo, I made a mongo-dump of the collection so that I’d could recover back to this point in-case anything went south.  (Which it did…)

I executed a PHP script I wrote to scan the mySQL table, get the tuple by the newly-created primary key, and then create the sub-array in the mongo collection for the geospatial data.  See, in order to impose a geospatial index, your lat/lon data has to be a sub-array within the primary collection.  There’s no way I’ve yet discovered to import data over from either a flat (csv) file, or directly from mySQL, so that it creates your sub-array automagically.  Hence, the home-brew PHP script to parse through the mySQL records and build (insert) the sub-array in the newly-created mongodb collection.

(Side note:  I was careful to maintain the maximum mantissa values for the lat/lon fields by importing initially, into mySQL as varchar(255) fields — this kept my 13-digit mantissas.  When I imported the data into mongodb, mongo converted these values into doubles and kept the precision.  However, my PHP program, casting these values to either (float) or (double) converted (round) the matissa to 7-digit precision.  Suitable for task?  Yes.  Annoying to lose this data?  Yes.  If you have a solution to this, please leave me a comment at the end of this article.  Thanks!   :-P )

The next step was to add the geo-spatial index to the collection:

> db.geodata_geo.ensureIndex( { loc : “2d” } );
point not in interval of [ -180, 180 )

What?

This error message was telling me that my data was out of range of the acceptable lat/lon values!

I tried searching for the data culprits:

> db.geodata_geo.find( { "loc" : { $exists : true }}).count();
3685667
> db.geodata_geo.find({"loc.lon" : {$lt : -180}}).count();
0
> db.geodata_geo.find({"loc.lon" : {$gt : 180}}).count();
0
> db.geodata_geo.find({"loc.lat" : {$gt : 180}}).count();
0
> db.geodata_geo.find({"loc.lat" : {$lt : -180}}).count();
0

These queries were telling me that while I have over 3.6 million records indexed, none are outside of the -180,180 boundaries.

> db.geodata_geo.find({"loc.lat" : {$gt : -180}, "loc.lon" : {$lt : 180}}).count();
3685663
> db.geodata_geo.find( { "loc" : { $exists : true }}).count();
3685667

These queries tell me that I have a delta of 4-records that exists outside of the -180, 180 boundary.

Wait...wut?

I focus on the $gt/$lt and wonder if I have an "edge" case.  Given that I've lost 6-digits of my mantissa, I wonder if mongo has rounded my data to my edge cases of 180:

> db.geodata_geo.find({"loc.lon" : 180 });

And I get back exactly four records that have a lon-value of exactly 180:

"loc" : { "lon" : 180, "lat" : -16.1499996 }

This, to me, seems to be an error in how mongodb indexes geospatial data.  If 180 is an allowed value for lat/lon, then why throw the error when you ensure the index?  I decide to handle this rounding problem by expanding the allowable limits of my query:

> db.geodata_geo.ensureIndex({ "loc" : "2d" }, { min : -180, max : 181 });
> db.geodata_geo.getIndexes();
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "dev_honeybadger.geodata_geo",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"loc" : "2d"
		},
		"ns" : "dev_honeybadger.geodata_geo",
		"name" : "loc_",
		"min" : -180,
		"max" : 181
	}
]

And I see that my geospatial index has been created.  Now, to test:

> db.geodata_geo.find( { loc : {$near : [-50,50] } } ).limit(5);

And it immediately returns five records (Elliston, Bonavista, Elliston Station, Catalina and Port Union, Division #7,  in Canada) that I asked for.

My geospatial index is complete!  Now, all I need to do is add my regular indexes for keyed searching and export the table off my development environment.

 

 

mongodb.findOne() — calling with PHP variables (not literals)

Category : Technical
No Gravatar

So I’ve been doing a lot of work, for work, in MongoDB lately and I’ve learned an awful lot.  Or, depending on your point of view, a lot that’s just awful.

See, there’s not what you could even charitably call a lot of MongoDB documentation to begin with.   If you filter what is available on, oh, say, PHP implementation, well the results just dwindle to something roughly the same size as a tax-collector’s heart.

Here’s the scenario — I’ve been working on adding a mongo abstraction class on top of my base-data abstraction class — whereas said classes are extended by the table-level class instantiation.  This allows me to keep all of my query logic in the middle tier of the class design, generic and administrative functions in the base class, and table-specific stuff in the table class.  So far, so good, right?

Well, I get the mongo constructor running and, like it’s mySQL counterpart, I have an rule in every table constructor that states “if I pass a indexed field and it’s value to the constructor, then instantiate the class pre-populated with that record.”

And that’s where things start to head south.

In my constructor logic, I’m only allowing single-value key->value pairs as constructor parameters with the design intention of getting a record from the db using the pkey of the table/collection.  In other words, you get one column and one column value.  So, if you’re going to instantiate a new user object, you’d probably want to pass-in the primary-key field of a user and that field’s value:

$objUser = new UserProfile(‘email’, ‘mshallop@gmail.com’);   // instantiate a new user object with this email address

Still pretty easy.  I bang out the mySQL equivalent in nothing flat.  I hit a huge pothole when I get to the mongo side.

The method is defined as a protected abstract method in the base class – so this method has to appear in both child classes as defined in the parent:

protected abstract function loadClassById($_key, $_value);

So I have my methods defined in both the mySQL and mongoDB middle layer.  My strategy for the mongo fetch-and-return is pretty simple — once the class has been instantiated, do the following:

  1. make sure the $_key value exists in the allowed field list
  2. make sure the $_value has a value
  3. query mongodb using .findOne()
  4. store the return key->value pairs in the member array
  5. return status

That’s pretty much it.  But I run into huge problems when I get to step 3 — use the mongoDB findOne command.

The findOne method takes an array input of the key->value pair.  From the mongo command line, you’d execute something like this:

> db.session_ses.findOne({'idpro_ses' : 1})
{
 "_id" : ObjectId("4ea1af93ddc69802376b56d1"),
 "id_ses" : 1,
 "idpro_ses" : 1
}

( Just to show you that the data exists in the mongo collection…)

But, the PHP-ized version of the method is a wee bit different:

$this->collection->findOne(array(‘idpro_ses’ => 1));

All of the examples that I’ve been able to locate show using the method by invoking it using literals.  My problem is that I have the two input parameters sent to the method ($_key and $_value) and I’ve got to find a way to get the PHP version of the method call to work using variables instead of constants.  This is what didn’t work:

$this->collection->findOne(array($_key => $_value));

$this->collection->findOne(array(“‘” . $_key . “‘” => $_value));

 

$this->collection->findOne(array(“{$_key}” => $_value));

$aryData = array();
$aryData[$_key] = $_value;
$this->collection->findOne($aryData);
or
$this->collection->findOne(var_dump($aryData)); 

I thought this worked but I was wrong:

$this->collection->findOne(array(array_keys($aryData) => array_values($aryData)));

This format returned a mongo record — the problem was that it returned the first mongo record independently of any key-search criteria.

What finally worked for me was this:

            $qs = array(); // QueryStructure
            switch($this->fieldTypes[$_k]) {
                case 'int' :
                    $_v = intval($_v);
                    break;
                case 'str' :
                    $_v = strval($_v);
                    break;
                case 'float' :
                    $_v = floatval($_v);
                    break;
            }
            $qs[$_k] = $_v;
            $aryData = $this->collection->findOne($qs);

[Update]

I encountered a similar problem when trying to update records in a mongo collection — while I could update the record from the mongo command line, I did not experience the same success in trying to execute the command from within my PHP program…

$foo = $collection->find(array(‘id_geo’ => $row['id_geo']));

Consistently failed.  No exceptions were caught, and mongo’s findLastError() reported no errors in the transaction.

After several iterations of debugging and attempting various work-arounds, I stumbled upon the solution as being one of casting.  While the variable was being evaluated in the PHP array as type int, somehow this wasn’t being interpreted that way by Mongo.  Casting the variable to an integer:

$foo = $collection->find(array(‘id_geo’ => intval($row['id_geo']))); 

 generated a successful query for both the find() and my update() functions.

As I gain experience with Mongo, I expect to discover more of these little mannerisms…

Part 4: Installing Apache Thrift: Linux Development Environment

Category : Technical
No Gravatar

Previously, we dealt with getting a working LAMP development environment up and running on a fresh CentOS 6 install.  We next dealt with the installation of PHPStorm and our JDK issues.

In this, and the next issue, I’m going to talk about the Thrift framework and getting it installed and running.

Thrift was originally developed by Facebook, was entered into open source in 2007, and became part of the Apache incubator the next year.

Thrift, according to Apache, is “a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.”

What it is in plainspeak is an API framework for your LAMP application.

Why I want it:  I want to use Thrift for our project because of the nature of the project.  (A social-networking concept.)  Because the application will rely heavily on data-storage calls, I’ve decided to implement the data access layer as an API instead of a more-traditional OOP model.  Thrift, as the API framework, allows me complete freedom on the back-end of the API.  I can implement the API in a variety of languages, although I’ll probably use PHP.

Thrift also provides me with a strongly-typed interface to the API.  Like XML-RPC, calls to the API are well-defined beforehand and must comply with the typed definition of both the methods used, and the data exchanged to/from said methods.

My personal experience with Thrift is limited — I used it as an API for a product concept at a former employer.  The calling application would invoke the API and make requests to the API which, in turn, would do a “bunch of stuff” and return a well-defined “data ball” (a json object) back to the calling stub for processing and display.

The other concept that makes me embrace Thrift as the controller for my LAMP application is that I can completely encapsulate the data layer from the front-end developers.  They do not need to know if the data is stored within mongodb, mysql, or a flat file.  All they need is the data.  The query language is hidden; front-end developers should not need to write data-access code.

I’ll talk more about the glories of Thrift later.  For now, let’s just get it installed and running…

On our Linux system, we have to do some preliminary installation of packages first.  Luckily, if you hit the Thrift Wiki, you’ll find pretty much everything you need to do a successful install.  Be warned, however.  Sparseness of documentation could easily be one of the hallmarks of Thrift.  Read carefully, and then read again before punching the enter key on your keyboard.  Make sure you understand what it is you’re about to do.

Ok.  Let’s get some non-LAMP development tools installed.  Our first command will be to install most of the pre-requisite packages needed by Thrift:

#  <span class="Apple-style-span" style="font-family: Consolas, Monaco, monospace; font-size: 12px; line-height: 18px; white-space: pre;">sudo yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel</span>

This  will install the base development packages you’re going to need.  Once this has completed, you should also install the open-SSL development libs as the build will fail without it.  (At least, if failed on my install.)

#  sudo yum install openssl-devel.x86_64

Installing this package will also pick-up all the dependent packages you’ll need to complete the install.

Next, download the Thrift tarball from the site and move the package somewhere within what will become your DocumentRoot path for Apache2.

#  tar xvzf thrift-0.7.0.tar.gz

Once you’ve expanded the tarball, cd into the thrift directory and follow the instructions to make the Thrift packages and libraries.  I did this pretty much exactly as told and my installation went without a problem.

At this point, we’ve only built and installed the Thrift libraries (installed in /usr/lib, I believe…).  In the next installation, we’re going to install the PHP src directory and make it visible to our application’s docRoot.

Searching embedded array structures in Mongodb documents…

Category : Technical
No Gravatar

I am working with a fairly complex data structure, which I am storing in mongodb.  The document, as these are known in the parlance of Mongo, contains six embedded documents, one of which is an array of documents.

It is this document, the array, which I am interested in searching.

The mongo home page is a great resource for us mongo-nubs — the documentation available there is well-written, clear, and concise.  Sometimes, I can even find what I need, before I even realize that I need it!

However, like most documentation, and this is the former support-engineer in me talking, developer documentation is written in a tone of a “solution” and not as a “problem”.  In other words, when I’m looking to solve a problem I am having, I am searching in the terms of my problem.  Once I’ve solved the problem, I will usually document what I’ve learned — the challenge is to not write the document using the vocabulary of the solution, but to use the vocabulary of the problem (seeking) instead.

Anyway, the problem statement for this entry is to be able to search a field within an array of fields within an embedded document, within a mongodb document.  Here we go….

Fatal Error: Class ‘Mongo’ Not Found…

Category : Technical
No Gravatar

Working on this Mongo script in PHP – the script is meant to be run from the command line as there’s about seven different parameters I need to pass into it.  The script itself takes a defined CSV file and build a multi-collection structure for storing in  as a Mongo table.

I get the basic stuff working – the data is porting over and storing very nicely into Mongo — and I go to work on the command-line options (argc/argv stuff) for the script.  First thing I do, is specify the php executable to the shell:

1
2
#! /usr/bin/php
<?php

And I go on to code the handlers for the command-line arguments on a Friday.  When I come back to work on Monday, I go back to run the script and I get the error:

Fatal Error: Class Mongo not found in {source code} on Line such-and-such… which is a huge WTF to me because we’re talking about  nothing changing in the source code over the weekend…

I run mac ports and get a PHP update but I get the same error.

I check that the mongo extensions are loaded with php -i:

mongo
MongoDB Support => enabled
Version => 1.0.9
Directive => Local Value => Master Value
mongo.allow_persistent => 1 => 1
mongo.auto_reconnect => 1 => 1
mongo.chunk_size => 262144 => 262144
mongo.cmd => $ => $
mongo.default_host => localhost => localhost
mongo.default_port => 27017 => 27017
mongo.long_as_object => 0 => 0
mongo.native_long => 0 => 0
mongo.utf8 => 1 => 1

I google the error and read that (derp-derp) the mongo extension is probably not loaded in the php-cli.ini file.  Which I search for but don’t seem to have on my system.

I do have the Zend CE install still hanging around and since I’m using PHPStorm now (it rocks!), I nuke the Zend installation.  Nope – still getting the error.  I run it past a couple of the Mongo/PHP wonks here at work and get lots of hmmm….but it’s a head-scratcher for all of us.

Then, right before I fall asleep last night, I get the answer…it’s in the shell directive in the script…the /usr/bin/php is not the same as the mac-ports version of php.  I ls -l and file the two php files and I get:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px} span.Apple-tab-span {white-space:pre}

$ ls -l /usr/bin/php

-rwxr-xr-x  1 root  wheel  30168960 Aug 22 19:44 /usr/bin/php

$ file /usr/bin/php

/usr/bin/php: Mach-O universal binary with 3 architectures

/usr/bin/php (for architecture x86_64): Mach-O 64-bit executable x86_64

/usr/bin/php (for architecture i386): Mach-O executable i386

$ ls -l /opt/local/bin/php

-rwxr-xr-x  2 root  admin  6054056 Jan 10 14:49 /opt/local/bin/php

$ file /opt/local/bin/php

/opt/local/bin/php: Mach-O 64-bit executable x86_64

Once I change the directive in the script source to point the shell to the correct executable, the mac-ports version of PHP, my Mongo-Fatal error goes away and life, once again, is good.

MongoCursorException: E11000 duplicate key error index

Category : Technical
No Gravatar

So I’m working on a project where I’m taking a csv file that contains a little more than 100 columns of data by 10,000 rows.  (It’s a sample db file — the final file will be about 200,000,000 rows…) and writing a PHP script to process the csv file into structures that can be inserted as mongodb collections.

I’m rocking along and all is working well for initial tests of the algorithm (header + first row of actual data) but when I turn-on processing for the other 9,999 rows, all I get stored into mongo is the first row of data.

I add an echo statement after the insert and I see 10,000 names scroll across my terminal.  So the problem isn’t that I’m not getting the data, it’s that the data isn’t being stored into Mongo.  I try turning on safe writes on my $mongo->insert() function and *bam*, error message:

Page optimized by WP Minify WordPress Plugin

Weather forecast by WP Wunderground & Denver Snow Service