Featured Posts

The 10 Worst Things about World of Warcraft - Mists... I've been playing WoW since vanilla version starting in 2006.  Except for a six-month hiatus in late 2011, I've been a daily player.  I've seen multiple patches come...

Read more

Best Breakfast Burritos, ever! I like eating a good breakfast, usually around lunchtime once I've had my fill of coffee and am awake enough to appreciate a good breakfast. This is my recipe for my ultimate...

Read more

Testing Arrays in PHP - Back to Basics... Sometimes, when you're wallowing through your abstraction class layers, you find yourself using code for simple functions that are normally the focus of an Intro to Programming...

Read more

PHP: Comparing Object Structures I'm working on a project where I am converting an established REST API over to a rabbitMQ service.  Because, you know, dinosaur, I'm continuing to use PHP as my language...

Read more

Mountain Lion and Tunnelblick - Playing Nice Together One of the things that requires some tweaking after the installation of Mac OS X (Mountain Lion) is Tunnelblick, a free and open-source GUI for openVPN.  I use Tunnelblick...

Read more

Subscribe

Quick Update…

Category : Announcements, Site
No Gravatar

Sorry this blog has been inactive for so long but I’ve been really, really busy with work, and my move to Puerto Nuevo, Mexico in northern BC.

I am thinking about putting together a series of posts that detail how to set-up a data-processing stack, in PHP, for mongodb that allows you to dynamically generate all CRUD queries via the class stack.

The front-end interface, to this stack, is through RabbitMQ — also written in PHP — which eliminates Apache from the  LAMP stack, and no longer requires a REST interface for transferring data requests to-and-from store.

The stack includes services such as auditing, registration for public-facing requests, memcached and membase support, error-logging, and internal checks on requests that prevent things like query generation that result in full-table scans or any searches on un-indexed columns within either mongodb or mysql. (I think I still remember how to code for mysql… :) )

Anyway, this project has been all-consuming for me for the past year and the concept of generalizing the stack for instructional purposes has been rattling around in my can now, looking for a way out, for quite some time.  It’s not like there’s a plethora of PHP-based RabbitMQ tutorials out there either.

So, that’s the happs.  Now that things are settling down a bit, I’ll try to get more information out.

Thank you for checking-in!

Searching MongoDB Sub-Documents…

Category : Technical
No Gravatar

I’ve recently finished a mongo collection that stores all auditing data from my application — specifically, it records every database transaction, conducted in either mySQL or mongo, assigning an event-identifier to the event, and storing the data under an event ID within a single sessionManger object.

Sounds good?

Well, I like it.   This design eliminated the need to maintain meta-data in my data tables since I can pull transaction history for any record that I’ve accessed.

The problem is that, being new to mongodb, accessing what I’ve put into mongodb isn’t (yet) as intuitive as, say, my mySQL skills are.

Sub-documents within a mongo document are analogous to the results of a mySQL join.  One of the key motivators in storing this information in mongodb to begin with was that I could de-normalize the data by storing the sub-document with it’s parent instead of having to incur the expense of a search-join-fetch later.

Traditionally, any data objects defined as a one-to-many type of a relationship (1:m) were stored in multiple mySQL tables and were accessed via some sort of join mechanism.

Mongodb breaks that traditional mold by allowing you to store a sub-document (the “m” part of the 1:m relationship) within the same document in which you’re currently working.

Using my sessionManger document, I have a document that looks something like this:

{
_id : somevalue,
foo : bar,
event : {},
argle : bargle,
}

My desire is to, for every database event that is recorded, enter information about that event within the sub-document that I’ve wittily named “event”.

In my PHP code, I’ve written a sequence manager for mongo that maintains a document containing sequence values for various tables.  Think of this as the functional version of mySQL’s auto-increment feature.  I decided, then, for the sessionManager events, I would use this key sequence to obtain unique values and use those as my sub-document index.  I’d then store whatever data I needed to store using the sequence value as a sub-document key, or index:

{
_id : somevalue,
foo: bar,
event : {
n : {
created : dateval,
table : tableName,
schema : dbSchema,
query : lastQuery
}
}
argle : bargle
}

So, when I need to add another event, I just create a new sub-document under the event key, then add the data I need to store under the sub-document index key.

Worked like a champ!

And then I asked myself:  ”So, Brainiac, how would you go about extracting event -n- from your collection?”

I went through a lot of failed query attempts, bugged a lot of people, googled and saw stuff  that led me down many plush ratholes until I finally, through some serious trial-and-error, got the answer…

> db.mytable.find( { foo : bar }, { ‘event.n’ : 1 } );

where n = the number of the event I want to find.

If I want to get all of the events for a particular document (sessionManger object), then I would write something like:

> db.mytable.find( {foo : bar}, { event : 1});

If I wanted to return all of the events for all of the objects, then I would write this:

> db.mytable.find( {}, {event : 1});

What I’ve not been able to figure out, so far, is how I can use $slice to grab a range of events within a document.  Everything I try returns the full sub-set of documents back to me.  The doc tells me that $slice is used to return a subrange of array elements, which is what I thought “event.n” was but, apparently, it’s not.  (I think it’s an object (sub-document) which is why $slice fails for me.)

It’s not a big deal because, programmatically, I can grap the entire sub-document from it’s parent and parse in-memory to get the desired record.  And, if I know what the value for -n- is, then I can fetch just that one sub-document.  So, I’m ok for now.  However, please feel free to enlighten me with your expertise and experience should you see where I am failing here, ok?

 

mongodb, geospatial indexing, and advanced queries….

Category : Technical
No Gravatar

I’ve been working to build, and re-build, a geospatial table for work.  There’s been a lot of challenges in this project for me as this is the first time that I’ve had to architect db designs incorporating mongodb with mySQL.

The mongo geospatial repository will be replacing several tables in the legacy mySQL system — as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is:  no more hideous trigonometric mySQL queries!

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

Let’s use a standard customer object as an example.  Every customer has at least one phone number.  Most, if not all, customers have more than one phone number.  We could define several columns in the customer table for the phone numbers: workphone, homephone, cellphone, otherphone and store the data that way.  Problem is that we will eventually hit the wall where we have the need to store numbers for which we don’t have columns pre-defined:  faxphone, skypephone, tddphone, vrsphone, etc.

RDBMS design demands a normalization of this 1:M data design by requiring a second table to store just phone numbers for each customer.  The phone table would have a primary key (id), the customer id, the customer phone number and perhaps a short, descriptive, field explaining the purpose of this number.  To get the phone data for a customer, then, you’d simply query (or join) the phone table based on the customer ID to get all the phone tuples for that customer.

Mongo, on the other hand, sees every customer as a document.  Think of each customer in your collection as a piece of paper.  You want to go into your collection and retrieve on piece of paper upon which has all the customer data.  So, for example, you retrieve the document for “John Smith” and on this document, it lists several key-value pairs, underneath an array called phone:

phone : {
home : (408) 123-4567,
work : (415) 123-4567,
cell : (312) 765-4321
}

…and so on…

Mongo stores the document for this, or any user, by de-normalizing the data relationships within the customer object.  These relationships can be maintained as sub-arrays within the document.  Because mongo is schema-less, every customer object isn’t required to have all the possible combinations of phone numbers.  So, if you were to do a search where you pull-up all customers with fax numbers, our Mr. Smith would not appear in this list since he has no fax number listed in his phone array.

See?

This first step towards clarity in mongo architecture, then, is to think of all the data when you design a class object and include that data within a single document.  Data that was stored, in traditional RDBMS relation-based tables, is incorporated into the document as sub-arrays to the document.

But, you’re asking, what if you want to later add a fax number to John Smith’s phone collection?  Can you do that?

Sure!

Again, this is the inherent strength in mongodb — it’s schema-less!  Adding another number to the existing collection of phone numbers, or adding a new “column” to the document itself, requires only that you update that document.  That’s it!

So, returning back to the geospatial build, I used mySQL pull the legacy data, and collect the updated catalog tables into a single database.  Then I built new tables that (a) eliminated columns I no longer needed and, (b), de-normalized the data so that every tuple in every table reflected all of the data.

I then combined the five tables into a single table under a new primary-key value and then imported this data directly into a mongo collection.  This took several hours as my collection has over 3.6 million rows.

Once I had the collection in mongo, I made a mongo-dump of the collection so that I’d could recover back to this point in-case anything went south.  (Which it did…)

I executed a PHP script I wrote to scan the mySQL table, get the tuple by the newly-created primary key, and then create the sub-array in the mongo collection for the geospatial data.  See, in order to impose a geospatial index, your lat/lon data has to be a sub-array within the primary collection.  There’s no way I’ve yet discovered to import data over from either a flat (csv) file, or directly from mySQL, so that it creates your sub-array automagically.  Hence, the home-brew PHP script to parse through the mySQL records and build (insert) the sub-array in the newly-created mongodb collection.

(Side note:  I was careful to maintain the maximum mantissa values for the lat/lon fields by importing initially, into mySQL as varchar(255) fields — this kept my 13-digit mantissas.  When I imported the data into mongodb, mongo converted these values into doubles and kept the precision.  However, my PHP program, casting these values to either (float) or (double) converted (round) the matissa to 7-digit precision.  Suitable for task?  Yes.  Annoying to lose this data?  Yes.  If you have a solution to this, please leave me a comment at the end of this article.  Thanks!   :-P )

The next step was to add the geo-spatial index to the collection:

> db.geodata_geo.ensureIndex( { loc : “2d” } );
point not in interval of [ -180, 180 )

What?

This error message was telling me that my data was out of range of the acceptable lat/lon values!

I tried searching for the data culprits:

> db.geodata_geo.find( { "loc" : { $exists : true }}).count();
3685667
> db.geodata_geo.find({"loc.lon" : {$lt : -180}}).count();
0
> db.geodata_geo.find({"loc.lon" : {$gt : 180}}).count();
0
> db.geodata_geo.find({"loc.lat" : {$gt : 180}}).count();
0
> db.geodata_geo.find({"loc.lat" : {$lt : -180}}).count();
0

These queries were telling me that while I have over 3.6 million records indexed, none are outside of the -180,180 boundaries.

> db.geodata_geo.find({"loc.lat" : {$gt : -180}, "loc.lon" : {$lt : 180}}).count();
3685663
> db.geodata_geo.find( { "loc" : { $exists : true }}).count();
3685667

These queries tell me that I have a delta of 4-records that exists outside of the -180, 180 boundary.

Wait...wut?

I focus on the $gt/$lt and wonder if I have an "edge" case.  Given that I've lost 6-digits of my mantissa, I wonder if mongo has rounded my data to my edge cases of 180:

> db.geodata_geo.find({"loc.lon" : 180 });

And I get back exactly four records that have a lon-value of exactly 180:

"loc" : { "lon" : 180, "lat" : -16.1499996 }

This, to me, seems to be an error in how mongodb indexes geospatial data.  If 180 is an allowed value for lat/lon, then why throw the error when you ensure the index?  I decide to handle this rounding problem by expanding the allowable limits of my query:

> db.geodata_geo.ensureIndex({ "loc" : "2d" }, { min : -180, max : 181 });
> db.geodata_geo.getIndexes();
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "dev_honeybadger.geodata_geo",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"loc" : "2d"
		},
		"ns" : "dev_honeybadger.geodata_geo",
		"name" : "loc_",
		"min" : -180,
		"max" : 181
	}
]

And I see that my geospatial index has been created.  Now, to test:

> db.geodata_geo.find( { loc : {$near : [-50,50] } } ).limit(5);

And it immediately returns five records (Elliston, Bonavista, Elliston Station, Catalina and Port Union, Division #7,  in Canada) that I asked for.

My geospatial index is complete!  Now, all I need to do is add my regular indexes for keyed searching and export the table off my development environment.

 

 

Converting a mySQL Column to AutoIncrement

Category : Technical
No Gravatar

We’re updating a large dataset at work — there’s about an 18% increase in the number of tuples in the new dataset spread across a highly-normalized 8 or so tables.

I have to port the new data into a (more) efficient table structure — so I’m de-normalizing the heck out of the data reducing the schema from eight tables to a single table.

In the old architecture, four of the tables have unique key values that were imposed on the data during the original port.  So, to maintain application compatibility in the data catalog, these key values have to be maintained.  Additionally, new tuples of data have to be added to the data set and new (old) key values assigned.

In porting over one of the updated tables, which uses a string-code as the primary key, I first export the old table columns (pkey, str_code) into a temp table and then add an auto_increment int() column to the new table marking the column as default = null.  I then do a simple update-join to bring over the old pkey values based on the native str_code.

This leaves me with a new numeric column that has a variable number of NULL values (representing the delta of the new-data import) interspersed with the legacy data pkey -> str_code values.

The problem is: how do I convert the NULL pkey fields to a meaningful value that maintains the auto-increment without causing mysql to totally freak?

The first thing I do is get the max-value of the pkey:

select max(pkey_field_name) from table_name;
+---------------+
| max(pkey_...) |
+---------------+
|          4162 |
+---------------+
1 row in set (0.00 sec)

Next, I need to reset the auto_increment value of the column because, since the column is just a numeric column, it currently defaults to zero.  Attempting to convert the column on-the-fly to auto-increment will cause mysql to spit and error out about duplicate primary key values…

mysql> alter table table_name auto_increment=4163;
Query OK, 3965 rows affected (0.05 sec)
Records: 3965 Duplicates: 0 Warnings: 0

Now that I have the auto_increment reset, I can convert the column to auto_increment type and, in the process of converting, mySQL will re-number the NULL column key values using the new auto_increment value so that my numbering scheme is seamless.

When I query the data back, I can see that my new column is completely re-ordered with the legacy data maintained and the new data correctly keyed.

Resetting the auto-increment key is a handy little trick to know — I also use it when building test datasets and I need a fast way to reset my table values.

Hope this helps!

 

 

Note:  Here’s the complete and full steps to successfully complete this operation.

(DDL for admin1 table)

CREATE TABLE `admin1` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `field1` varchar(255) NOT NULL,
  `field2` varchar(255) NOT NULL,
  `field3` varchar(255) NOT NULL,
  `field4` varchar(255) NOT NULL,
  `field5` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4163 DEFAULT CHARSET=utf8

(DDL for admin1ll table)

CREATE TABLE `admin1ll` (
  `field1` int(11) NOT NULL,
  `field2` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

 

Step 1:  Create a new integer column in the table

ALTER TABLE `meridian`.`admin1` DROP COLUMN `id`, ADD COLUMN `id` int UNSIGNED FIRST, CHANGE COLUMN `field1` `field1` varchar(255) NOT NULL AFTER `id`, CHANGE COLUMN `field2` `field2` varchar(255) NOT NULL AFTER `field1`, CHANGE COLUMN `field3` `field3` varchar(255) NOT NULL AFTER `field2`, CHANGE COLUMN `field4` `field4` varchar(255) NOT NULL AFTER `field3`, CHANGE COLUMN `field5` `field5` varchar(255) NOT NULL AFTER `field4`;

Step 2:  Update the new column by inserting the previous-tables pkey values

update admin1, admin1ll
set admin1.id = admin1ll.field1
where admin1.field1 = admin1ll.field2

Step 3: Update the Auto_Increment value:

> select max(id) from admin1;

4030

> alter table admin1 auto_increment=4031;

Note:  the row count does not imply or set the auto_increment value!

Step 4: Reformat Column

Set column to auto_increment, unsignent, not null, primary key

Step 5:  Validate!

select *

from admin1, admin1ll

where admin1.id <> admin1ll.field1

and admin1.field1 = admin1ll.field2

> Empty set (1.70 sec)

mongodb.findOne() — calling with PHP variables (not literals)

Category : Technical
No Gravatar

So I’ve been doing a lot of work, for work, in MongoDB lately and I’ve learned an awful lot.  Or, depending on your point of view, a lot that’s just awful.

See, there’s not what you could even charitably call a lot of MongoDB documentation to begin with.   If you filter what is available on, oh, say, PHP implementation, well the results just dwindle to something roughly the same size as a tax-collector’s heart.

Here’s the scenario — I’ve been working on adding a mongo abstraction class on top of my base-data abstraction class — whereas said classes are extended by the table-level class instantiation.  This allows me to keep all of my query logic in the middle tier of the class design, generic and administrative functions in the base class, and table-specific stuff in the table class.  So far, so good, right?

Well, I get the mongo constructor running and, like it’s mySQL counterpart, I have an rule in every table constructor that states “if I pass a indexed field and it’s value to the constructor, then instantiate the class pre-populated with that record.”

And that’s where things start to head south.

In my constructor logic, I’m only allowing single-value key->value pairs as constructor parameters with the design intention of getting a record from the db using the pkey of the table/collection.  In other words, you get one column and one column value.  So, if you’re going to instantiate a new user object, you’d probably want to pass-in the primary-key field of a user and that field’s value:

$objUser = new UserProfile(‘email’, ‘mshallop@gmail.com’);   // instantiate a new user object with this email address

Still pretty easy.  I bang out the mySQL equivalent in nothing flat.  I hit a huge pothole when I get to the mongo side.

The method is defined as a protected abstract method in the base class – so this method has to appear in both child classes as defined in the parent:

protected abstract function loadClassById($_key, $_value);

So I have my methods defined in both the mySQL and mongoDB middle layer.  My strategy for the mongo fetch-and-return is pretty simple — once the class has been instantiated, do the following:

  1. make sure the $_key value exists in the allowed field list
  2. make sure the $_value has a value
  3. query mongodb using .findOne()
  4. store the return key->value pairs in the member array
  5. return status

That’s pretty much it.  But I run into huge problems when I get to step 3 — use the mongoDB findOne command.

The findOne method takes an array input of the key->value pair.  From the mongo command line, you’d execute something like this:

> db.session_ses.findOne({'idpro_ses' : 1})
{
 "_id" : ObjectId("4ea1af93ddc69802376b56d1"),
 "id_ses" : 1,
 "idpro_ses" : 1
}

( Just to show you that the data exists in the mongo collection…)

But, the PHP-ized version of the method is a wee bit different:

$this->collection->findOne(array(‘idpro_ses’ => 1));

All of the examples that I’ve been able to locate show using the method by invoking it using literals.  My problem is that I have the two input parameters sent to the method ($_key and $_value) and I’ve got to find a way to get the PHP version of the method call to work using variables instead of constants.  This is what didn’t work:

$this->collection->findOne(array($_key => $_value));

$this->collection->findOne(array(“‘” . $_key . “‘” => $_value));

 

$this->collection->findOne(array(“{$_key}” => $_value));

$aryData = array();
$aryData[$_key] = $_value;
$this->collection->findOne($aryData);
or
$this->collection->findOne(var_dump($aryData)); 

I thought this worked but I was wrong:

$this->collection->findOne(array(array_keys($aryData) => array_values($aryData)));

This format returned a mongo record — the problem was that it returned the first mongo record independently of any key-search criteria.

What finally worked for me was this:

            $qs = array(); // QueryStructure
            switch($this->fieldTypes[$_k]) {
                case 'int' :
                    $_v = intval($_v);
                    break;
                case 'str' :
                    $_v = strval($_v);
                    break;
                case 'float' :
                    $_v = floatval($_v);
                    break;
            }
            $qs[$_k] = $_v;
            $aryData = $this->collection->findOne($qs);

[Update]

I encountered a similar problem when trying to update records in a mongo collection — while I could update the record from the mongo command line, I did not experience the same success in trying to execute the command from within my PHP program…

$foo = $collection->find(array(‘id_geo’ => $row['id_geo']));

Consistently failed.  No exceptions were caught, and mongo’s findLastError() reported no errors in the transaction.

After several iterations of debugging and attempting various work-arounds, I stumbled upon the solution as being one of casting.  While the variable was being evaluated in the PHP array as type int, somehow this wasn’t being interpreted that way by Mongo.  Casting the variable to an integer:

$foo = $collection->find(array(‘id_geo’ => intval($row['id_geo']))); 

 generated a successful query for both the find() and my update() functions.

As I gain experience with Mongo, I expect to discover more of these little mannerisms…

Part 5: Setting-up a Linux Development Machine: Virtual Hosts in Apache2

Category : Technical
No Gravatar

When I am working on code project, I isolate that project within it’s own directory/repository.  Further, it matters not if I’m starting a completely new project, or if I’m branching off the trunk of an existing project.  As a means of imposing order over chaos, I isolate the existing project within it’s own sandbox both on the filesystem and via Apache2.

To do so requires an understanding, somewhat, of the mechanics of Apache2, DNS, and your localhost.  A minimal understanding, trust me.

What it, in return, gives you is an isolated view of your code project from the web-server perspective.  Cookies are isolated by domain, your document root is isolated to a single directory/repository, and you not only put your log files, just for that domain, where ever you want but you can also name them anything you want as well.

What I’ll provide you with in this installment is a rudimentary understanding of the mechanics behind virtual hosting using Apache2, a template configuration file to get you going, and the basic steps necessary to get the whole mess working.  Let’s get started…

When you start a new project, if you’re checking it out from a source-code repository, you’ll typically assign it to a directory somewhere common.  For example, within your home directory, you may have a folder named “code” and beneath that folder, other folders that describe either the project or the programming language you’re working in.  Doesn’t really matter as the point is this:  you’ve isolated your code repository from everything else on your filesystem, right?

It really doesn’t matter, to Apache2, where you create your filesystem repository.  As long as the webserver pseudo-user has access permissions to the directory, then you can access the files within that directory via a web browser.  The webserver has to be configured to be told that, for a given domain name, where is the documentRoot for that domain.

Some of you, at this point, may be asking: what’s a domain name and why is it important?  Well, a domain name is simply a name you’ve assigned to the project to keep it separate, at least in your own head, from the other projects you may, or may not, have running on your development machine.  For example, I create a new project called newWidget and it’s currently in the 1.4 revision.  I’m ready to branch and write some new features for the product so, using whatever sccs tool, I branch off the trunk and create the 1.5 branch.

I check that branch out to a directory in /lampdev/php/newWidget115.  I now need to do two basic things:

  1. invent some domain name that will be used exclusively for this project and resolve the domain to my localhost
  2. create a virtual host in apache so that apache knows that http://newW115 points to my localhost

The reasons, apart from what we’ve already discussed, is to keep your local DNS services on your local machine.  If you, before entering any configuration information, entered: http://newW115 into a browser url bar, chances are very good you’re going to end-up on a search page (I’m using Chrome) or get some sort of browser error.

So the first step is to define the new domain name (again, given that we’re already checked the code out into the aforementioned directory) to the local system so that all requests to that domain are resolved locally through our name services.  To do this, we’re going to sudo edit the /etc/hosts file.

This file, /etc/hosts, is the first thing checked whenever your local name services is trying to resolve a host name.  If it finds a host-to-IP alias in this file, all further attempts at resolution are halted as it has successfully resolved the host name.  Edit /etc/hosts to resolve your new domain.  It should look something like this:

127.0.0.1    localhost codemonkey codemonkey.shallop.com codeMonkey.shallop.com newW115

The way /etc/hosts works is that you first list an IP address for the domain to resolve to - in this case, we’re using 127.0.0.1 which is TCP/IP speak for your local host.  Next we list all of the domain names that are going to resolve to this IP address.  In the example above, we’re resolving localhost, codemonkey, codemonkey.shallop.com, codeMonkey.shallop.com, and the new domain: newW115 all to 127.0.0.1.

Whenever I type one of these domains, for example, in to a web browser URL bar, my local host domain services won’t go out to my network name servers to resolve the domain name — it’s telling the requesting service that it’s 127.0.0.1.  Note, too, that you can alias multiple domain names to the same machine.

Side Note — this is how you can blacklist certain domains from your browsing experience.  Simple resolve that domain to 127.0.0.1…but that’s an article for another day…

You can also have multiple entries resolving to the same IP address.  It would have been just as correct for me to have listed by /etc/hosts file as:

127.0.0.1     localhost
127.0.0.1     codemonkey
127.0.0.1     codeMonkey
127.0.0.1     codemonkey.shallop.com
127.0.0.1     codeMonkey.shallop.com
127.0.0.1     newW115

Finally, also note that a domain extension isn’t really required.  We can name our domain pretty much anything we want and as long as you universally use that spelling (and case), then it will resolve locally.

Now that the domain is resolving locally, the next step is to tell Apache2 how to handle the request.  When you type: http://newW115 at the browser, the browser will query local services and receive a response that the domain is handled locally.  Apache2 will then say: “Oh, if it’s local, then were do I go to get the files and stuff?”

The configuration for Apache2 is done with virtual hosting.  Technically, you can do this without virtual hosting — but you can only do it for one domain.  If you want to locally-host multiple domains, you have to use virtual hosting.

The Apache2 configuration file lives in: /etc/httpd/conf and is named: httpd.conf.  This is the main configuration file for Apache2.  Some installations use a sub-directory, usually called something like: vhostsd.conf, and stores the vhosts.conf file within that directory.  That’s ok, too.  Apache2 is versatile that way but, for our purposes, we’re going to maintain the virtual host configuration(s) within the main conf file.

However, if you wanted to use a separate file for Virtual Hosting, all you need in your httpd.conf file is the directive:

# Virtual hosts
Include conf/extra/httpd-vhosts.conf

At the very end of httpd.conf, there’s a section called: Name-Based Virtual hosting.  We’re going to append this virtual host configuration to the end of this file.

Allow me to side-step for a quick second.  Consider if we were to install phpMyAdmin locally on our server because this is how we want to administer our mySQL database.  We can install the program files anywhere as phpMyAdmin is just another LAMP application, right?  Were we to do that, then we would need a <Directory> directive to Apache2 telling Apache2 where to look for phpMyAdmin.  The domain for phpMyAdmin would still be localhost, or 127.0.0.1 or whatever else you’d defined in /etc/hosts.  The location of the application can live anywhere and we’re using the conf file to tell Apache2 how to find and serve it to us when requested.

Alias /phpMyAdmin "/opt/local/www/phpmyadmin"
&lt;Directory "/opt/local/www/phpmyadmin"&gt;
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
&lt;/Directory&gt;

What this <Directory> directive simply does is tell Apache2 where to look for phpMyAdmin if I enter something like: http://localhost/phpMyAdmin in the URL bar of my browser.  It’s not the same thing as giving phpMyAdmin it’s own domain at all.

I do this with a lot of my web applications: phpMyAdmin, mcmon, ajaxmytop, nagios, etc., simply because I don’t want to remember where the fill path name is of the applications.  It’s easier to type: http://localhost/phpMyAdmin that it is to type: http://localhost/webapps/database/phpMyAdmin.

Ok, so back to domains.  Here’s the template for the virtual host we’ve created in /etc/hosts: newW115:

&lt;VirtualHost *:80&gt;
ServerName  <strong>newW115</strong>
ServerAdmin <a href="mailto:mshallop@nileguide.com">mshallop@g</a>mail.com
DocumentRoot <strong>/code/webapps/LAMP/newWidget/1-15</strong>

DirectoryIndex  index.php

&lt;Directory /&gt;
Options FollowSymLinks
AllowOverride None
&lt;/Directory&gt;
&lt;Directory <strong>/code/webapps/LAMP/newWidget/1-15</strong>&gt;
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
&lt;/Directory&gt;

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
&lt;Directory "/usr/lib/cgi-bin"&gt;
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
&lt;/Directory&gt;

<strong>ErrorLog /var/logs/115_error.</strong><wbr><strong>log</strong>

LogFormat       "%h %l %u %t \"%r\" %&gt;s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat       "%h %l %u %t \"%r\" %&gt;s %b" common
LogFormat       "%{Referer}i -&gt; %U" referer
LogFormat       "%{User-agent}i" agent
<strong>       CustomLog       /var/logs/115_log common</strong>
<strong>       ErrorLog        /var/logs/115_error_</strong><wbr><strong>log</strong>

# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn

CustomLog /var/logs/115_access.<wbr>log combined
ServerSignature On

&lt;/VirtualHost&gt;
</wbr></wbr></wbr>

This is a pretty minimal configuration — but it’s the boilerplate template I use for all new domains and it works.  The lines that in boldface are the lines you should change to match your environment.  Note that you can pretty much put files, such as the log files, where ever you wish.  I changed the names from my normal location but, as a rule, I maintain the entire environment outside of the root filesystem.

Once you’ve made your changes and saved the file, you’ll need to restart Apache2 so that it will read the new configuration.  If there are errors in your configuration file, Apache2 will let you know and will refuse to start.  Make sure you’ve corrected all errors and, once the server successfully restarts, you should be able to type: http://newW115 into your browser URL bar and have that domain resolve locally, and serve files from the directory you specified in the httpd.conf file.

Over time, as you add additional projects and create new code-domains, you can simply add the new <VirtualHost> directives, appending them, to the httpd.conf file as needed.  When you expire and remove hosts and files, don’t forget to remove them from the Apache configuration as well.

And that’s pretty much it.  This is a simple thing to set-up as we didn’t delve into anything that wasn’t plain-vanilla.  For example: SSL configurations, .htacces, or the re-write engine.  That’s for another day, another article.

Hope this helps…

Part 4: Installing Apache Thrift: Linux Development Environment

Category : Technical
No Gravatar

Previously, we dealt with getting a working LAMP development environment up and running on a fresh CentOS 6 install.  We next dealt with the installation of PHPStorm and our JDK issues.

In this, and the next issue, I’m going to talk about the Thrift framework and getting it installed and running.

Thrift was originally developed by Facebook, was entered into open source in 2007, and became part of the Apache incubator the next year.

Thrift, according to Apache, is “a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.”

What it is in plainspeak is an API framework for your LAMP application.

Why I want it:  I want to use Thrift for our project because of the nature of the project.  (A social-networking concept.)  Because the application will rely heavily on data-storage calls, I’ve decided to implement the data access layer as an API instead of a more-traditional OOP model.  Thrift, as the API framework, allows me complete freedom on the back-end of the API.  I can implement the API in a variety of languages, although I’ll probably use PHP.

Thrift also provides me with a strongly-typed interface to the API.  Like XML-RPC, calls to the API are well-defined beforehand and must comply with the typed definition of both the methods used, and the data exchanged to/from said methods.

My personal experience with Thrift is limited — I used it as an API for a product concept at a former employer.  The calling application would invoke the API and make requests to the API which, in turn, would do a “bunch of stuff” and return a well-defined “data ball” (a json object) back to the calling stub for processing and display.

The other concept that makes me embrace Thrift as the controller for my LAMP application is that I can completely encapsulate the data layer from the front-end developers.  They do not need to know if the data is stored within mongodb, mysql, or a flat file.  All they need is the data.  The query language is hidden; front-end developers should not need to write data-access code.

I’ll talk more about the glories of Thrift later.  For now, let’s just get it installed and running…

On our Linux system, we have to do some preliminary installation of packages first.  Luckily, if you hit the Thrift Wiki, you’ll find pretty much everything you need to do a successful install.  Be warned, however.  Sparseness of documentation could easily be one of the hallmarks of Thrift.  Read carefully, and then read again before punching the enter key on your keyboard.  Make sure you understand what it is you’re about to do.

Ok.  Let’s get some non-LAMP development tools installed.  Our first command will be to install most of the pre-requisite packages needed by Thrift:

#  <span class="Apple-style-span" style="font-family: Consolas, Monaco, monospace; font-size: 12px; line-height: 18px; white-space: pre;">sudo yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel</span>

This  will install the base development packages you’re going to need.  Once this has completed, you should also install the open-SSL development libs as the build will fail without it.  (At least, if failed on my install.)

#  sudo yum install openssl-devel.x86_64

Installing this package will also pick-up all the dependent packages you’ll need to complete the install.

Next, download the Thrift tarball from the site and move the package somewhere within what will become your DocumentRoot path for Apache2.

#  tar xvzf thrift-0.7.0.tar.gz

Once you’ve expanded the tarball, cd into the thrift directory and follow the instructions to make the Thrift packages and libraries.  I did this pretty much exactly as told and my installation went without a problem.

At this point, we’ve only built and installed the Thrift libraries (installed in /usr/lib, I believe…).  In the next installation, we’re going to install the PHP src directory and make it visible to our application’s docRoot.

Part 3: Creating Linux Development Environment (PHPStorm and the JDK)

Category : Technical
No Gravatar

I stopped working yesterday on the installation because I hit a pothole installing PHPStorm by JetBrains.

As I mentioned in the previous article, and in case you’re just tuning in, I am first working towards a LAMP development environment on an older PC running 64-bit Linux.  We’ve decided on CentOS 6 as the base distribution and installed the LAMP stack yesterday.  I installed the PHPStorm package but hit a snag when I received an error message telling me that it required the JDK runtime … thingys.  (Whatever – I assiduously avoid Java.)

I installed the openjdk packages with yum and got PHPStorm to start-up, albeit with many dire warnings and threats to the graphics system.  Apparently PHPStorm is comfortable running only with the jdk from SUN/Oracle.

I then downloaded and RPMd the SUN/Oracle version of the jdk and restarted.  What happened next were error messages telling me that I need to set-up the java (dk) environment correctly as, now, the two were conflicting with each other.

ERROR: Cannot start WebIde.
No JDK found to run WebIde.  Please validate either WEBIDE_JDK, JDK_HOME or JAVA_HOME environment variable points to valid JDK installation.

See, in the linux world, the PHPStorm is launched from a shell script.  It checks your environment for the JDK through these variables and, if correctly defined, launches the IDE.

There’s a java-sdk configuration file located as /etc/java/java.conf – don’t make the same mistake I made and edit this file to re-direct/create environment  variables so they point to the SUN/Oracle version of the JDK.

The SUN/Oracle version of the Java SDK installed in: /usr/java/jdk1.7.0/ which will change for your system depending, I’d assume, on your distribution and version of the SDK.

To reconcile the conflicts, I used the yum installer to remove any traces of the openjdk — all packages were removed and then I did a yum clean all to reset the environment.

Since I’m the only user on this system, I next cd’d into my home directory and pulled up the .bashrc file – this will modify the bash shell environment for every terminal session I start.  I added the following two lines to the .bashrc:

1
2
JDK_HOME=/usr/java/jdk1.7.0
export JDK_HOME

I exited the editor and reloaded my bash environment:

# . ./.bashrc

From there, all I need to do is start the PHPStorm shell script which launches the application and I’m good to go!

You can install the PHPStorm folder anywhere.  Using your bashrc file, you can make an alias to the start-up shell script so that you can launch the IDE anywhere from the CLI environment.

alias phpstorm='nohup /home/user/folder/PhpStorm/bin/PhpStorm.sh &amp;'

The nohup allows the program to ignore SIGHUP — in other words, if you close the terminal from where you launched PHPStorm, you will not close PHPStorm as well.  The ampersand (&) at the end of the command tells the shell interpreter to launch the application as a “background” task which frees up your terminal session so that you can continue to use the shell while PHPStorm is running.

At this stage, I’m pretty much good to go for basic LAMP development.  I’ve got a running mySQL server, Apache2 is good to go, and PHP5 is installed.  I will enhance my environment by adding a few packages such as:

I also want to give some thought to virtual hosts — I’ll cover this topic in a future post — within my local Apache2 environment, I’m going to want to establish several different virtual host environments, each of which point to a different documentRoot location (or code repository) depending on which application/environment I’m currently working on.
I’ll also have to plan my filesystem repositories carefully — for the most part, I’ll be working as a subversion server for home-projects, while also working as a subversion client for work projects.
Which reminds me – in the first article in this series, I reported on the filesystem utilization following a clean Fedora 15 install.  Here’s the state of the current filesystem (CentOS 6) following the LAMP stack install, and the install of the PHPStorm IDE and the SUN/Oracle JDK:
/ (root):   50gb, used 6%, 47gb available
/boot:      485mb, used 11%, 409mb available
/home:    864gb, used 1%, 820gb available
I’m looking pretty good for user filesystems but I’ll want to check my mySQL configuration and ensure that databases are being created in the /home filesystem and not in the /root filesystem also.
Ok – done for this weekend and off to play some Rift!  Hope this helps someone!
PS: If you want some detailed tutorials on installing any of the supplemental packages I listed at the end of this article, please leave a comment!

Setting-up a Linux Development Client – Part 2 – CentOS 6 Install

Category : Uncategorized
No Gravatar

In the last install, I wrote about how I decided to try Fedora Linux after a nearly 10-year hiatus from the product.  Unfortunately, as it turned out, my fears were not groundless and I am going to scrap the install in favor of CentOS before I get in so deep that making the switch-out becomes prohibitive.

I am going to continue to try Gnome as my desktop, however, as I did like what I saw for Gnome under Fedora.  While I have always used KDE in the past, it was always accompanied by a wistful bit of: “I’ll bet the grass is greener over there…” kind of thinking.  Anyone that’s ever spent anytime looking over the Gnome application offerings vs. the KDE application offerings will agree.

Time to stop wondering and start trying.  I’ve downloaded the CentOS 6 x86_64 CD ISO and have booted into the desktop.  It’s not nearly as polished, pretty, as the Fedora desktop — it looks more like a traditional windows set-up with the desktop icons falling down the left-side of the screen and the top-bottom menu bars.  While simpler in appearance, it’s also intuitive and easier to use.  Less eye-candy also means less CPU/GPU crunching resulting in improved responsiveness.  (Dragging a window around in the Fedora desktop on my hardware platform was like a bit like being on a strong hallucinogenic.  Or so I’ve been told.)

Anyway, I locate the “Install to Hard Drive” icon and click it…

The CentOS 6 installer opens a window in the middle of desktop (as opposed to Fedora taking over the entire desktop) and presents you with the same two start-up options: installation language and installation destination.  (As I mentioned in the previous article, CentOS is a child of Fedora.  I expect things to be similar.  Stuff working is one such expectation.)

CentOS gives me the same options as the Fedora installer – except with less eye-candy.  For example: when asking to input the root password, I’m not shown a bar indicating password strength.  I just type in my password and that’s pretty much it.  Also, like the previous install, I’m not going to choose the encrypted filesystem, and I’m going to go with the defaults for filesystem partitioning.

While this is installing, I’ll yak about why I’ve chosen these two distributions as my first-two choices.  Ubuntu offers a great installation and configuration experience.  However, after messing around with Linux distributions for 30 years, I can’t quite shake the feeling that Ubuntu in the Granimal of linux installs.

Don’t get me wrong – it’s a great install in that everything works, is highly automated, and requires little, if any, user intervention from the machine’s administrator.  And that’s probably what bugs me the most about Ubuntu.  As a Linux guy, I want (need) more interaction with my OS.  If I was content to let me OS run off and make all the most-important decisions without asking me, I’d use Windows.  Ubuntu fulfills a great niche – it introduces Windows users to Linux.  I’d install Ubuntu on my Dad’s PC.

I’ve also bypassed SuSE Linux — which is surprising considering that, for nearly a decade-and-a-half, all I would consider running and installing was SuSE.  This flavor of Linux, like most things German, is precise, exacting and mechanically sound.  Correct, even.  It’s also overbearing, heavy-handed and leaves deep footprints.  The other problems that I have with SuSE is that it can be difficult to find packages tailored for it’s installation base.  While SuSE enjoys a wide-variety of software, there always seems to be those few-dozen packages you want to install but can’t locate the ports to the SuSE distribution.  In that, it’s like the Dewey (Malcolm in the Middle reference) of Linux installs: unprepossessing and brilliant but relatively scarce when it comes to applicable resourcing.

I’ve never been a big fan of Debian simply because they move in geological-timeframes when it comes to engineering releases.  Oh, look, kernel 2.26.9999 is out!  (Debian: happy with 2.123, thank you.)  Geh.  What it lacks in contemporary packaging, it more than makes up with in stability.  I, on the other hand, tend to blow through distributions like the end is near so Debian isn’t really for me.

I tried Mandriva once and, as a result, got sucked into this weird mail hell back when I was running my own DNS and MX servers.  I really tried to make it work but it just got too … weird for me.  It may have improved in recent years but I’ve never had enough of it catch my eye to really care enough to revisit it.

Rebooting the CentOS 6 Live CD was better than the Fedora Live CD as CentOS actually gave me a ‘reboot’ option whereas Fedora would only let me ‘suspend’…whatever that means…

I configured the user and the network time and then was presented with an alert: “Insufficient Memory to Start kdump” … which made me think I had crashed the install…turns out, it was just telling me I couldn’t start the monitor itself.

On to the login…

Well, CentOS 6 is definitely a derivative of Fedora 15.  Although the desktop is radically different, the first thing I try is FireFox — and am immediately told that I can’t access any off-site web page.  Although I can ping and resolve hosts from terminal, FireFox cannot do so from the browser.  So the same crappy DNS issue which plagues Fedora was inherited by CentOS.  Great.  Starting to get an idea of where all this is eventually going to end up…

The network configuration applet in CentOS allows me to edit and add google’s nameserver and things start to work in the browser immediately thereafter. For some reason, I wasn’t able to get this to work in Fedora so, bonus.  Also, my screen resolution is at the highest at 1280 x 1024 and that gives me a happy, too.

I start the software update and am informed that all my software is currently up-to-date and I do not need to additional software.  That strikes me more as a software fail…so I run yum update from the command line as root (side note: either I didn’t see the option to create my new user as an admin, or it didn’t exist, but regardless, I can’t sudo…) and I’m suddenly off-and-installing 237 total packages… so, clearly something in the GUI version of the software update failed and now I’m thinking that, because I didn’t have sudo privileges, it was my account exec’ing the command.

CentOS 6 will allow you to login graphically as root.  And thereafter puts so many scare-ware pops on the screen that you eventually, submissively, quietly and quickly edit the sudousers file and logout.  Now that my main account has sudo access, I never need to hit root again.

Quick download and now Chrome is my default browser…time to try to install some development tools…

The first package I’m going to install, from the Add/Remove Software package manager, is the MySQL server and related files package which is an 8.1mb download…I have to also install dependent packages for perl support and client programs and shared libs, which is ok…PHP 5.3.2 is the next item to be installed and I install all packages except for postgres.

At this point, I have a LAMP stack installed, but it’s not running…  starting off with mysql:

# sudo chkconfig --level 2345 mysqld on

# sudo /etc/init.d/mysqld start

# mysql -uroot

mysql&gt; use mysql;

mysql&gt; update mysql set password=password('yourPasswordHere') where user='root';

mysql&gt; exit;

This set of commands sets-up mysql to run at start time (run levels 2, 3, 4, and 5) and then starts the mysql server.  Next, you invoke mysql as root and reset the root password to something other than the default, which is nothing.

–> mySQL is now running.

For Apache, we’re going to leave virtual hosts alone for a future article, and just make sure that the webserver will execute at boot, and that we can serve system information…

# sudo chkconfig --level 2345 httpd on

# sudo /usr/sbin/apachectl start

If you ps -ef | grep httpd you’ll see a list of the running apache servers…you can also open up http://localhost in a browser window and you should see the CentOS Apache 2 Test Page.  Now we have to confirm that we have PHP installed and running, along with a few other modules.  By default, your web server DocumentRoot is in /var/www/html.  Using the terminal, cd into this directory and type the following:

# sudo vi snitch.php

i

&lt;?php

phpinfo();

&lt;esc&gt;:wq

This creates a little snitch file in your DocumentRoot which you can load in a browser — it then dumps your LAMP configuration to your browser window.  At the very top of the display, it should tell you what version of PHP you’re running.  (Mine reports version 5.3.2.)  Important to me, at this stage, is that I have memcache, soap, mysql, and ODBC drivers installed.

The last stage for me is to install my IDE.  I own a license for JetBrains PHPStorm which I personally prefer.  It’s not freeware but if you can afford the license costs, it’s probably the best IDE you can get for the price.  I use it on all environments (Mac, Windows and Linux).  I also noticed that you can install the Eclipse IDE using the software installer — this is very similar to PHPStorm.

To get PHPStorm up and running, I need the SUN/Oracle version of the JDK — not the openJDK.  I did get it running, but not without DIRE and URGENT messages prophesying  the END OF THE WORLD, or at least my video display, should I continue.  Point is, I did get it installed, configured, licensed.  Then I de-installed the openJDK and went hunting for the SUN/Oracle JDK.

Which will be covered in the next installment…

Please Stop: Securing User Accounts with mySQL’s Password() Function…

Category : Technical
No Gravatar

I tweeted a week or so ago:  Please stop using mySQL’s password() function.  I had good reason for doing so – the legacy software I’ve been assigned to maintain until I can help with the new rev has a password schema that is based entirely on mySQL’s password function as the hash strategy for storing user-account passwords in the database.

As developers, we should be getting that uncomfortable, squishy, feeling whenever we read about another corporate hack event, one that exposed gazillions of user accounts to the ether, and zomg, we even got their passwords!!!

Page optimized by WP Minify WordPress Plugin

The forecast for 92143 by Wordpress Weather