Wednesday, December 9, 2009

The Benefits of Unstructured Data

In the book The Pragmatic Programmer, an entire section is dedicated to "The Power of Plain Text".  The section lists integration (the UNIX tool chain being the canonical example), insurance against obsolescence, and easier testing as motivations for using plain text.  There is another significant advantage to plain text that is often overlooked: it lacks rigorous structure.

MongoDB

There has been a trend recently against relational databases that has been coined NoSQL.  Not to say that relational databases are always a bad idea, just that they are not the best tool for every job.  Personally, I have all too often encountered the "I need to persist data so I must use a relational database" mentality.  Avoiding RDBMS allows for avoiding schemas.  Database schemas are a pain because they have to stay synchronized with all of the applications that access it.  Typically this also involves going through a DBA group, which adds a hop that impedes rapid and frequent deployment.  Ruby on Rails has a facility called migrations to help deal with such pain points, but avoiding schemas removes this problem altogether.  Furthermore Rails maintains referential integrity at the application layer, making an RDBMS largely overkill.  Although my experience with non-relational data stores is far from comprehensive, my favorite so far is MongoDB.

The first thing to notice about MongoDB is the API.  The API is very simple and has little if any accidental complexity.  There are named collections and you can store dictionaries (or nested dictionaries) in those collections.  Each dictionary within a collection can have a different number and types of key-value pairs, and they can be nested.  Simple.

The benefit of not having schemas is that the application can be upgraded before the data store is ready.  Just deploy the application upgrade into production and start storing the new data.  Separate applications can in most cases be upgraded independently.  If previously stored data needs to be normalized, this can happen separately, before or after the application upgrade occurs.

In addition to avoiding schemas, an unstructured approach yields two other benefits.  Want access to the data independent of MongoDB?  The data is all stored in binary JSON, and therefore is accessible from other languages and applications that have no knowledge of MongoDB.  Another benefit of MongoDB made possible by its non-relational nature is support for auto-sharding and MapReduce.  These features make MongoDB scale significantly better than RDBMS solutions.  MongoDB nicely fills niches on both sides of RDBMS in the size-of-data spectrum.

Wikis

Wikis are an excellent way to share content and communicate.  Wikis are superior to other solutions, such as SharePoint, because they are akin to a version controlled notebook that everyone can view/edit from their browser.  Notebooks impose no structure, nor require a separate context to view or edit (such as Word).  Many wikis have macros for tables and other formatting but these are all optional.  The wiki will not complain if you leave something out.  Wikis can make an excellent replacement for documentation, forums, task lists, release notes, etc.  One of the more recent items I have replaced with a wiki is Mingle, a software project management tool.  Having used several project management tools, I find a wiki to be a better choice due to their lack of structure.  The model imposed by project management software never quite fits or is too heavy handed, forcing me to either make my process match the tool, or spend more time than it is worth to coerce the tool to fit my process.  Lastly, searching is significantly easier with a wiki because the data is all text.  Searching in SharePoint is frustrating in comparison.

The "Power of Plain Text" is understated.  Rigorous structure is not always a good thing when it comes to data, and less stringent solutions should be considered first, as they are often more flexible and require less developer overhead.

1 comments:

  1. You replaced Mingle with wiki?
    Mingle is a wiki...

    ReplyDelete