A recurring theme in enterprise application performance is the natural conflict between achieving fast writes AND fast complex query responses at the same time in SQL-based RMDS databases* .
This write/read conflict has been around since the very first Hello World 2 tier applications back in the ’60s. Recently this self-same subject poked its head above the parapet and started pulling faces at a client or ours.
What surprises me is that this is not a hugely difficult problem to solve – once you know how . Equally surprising is how divided the development community is on this topic.
When asked about this particular issue, most app developers reach for some sort of in-memory caching system with all of the cache-miss and eviction policy management nonsense that these things entail. Conversely DB developers swear by a complicated mechanism of stored procedures putting data into staging tables and the shuffling of data to a read table for querying. Similar issues to caching appear here along with the extra fun of dirty reads and row deadlocks.
So you need miracle akin to passing a camel through the eye of a needle to meet this simple requirement, right?
Er, no.
As with most things in life, the answer is elegantly simple. In fact our DB hackers are on the right track. THE proven solution is to have TWO databases in the same engine with identical table structures and data. Lets call the 1st DB the Online Transaction Processing (OlTaP) DB and the second DB the (Management Information Services) MIS DB.
The OlTap DB has one purpose in life – to store data as fast as it can. The target here should be 10ms or less round trip between the application and the DB during a write operation. This DB has only stored procedures to store the data – procedures which have been painstakingly optimised for writing (and that includes updates and deletes – both of which a generally considered to be write operations). But, and here is the secret, this DB has NO INDEXES other than primary keys (which one could argue are in fact constraints). Fast writes are guaranteed no matter what the table size is, because there is no index to care about such things.
Flipping the coin over, the MIS DB has lots of indexes, tuned to serve up query results, like, NOW. These are called through stored procedures to allow the DB engine’s execution query optimiser/query planner to properly prepare for volatile materialised views. And here’s the thing about the MIS DB – there are no public stored procedures to allow writing to any of its tables. Not one.
OK, you say, so how does the data get from the OlTaP Db to the MIS DB, huh? By definition they have been designed to be isolated from one another!
Yes, kimosabe, that is the public appearance of these two DBs. However there is a little known trick which in the Oracle world is known as transaction log triggered materialised views. All other enterprise DBs have similar features.
RMDS databases all have a thing called the transaction log. Useful if you want to rollback a transaction on error, useful for logging row deletions in a table and generally handy for all DML operations. You may also have got to know these little beasties when they blow up due to lack of space when you run a massive deletion. Yes, I am referring to those puppies.
Another baked-in feature of RMDS systems is their ability to handle materialised views or views which cache data from a previous call, ready to serve the same result on second or later call. You have probably not realised that all DB calls through a client such as TOAD or SQL Server Management Studio or Embarcadero’s Studio Enterprise all cache query results in a volatile materialised view formed in the background.
Bolting the transaction log to preconstructed non-volatile materialised views from tables in the OlTap DB and writing the result set to the same tables in the MIS DB on OlTaP commit allows the two DBs to remain in sync. Permanently. Oh, and this is how DBs replicate in a cluster.
Also it helps if the app only stores the data it needs to store. For example, a lot of systems generate and consume heavy XML messages by way of conforming to a recognised data interface specification. Better not to store all that useless XML element structure, rather extract the values you need and persist those instead.
Neat, eh?
*yes, there are other NoSQL and BigData solutions which leverage distributed computing paradigms, but assume that your client doesn’t have the money, time or will to bring in a brand new technology stack just to solve a relatively simple performance problem.