Category Archives: Performance

Performance related know-how

Miracles we can do immediately, the Impossible may take a bit longer

A recurring theme in enterprise application performance is the natural conflict between achieving fast writes AND fast complex query responses at the same time in SQL-based RMDS databases* .

This write/read conflict has been around since the very first Hello World 2 tier applications back in the ’60s.  Recently this self-same subject poked its head above the parapet and started pulling faces at a client or ours.

What surprises me is that this is not a hugely difficult problem to solve – once you know how . Equally surprising is how divided the development community is on this topic.

When asked about this particular issue, most app developers reach for some sort of in-memory caching system with all of the cache-miss and eviction policy management nonsense that these things entail. Conversely DB developers swear by a complicated mechanism of stored procedures putting data into staging tables and the shuffling of data to a read table for querying.  Similar issues to caching appear here along with the extra fun of dirty reads and row deadlocks.

So you need miracle akin to passing a camel through the eye of a needle to meet this simple requirement, right?

Er, no.

As with most things in life, the answer is elegantly simple. In fact our DB hackers are on the right track.  THE proven solution is to have TWO databases in the same engine with identical table structures and data. Lets call the 1st DB the Online Transaction Processing (OlTaP) DB and the second DB the (Management Information Services) MIS DB.

The OlTap DB has one purpose in life – to store data as fast as it can.  The target here should be 10ms or less round trip between the application and the DB during a write operation. This DB has only stored procedures to store the data – procedures which have been painstakingly optimised for writing (and that includes updates and deletes – both of which a generally considered to be write operations).  But, and here is the secret, this DB has NO INDEXES other than primary keys (which one could argue are in fact constraints). Fast writes are guaranteed no matter what the table size is, because there is no index to care about such things.

Flipping the coin over, the MIS DB has lots of indexes, tuned to serve up query results, like, NOW.  These are called through stored procedures to allow the DB engine’s execution query optimiser/query planner to properly prepare for volatile materialised views. And here’s the thing about the MIS DB – there are no public stored procedures to allow writing to any of its tables. Not one.

OK, you say, so how does the data get from the OlTaP Db to the MIS DB, huh? By definition they have been designed to be isolated from one another!

Yes, kimosabe, that is the public appearance of these two DBs. However there is a little known trick which in the Oracle world is known as transaction log triggered materialised views.  All other enterprise DBs have similar features.

RMDS databases all have a thing called the transaction log.  Useful if you want to rollback a transaction on error, useful for logging row deletions in a table and generally handy for all DML operations.  You may also have got to know these little beasties when they blow up due to lack of space when you run a massive deletion.  Yes, I am referring to those puppies.

Another baked-in feature of RMDS systems is their ability to handle materialised views or views which cache data from a previous call, ready to serve the same result on second or later call.  You have probably not realised that all DB calls through a client such as TOAD or SQL Server Management Studio or Embarcadero’s Studio Enterprise all cache query results in a volatile materialised view formed in the background.

Bolting the transaction log to preconstructed non-volatile materialised views from tables in the OlTap DB and writing the result set to the same tables in the MIS DB on OlTaP commit allows the two DBs to remain in sync. Permanently. Oh, and this is how DBs replicate in a cluster.

Also it helps if the app only stores the data it needs to store. For example, a lot of systems generate and consume heavy XML messages by way of conforming to a recognised data interface specification.  Better not to store all that useless XML element structure, rather extract the values you need and persist those instead.

Neat, eh?

*yes, there are other NoSQL and BigData solutions which leverage distributed computing paradigms, but assume that your client doesn’t have the money, time or will to bring in a brand new technology stack just to solve a relatively simple performance problem.

Overtaken by Events

Many years have passed since the dawn of modern computing broke … and with it many batch processes have been spawned and continue to survive.

Like a plague of locusts.

It is time for software engineers the world over- I am talking about you over there in the corner pretending not to be noticed with your geek sensibility and painful shyness – yes you, to stand up, take a deep breath and open your eyes to the modern world of event driven architecture or EDA.

As life becomes increasingly recorded by Big Data miners and such like, the sheer mass of data being collected has gone from a light breeze of disparate facts to a veritable  snow storm of related information.

Classic data handling eco-systems accumulate data at one point in the application constellation and greedily hog that data from downstream systems, until the clock strikes the magic hour from Whence The Batch Load Begins.

Given that batch loading – moving a large chunk of data, usually relating to a period/epoch whatever – was invented in the 1970s, it is no surprise that modern information demands are straining this paradigm beyond its limits.

Read as this:  some clients experience batch loading scenarios where their batch load for the previous day takes 25 hours to process.

Hmm. not sustainable. Some would say “Suboptimal.”

But make the batch run faster, I hear you cry.  I can assure you that many have tried and partially succeeded in doing so , but really have only kicked the can of catastrophe down the road 6 months or so.  This remediation is akin to pushing runny stuff up a hill with a garden rake. Eventually you are back to where you started, only sweatier.

The solution is before us and some would be surprised to learn, always has been. Simply take a look at Nature. All bio-systems appear to exhibit near real-time interactions.  The dog sees the cat, the cat the dog.  The cat runs. The dog runs and, well, Nature takes its course.

Boffins would call this “cause and effect”.  I call this event based behaviour.  A Thing happens which triggers another Thing to happen and so on. Much like what our application sub systems need to do with one another.

Our humble application constellation needs sub system A to tell sub system B (eventually) about that Thing That Happened.  Why not pass on the Thing straight away? Why wait, holding all the cards to your chest, sub system A?

There is no logical reason.  However, there are plenty of other reasons, such as legacy methods, sunken investment, complexity of the batch process and the like.  But under close scrutiny these arguments don’t hold water and are actually just excuses born from Fear and Loathing.

Migrating from batch to an EDA has a number of advantages. There are more, but for the sake of brevity I have listed the main advantages here:

  •     Liveliness appears within the application constellation for free.  This is an excellent thing to market to your information partners as you can provide  real-time services – reports become heatmaps; changes to the nature of the data can be detected as they happen, great for limit, fraud, regulatory or breach detection.
  •     Gone is the need for heavy, complicated data-loaders,  staging ares and massive files which corrupt easily.  Data comes over the wire in a nice steady stream of bite-sized (sic!) chunks which flow river-like downstream to all relevant sub systems.
  •     The architecture is simpler to manage: you don’t suffer an angry business partner when your EOD batch fails 9 hours into the process and you have to take the whole day to recover.  If there is an outage systems only need to recover from the point in time when the outage occurred, giving the opportunity to fall back on manual systems while recovery is underway.  The missing dataset is by definition smaller.
  •     File corruption becomes a thing of the past.
  •     All sub systems involved in the event flow can be made aware of each other’s health through judicious use of heart-beating or request-response paradigms.
  •     Monitoring UI’s can be put in place to show what is happening where and when. This also serves as an alerting platform.
  •     Optionally a smart routing system can be deployed to filter, wire-tap or simply forward messages to whoever is interested.

It is remarkably simple to establish a common interface language between systems.  There are many existing standards already in place that make just this so simple. There are many messaging infrastructure solutions available  – XML e.g. FPML, Fixed Length e.g. SWIFT/BACS, Protocols such as FIX.  Doubtless one of these will suit your budget and complexity.

The main  questions that remain then are:

  1. do we need durable messaging?  If a subsystem goes offline should it be able to publish/consume waiting messages? If so, use a Point to Point messaging technology. If not, then use the broadcast radio-style Publish/Subscribe methodology where another event just like the last one will be coming along soon enough, so don’t worry.
  2.  how we fast do we need to push events about the place?  Most systems accept latencies of around 1 second.  Do bear in mind that when you start asking for more speed, the cost of messaging goes up dramatically
  3.  should we be able to handle receiving duplicated messages?  Generally the answer is a resounding Yes!
  4.  should we be able to get source subsystems to resend the same message one demand? This relates to point 3 above and is also usually Yes! This feature is invaluable should the unfortunate happen, you loose connectivity and need to recover a missing message or two.
  5.  how much will this cost?  My experience is that the maintenance costs of batch loaders are similar to the development costs of an EDA, but once running the EDA is so much more flexible and has a much lower cost of ownership
  6.   would we benefit from a simple EDA routing engine where I can pass the same message to many places at once? Or pass on only certain messages to certain subsystems.  Most cases again the answer is Yes!  Doing so brings in an intelligence into your system never before seen.

The trick is to start small, pick on a key data flow and migrate that first, preferably with the inclusion of a routing broker.  Once you have achieved the first EDA flow, you will doubtless be eager for more.

Lastly a word on brokers.  I have used Complex Event Processors very successfully as routing brokers to support an EDA in small and large Enterprise contexts.  They are invaluable. Based on your budget I would recommend that you look at Spring Integration, Esper and Oracle Event Processing.  Other solutions are available 😉

Happy eventing!

Benching the Mark

X marks the spot. In the case of IT this loosely translates into the moment when an exclamation of “WTF!?!” is emitted by a seasoned developer.  Its that same moment when they stumble across the one problem that is robbing your system blind of decent performance.

Once the blueness has cleared from the air, its down to business to resolve the problem at hand.  Which IS a lot simpler than it sounds.

Ah, grasshopper, how did we get to this Eureka moment?  Sheer graft I am afraid.  There is no one silver bullet that you can fire into a system that will reveal the ugly truth.  It takes patience.  It takes planning. And it takes a long long time … no wait!  It doesn’t need to.

So the more cynical of you out there in internetland will be scoffing at this point.  Pah, you can spend YEARS and never find the real cause.  Better to throw the whole thing away and start again, I hear you say.

This view does have merit. Sometimes. If you have a tolerant and rich sponsor who is willing to wait a year or so for you to come up with exactly the same software thats slightly faster than the current.  IMHO this choice should only be seriously considered when the rewritten application is going to add appreciable functionality to boot. Otherwise the cost benefit ratio doesn’t look too good.

Our experience has been that to achieve that Moment of Clarity (© uptonconsulting 2010, heh), you need to benchmark the entire system.

What?

Benchmark.  But not just in terms of one operational parameter, such as response time, but for ALL aspects of the application.  Throughput, response time, latency, volume, paralellism and so on.  There are many texts on what to measure, but few suggest how to measure them usefully.

OK, so now what?  Simple.  Don’t think like a geek.  Really.  Don’t get lost in the bits and bytes of individual call stacks. Don’t spend days grovelling around like a mole tracking message flows between log files. Take a  systemic view of the application from a business perspective and measure that scientifically.

  • keep the system in the same configuration as Production
  • ensure that the test environment is close to production as possible.
  • have realistic input load injectors (see below)
  • make sure that you can measure the things you want to measure, easily.  Get the right tools. A profiler is a very handy thing to have in the toolbox.
  • keep the benchmarking exercise as simple as possible.  Resist changing dataloads on the fly.
  • make sure you can reproduce your measurements on demand.  No reproducibility == No benchmark

If you cannot set up a consistent and predictable load on the app on demand you will be wasting your time.  Have a good look at your load test harness.  Get the load right and reproducible, all other factors will tend to fall in line.

What do we measure? Load up the system with the same stress as is seen in production.  Better yet, replay prod data through your prod-like test environment.  This means that you have captured your inputs in such a way that you can replay them.

As an aside, most systems experience three operational states:

  1. idle – the world is asleep and therefore so is the app
  2. steady state – the application is ticking over with a fairly predictable input load
  3. high load – the application is being stressed by an external event that causes a flood of input.  Typically market opening times, approx 5-10% of the apps up time.

Make sure that you can provoke your system into each one of these three states on demand. Reproducibly.

One thing here.  If you have a system that spends more than 5-10% of its time under stress, ask yourself: What has changed?  New inputs?  Is someone doing a DoS on you upstream? Has someone in your team mucked up the inter component message flow causing a message storm?  The world may have grown faster than your original capacity planned. You might be heading back to the drawing board.

Once you can load up as you like, my preference is to look at the blindingly obvious.

  • CPU thrashing on all machines acting as hosts – that includes the DB
  • Memory footprints – again, everywhere

CPU thrashing indicates poor algorithmic execution i.e. Muppet programming.  An execution pathway is doing something far more often than it should.  Deep memory footprints point to bad memory management such as crap caching, poor object destruction, badly managed collections.  This can manifest in many ways.  More on that later.

Its not so much the fact that you witness these observations, but rather WHERE you see them.  If you think about it, DB operations are THE MOST EXPENSIVE operations an app can make. They involve IO, sockets, net, more sockets and yet more IO.  So make sure you have a good hard look at what your DB is doing.  If your app is making more calls to the DB than it should, as sure as a Cox’s Orange is an apple, you are burning loads of time for no reason.  Too many DB calls show up as high CPU on the DB and can also contribute to bulging memory allocations to the app.  Notice I didn’t say tune the DB calls? Only do that when you know you are making the RIGHT ones every time.

If you have just CPU problems in the app, profile the execution pathways.  Someone will have put some dumb recursion call in somewhere that is having  a hard time breaking out. Or you could be making too many DB calls…

If you have just memory problems in the app, hunt down the cause using a profiler.  If you have them on the DB host, grab a DBA by the throat and don’t let go until has given you stats on the calls creating the biggest dent.  Likewise if you see memory issues in a 3rd party resource you are reliant upon – queue managers, application servers – get evil on the vendor’s after sales support until you see what is causing the problem.

If you get to this point and you haven’t seen CPU twanging or Memory bashing then its fairly obvious what to do next.  Check your code for deliberate waits.  Profile for deadlocks, long waits and slow IO. Something is gunging up the system and it most likely YOU.  Lastly check if your environment sags under load.  Net IO is a favourite. MOM resources can also get fragile if they are abused by a high volume of heavy messages.

Other things look at but to avoid getting hung up over:

  1. execution pathways.  You can only look at one of these statically when you read/debug code.  Let the profiler take the strain and work out where the CPU/memory is/isnt going
  2. MOM queue depths.  They signal a problem somewhere else in your system.  They are not the cause.
  3. MOM queue residency times.  This is symptomatic of a sluggish message consumer, they are not the cause of a slow system
  4. service times in each execution pathway.  A problematic pathway will show up in the Profiler. Don’t waste your time digging timestamps out of logs.  All this will do is a) confirm the profiler’s view on the app and b) waste loads of time.  The business view has a perception of “fast” or “slow”.  Your system is “slow” otherwise you wouldn’t be doing this work, would you?  Don’t make a fool of yourself by confirming what is already noted by your sponsors.

Guaranteed you will find something by the time you work you way to here.  Remember, think about what you are seeing.  Resist the urge to tweak something if you are not certain beyond reasonable doubt that the thing to be tweaked is in fact the cause. Lots of systems get buggered exactly in this way.

Dolly the clone had it right. Replicants Rule!

Dolly was a Sheep in the Scottish highlands minding her own business, wondering each day what that glowing yellow warm thing in the sky was and marvelling at how good that green stuff on the ground tasted.  The fact that she was cloned rather than created was neither here nor there for her nor her sister clones. They ate, grew, and well you know the rest.

The same is true in IT, particularly with DBs in high performance systems.  Not the marvelling-at-the-sun or the chewing-green-stuff, but the concept of cloning for performance or to put it in database terms, DB Replication.

Some DBA pundits would have you believe that its a non-starter, this replication nonsense.  The School of Hard Knocks has shown us otherwise.  It boils down to the fact that a Relational Database Engine is a very poweful thing in the hands of the skilled because it can be tuned to one of two business operational modes. Namely Online Transaction Processing (OLTP) or Management Information System (MIS) modes.  One DB instance cannot be tuned to do both of these things brilliantly.  Passably maybe, but brilliantly? No, it simply can’t.

So what to so?  Replicate, people, replicate.  The key here is to have 2 instances of the same DB schema running at the same time.  Instance 1 handles OLTP operations from apps that need high throughput which are mainly insert and update operations, so this instance has naff-all indices.  Instance 2 handles MIS requests from spreadsheets and nosy-parkers who want to cut, slice, dice and otherwise extract data via select statements – historical data usually.

How does this work? Instance 2 follows instance 1’s transaction log, replicating all CRUD operations in its own DB.  Its reliable, elegant and its FAST.  The key lies in copying over the delta between the two DB’s transaction logs, so keeping in sync is a cinch.

Oh, and you can load up instance 2  – the MIS instance, remember – with as many indexes as you like.  Just as long as you remember to update your table and index stats daily (see my earlier blog on this), you will have a system that absolutely flies.  Instance 2 will usually keep its indices up to snuff when a transaction log change is detected.  Do check with your DBA/DB vendor.

One client was able to achieve in excess of 20 full object graph inserts into the underlying relational DB tables per second moving to this model. In Dev. Up hill. With a headwind.

You know it make sense so Just Do It (thanks, Nike)

Ensuring consistent DB Select response times

Now we all know that having a DB with a large number of records (over 1 million rows should do it) can beome sluggish over time if the content of the key tables changes often, especially with insertions.  This is despite your best efforts to tune the SQL you have carefully crafted using the Explain Plan/Execution Plan tools, eliminating unnecessary table scans.

Well children, you are missing a simple trick – regularly recalculating stats.  This forces the DB to re-estimate the size of the tables and associated indices.  Without going into the nitty gritty of why, the basic fact is that any modern relational database engine relies on an optimiser (via the execution plan) to work out how best to grab your data.  It relies heavily in fact on the DB table stats and index stats to achieve this.

By recalculating stats on a frequent basis – most enterprise systems do it overnight – the DB response times remain consistent for the same queries irrespective of the data load.

A note to the observant: you will see the DB performance roll off a lot more rapidly the more indexes you add.  This  appears as voodoo to people who spent a large amount of time tuning queries and indexes with smaller data loads, when they suddenly find that their query response times suddenly fall off a cliff after a few 1000 more rows are added to the DB.

The less often you do this necessary bit of housework, the slower the recalculation process becomes.  Remember our old friend the Law of Diminishing Returns.

So look up in your DBA handbook how to housekeep your key table statistics. Regularly.