Saturday, April 13, 2013

Time Sensitive Priority Escalation Algorithm (Heuristic)


Imagine a scenario where records maintained in a database need to be transitioned from one state to another i.e. the next higher state based on how long those records have been in the current state. This is a common scenario in trade settlement, customer service or ticketing systems where the incidents are governed by strict SLAs such that the priority of the case goes on increasing the longer the time it spends on a wait queue lying unattended. 

Supposing you were to write a program where you would fetch these records from the database and then check for each one of those records how much more time is left before it needs to be escalated to the next higher level. What would be the time interval at which you will fetch potentially hundreds or thousands of records from the database and then go on checking for each one of them whether to escalate it to next level or keep it in the same state?  The program will need to go on polling the database at regular intervals. But what should be the interval at which this poll should take place. Based on the number of records in the database this can happen potentially at every second, which would be a huge overhead on the system.  Moreover by its very nature polling is very inefficient. If we were to fix the polling interval at some arbitrary value let’s say once every 15 secs, then we run the risk of delaying the priority escalation by 15 secs and the system developed  on this design cannot be called a real-time system.

Is there a better way around this that can avoid polling? Well of course there is, that’s why this blog.

What I am proposing here is an algorithm that will find out heuristically when the next record in the database is due for a state change. I will explain this with an example. Let’s take a simple business case here, for example a ticketing or incident management system, where once a ticket is raised by an analyst its priority goes on increasing the longer it stays in the queue unattended as below:

Ticket origination time + 8 hours - Low priority (4)
 Ticket origination time + 12 hours - Medium priority (3)
Ticket origination time + 16 hours - High priority (2)
Ticket origination time + 20 hours - Urgent priority (1)

The program implementing this would use the following algorithm.

1. Start
2. From all the ticket records in the database, find out the ticket(s) with the nearest interval of time (‘lt’) from now when the priority escalation / state / level change is due.
3.  Sleep / put the program on hold for ‘lt’ time units.
4.  When the program wakes up after the sleep interval ‘lt’, change the priority of the record(s) for which it is now due.
5. Go to step 2.

Equivalent Java Code looks as below:


while (true){
                List<Ticket> ticketList = ticketDAO.findNextPriorityChangeDue();
                long nextEarliest = ticketList.get( 0 ).getNextPriorityChangeDue();
               
                long now = System.currentTimeMillis();
               
                if( nextEarliest >= now  )                                                                                                                                                
                                escalatePriority( ticketList );
                else
                                try {
                                          Thread.sleep( nextEarliest – now );
                                } catch ( InterruptedException e ) {
                                          escalatePriority( ticketList );
                                }
}

With this algorithm you can see that there is neither a fixed polling interval nor polling.  Whatever polling happens, it happens smartly. There is no need to supply a hard coded or even configurable polling interval value. This is such a simple solution and the program ‘learns’ about its next database trip on its own making it heuristic in nature as the program goes on learning the next database polling interval on its own without any user input.

What this demonstrates is that smart solutions need not be complex and they are often very simple to implement without requiring any advanced solutions such as AI.

Wednesday, January 30, 2013

Enterprise Applications - Architecture Assessment


This is one more type of the projects that I have handled as an Architect. The need for this type of project comes when a client is looking for making a decision on whether to invest more on an existing system and upgrade it to meet additional strategic functional requirements or develop a new application in case the existing system is not fit enough for accommodating the new requirements that could be critical to business function.  Additional investment into a system means not only in the application code but it also means adding more hardware / infrastructure resources such as memory or CPU or storage and additional support staff. This could be a critical decision and client is really not sure if its current system is fit enough, often the opinions of the existing application team are sought. However these can be biased as teams & org structures are also driven by personal ambitions, goals, sense of security and internal politics. The LoB mgmt. then decides to seek a professional opinion and independent report on the health of the system and the journey into application assessment starts.
Architecture assessments are also conducted when an application / system is facing some systemic issues such as frequency outages, poor response times etc.

The recommendations would be made by conducting the analysis in a structured fashion as below:
·         Analyze the As-Is State of the System
·         State Limitations of the As Is State of the System
·         Provide Recommendations to Remediate the Limitations
·         Provide Additional Recommendations for overall improvement
·         Prepare To Be State based on recommendations
·         Provide Roadmap to attain the To Be State
Each of the above not only becomes a task for the assessment project plan but also an item in the table of content of the assessment report. The minor / sub items will also be useful from this perspective, especially that of the roadmap. When preparing out the implementation / remediation plan based on this report; the sub items under the roadmap can become tasks in the implementation project plan.
OK enough of planning & management stuff; let’s now dig straight into the details of the assessment process.

The information that I look for falls broadly into two categories:
1)      Static Analysis - things that do not require the application to be running
2)      Dynamic Analysis – the application needs to run to get this kind of data
Under static comes things such as – use cases, system architecture document, design document, source code etc . All these things do not require the application to be running.

Dynamic Analysis

Under dynamic analysis falls things such as performance (response times), availability, stability etc. I begin by analyzing the runtime behavior of the system. I collect the data listed below so as to judge if the system is stable enough to carry it forward into future.

Performance Test

 It is important to conduct performance test (using a tool like load runner) of the application using the latest available code base and not rely on the performance test results that may have been conducted some years ago when the system was first deployed into production.  Therefore it is important to conduct a fresh performance test and record results.

Once the performance test results are available, I go through the report of bottlenecks and generally ask for profiling those use cases that reported with poor response times. From the profiling report the causes behind the bottlenecks and its location within the code is identified. This is recorded in the report for further analysis and drawing conclusions.

At the end of this stage following deliverables are provided to the client
Deliverable:  Performance Hotspots & Bottlenecks report along with the name of the application areas where these were found.

Availability Report

I also refer to the production ticket logs to identify how often the application faces outages. This is a good data point to understand how stable the application is. Obviously frequent outages indicate that it is not a good idea to continue developing the system with new functionality.
Deliverable:  Report that describes the Failure frequency, categories and the areas in the application that caused this downtime.

Stability Report

Not only it is important to check on the outages, it is also important to understand how often functional issues are reported in the system. The best place to look for this type of information is to check the ticketing system and note how often functional failures were reported in the system in terms of ticket frequency. It is also important to note the area / module of the application for which tickets are reported commonly. All these findings are recorded for further analysis.
Deliverable: Report that describes the Ticket frequency and categorization along with the list of application areas that the categories point to.

Static Analysis

Under static analysis, I take the top bottom approach and first try to understand the Architecture vision, technology roadmap and overall EA before starting with the application architecture. With this it can be understood whether the application architecture is in alignment with the EA vision & technology roadmap. It is important to understand the technologies which have been standardized in the enterprise and also the technologies where investment will be made in future so as to identify if the application is using any exotic or obsolete technologies where there could be challenges in future in terms of availability of skills & support. So here are the things that I study under the Architecture Analysis.

Architecture Analysis

·         Request an Overview by SME
·         Refer existing (arch & design) documentation
·         Architecture analysis
·         Identify Architecture Pattern (e.g. SOA, Distributed, Layered etc.) that the current application follows
·         Requirements / Use cases analysis
·         System Context analysis (external & internal systems – protocols, technologies, data format, mode – synch / asynch)
·         Technology & Product analysis
·         Analysis of subsystems / layers (e.g. in case layered architecture - analysis of Presentation, Service, Data Access & Integration layers)

Deliverables at the end of Architecture Analysis task:
1) Limitations & Improvement areas, if any & strengths of existing Architecture
2) Architectural Views - System Context, Logical, Implementation and Deployment View block diagrams

After the Architecture I start moving towards the source code all the way through the design.

High Level Design Analysis

As part of the analysis of the design, I go through the items below and check if each has been designed properly.
·         Screen / Page flow analysis
·         External system calls protocols & data exchange analysis
·         Analysis of the position / placement of Components within application layers / subsystems
·         Component collaborations & dependency analysis
·         Design pattern usage analysis
·         Product & Technology  usage (best practice) analysis
For example an application may have products frameworks such as in the list below. Under usage analysis I check whether all the best practices prescribed for the product are being followed correctly.
Spring MVC
EHCache
Spring JDBC
Drools
Oracle Database
AJAX & JavaScript libraries
JMS & Websphere MQ

Deliverables at the end of Design Analysis phase:
1) Limitations & Improvement areas, if any & Strengths  of High Level Design
2) UML Package & Component Diagrams
3) List of technologies & products used along with versions

Detailed Design Analysis

Under this phase I delve deeper into the design looking at the granular components & details as per the list below:
·         HTTP Request / Response & Payload analysis (e.g. size of the page)
·         HTTP Session Management
·         External System calls & Data Exchange Format analysis
·         Distributed calls & data handling
·         Transaction Management
·         Concurrency  & data integrity
·         Parsing Operations
·         SQL Queries (joins / large result sets)
·         Connection Pooling
·         Object level Coupling & dependency analysis
·         Caching
·         Exception Handling

Deliverables at the end of detailed design analysis phase:
1) Limitations & Improvement areas  & Strengths  found at  detailed level of design
2) UML Component & Class Diagrams

Source Code Quality Analysis

For source code quality, there were times when there was no Sonar and there were no PMDs & Checkstyles & Findbugs. However these are available in present times and we are extremely lucky to have these. With these the source code quality assessment is a matter of importing code into an IDE like eclipse and running reports with all these plugins installed in the IDE or by making a build with Sonar. Whether you use IDE with plugins or Sonar or run these tools like PMD, checkstyle, Findbugs, JDepend, JavaNCSS over command line, Static Source Code quality metrics reports should be included for source code quality assessment. A more formal approach for quantifying these metrics is also available with SQUALE.  Of all these metrics the most important is the unit test coverage report that tells how reliable the code is in meeting the behavior expected from it.

Deliverable: Reports like Sonar Dashboard.

Remediation Recommendations

After having studied the system in detail in all possible aspects and having pointed out the limitations, it is time to remediate as many limitations as possible at each level like architecture, design, source code etc. So recommendations are made at each of the levels described above to remediate a specific limitation of the system.

Deliverable: Recommendation Report

 Proposed State Architecture & Design

In case any serious flaw is discovered either at the Architecture or Design level, new architecture & design is proposed for the system. When preparing the new architecture of the system, the recommendations made at the architecture / Design levels as well as general improvements are taken into consideration. As such the proposed state is not just the (Current State – Its Limitation) but (New Architecture + Recommendations + improvements – Current State Limitations)

Deliverable: High level architecture & design for the Proposed state along with the list of improvements that would be observed as a consequence of moving to the “to be state”.

Roadmap

A roadmap is also prepared by detailing out how to reach the Proposed State by laying out milestones in right order that can be scheduled with small or medium size releases.