Thursday, July 10, 2014

Designing System Architecture

I have been a practicing software solutions architect for more than 8 years now and my total IT industry experience is close to 15 years. During these years I have designed the architecture of many small to large systems. I come across people from varying backgrounds during my work and many of whom even though industry veterans did not quite understand the term Architecture. For many of them Architecture means block diagrams that you prepare after requirements and before design, while others thought that Architecture is a simplified view of design for people from non-technical background and yet others thought that it was a warm-up exercise before design.

All these myths are valid and I don't blame the myth-holders. The main reason behind these myths is that people do not understand what is Architecture and how it is designed and what exactly Architects do. And so here goes an attempt to clear myths around what exactly is Architecture by showing how it is prepared more so from my professional experience, point of view, beliefs and understanding. The focus of this post is on design of architecture rather than defining architecture. It is more about how than what. It is important to note that scope of this post  is Architecture only, it does not delve into design, either high level or low level. We design the architecture here and then hand it over to the system designer for a reference when she prepares design.

Before we start designing architecture, first let's understand what we can expect from an Architect. What is his/her role? An Architect's main responsibility is to act as a visionary for the solution of the system under conception while making sound judgments on the availability, use and future of the technology resources. An Architect provides an optimal solution to meet the system requirements, not necessary the best solution. While an Architect can provide the best solution but that may increase the time & cost required to develop the system to such as extent that it will be detrimental to Time To Market and Profitability, as such an Architect will always propose the most optimal solution but not necessarily the best solution and we should appreciate that. An Architect uses his Vision,  performs Trade-off analysis, Cost Benefit analysis and makes informed judgments to reach an optimal solution for the system. 
An Architect makes decision regarding investment on technology based on his experience, vision and judgments on technology. His decisions can go wrong as nobody knows the future and so cannot make accurate judgments. For example, he decides to invest on apps for Blackberry mobile or UI using Adobe Flex which could be best at that time but in future these technologies or products fail to get market share. Not only we lose all the investment made in developing solutions using these technologies but we will find it hard to maintain as it would be very difficult to find programmers with these technology skills in the market. However it is expected that the Architect provides the best possible judgments given his/her experience and skills in the technology domain, which is why we hired him and not any other professional for the purpose. 

With these expectations set, we now start designing the architecture, broadly its a two step process which includes analysis & design of architecture.

Architectural Analysis




Business Context and Industry Overview

Understand the Business Model, Business Context, get the  Industry Overview & Standards. For example if you are developing a system for handling Debit Card Payments then it is important you know what are cards & electronic payments, how they make money, what is a POS, PCI, ACH etc, if you are developing solution architecture for Loan Origination System then we need to know what is Credit Bureau, Fannie Mae etc. This will build up qualitative information about the systems in your architecture. Without understanding this context the systems in your architecture are all but just a bunch of computer boxes & racks that exchange bits of data. When you understand the need and practical implications of systems in the ecosystem you realize their importance and can appreciate what will happen when a particular system faces downtime and you begin to think about ways of mitigating such risks. 
Understand the Business Model from a Business Analyst and / or by simply Googling or hitting Wikipedia. Once you appreciate the business and the actors involved in it you know that it is not just a bunch of computers you are programming but the implications of what you are doing, the value it is bringing. At this stage you prepare the Big Picture - call it Business Context  view that shows all the systems and actors involved in carrying out the business.


System Requirements Analysis


Once the big picture of the industry is clear, you try to get a grasp on what your client wants to achieve with the system under conception. Where does it fit in the Big Picture, you can high light or mark the position of your system under conception in the Business Context view. You can understand the requirements from business analysts and the documents they prepare. Once you have understood the business requirements from the business analyst and read through the Business Requirement Document (BRD). As functional requirement specification is generally still in works at this stage so a good  source is the BRD and even better is a one-on-one face-to-face talk with the Business Analysts. Once you understand the requirement, you should capture the same in a Logical or Functional View using either block diagrams or UML Use Cases diagrams. Use Case diagrams would be a preferred tool as it is a standardized notation. I prefer the Use Cases. Please note use cases cannot capture flow of the system so if you want to capture the flow then use the System Sequence Diagrams and also you may use the UML Activity Diagrams.


Prepare Domain Object Model

As part documenting Logical or Functional view, you should also prepare a domain model using only the high level business entities / objects. Preparing the domain model is a two step process - first you identify the domain objects and next you connect those to other related objects. For example for a retail banking system you would first identify the domain objects such as Customer, Account, Branch, Transaction, Statement, Loan, CD etc. and then you will connect these objects based on how these are related to one another. You can also show multiplicity on these connectors. For example Branch has many Customers, Customers have many accounts, Accounts have many Transactions etc.


Understand Business Volumes & SLA 


It is very important to take into consideration the Quality of Service expectations when you understand the Business Requirements. A formal SLA spec may not be available at this time but if available for any of the similar existing systems then that would be a preferred choice.At this stage understand the expectations around business volumes as this is the greatest factor that impacts the SLA the most. Business volumes are the the total number transactions expected to be processed by the system in a given time(as in business hours) and also per unit time (as in 200 requests every minute). It also states how long would it take to execute one transaction through the system. Define the SLAs with the help of stakeholders and business analysts.


Document Non-Functional Requirements

Once the various SLAs are known, it should be documented as part of the requirements document. Using the SLA as reference non-functional requirements(NFR) such as Performance (Response Time), Security, Availability, Reliability should be derived, documented, presented and signed off from key stake holders. This is a very important & must have step in defining System Architecture and there are no exceptions to this. This should be completed exactly as stated above i.e. derived, documented, presented and signed off from key stake holders before proceeding further. Signed off SLA and Non-Functional requirements go a long way and play a key role in making architecture decisions, avoiding conflicts of expectations later. It must be noted that not all non-functional requirements can be derived from the SLAs, there are many NFRs such as Extensibility, Flexibility, Usability, Maintainability, Supportability, Resilience, Fault Tolerance etc. which I refer to as non-tangible NFRs


Non-tangible NFRs

The non-tangible NFRs cannot be measured like other NFRs such as Availability at 99.5%. The stakeholders or business owners generally do not have any expectations from these or they may not even understand the role that non-tangible NFRs play for a system. It is in this area that the knowledge, wisdom acquired by an architect over the years and his vision comes into play. With these taken into consideration an architect adds value to the system and makes it future proof or extensible, easy to operate, support & change, user friendly, faster & easier to restore after a failure,  fail gracefully. The stakeholders, business analysts or system users may not think about these as their focus is on gathering providing & communicating business requirements but they have assumed all these systemic properties at the back of their mind and have implicit expectations. An architect's job is to make it explicit and reach an agreement on quality of these services.


Innovation & Emerging Technologies

An architect must take into consideration all contemporary and emerging technologies and visualize and arrive at the probability that the new system will need to use one of these when it is ready. For example although mobile technology or channel is not one that the stakeholders have requested, an architect must foresee that this may become a requirement in recent future given the way world is changing.Accordingly prepare a list of evolving technologies like Mobile, Cloud, Big Data, Social Media etc and see what is relevant for the system to be and make provisions accordingly. Check if there is a scope of creating something new as demanded by the requirements of the system to be and keep innovation on radar throughout the architecture analysis process.

These factors distinguish a poorly "architected" (thought-out) system from a system prepared with high quality standards and vision. Needless to say that it outlives poorly thought-out systems. Once the NFRs are analyzed and ready, the next step is to check the feasibility of implementing these for the system under conception. The best way to start this analysis is by understanding the environment in which the new system will exist and operate.

System Context Analysis

Once you get the Business Context, Requirements & Volumes (NFR), you need to understand how the business is mapped / implemented at your clients by understanding the existing systems in its ecosystem and  the place of system-to-be in it. As part of this one should understand the surrounding systems with which the system to be will interact and co-exist with. Towards this, understand the data center, technology platforms (mainframe, Unix, Java, .Net, Web Services), communication protocols, data exchange formats, hours of operation, availability and all the NFRs supported by the surrounding systems. This is a very important step in analysis and with these we will be able to identify the Limitation and Constraints under which the system to be will operate. We should also note down the strength and any advantages that the new system may gain because of the ecosystem. yes, SWOT (Strengths, Weakness, Opportunities and Threats) imposed by ecosystem. While we have to understand the strengths and limitations of the surrounding ecosystem it does not mean we have to simply accept these as given, where feasible an Architect must make recommendations on improving the surrounding system if that greatly benefits the system to be and the enterprise as a whole.  As part of conclusion of this step in the architecture analysis, prepare a System Context View and document strengths and limitations imposed by the ecosystem.

Identify Reusable Services, Systems & Infrastructure

As part of system context analysis we must identify any reusable services or components available in the enterprise. If there is a architecture governance in place, we need to consult it and check if any existing reusable services or components are available. This will greatly save on the development efforts, cut costs, promote re-usability & standards. While making use of any re-usable services it is important to take into consideration any SLA provided by the existing services. If the service to be re-used does not come with an SLA then we either have to insist on working out it or take the hard decision not using the service. Otherwise it will end up compromising the quality of server of the system-to-be.

EA Analysis

Most of the mature organizations have an Enterprise Architecture that outlines the vision, business & technology alignment, technology standards & preferences and governance model. This should be consulted before we start preparing the system architecture so as to ensure an alignment of the system to be with EA. As part of the EA analysis check the Reference Architecture, Technology & Product Standards, Governance Model, Processes and study it in the context of the analysis done so far for i.e. Business Context, System Requirements, NFRs and System Context. 

We will also need to decide the development process to be used for this project whether water fall or agile or any tailored for the organization, LoB or this project in particular as part of this step.

With the above analysis complete it is time to draw conclusions about the system under conception. The conclusions made should be captured by defining the mission statement, goals objectives and principles.

Design of Architecture


Design Solution Architecture 
The first step in preparing the Solution Architecture is to refer to standard Architecture Patterns used in the industry. After all we do not want to re-invent the wheel here. We should refer to Architecture patterns such as SOA, Layered (distributed components),Big Data (Map-Reduce), Client Server, Event Driven Architecture, Pipes & Filters,  etc and select the one (or more) that is most relevant for the needs of the system to be.  While we should refer to standard reference architectures used in the industry, however if EA provides any reference architecture matching needs of your application then that should be given preference over others. As the EA supplied RA is tailored specifically to the needs of your enterprise it should be given due preference. Once RA is selected, prepare a block diagram for the selected architecture style / pattern using groups of system components specific to your system's needs. This would be the Solution Architecture or View and will consist of layers or groups for related components. I generally use the Layered View for preparing Solution Architecture as I find this most suitable for the purpose at this stage. Also majority of the applications I have came across have been the web based distributed applications and this is the best View for describing Solution Architecture for such systems. The layered view, if used, will consist of layers (or groups or blocks of  related components) such as Presentation layer, Service layer, Persistence layer, Integration layer without identifying the components inside each layer.  For example we show “integration layer” but not the specifics of it or what lies inside it, for example we do not show “ESB” or “Adapters” inside the “Integration Layer”. For systems that do not consist of UI, you can still used the layered view and show layers for Message Transformation, Routing, Persistence etc. 

Identify Subsystems / System Components

Next we start identifying the subsystems or system components that will fill out the layers identified above. For example a requirement for “highly responsive & interactive, web based rich UI” will translate into “RIA”.  A need for a long running business process will manifest into “Workflow Engine”. Often changing business logic that governs the business process will require a “Rules Engine”. Consuming external events to change the state of the processes or objects and in turn generating more events and communicating those across various external systems in a reliable, scalable & technology agnostic fashion will require use of a “Messaging Middleware” or ESB. 

Note that in this stage we are identifying the system components in a technology & vendor agnostic manner. Our focus is to identify subsystems that will meet the system needs at a technology solution level without bothering about further details such as technology platform or vendor products for implementing it. 

This step requires in-depth knowledge of various technology solutions used in the industry for handling specific business problem scenarios. You cannot have a novice or a business analyst making these choices, these have to come from an industry veteran who knows where to use what. You have to know that there are ready made solutions in the industry that fulfill specific business scenarios or needs., for example - a business scenario that needs to address authoring, managing & publishing content can be fulfilled by using Content Management Systems, Portals. Here you have to know that a technology solution like Content Management System exists and it is used commonly for such needs.  It is very important to know that such systems have been conceptualized in the industry as common solutions and in many cases commercial products are available for these.More examples of such system components are - Transaction Monitors, Rule Engines, Workflow Engines, Reporting Software, Accounting Software, ERP, Middleware, ETL, Document Management System. Content Management Systems, Directory Servers, Map Reduce Systems, UI software etc.  Senior professionals who have been in industry for considerable length of time know these well and understand how to apply these effectively.  Novices may not know these and therefore make the mistake of re-inventing the wheel. That is why Architects are necessary when developing technical solutions for serious projects. 

Once you know that such system components are available and are  used routinely in the industry for fulfilling specific business scenarios, you can select one.Then you can go to the next level of decision which is whether to buy a commercial product or to build it in-house or use (rent) an existing solution deployed else where in the enterprise or cloud.The architecture view that you prepare for this step is the System Component view or you can enrich the Solution View by adding the system components to it.

This is the most important step in designing the system architecture and this is where the skill of the architect is best utilized.Since this is such as important step, an architect should give it due importance and spend considerable time on this vetting every possibility and choosing the most optimal solution considering all the trade-offs.

System Integration

We then move on to select how these system components or subsystem will talk to  one another. Here, there are many choices as listed below:

Synchronours or Asynchronous,
Events or RPC,
TCP/IP Sockets
Transaction oriented or otherwise ,
Secure or non-secure,
One way or duplex,
Point to Point or Publish Subscribe,
FTP
ETL

You will be overwhelmed with choices and believe me it does not stop here it continues further with the selection of data format to be used for this communication. The choices for data exchange formats are listed below:

Delimited – CSV, special char delimiter, tab or new line
Fixed length – fixed length columns in rows as in COBOL copybooks, Files
Markup – XML,SWIFT
Key Value pairs.
Database Tables over ETL
Strings that can be treated as Regular Expressions

Once you have made selection, you are in a position to decide the communication protocol such as HTTP(S), FTP, SOAP, REST, Messaging, RPC, CICS, Batch jobs, ETL etc. as well as the data exchange format suitable for that protocol. The basis of these decisions should be virtues and aspects such as loose coupling, location independence, interoperability, fault tolerance, high availability [what if one of the communicating parties goes down], security, real time communication, transactions, volumes, size of payload etc. and most importantly the integration patterns rather than technology hype or fashionable buzz words. With these decisions made, you can update the System Context, System Component views by drawing connector lines and labeling those with the protocol names. I follow the “<data exchange format>/<protocol name>” format as it helps capturing the relevant info about the system interaction precisely & concisely. Examples – SOAP/HTTP, XML/JMS, File/Socket, JSON/REST, SOAP/JMS etc. 

Various blocks in the diagrams were until now just occupying their respective positions without being aware of the adjoining boxes, with this step complete, the blocks now start talking to one another and participate in the ecosystem through mutual interaction and communication.  


Select Technology Platform & Product Stack

At this stage we fill out the empty boxes identified above with the technologies and products. The most important decision in this stage is to select the technology platform on which the system to be will be developed. This is the most important decision that will make most of the other technology & product choices for us. It is important to consider the needs of the system to be, technology standards laid down by EA, technologies used in the ecosystem and trends in the technology industry when making this choice. We have to understand that once this decision is made it is irreversible and will stay with the system to be through its life and will affect many other technology choices and impose certain limits. Accordingly this decision should be made with careful consideration and thorough analysis. Here are some choices for technology platform selection – Mainframe, C/C++ on Unix/Linux, Java, .Net, Win32, iOS etc.

With the technology platform selected we move on to making rest of the choices such as the database – Oracle / DB2 / MS-SQL, MySQL, MongoDB etc. middleware – TIBCO EMS, Websphere MQ, Mule, Apache Camel, big data platform – Apache Hadoop, security & access manager, directory server, cache manager – JBoss Cache, EHCache, Oracle Coherence etc. Workflow Engine – Activiti, Oracle BPM, Websphere Process Server etc. Most of these choices would be no brainers as EA governs these choices through the standards. That said, an architect should always weigh the pros and cons of each Given with careful scrutiny and challenge any established norms or standards where appropriate which may have come in because of conventional wisdom or even industry hype or a vendor sales promotion offers. This task may be challenging when EA does not specify any standards or if there is no EA in the organization. Record the decisions made here in the “Implementation View”.


Deployment and Hosting 

Gone are the days when every company had its own data center with a huge inventory of software & hardware to manage. These are the days when IT resides in cloud and so careful consideration should be made when deciding on deployment options. And the options that we should be considering would be:

On premise / internal data center 
Third party hosted (only hosting not cloud services like IaaS, PaaS or SaaS)
Public cloud [Fully virtualized, auto-provisioned multi-tenant architecture]
Private cloud (on premise cloud)
Hybrid cloud ( mix of public & private clouds or internal data center)

This decision has many factors that should be considered, for example - whether there are any regulatory data privacy requirements for the system to be, if so that immediately rules out public cloud, budgetary constraints on capital expense which immediately makes public cloud an attractive option.

Once this decision is made we move on to selecting the infrastructure and preparing deployment view, where we will answer questions on how to implement NFRs like High Availability, Performance, Security  by taking care of hardware aspects such as clusters (active-active, active-passive), load balancers, network bandwidth, disaster recovery etc. Either we will need to make these decisions ourselves working closely with IT infrastructure group if its a internal data center or we will need to understand already provided infrastructure available in the cloud by working closely with the cloud or hosting services provider and understanding that each of our non-functional requirement can be met by the provider. Where we see a risk we need to bring that to the attention of the provider and  agree on SLA on all quality of service attributes.


Prepare Architecture Views

While we have been preparing views (models - block diagrams) at every stage above, this is the stage where we focus exclusively on views alone.  This is a culmination of all the analysis & design done so far. However this is not full-fledged documentation, it is completion and consolidation of the views prepared earlier and adding what remains. The idea is to present the Architecture to stakeholders, designers & developers even before it is formally documented. The list of views prepared so far include:

- Business Context View
- Functional / Logical View
- System Sequence Diagram
- Domain Object Model
- System Context View

- Solution View
- Layered View
- Implementation View

With each of these views we add relevant notes as bullet points where you want to make certain things explicit which are not obvious from the block diagrams.

Then we need to prepare additional views justifying how the proposed architecture meets the functional & non-functional requirements. We prepare so many block diagrams / views because it is no possible to represent all aspects of the solution in a single block diagram. Which would be cluttered and too complex to be comprehensible. 

We prepare additional views of the solution focusing on one aspect at a time and addressing all the concerns. So we will need to prepare block diagrams for addressing how each requirement (functional & non-functional) is fulfilled. For example:

Security View: Explains how aspects of security such as Authentication, Authorization, Accountability, Non-Repudiation, Data Integrity, Confidentiality etc are addressed by the architecture.

Deployment View: Explains how aspects such as High Availability (clusters, load balancers), Scalability, Security (firewalls), Disaster Recovery etc are addressed by the architecture.

Operations View, Usability View, and any other views you want to invent to describe a specific aspect of the technical solutions is welcome. You can have one block diagram per NFR and ensure that you have not missed out on anything in your architecture solution.The point is we want to communicate everything that was thought through while designing the architecture and to also share the same with all the stakeholders before freezing the architecture and presenting it for a sign-off. 

Advise on Improvements

Once the architecture that fulfills all the functional and non-functional requirements is ready, an Architect should advise on how certain improvements such as Mobile Channel Integration, Big Data Analytics, integration with Social Network  etc as relevant can be built into the system at lower cost taking this opportunity now that the system is under development. Doing this after the system is in production would be a costly undertaking. Not only the Architecture should advise on improvements that leverage emerging technologies but due consideration should also be given on innovation and improvements that can be added to the existing systems, applications processes, governance, operations etc.  

Documentation

The final step in the Architecture is to document it and send it for sign-off. There is a certain structure that the Architecture Document follows, I guess another post required to for it.

What now remains? - Complete Design

In this post we have covered only the design of Architecture but not the application design, so the question is - what now remains in design? Well at this stage the Architect will handover the various architecture views & high level class diagrams of the business domain objects identified during requirement analysis. The designer will then prepare the Logical Data Model & Physical Data Model from the domain object model. From the various architecture views the design will work out High & Low Level Design (HLD & LLD). In HLD, the designer will start with component diagram, where she will identify main components of the application based on standard design patterns such as MVC, Front Controller, Factory, DAO, Adapter etc. and any framework specific patterns or best practices like those for frameworks such Spring or Hibernate or Jersey etc. Then the designer will proceed with use cases as part of LLD and for each use case she will prepare class & sequence diagrams taking into consideration the patterns laid out  in component diagram made in HLD. And then she is to proceed with the common services or cross cutting concerns like exception & error handling, transaction management, caching, pooling, logging, security as you have correctly guessed by now the architecture views addressing NFRs will be a useful guide to the designer to incorporate these into the design. This way once the design is complete, the Architect will review it thoroughly before construction begins. You may refer to my post on design reviews on how to conduct design reviews objectively.