SDDC, SDN, NFV, SFV, ACI, Service Governor, Super Recursive Algorithms and All That Jazz:

“It’s very likely that on the basis of philosophy that every error has to be caught, explained, and corrected, a system of the complexity of the living organism would not run for a millisecond.“
—–   von Neumann, Papers of John von Neumann on Computing and Computing Theory, Hixon Symposium, September 20, 1948, Pasadena, CA, The MIT Press, 1987.

Communication, Collaboration and Commerce at the Speed of Light:

With the advent of many-core servers, high bandwidth network technologies connecting these servers, and new class of high performance storage devices that can be optimized to meet the workload needs (IOPs intensive, throughput sensitive or capacity hungry workloads), Information Technology (IT) industry is looking at a transition from its server-centric, low-bandwidth, client-server origins to geographically distributed, highly scalable and resilient composed service creation, delivery and assurance environments that meet the rapidly changing business priorities, latency constraints, fluctuations in workloads and availability of required resources. Distributed service composition and delivery brings new challenges with scale and fluctuations both in demand and the availability of resources. New approaches are emerging to improve resiliency and the efficiency of distributed system design, deployment, management and control.

The Jazz Metaphor:

The quest for transition is best described by the Jazz metaphor aptly summarized by Holbrook [1] (Holbrook 2003), “Specifically, creativity in all areas seems to follow a sort of dialectic in which some structure (a thesis or configuration) gives way to a departure (an antithesis or deviation) that is followed, in turn, by a reconciliation (a synthesis or integration that becomes the basis for further development of the dialectic). In the case of jazz, the structure would include the melodic contour of a piece, its harmonic pattern, or its meter…. The departure would consist of melodic variations, harmonic substitutions, or rhythmic liberties…. The reconciliation depends on the way that the musical departures or violations of expectations are integrated into an emergent structure that resolves deviation into a new regularity, chaos into a new order, and surprise into a new pattern as the performance progresses.”

The Thesis:

The thesis in the IT evolution is the automation of business processes and service delivery using client-server architectures. It served well as long as the service scale and fluctuations of service delivery infrastructure resources were within certain bounds that allowed the action to increase or decrease available resources and meet the fluctuating demands. In addition, the resiliency of the service is always adjusted by improving the resiliency (availability, performance and security) of the infrastructure through various appliances, processes and tools. This introduced a timescale for meeting the resiliency required for various applications in terms of recovery time objectives and recovery point objectives. The resulting management “time constant” (defined as the time to recover a service to meet customer satisfaction) has been continuously decreasing with the use of newer technologies, tools and process automation.
However, with the introduction of the high-speed Internet, access to mobile technology and globalization of e-commerce, the scale and fluctuations in service demand have radically changed which have put challenging demands on provisioning the resources within shorter and shorter periods of time. Figure 1 summarizes the key drivers that are forcing the drastic reduction of management time constant.

Business Drivers for Anti-Thesis

Figure 1: Global communication, collaboration and commerce at the speed of light is forcing the drastic reduction in IT resource management time constant

 The Anti-Thesis:

The result is the anti-thesis (the word is not used pejoratively but actually it denotes innovation, creativity and a touch of anti-establishment rebellion in the Jazz metaphor) to virtualize the infrastructure management (compute, storage and network resources) and provide intelligent resource management services that utilize commodity infrastructure connecting fat pipes. Software defined data center (SDDC) is used to represent the dynamic provisioning of  server clusters connected by a network attached to required storage all meeting the service levels required by the applications that are composed to create a service transaction. The idea is to monitor the resource utilization by these service components and adjust the resources as required to meet the Quality of Service (QoS) needs of the service transaction (in terms of cpu, memory, network bandwidth, latency, storage throughput, IOPs and capacity.) Network function virtualization (NFV) is used to denote the dynamic provisioning and management of network services such as routing, switching and controlling commodity hardware that is solely devoted to connect various devices to assure desired network bandwidth and latency. Storage function virtualization (SFV) similarly denotes the dynamic provisioning and management of commodity storage hardware with required IOPs, throughput and capacity. ACI denotes application centric infrastructure which is sensitive to the needs of particular application and dynamically adjusts the resources to provide right cpu, memory, bandwidth, latency, storage IOPs, throughput and capacity. The drive to move away from proprietary network and storage equipment to commodity high performance hardware made ubiquitous with open interface architectures are intended to foster competition and innovation both in hardware and software. The open software is supposed to match the needs of the application by tuning the resources dynamically using the compute, network and storage management function made available with open-source software.

Unfortunately, the anti-thesis brings its own issues in transforming the current infrastructure that has evolved over few decades to the new paradigm.

  1. The new approach has to accommodate current infrastructure and applications and allow seamless migration to new paradigm without vendor lock-in to use new infrastructure. Fork-lift strategy will not work that involves time. money and service interruption.
  2. Current infrastructure is designed to provide low latency high performance application quality of service with various levels of security. For mission critical applications to migrate to new paradigm, these requirements have to be met without compromise.
  3. The new paradigm should not require new way of developing applications or it must support current development languages and processes without new methodology lock-in. An application is defined both by functional requirements that dictate the specific domain functions and logic as well as non-functional requirements that define operational constraints related to service availability, reliability, performance, security and cost dictated by business priorities, workload fluctuations and resource latency constraints. A non-functional requirement specifies criteria that can be used to judge the operation of a system, rather than specific behaviors. The plan for implementing functional requirements is detailed in the system design. The plan for implementing non-functional requirements is detailed in the system architecture. The architecture for non-functional requirements plays a key role in whether the open systems approach will succeed or fail. An architecture that defines a plug and play approach requires a composition scheme which leads to the next issue.
  4. There must be a way to compose applications developed by different vendors without having to look inside their implementation. In essence there must be a composition architecture that allows applications to be developed independently but can be composed to create new applications without having to modify the original components. Even when you have open-sourced applications, integrating them and creating new workflows and services is a labor intensive and knowledge sensitive task. The efficiency will be thwarted by the need for service engagements, training and maintenance of integrated workflows.

Current approaches suggested in the anti-thesis movement embracing virtual machines (VM), open-sourced applications and cloud computing fail on all these accounts by increasing complexity or requiring vendor, API and architecture dependency. The result is increased operation cost of integration dependency on ad-hoc software and services.

The increase in complexity with scale and distribution is more an issue of architecture and is not addressed by throwing more ad-hoc software to automate with managers of managers, point solutions and tools. It has to do more with the limitation of current computing architecture than lack of good ad-hoc software approaches.

Server virtualization creates a Virtual Machine image that can be replicated easily in different physical servers with shared resources. The introduction of Hypervisor to virtualize hardware resources (cpu and memory) allows multiple virtual machine images to share the resources in a physical server. NFV and SFV provide management functions to control the underlying commodity hardware. OpenStack and other infrastructure provisioning mechanisms have evolved through the anti-thesis movement to integrate VM provisioning integrated with NFV and SFV provisioning to create clusters of VMs on which the applications can deliver the service transactions. Figure 2 shows OpenStack implementation of such a service provisioning process. A cluster of VMs required for a service delivery can be provisioned with required service level agreements to assure right cpu, memory, bandwidth, latency, storage IOPs, throughput and capacity. It is also important to note that OpenStack not only can provision a VM cluster but also physical server cluster or a mixture. It allows adding or deleting or tuning a VM on demand. In addition, OpenStack allows including applications themselves to be part of the image and snapshots that can be reused to replicate the VM on any server. Clusters with appropriate applications and dependencies with connectivity and firewall rules can be provisioned and replicated. This allows for orchestration of VM images to provide auto-failover, auto-scaling, live-migration and auto-protection for service delivery.

OpenStack based infrastructure control plane

Figure 2: OpenStack is used to provision infrastructure with required service level agreements to assure cpu, memory, bandwidth, storage IOPs, throughput, storage capacity of individual virtual machine (VM) and the network latency of the VM cluster

Unfortunately, the anti-thesis movement solely depends on infrastructure mobility and management through VMs and associated plumbing which requires a lock-in on the availability of same OpenStack in a distributed environment or complex image orchestration add-ons. More recently instead of moving the whole virtual image containing the OS, run-time environments and applications along with their configurations, a mini-OS (using subset of operating system services) image is created with application and their configurations. LXC containers and Docker containers are examples. The use of mobility of VMs or containers to move applications from one infrastructure to another to manage the infrastructure SLAs to meet QoS needs of an application has created a plethora of ad-hoc solutions adding to the complexity. Figure 3 shows the current state-of-the-art.

Anti-Thesis: urrent State of the Art

Figure 3: Current state-of-the-art that provides application QoS through Virtual Machine mobility or container mobility where container is also an image

While this approach provides a solution to meet application scaling and fluctuations needs as long as the infrastructure meets certain requirements, there are certain shortcomings in distributed heterogeneous infrastructures provided by different vendors:

  1. Multiple Orchestrators are required when different architectures and infrastructure management systems are involved
  2. Too many infrastructure management tools, point solutions and integration services increase cost and complexity
  3. Manager of Managers create complexity
  4. Cannot scale across distributed infrastructures belonging to different service providers to leverage co0mmodity infrastructures resulting vendor lock-in
  5. VM Image Mobility creates additional VM image management, run away bandwidth and storage with proliferation of VM instances
  6. Lack of end-to-end service security visibility and control when services span across multiple service infrastructures.
  7. Managing low-latency transactions in distributed environments increases cost and complexity

Figure 4 shows the complexity involved in scaling services across distributed heterogeneous infrastructures with different owners using different infrastructure management systems. Integrating multiple distributed infrastructures with disparate management systems is not a highly scalable solution without increasing complexity and cost.

Obviously if scale, distribution and fluctuations (both in demand and resources) are not a requirement, then, the thesis will do well. Today, there are still many main-frame systems providing high transaction rates albeit at a higher cost. Anti-thesis is born out of the need for high degree of scalability, distribution and fluctuations with higher efficiency. Big data analysis, large scale collaboration systems are examples. However there is a large class of services that like to leverage commodity infrastructure and resiliency with security and application QoS management without vendor lock-in or high cost of complexity.

There are three stakeholders in an enterprise who want different things from infrastructure to provide QoS assurance:

  1. The Line of business owners and the CIO want:
    1. Service Level Quality (availability, performance, security and cost) Assurance
    2. End-to-end service visibility and control
    3. Precise resource accounting
    4. Regulatory Compliance
  2. The IT infrastructure providers want:
    1. Provide “Cloud-like Services” in private datacenters
    2. Advantage of commodity infrastructure without vendor lock-in
    3. Ability to “migrate service” or “tune infrastructure SLAs” based on Policies and application demand
    4. Ability to burst into cloud without vendor-lock-in
  3. The developers want:
    1. Focus on business logic coding and specification of run-time requirements for resources (application intent, context, communications, control and constraints) without worrying about run-time infrastructure configurations
    2. Converged DevOps to develop test and deploy with agility
    3. Service deployment architecture decoupling non-functional and functional requirements
    4. Service composition tools for reuse
    5. End-to-end visibility and profiling at run-time across the stack for Debugging

In essence, service developers would want to focus on functional requirement fulfillment without having to worry about resource availability in a fluctuating environment. Monitoring resource utilization and taking action on non-deterministic impact of scaling and fluctuations should be supported by a common architecture that decouples application execution from underlying resource management distributed or not.

Complexity and Cost

Figure 4: Complexity in a distributed infrastructure where scaling and fluctuations are increasing

The Synthesis:

The synthesis depends on addressing the scaling and fluctuation issues without vendor lock-in or architecture lock-in that restricts developers to use their current environments and requires accommodating current infrastructure while allowing new infrastructure with NFV and SFV to seamlessly integrate. For example the anti-thesis solutions require certain features in their OSs and new middleware must run in distributed environments. This leaves a host of legacy systems out.

A call for the synthesis is emerging from two quarters:

  1. Industry analysts such as Gartner who predict that a service governor will emerge in due time. “A service governor [2] is a runtime execution engine that has several inputs: business priorities, IT service descriptions (and dependency model), service quality and cost policies. In addition, it takes real-time data feeds that assess the performance of user transactions and the end-to-end infrastructure, and uses them to dynamically optimize the consumption of real and virtual IT infrastructure resources to meet the business requirements and service-level agreements (SLAs). It performs optimization through dynamic capacity management (that is, scaling resources up and down) and dynamically tuning the environment for optimum throughput given the demand. The service governor is the culmination of all technologies required to build the real-time infrastructure (RTI), and it’s the runtime execution management tool that pulls everything together.”
  2. From the academic community who recognize the limitations of Turing’s formulation of computation in terms of functions to process information using simple read, compute (change state) and write instructions combined with the introduction of program, data duality by von Neumann which has allowed information technology (IT) to model, monitor, reason and control any physical system. Prof. Mark Burgin [3] in his 2005 book on super recursive algorithms states “it is important to see how different is functioning of a real computer or network from what any mathematical model in general and a Turing machine,(as an abstract, logical device), in particular, reputedly does when it follows instructions. In comparison with instructions of a Turing machine, programming languages provide a diversity of operations for a programmer. Operations involve various devices of computer and demand their interaction. In addition, there are several types of data. As a result, computer programs have to give more instructions to computer and specify more details than instructions of a Turing machine. The same is true for other models of computation. For example, when a finite automaton represents a computer program, only certain aspects of the program are reflected. That is why computer programs give more specified description of computer functioning, and this description is adapted to the needs of the computer. Consequently, programs demand a specific theory of programs, which is different from the theory of algorithms and automata.”

In short, the programs (or functions) developers develop to code business logic do not contain knowledge about how compute, storage and network devices interact with each other (structure) and how to deal with changing business priorities, workload variations and latency constraints (fluctuations that force changes to structure). This knowledge has to be incorporated in the architecture of the new computing, management and programming model.

These non-functional requirements are requirements that specify criteria that can be used to judge the operation of a system, rather than specific behavior. This should be contrasted with functional requirements that define specific behavior or functions that deal with algorithms, or business logic. The plan for implementing functional requirements is detailed in the system design. The plan for implementing non-functional requirements is detailed in the system architecture. These requirements include availability, reliability, performance, security, scalability and efficiency at run-time. The new architecture must encapsulate the intent of the program, its operational requirements such as the context, connectivity to other components, constraints and control abstractions that are required to manage the non-functional requirements. Figure 5 shows an architecture where the service management architecture is decoupled from the infrastructure management systems monitoring and managing distributed resources that may belong to different providers with different incentives.

Infusing Cognition into Service Control Plane

Figure 5: A cognition infused service composition architecture that decouples distributed heterogeneous multi-vendor infrastructure management

The infrastructure control plane provides automation, monitoring and management of infrastructure required for applications to execute their intent. The output of the infrastructure is a cluster of physical servers or virtual servers with an operating system in each server to provide well-defined computing resources in terms of total CPU, Memory, network bandwidth, latency, storage IOPs, throughput and capacity. The infrastructure control plane will be able to provide required clusters on demand and elastically scale the nodes or the individual node resources on demand. The elastic on-demand resources use automation processes or NFV and SFV resources connected to Virtual or Physical servers.

As Professor Mark Burgin points out, the intent and the application monitoring to process information, apply knowledge, and change the circumstance must be part of the service management knowledge independent of distributed infrastructure management systems for providing true scalability, distribution and resiliency; and avoiding vendor lock-in or infrastructure, architecture or API lock-in. In addition, the service control plane must support recursive service composition to be able to have end-to-end service visibility and control to avail the best resources wherever they are available to meet the quality of service dictated by business priorities, latency constraints and workload fluctuations. The application quality of service must not be dictated or limited by the infrastructure limitations. Then only we can predictably deploy highly reliable services on even not so reliable distributed infrastructure and increase efficiency to meet demand that is not as predictable.

Borrowing from biological and intelligent systems which specialize in exploiting  architectures that provide predictability, we can argue that infusing cognition into service management will provide such an architecture. Cognition [4] is associated with intent and its accomplishment through various processes that monitor and control a system and its environment. Cognition is associated with a sense of “self” (the observer) and the systems with which it interacts (the environment or the “observed”). Cognition [4] extensively uses time, history and reasoning in executing and regulating tasks that constitute a cognitive process. There is a fundamental reason why current Turing, von Neumann stored program computing model cannot address large-scale distributed computing with fluctuations both in resources and in computation workloads without increasing complexity and cost. As von Neumann [5] put it “It is a theorem of Gödel that the description of an object is one class type higher than the object.” An important implication of Gödel’s incompleteness theorem is that it is not possible to have a finite description with the description itself as the proper part. In other words, it is not possible to read yourself or process yourself as a process. In short, Gödel’s theorems prohibit “self-reflection” in Turing machines. Turing’s O-machine was designed to provide information that is not available in the computing algorithm executed by the TM. More recently, the super recursive algorithms proposed by Mark Burgin [3] points a way to model the knowledge about the hardware and software to reason and act to self-manage.  He proves that the super recursive algorithms are more efficient than plain Turing computations which assume unbounded resources.

Perhaps, we should look for “synthesis” solutions not in familiar places where we feel comfortable with more ad-hoc software and services that are labor and knowledge intensive. We should look for clues in biology, human organizational networks and even telecommunication networks to transform current datacenters from being infrastructure management systems to services switching centers of the future [6]. This requires search for new computing, management and programming models without disturbing current applications, operating systems or infrastructure while facilitating smooth migration to a more harmonious melody of orchestrated services on a global scale with high efficiency and resiliency.

References:

[1] Holbrook, Morris B. 2003. ” Adventures in Complexity: An Essay on Dynamic Open Complex Adaptive Systems, Butterfly Effects, Self-Organizing Order, Coevolution, the Ecological Perspective, Fitness Landscapes, Market Spaces, Emergent Beauty at the Edge of Chaos, and All That Jazz.” Academy of Marketing Science Review [Online] 2003 (6) Available: http://issuu.com/gfbertini/docs/adventures_in_complexity_-_an_essay_on_dynamic_ope/search 

[2] https://www.gartner.com/doc/2075838/infrastructure-service

[3] M. Burgin, Super-recursive Algorithms, New York: Springer, 2005.

[4] Mikkilineni, R. (2012). Applied Mathematics,  3, 1826-1835 doi:10.4236/am.2012.331248 Published Online November 2012 (http://www.SciRP.org/journal/am)

[5] Aspray W., and Burks A., 1987. Editors, Papers of John von Neumann on Computing and Computer Theory. In Charles Babbage Institute Reprint Series for the History of Computing, MIT Press. Cambridge, MA, p409, p.474.

[6] Rao Mikkilineni, “Designing a New Class of Distributed Systems” Springer, New York, 2011

Advertisements

There are no comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: