From Being to Becoming: Function, Structure and Fluctuations – Incremental versus Leap-frog Innovation in Datacenters

“We grow in direct proportion to the amount of chaos we can sustain and dissipate” ― Ilya Prigogine, Order out of Chaos: Man’s New Dialogue with Nature


According to Gartner “Alpha organizations aggressively focus on disruptive innovation to achieve competitive advantage. Characterized by unknowns, disruptive innovation requires business and IT leaders to go beyond traditional management techniques and implement new ground rules to enable success.”

While there is a lot of buzz about “game changing” technologies, and “disruptive innovation”, real “game changers” and “disruptive innovators” are few and far between. Leap-frog innovation is more like a “phase transition” in physics. A system is composed of individual elements with a well-defined function which interact with each other and the external world with a well-defined structure. The system usually exhibits normal equilibrium behavior that is predictable and when there are small fluctuations, incremental innovation allows to adjust itself and maintain the equilibrium with predictability. Only when the external forces inflict large or wild unexpected fluctuations in the system, the equilibrium is threatened and the system exhibits an emergent behavior where unstable equilibrium introduces unpredictability in the evolution dynamics of the system. A phase transition occurs with a reconfiguration of the structure of the system going through an architecture transformation resulting in order from chaos.

The difference between “Kaizen” (incremental improvement) and “disruptive innovation” is in dealing with stable equilibrium with small fluctuations versus dealing with meta-stable equilibrium with large-scale and big fluctuations. Current datacenter is in a similar transition from “being” to “becoming” driven by both the hyper-scale structure and fluctuations (which, the hardware and software systems delivering business processes are experiencing) caused by rapidly changing business priorities on a global scale, workload fluctuations and latency constraints. Is the current von Neumann stored program control implementation of the Turing machine reaching its limit? Is the datacenter poised for a phase transition from current ad-hoc distributed computing practices to a new theory-driven self-* architecture? In this blog we discuss a non-von Neumann managed Turing oracle machine network with a control architecture as an alternative.

From Being to Becoming” – What Does It Mean?

The representation of the dynamics of a physical systems as linear, reversible (hence deterministic), temporal order of states requires that, in a deep sense, physical systems never change their identities through time; hence they can never become anything radically new (e.g., they must at most merely rearrange their parts, parts whose being is fixed). However, as elements interact with each other and their environment, the system dynamics can dramatically change when large fluctuations in the interactions induce a structural transformation leading to chaos and the eventual emergence of a new order out of chaos. This is denoted as “becoming”. In short, the dynamics of near equilibrium states with small-scale fluctuations in a system represent the “being” and large deviations from the equilibrium, emergence of an unstable equilibrium and the final restoration of order in a new equilibrium state represent the “becoming”. According to Plato “being” is absolute, independent, and transcendent. It never changes and yet causes the essential nature of things we perceive in the world of “becoming”. The world of becoming is the physical world we perceive through our senses. This world is always in movement, always changing. The two aspects – the static structures and their dynamics of evolution are two sides of a coin. Dynamics (becoming) represents time and static configurations at any particular instance represent the “being”. Prigogine applied this concept to understand the chemistry of matter, phase transitions and the like. Individual elements represent function and the groups (constituting a system) represent structure with dynamics. Fluctuations caused by the interaction within the system and between the system and its environment, cause the dynamics of the system to induce transitions from being to becoming. Thus, function, structure and fluctuations determine the system and its dynamics defining the complexity, chaos and order.

Why is it Relevant to Datacenters?

Datacenters are dynamic systems where software working with hardware delivers information processing services that allow modeling, interaction, reasoning, analysis and control of the environment external to them. Figure 1 shows the hardware, software and their interaction among themselves and the external world. There are two distinct systems interacting with each other to deliver the intent of the datacenter which is to execute specific computational workflows that model, monitor and control the external world processes using the computing resources:

  1. Service workflows modeling the process dynamics of the system depicting the external world and its interactions. Usually this consists of functional requirements of the system that is under consideration such as business logic, sensors and actuator monitoring and control (the computed) etc. The model consists of various functions captured in a structure (e.g., a directed acyclic graph, DAG, and it’s evolution in time. This model does not include the computing resources required to execute the process dynamics. It is assumed tat the resources will be available for the computation (cpu, memory, time etc.)
  2. The non-functional requirements that address the required resources to execute the functions as a function of time and fluctuations both in the interactions in the external world and also in the computing resources available to accomplish the intent defined in the functional requirements. The computation as implemented in the von Neumann stored program control model of the Turing machine requires time (impacted by the cpu speed, network latency, bandwidth, storage IOPs, throughput, capacity) and memory. The computing model assumes unbounded resources including time for completing the computation. Today, these resources are provided by a cluster of servers and other devices containing multi-core cpu’s and memory networked with different types of storage. The computations are executed in the server or device by allocating the resources using an operating system which itself is a software that mediates the resources to various computations.

On the right hand side of Figure 1, we depict the computing resources required to execute the functions in a given structure whether it is distributed or not. In the middle, we represent the application workflows composed of various components constituting an application area network (AAN) that is executed in a distributed computing cluster (DCC) made up of the hardware resources with specified service levels (cpu, memory, network bandwidth, cluster latency, storage capacity, IOPs , throughput and capacity). The left hand side shows a desired end-to-end process configuration and evolution monitoring and control mechanism. When all is said and done, the process workflows need to execute various functions using the computing resources made available in the form of a distributed cluster providing required CPU, memory, network bandwidth, latency, storage IOPs, throughput and capacity. The structure is determined by the non-functional requirements such as resource availability, performance, security and cost. Fluctuations evolve the process dynamics and require adjusting the resources to meet the needs of applications to cope with the fluctuations.

Figure 1: Decoupling service orchestration and infrastructure orchestration to deliver function, structure and dynamic process flow to address the fluctuations both in resource availability and service demand

Figure 1: Decoupling service orchestration and infrastructure orchestration to deliver function, structure and dynamic process flow to address the fluctuations both in resource availability and service demand

There are two ways to match the resources available to the computing nodes connected by links that execute the business process dynamics. First approach is the current state of the art and the second one is an alternative approach based on extensions to the current von Neumann stored program implementation of  the Turing machine.

Current State of the Art

The infrastructure is infused with intelligence about various applications and their evolving needs and adjust the resources (time of computation affected by cpu, network bandwidth, latency, storage capacity, throughput and IOPs and the memory required for the computation). Current IT has evolved from a model where the resources are provisioned anticipating the peak workloads and the structure of the application network is optimized for coping with deviations from equilibrium. Conventional computing models using physical servers (often referred to as bare-metal) cannot cope with wild fluctuations if the new server provisioning times are much larger than the time it takes for the onset of fluctuations and the predictability of their magnitude to pre-plan the provisioning of additional resources. Virtualization of the servers and on-demand provisioning of Virtual machines reduces the provisioning times substantially to institute auto-scaling, auto-failover and live migration across distributed resources using Virtual Machine image mobility. However, it comes with a price:

    1. The Virtual Image is still tied to the infrastructure (network, storage and computing resources supporting the VM and moving a VM involves manipulating a multitude of distributed resources often owned or operated by different owners and touch many infrastructure management systems thus increasing complexity and cost of management.
    2. If the distributed infrastructure is homogeneous and supports VM mobility, it is simpler but the solution forces vendor lock-in and does not allow to take advantage of commodity infrastructure offered by multiple suppliers.
    3. If the distributed infrastructure is heterogeneous, VM mobility now must depend on myriad management systems and most often, these management systems themselves need other management systems to manage their resources.
    4. The VM mobility and management also increase bandwidth and storage requirements and proliferation of point solutions and tools to move across heterogeneous distributed infrastructure that increase operational complexity and additional cost.

Current state of the art based on the mobility of VMs and infrastructure orchestration  is summarized in figure 2.

Anti-Thesis: urrent State of the Art

Figure 2: The infrastructure orchestration based on second guessing the application quality of service requirements and its dynamic behavior

 It clearly shows the futility of orchestrating service availability, performance, compliance, cost and security in a very distributed and heterogeneous environment where scale and fluctuations dominate. The cost and complexity of navigating multiple infrastructure service offerings often outweigh the benefits of commodity computing. It is one reason why enterprises complain that 70% of their budget often is spent on keeping the service lights on.

Alternative Approach: A Clean Separation of Business Logic Implementation and the Operational Realization of Non-functional Requirements

Another approach is to decouple application and business process workflow management from the distributed infrastructure mobility by placing the applications in the right infrastructure that has the right resources, monitor the evolution of the applications and proactively manage the infrastructure to add or delete resources with predictability based on history. Based on the RPO and RTO, adjust the application structure to create active/passive or active/active nodes to manage application QoS and workflow/business process QoS. This approach requires top down method of business process implementation with the specification of the business process intent followed by a hierarchical and temporal specification of process dynamics with context, constraints, communication, control of the group and its constituents and the initial conditions for the equilibrium quality of service (QoS). The details include:

  1. Non-functional requirements that specify availability, performance, security, compliance and cost constraints and the policies specified with hierarchical and temporal process flows. The intent at higher level are translated to the down-stream intent of the computing nodes contributing to the workflow.
  2. A distributed or otherwise structure of network of networks providing the computing nodes with specified SLAs for resources (cpu, memory, network bandwidth, latency, storage IOPs, throughput and capacity)
  3. A method to implement autonomic behavior with visibility and control of application components so that they can be managed with policies defined. When scale and fluctuations demand a change in the structure to transition to a new equilibrium state, the policy implementation processes proactively add or subtract computing nodes or find existing nodes to replicate, repair, recombine or reconfigure the application components. The structural change implements the transition from being to becoming.

A New Architecture to Accommodate Scale and Fluctuations: Toward the Oneness of the Computer and the Computed

There is a fundamental reason why current Turing, von Neumann stored program computing model cannot address large-scale distributed computing with fluctuations both in resources and in computation workloads without increasing complexity and cost (Mikkilineni et. al. 2012). As von Neumann put it “It is a theorem of Gödel that the description of an object is one class type higher than the object.” An important implication of Gödel’s incompleteness theorem is that it is not possible to have a finite description with the description itself as the proper part. In other words, it is not possible to read yourself or process yourself as a process. In short, Gödel’s theorems prohibit “self-reflection” in Turing machines. According to Alan Turing, Gödel’s theorems show that every system of logic is in a certain sense incomplete, but at the same time it indicates means whereby from a system L of logic a more complete system L_ may be obtained. By repeating the process we get a sequence L, L1 = L_, L2 = L_1 … each more complete than the preceding. A logic Lω may then be constructed in which the provable theorems are the totality of theorems provable with the help of the logics L, L1, L2, … Proceeding in this way we can associate a system of logic with any constructive ordinal. It may be asked whether such a sequence of logics of this kind is complete in the sense that to any problem A, there corresponds an ordinal α such that A is solvable by means of the logic Lα.”

This observation along with his introduction of the oracle-machine influenced many theoretical advances including the development of generalized recursion theory that extended the concept of an algorithm. “An o-machine is like a Turing machine (TM) except that the machine is endowed with an additional basic operation of a type that no Turing machine can simulate.” Turing called the new operation the ‘oracle’ and said that it works by ‘some unspecified means’. When the Turing machine is in a certain internal state, it can query the oracle for an answer to a specific question and act accordingly depending on the answer. The o-machine provides a generalization of the Turing machines to explore means to address the impact of Gödel’s incompleteness theorems and problems that are not explicitly computable but are limit computable using relative reducibility and relative computability.

According to Mark Burgin, an Information processing system (IPS) “has two structures—static and dynamic. The static structure reflects the mechanisms and devices that realize information processing, while the dynamic structure shows how this processing goes on and how these mechanisms and devices function and interact.”

The software contains the algorithms (à la the Turing machine) that specify information processing tasks while the hardware provides the required resources to execute the algorithms. The static structure is defined by the association of software and hardware devices and the dynamic structure is defined by the execution of the algorithms. The meta-knowledge of the intent of the algorithm, the association of specific algorithm execution to a specific device, and the temporal evolution of information processing and exception handling when the computation deviates from the intent (be it because of software behavior or the hardware behavior or their interaction with the environment) is outside the software and hardware design and is expressed in non-functional requirements. Mark Burgin calls this Infware which contains the description and specification of the meta-knowledge that can be also be implemented using the hardware and software to enforce the intent with appropriate actions.

The implementation of Infware using Turing machines introduces the same dichotomy mentioned by Turing with respect to the manager of manager conundrum. This is consistent with the observation of Cockshott et al. (2012) ““The key property of general-purpose computer is that they are general purpose. We can use them to deterministically model any physical system, of which they are not themselves a part, to an arbitrary degree of accuracy. Their logical limits arise when we try to get them to model a part of the world that includes themselves.”

The goals of the distributed system determine the resource requirements and computational process definition of individual service components based on their priorities, workload characteristics and latency constraints. The overall system resiliency, efficiency and scalability depend upon the individual service component workload and latency characteristics of their interconnections that in turn depend on the placement of these components (configuration) and available resources. The resiliency (fault, configuration, accounting, performance and security often denoted by FCAPS) is measured with respect to a service’s tolerance to faults, fluctuations in contention for resources, performance fluctuations, security threats and changing system-wide priorities.  Efficiency depicts the optimal resource utilization.  Scaling addresses end-to-end resource provisioning and management with respect to increasing the number of computing elements required to meet service needs.

A possible solution  to address resiliency with respect to scale and fluctuations is an application network architecture, based on increasing the intelligence of computing nodes which, is presented in the Turing centenary conference (2012) for improving the resiliency, efficiency and scaling of information processing systems. In its essence, the distributed intelligent managed element (DIME) network architecture extends the conventional computational model of information processing networks, allowing improvement of the efficiency and resiliency of computational processes. This approach is based on organizing the process dynamics under the supervision of intelligent agents. The DIME network architecture utilizes the DIME computing model with non-von Neumann parallel implementation of a managed Turing machine with a signaling network overlay and adds cognitive elements to evolve super recursive information processing. The DIME network architecture introduces three key functional constructs to enable process design, execution, and management to improve both resiliency and efficiency of application area networks delivering distributed service transactions using both software and hardware (Burgin and Mikkilineni):

  1. Machines with an Oracle: Executing an algorithm, the DIME basic processor P performs the {read -> compute -> write} instruction cycle or its modified version the {interact with a network agent -> read -> compute -> interact with a network agent -> write} instruction cycle. This allows the different network agents to influence the further evolution of computation, while the computation is still in progress. We consider three types of network agents: (a) A DIME agent. (b) A human agent. (c) An external computing agent. It is assumed that a DIME agent knows the goal and intent of the algorithm (along with the context, constraints, communications and control of the algorithm) the DIME basic processor is executing and has the visibility of available resources and the needs of the basic processor as it executes its tasks. In addition, the DIME agent also has the knowledge about alternate courses of action available to facilitate the evolution of the computation to achieve its goal and realize its intent. Thus, every algorithm is associated with a blueprint (analogous to a genetic specification in biology), which provides the knowledge required by the DIME agent to manage the process evolution. An external computing agent is any computing node in the network with which the DIME unit interacts.
  2. Blue-print or policy managed fault, configuration, accounting, performance and security monitoring and control (FCAPS): The DIME agent, which uses the blueprint to configure, instantiate, and manage the DIME basic processor executing the algorithm uses concurrent DIME basic processors with their own blueprints specifying their evolution to monitor the vital signs of the DIME basic processor and implements various policies to assure non-functional requirements such as availability, performance, security and cost management while the managed DIME basic processor is executing its intent. This approach integrates the evolution of the execution of an algorithm with concurrent management of available resources to assure the progress of the computation.
  3. DIME network management control overlay over the managed Turing oracle machines: In addition to read/write communication of the DIME basic processor (the data channel), other DIME basic processors communicate with each other using a parallel signaling channel. This allows the external DIME agents to influence the computation of any managed DIME basic processor in progress based on the context and constraints. The external DIME agents are DIMEs themselves. As a result, changes in one computing element could influence the evolution of another computing element at run time without halting its Turing machine executing the algorithm. The signaling channel and the network of DIME agents can be programmed to execute a process, the intent of which can be specified in a blueprint. Each DIME basic processor can have its own oracle managing its intent, and groups of managed DIME basic processors can have their own domain managers implementing the domain’s intent to execute a process. The management DIME agents specify, configure, and manage the sub-network of DIME units by monitoring and executing policies to optimize the resources while delivering the intent.

The result is a new computing model, a management model and a programming model which infuse self-awareness using an intelligent Infware into a group of software components deployed on a distributed cluster of hardware devices while enabling the monitoring and control of the dynamics of computation to conform to the intent of the computational process. The DNA based control architecture configures appropriately the software and hardware components to execute the intent. As the computation evolves, the control agents monitor the evolution and makes appropriate adjustments to maintain an equilibrium conforming to the intent. When the fluctuations create conditions for unstable equilibrium, the control agents reconfigure the structure in order to create a new equilibrium state that conforms to the intent based on policies.

Figure 3 shows the Infware, hardware and software executing a web service using DNA.


Figure 3: Hardware and software networks with a process control Infware orchestrating the life-cycle evolution of a web service deployed on a Distributed Computing Cluster

The hardware components are managed dynamically to configure an elastic distributed computing cluster (DCC) to provide the required resources to execute the computations. The software components are organized as managed Turing oracle machines with a control architecture to create AANs that can be monitored and controlled to execute the intent using the network management abstractions of replication, repair, recombination and reconfiguration. With DNA, the datacenters are able to evolve from being to becoming.

It is important to note that DNA is implemented (Mikkilineni, et. al. 2012, 2014) to demonstrate a couple of functions that cannot be accomplished today with current state of the art:

  1. Migrating a workflow being executed in a physical server (a web service transaction including a web server, application server and a database) to another physical server without a reboot or losing transactions to maintain recovery time and recovery point objectives. No virtual machines are required although they can be used just as if they were bare-metal servers.
  2. Provide workflow auto-scaling, auto-failover and live migration with retention of application state using distributed computing clusters with heterogeneous infrastructure (bare metal servers, private and public clouds etc.) without infrastructure orchestration to accomplish them (e.g., without moving virtual machine images or LXC container based images).

The approach using DNA allows the implementation of the above functions without requiring changes to existing applications, OSs or current infrastructure because the architecture non-intrusively extends the current Turing computing model to a managed Turing oracle machine network with control network overlay. It is not a coincidence that similar abstractions are present in how cellular organisms, human organizations and telecommunication networks self-govern and deliver the intent of the system (Mikkilineni 2012).

Only time will tell if the DNA implementation of Infware is an incremental or leap-frog innovation.


This work originated from discussions started in  IEEE WETICE 2009 to address the complexity, security and compliance issues in Cloud Computing. The work of Dr. Giovanni Morana, the C3DNA Team and the theoretical insights from professor Eugene Eberbach, Professor Mark Burgin and Pankaj Goyal are behind the current implementation of DNA.


There are no comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: