Raphael M. Bahati | Research

This research blends key concepts specific to Policy-based Management (PBM), Autonomic Computing (AC), and Artificial Intelligence (AI) into a framework (see Figure 1) for adaptive policy-driven autonomic management. In particular, the framework utilizes Reinforcement Learning methodologies to determine how to best use a set of active policies to meet the different performance objectives. This is based on considering past experience in the use of policies to determine which policy actions should be enforced in order to resolve Quality of Service (QoS) requirements violations. We believe that "learning" could offer significant benefits to this effect since it allows autonomic systems to dynamically adapt the use of policies, learn new policies, or even ignore some policies when past experience show it is prudent to do so. The work also proposes several strategies for adapting what has been learned from the use for one set of policies to another set of "similar" policies as a result of run-time changes to policies driving autonomic management.
Policy-based Management

The use of policies makes it more straightforward to define and modify systems behavior at run-time through policy manipulation rather than through re-engineering.

Unlike previous work on policy-based management which mainly focused on the specifications and use "as is" and where changes to how policies are utilized within systems is only possible through manual intervention, our approach allows for policies to be used to express high-level objectives, leaving it to systems to figure out (through learning) how to best achieve those objectives.

Autonomic Computing

It is increasingly evident that the growing complexity of today's systems, both in terms of the heterogeneous nature of the infrastructure under which they operate and what the users of these systems expect, is surpassing the ability of human administrators to effectively manage them. Our interest is in the development of mechanisms for automating the management of such systems to enable the efficient operation of systems and the utilization of services.

Central to the functionality of autonomic system is self-optimization; i.e., the ability of systems to evaluate their own behavior and adapt it accordingly to improve performance. In the context of where policies are used to drive autonomic management, this may often require having the system monitor its own use of policies to learn which of the policy actions are most effective in each encountered situation.

In many systems, it is often common to find multiple components cooperating to deliver a set of services. Since such systems could be deployed to operate in environments with diverse characteristics and under varying configurations, it is important for autonomic systems to be flexible. Policies could be used to provide directives on how the different components are configured. The ability to dynamically reconfigure components and application in response to changes in the configuration of the managed environment helps us to deal with this problem.

Reinforcement Learning

The complex and stochastic nature of today's computing environments means that, it is often impractical to obtain an environment model that is both accurate and representative of all possible situation the learning agent may encounter while interacting with the environment.

Reinforcement Learning provides the ability to learn the model of the environment's dynamics as the agents encounters new situations. In this approach, the model is updated continually throughout the agent's lifetime: at each time-step, the currently learned model is used to improve the policy guiding the agent's interaction with the environment.

The heterogeneous nature of today's systems necessitates the ability of autonomic management systems to have the flexibility to adapt their behavior to support services under diverse configurations. Our proposed strategies for autonomic adaptation are only dependent on the structure of the policies and, as such, should be applicable in other domains.

Figure 1: Adaptive policy-driven autonomic management framework.
Knowledge Base
The Knowledge Base is a shared repository for system policies and other relevant information. This may include information for configuring systems and applications as well as determining corrective actions for resolving QoS requirement violations. The information about policies is eventually distributed to other management components, and then realized as actions driving the autonomic management.
Monitor (M)
Monitors gather performance metric information of interest for the management system, such as resource utilization, response time, throughput, and other relevant information. It is this information that is then used to determine whether the QoS requirements are either being met or violated.
Monitor Manager
Monitor Manager deals with the management of Monitors, including instantiating (i.e., loading and starting) a Monitor for a certain resource type to be monitored as well as providing the context of monitoring (i.e., monitoring frequency or time interval for periodic monitoring or monitoring times for scheduled monitoring). In addition, it allows Monitors to be re-configured (i.e., adding a new Monitor, adjusting the context of monitoring, or disabling a Monitor) dynamically in response to run-time changes to policies. At the core of its responsibility is the collection and processing of Monitor events whose details are then reported to the Event Handler. In essence, the Monitor Manager acts as an event producer by gathering information from multiple Monitors as illustrated in Figure 1. It provides customized services to event consumers (such as the Event Handler) in terms of how often they should receive events notifications.
Event Handler
The Event Handler deals with the processing of events from the Monitor Manager to determine whether there are any QoS requirements violations (based on the enabled policy conditions) and, if so, forwarding appropriate notifications to the interested components. This includes notifying the PDP of conditions' violations as well as forwarding information to the Event Log for archiving. A key feature of this component is its ability to provide customized services to event consumers (i.e., PDP, Event Log, etc.) through subscriptions by allowing components to specify, for example, how often and/or when they should receive notifications.
Policy Decision Point (PDP)
This component is responsible for deciding on what actions to take given one or more violation messages from the Event Handler. If any expectation policy has been violated, the PDP must decide which policy was the "most important" and then what action(s) to take. It uses information about not only the violations, but also the expectation policies and management policies, both expressed within the expectation policies and via management policy rules.
Policy Enforcement Point (PEP)
This component defines an Application Programming Interface (API) which maps the actions subscribed by the PDP to the executable elements corresponding to the various Effectors.
Effector (E)
Effectors translate the policy decisions, i.e., corrective actions, into adjustment of configuration parameters to implement the corrective actions. Note that there will be multiple instances of the Effectors for different types of resources (e.g., logical partitioning of CPUs, allocation of streaming buffers) or for tuning parameters to be adjusted.
Event Log
This component archives traces of the management system's events onto (1) an event log in the memory for capturing recent short term events, and (2) a persistent event log on disk for capturing long term history events for later examination. Such events may include QoS requirement violations from the Event Handler, records of decisions made by the PDP in response to the violations, the actions enforced by the PEP, as well as other relevant management events.
Event Analyzer
This component correlates the events with respect to the contexts, performs trend analysis based on the statistical information, and models complex situations for causality analysis and predictive outcomes of corrective actions. This enables the PDP to learn from past, predict the future, and make appropriate trade-offs and optimal corrective actions. This component is an objective of future work.
Policy Tool
The Policy Tool provides an interface to the managed systems through which policies expressing the desired behavior of the managed system (in terms of CPU, memory utilization, and response time thresholds) as well as possible management actions to be taken whenever those objectives are violated could be specified.