Data Center Automation


Data Center Automation (DCA) consists of the processes and procedures implemented in a data center environment for the purpose of automating day-to-day activities. These activities will be those performed normally by system administrators, operators, programmers, and users.

The fundamental purpose of utilizing computers is automation, however computers rarely simplify or expedite the initial gathering of data, in fact the opposite is usually true. A computing environment provides the power to be able to easily manipulate the data once it has been gathered, and to automate processes based on this data.

When analyzing processes for automation in a data center environment the following question must be asked first: Will automating this process save or generate revenue? If not, then it is not a candidate for automation. Implementing technology solutions in a business scenario only for the sake of technology itself, is a waste of time, energy, and resources. If no monetary benefit is realized, then it is of no benefit to the business. In a non-business scenario, the requirement of "monetary benefit" may be overridden by research or scientific curiosity, but is still a question that should be asked.

It is the duty and responsibility of all persons within an organization who utilize computing resources to identify and document those procedures and processes that are candidates for automation. As previously stated, in a business scenario, candidates for automation are any activity that will generate or increase monetary benefits by automation, and is repeated more than once. Priority for assigning resources to automate an activity is determined by how often it is performed and the expected benefits.

System deployment, configuration, and implementation

Numerous software application packages exist to automatically provide system deployment, configuration, and implementation. These packages are typically large and work across heterogenous platforms and environments.

High Availability deployment, configuration, and implementation

High availability is typically an automated process for minimizing business function downtime associated with planned or unplanned outages. Typically utilizes replicated hardware platforms to eliminate single points of failure. The business function fail-over normally occurs between two or more physical frames within the same data center using a single shared storage system.

Disaster Recovery deployment, configuration, and implementation

Disaster Recovery (DR) is the implementation of a project plan which describes the tasks necessary to recover critical business functions after a disaster. The recovery occurs between geographically separated data centers using one or more methods of storage replication between the data centers. The initial implementation of a DR plan is normally a manual process that requires management declaration of a disaster, however subsequent DR processes may be automated.

In the context of data center automation, the generation of a disaster recovery plan should be an automated process, or mostly automated. Unfortunately, this is typically not the case. In most instances, the DR plan is written and manually maintained by system administrators, application administrators, and other technical personnel.

Business Continuity compliance, configuration, and implementation

Business continuity consists of the activities performed on a daily basis to ensure the business operates normally today, tomorrow, and beyond. The business continuity plan may be thought of as a methodology, or as an enterprise wide mentality of conducting day-to-day business.

Network resources allocation and deallocation

For the purpose of data center automation, allocation of network resources, such as IP addresses, must be programmable. This requires that available network addresses be stored in dynamic locations or databases that are accessible by other automated processes, such as system deployment and configuration. This also requires that node names, host names, and aliases be automatically generated on an as needed basis and name resolution services be automatically updated with this information.

Storage resources allocation and deallocation

When automatically deploying new systems as part of data center automation, this requires storage for operating systems and application be available for allocation. Automating these storage allocations and deallocations requires a Storage Area Network (SAN) with a programmable interface such as scripts, or API's.

Dynamic CPU and Memory allocation and deallocation

Most of today's modern systems provide capabilities to dynamically allocate and deallocation hardware resources such as CPU and Memory. Automating these changes requires a programmable interface such as scripts or API's to the hardware management system. Furthermore, in modern data center environments, change control must be employed before modifications are implemented, therefore change requests must be automatically submitted and approvals transmitted back to the hardware management system. Once approvals are received, the automated hardware change can proceed.

Process Scheduling

Assuming heterogenous environments, data center automation requires a cross-platform process scheduling system that can start processes on a node, detect when the process is complete, and make decisions regarding the next step (and process) in a sequence of data processing procedures. These platforms may include mainframe, Unix, MS-Windows, and a wide variety of others.

Performance Monitoring

Performance monitoring and notification is a fundamental piece of data center automation. Many problem avoidance procedures can be automatically implemented based on performance information. This information is also critical to determining service level agreement compliance. Notification of performance issue that cannot be automatically resolved can be forwarded to the appropriate personnel for further review.

Error detection and problem resolution

Monitoring of system and application error logs provides a foundation for automated problem resolution.

Security Management

User authentication and authorization is one of the most time consuming activities in any large computing environment. Many tools exist to automate these processes in heterogenous environments, however it is still a very difficult task. Once a solution is implemented by an organization, an enterprise wide policy must be adopted that requires all subsequent systems and applications to utilize and integrate into the selected solution. Non-compliance must require executive management approval.

Document Management

In a data center automation environment, technical personnel should not be spending their time writing system documentation and procedures. The primary reason is because the documentation will always be obsolete or incomplete. Instead system documentation and procedures should be automatically generated on a periodic basis to ensure they are current and complete.

Change Management

Any large data center environment will require change requests to be submitted and approved before any planned work is performed. To support data center automation principles, the change request system must accept input programmatically through scripts, API's, email, or some other mechanism that can be generated from the systems themselves, rather than by a human.

Audit Management

In support of data center automation concepts, audit compliance must be achievable through automated mechanisms, at least from the technical support personnel's perspective. Toolsets should be implemented with the ability to respond to audit requests via a "self-service" interface which grants access to audit information only to authorized auditors. This relieves the technical support staff from having to gather, format, and provide this information to numerous different auditors, several times a year.

Service Level Agreements

The Service Level Agreement (SLA) is an agreement between a business function owner and the service provider which designates the amount of time, on an annualized basis, the business function will be available. Conversely, the SLA also designates the amount of time, on an annualized basis, for which the business function will NOT be available. The SLA's are utilized in data center automation to schedule maintenance, updates, upgrades, etc., and every system in the data center must have an associated SLA in order for it to participate in the automated scheduling of services. This includes test, development, and sandbox systems.