Health Check Report Template - KeyInfo

[Pages:55]VMware Assessment

for

Acme Corporation

Prepared by Key Information Systems Robert Pryor rpryor@

Key Info and ACME Confidential

Version History

VMware Assessment

Date 6/2/2015 6/4/2015 6/5/2015 6/6/2015 6/20/2015 6/30/2015

Ver. Author

Description

1

Robert Pryor VMware Assessment

2

Robert Pryor Review Session with Scott

3

Robert Pryor Review Session with Bob

4

Robert Pryor Updated Findings and Links

5

Robert Pryor Couple minor fixes

6

Robert Pryor Final Report

Reviewers

Scott Youngs Bob Sorace Scott Youngs Charles Hibbits

Page 2 of 55

Contents

VMware Assessment

1. Executive Summary ......................................................................... 4 2. VMware Assessment Background ................................................... 4

2.1 Scope ............................................................................................................................. 4 2.2 Assumptions................................................................................................................... 4 2.3 Constraints ..................................................................................................................... 5 2.4 Methodology................................................................................................................... 5

3. Major Findings and Recommendations ............................................ 7

3.1 Operational..................................................................................................................... 7 3.2 Technical........................................................................................................................ 7

4. vCenter Operations ? Optimization Check ..................................... 10

4.1 Efficiency of vSphere Environment .............................................................................. 10 4.2 Risk in vSphere Environment....................................................................................... 12 4.3 Health of vSphere Environment ................................................................................... 15

5. Health Check Assessment and Recommendations........................ 17

5.1 Compute....................................................................................................................... 17 5.2 Network ........................................................................................................................ 18 5.3 Storage......................................................................................................................... 20 5.4 Virtual Datacenter ........................................................................................................ 23 5.5 Virtual Machine ............................................................................................................ 27

Appendix A: Health Check Participants................................................. 36 Appendix B: Audited Inventory.............................................................. 37 Appendix C: Current Reference Architecture ........................................ 48 Appendix D: Health Check Assessment Best Practices ........................ 49 Appendix E: References ....................................................................... 55

Page 3 of 55

VMware Assessment

1. Executive Summary

ACME engaged Key Information Systems to conduct a VMware Assessment. This engagement included an assessment of ACME's current vSphere deployment in terms of configuration, operations, and usage at both their Phoenix datacenter as well as Agoura Hills. A key part of this service included involvement with ACME IT staff to review the current environment and any relevant pain points. This purpose of this report is to document the discovery, analysis, and recommendations of the VMware Assessment.

The vSphere environment is a critical component of ACME IT operations and this infrastructure is always dynamically changing. Overall the environment is stable with little issues, but they have asked Key Information to identify any areas for improvement. They do have a few areas they would also like to investigate a bit further.

Some of the areas ACME mentioned specifically were:

Capacity Planning

Performance Tweaks / Tuning

In general, the recommendations take into best practices and industry experience from two different perspectives.

Operational Recommendations ? Research and evaluate virtual infrastructure monitoring tools, such as vCenter Operations Manager which was installed as part of this assessment. Ensure there is focus on process definition and improvement, specifically in the areas of systems monitoring, provisioning and problem management.

Technical Recommendations ? Implement consistent configurations across similar systems where possible; perform minor network adjustments; use redundant network configurations; and configure virtual machines to exploit the benefits of virtualization.

Some of these recommendations can be applied quickly, while other recommendations will involve additional planning and executive support. Key Information can provide assistance in each of the areas of recommendation.

2. VMware Assessment Background

2.1 Scope

This engagement applies to the ESX hosts at Phoenix and Agoura Hills, as well as a single vCenter server located in Phoenix. The Production and Development environments are included. Refer to Appendix A for the Health Check participant list.

2.2 Assumptions

This document is based on a number of assumptions as explained in Table 1.

Table 1: Assessment Assumptions

#

Description

A101

Current virtual infrastructure is based upon existing architecture and design assumptions that Key Information was not involved in initially

A102

The network and storage components beyond those directly connected to the ESX hosts are out of scope.

A103

Sufficient consultant access and rights are available to perform the Health Check.

Page 4 of 55

VMware Assessment

2.3 Constraints

In addition to the assumptions, there are also a number of constraints as listed in the following table. Table 2: Constraints of the ACME Assessment

#

Description

C102

Not all virtualization stakeholders were represented in this engagement. (example, endusers and application teams)

2.4 Methodology

The VMware Assessment for ACME has the following primary objectives:

Assess and summarize the VMware vSphere environment in terms of its current health and architecture, with a focus on technical and organizational aspects.

Provide clear recommendations to improve the performance, manageability, and scalability of this environment.

Serve as a reference for ACME to review best practices and communicate current infrastructure issues among stakeholders.

Assess the VMware Operations Management data for any possible optimization/capacity gains

To accomplish these objectives, Key approached this engagement in two ways. The first set of activities is a technical audit of the infrastructure. This was performed using VMware HealthAnalyzer, as well as through observations and measures taken from the infrastructure components. The configurations captured with HealthAnalyzer included detailed, point-in-time settings for virtual machines, VMware ESX host servers, and VMware vCenterTM.

The second set of activities involved meeting with ACME to see if there are any known issues, concerns, and configurations of the current virtual infrastructure.

The third set of activities was to install vCenter Operations Manager within the environment to collect and analyzed the current VMware infrastructure for at least 30 days. This will give us a better picture of health, risk and efficiency.

Following these discovery activities, detailed analysis of the data proceeded. The data included raw configuration settings, performance metrics, screenshots, observational notes, and client-provided documentation. The analysis was driven by comparison of ACME data to industry best practices for vSphere infrastructure in the technical areas of:

Compute ? ESX hypervisor and host hardware configuration

Network ? Virtual and physical network infrastructure settings

Storage ? Shared storage architecture and configuration

Virtual Datacenter ? vCenter, monitoring, backup and other technology to support operations

Virtual Machine ? Virtual workloads, application requirements

Based on the analysis, an assessment of the infrastructure follows. The summary findings and recommendations are presented in the following section of this document. The detailed assessment results are presented in a prioritized format.

Table 3 summarizes the different priority categories of the assessment.

Table 3: Report Card Priority Categories

Grade

Definition

Page 5 of 55

P1 P2 P3 OK No Data Not applicable

VMware Assessment

Specific items of concern that require immediate attention, with corresponding actions to address each concern.

Items of potential concern noted. The items are either non-critical, or require further investigation.

Deviation from best practices noted, but addressing these may not be an immediate priority.

Items conform to best practices guidelines. No items of concern were noted.

We were unable to gather data to evaluate.

This item is not applicable

For each assessment area, specific infrastructure checkpoints were measured and compared against VMware best practice guidelines.

Page 6 of 55

VMware Assessment

3. Major Findings and Recommendations

3.1 Operational

As vSphere environments grow and evolve, managing and controlling the growth of ESX hosts and virtual machines can become a challenge. Without effective control over the infrastructure, virtual machine or ESX host sprawl can quickly diminish the return on investment from virtualization. Looking to a tool like vCenter Operations Manager will provide a single pane of glass dashboard to allow you to get a good feel for the overall health, capacity and efficiency of the vSphere environment.

As discovered in the discussions with ACME, originally templates for creating virtual machines were not always used, but they are the standard today. This will help to provide consistency as well as speed up the provisioning process. There is not any real provisioning documentation for the virtual machines or hosts, and this should be developed to ensure consistency as well as provide base documentation for the department on processes.

Some basic operation recommendations are listed below.

3.1.1 Operational Recommendations

Research and evaluate a virtual infrastructure monitoring or management tool to supplement the features of vCenter, such as vCenter Operations Manager.

Ensure you are using templates for virtual machine creation when possible and update them regularly.

Develop a process to do some regular checks and audits on the environment including but not limited to snapshots, unused VMs, hardware settings, etc.

Develop an ESX host provisioning document / process to ensure your hosts are setup consistently.

Standardize datastore sizes and VMFS versions; possibly conduct a brief storage workshop to determine the optimal size, RAID level for the datastores.

3.2 Technical

Most of the health check items are technical details from the vSphere environment. Technical details are heavily dependent on the processes and architecture that have been defined for the environment. At a minimum, the technical configurations should reflect the architectural decisions and the processes used in operations.

First off, as requested, we took a look at the resource pool configuration. Resource pools are very useful when there is contention for resources, but can quickly become troublesome if not managed and implemented correctly. In ACME's Phoenix environment there are resource pools created for Production, Test and Retired Systems. First we would recommend to remove the retired systems resource pool as this is really just used a placeholder for virtual machines which are no longer in use. A better suggestion would be to ensure the VMs are powered off, annotate them, and possibly use folders to organize them if desired. Next, we recommend removing any limits on individual VMs. In most cases it appeared that when a limit was set, it is set to the total amount of RAM as well so they are not performing a function at this time. Lastly we would recommend removing all reservations from both resources pools and allowing Shares to control the allocation of resources. ACME also does not currently seem to have a resource contention problem at this point, so another option would be to remove them (and the complexity that can come with them) all together.

In terms of technical assessments, ACME's environment seems to be working without any major challenges, but there is always remove for improvement. The data coming from the VMware Health Analyzer highlights some of those areas that can be focused on.

One example is the lack of redundant 10GB connections on the hosts in Phoenix. This can typically be seen as a High Availability risk and it is always considered a best practice to have at least two uplinks for

Page 7 of 55

VMware Assessment

each vSwitch. Another area that could be addressed is to enforce some separation between management and vMotion. A relatively simple fix for this would be to force Management to use the first NIC and vMotion to use the second NIC, and then they could use each other's primary link as their backup.

Another quick fix we found there is a single host that is not using NTP today, this highlights our operational recommendation to ensure you have a hardened provisioning document and/or process when building your hosts For this host, NTP should be configured to match the other hosts within the environment.

The check also found that there are snapshots that were created and appear to have been in place since as far back as 2011. Snapshots should only be used for short term purposes and should be removed as soon as possible. They can not only use up valuable space, but they can be difficult to clean up the longer they are in place. This scenario would also be benefited by some routine checks and audits on the environment which was mentioned above.

Table 4 outlines only the most immediate technical recommendations (some of which are mentioned above). Address these issue as soon as possible. A more comprehensive list of recommendations follows with remediation details and items that may not require immediate attention.

3.2.1 Technical Recommendations

Table 4: Technical Recommendations

Finding Priority Component Recommended Action Item

5.2.1 1

Network

Configure networking consistently across all hosts in a cluster.

5.2.2 1

Network

Configure management/service console, VMkernel, and virtual machine networks so that there is separation of traffic (physical or logical using VLANs).

5.2.3 1

Network

Verify that there is redundancy in networking paths and components to avoid single points of failure. For example, provide at least two paths to each network.

5.5.1 1

Virtual Machines

Use NTP, Windows Time Service, or another timekeeping utility suitable for the operating system.

5.5.3 1

Virtual Machines

Verify that VMware Tools is installed, running, and up to date for running virtual machines.

5.5.4 1

Virtual Machines

Limit use of snapshots, and when using snapshots limit them to short-term use.

5.5.5 1

Virtual Machines

Configure Windows virtual machines using 10Gb NICs with a minimum of 1GB of memory.

5.3.2 2

Storage

Allocate space on shared datastores for templates and media/ISOs separately from datastores for virtual machines.

5.3.3 2

Storage

Size datastores appropriately.

5.4.1 2

Datacenter

Size with VMware HA host failure considerations.

5.5.6 2

Virtual Machines

Use the latest version of VMXNET that is supported by the guest OS.

Page 8 of 55

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download