Wednesday, March 22, 2017

Did You Know #6 - Using custom metrics groups in vROps for troubleshooting

Welcome back to the Did You Know Series on vRealize operations Manager. As I mentioned in the first part of this series, the goal here is to unearth the Best Kept Secrets of vRealize Operations Manager. 

This could be across features, functionalities, use cases, integrations, APIs or any tips or tricks which can help make day to day operations of SDDC easier and fun with vROps!

Today I will talk about how you can make troubleshooting an easier process with vROps. Troubleshooting as we all know is not a skill, it is a methodology. It would NOT be incorrect to say that each one of us has a different troubleshooting style. With vRealize Operations you can troubleshoot issues the way you like them. Some prefer OOTB dashboards, some like to create there own personalized views and some prefer to jump into what me and Iwan call God Mode aka the All Metric view in the product.

The All Metrics view of the product can easily become complex as it shows you all the metrics which are associated to an Object Type. If you look at vRealize Operations Manager 6.4, the product gave you some OOTB custom metric groups which can be used to list all common metrics around CPU, Memory and Disk. These were the OOTB options and might not fit all the needs. If you are on vROps 6.4, click on a VM object and click on All Metrics and you will see this:

You can see that apart from all metrics and all properties for the virtual machine, you can see 5 custom categories which list specific metrics which can be used for troubleshooting. As soon as I double click on CPU, I will see all the key metrics pertaining to the CPU Metric group in one shot on the right pane:

While this is a cool feature and make troubleshooting really simple, there was one use case which could not be solved here. If as an admin I wanted to create my own metric group where I want to focus on key metrics of my own choice, I was unable to create a metric group for same. While this was not possible with vROps 6.4, with the arrival of vROps 6.5, this feature is now available and now you can create your own custom metric groups with the metrics you like to use for troubleshooting.

I will show you how to create one custom group at virtual machine level:

1- Click on any virtual machine in your environment.

2- Click on All Metrics.

3- Click on the blue wheel and click on the Add Group option.

4- Provide a name. I will call it "VM KPIs"

5- Now you can drag and drop any metric from the all metrics group to this new created metric group:

Here are a few KPIs which I added in my vROps as they help me troubleshoot in God Mode with a single click.....


CPU | Demand %
CPU | Usage %
CPU | CPU Contention %
CPU | CO-Stop %
CPU | Ready %

Memory | Usage %
Memory | Contention %
Memory | Balloon %
Memory | Swap In (KB)
Memory | Compressed (KB)

Virtual Disk | Aggregate of all instances | Commands Per Second
Virtual Disk | Aggregate of all instances | Total Latency
Disk Space | Snapshot | Virtual Machine Used (GB)
Guest File System Stats | Total Guest File System Free (GB)

Network I/O|Aggregate of all instances| Packets Dropped %

Host System KPIs

CPU|CPU Contention (%)
CPU|Demand (%)
Memory|Contention (%)
Memory|Total Capacity (KB)
Memory|Consumed (KB)
Memory| Usage (%)
Network I/O|Aggregate of all instances|Packets Dropped (%)

Cluster KPIs

CPU| CPU Contention (%)
CPU|Demand (%)
CPU|Max VM CPU Contention (%)
Memory|Balloon (KB)
Memory|Contention (%)
Memory|Max VM memory Contention (%)
Memory|Usage (%)

 Go configure your vROps with the metrics you like and make troubleshooting an easy and fun process...

And yeah. Keep sharing!!

Sunday, March 19, 2017

vROps Webinar 2017 - Announcing Part 1 : What's New with vROps 6.5

Welcome to the vRealize Operations Manager Webinar Series 2017. With the huge success of the series back in 2016, we wanted to take a break, enjoy the success and come back with full rigor for this series in 2017. We are here and we are charged up to give you some more dope on vRealize Operations manager in the year 2017. 

The delivery mechanism would be same as last year. We will start with talking about a topic and then jump into a live environment to see what happens when the rubber hits the road...

To begin the series, we will start with the latest edition of vROps to see the enhancements VMware has done in the product and how customers can operationalize these features to make their operations simple and effective.

Here are the details: 👇👇👇

Session Title
Tuesday, 28th March 2017
1:00 PM to 2:00 PM Pacific Time
Sunny Dua , Simon Eady
Webinar Link
Save Invite

See you at the Webinar!!  👋👋👋👋

Looking back at vRealize Operations Webinar Series 2016

The series started back in 2016 when Simon & I decided that we need to share the work which we are doing with our customers in the field. This would help individuals like us and customers to gain insight into how they can improve their operations by using the features which vROps has to offer. The key purpose was to show how the product would solve specific customer use cases along with deep dive into architectures, commonly used features, tips and tricks and more.

We recorded a total of 12 Episodes with a total 15 hours worth of material. With more than 10000 views on YouTube of our episodes, the work was highly appreciated by audiences and they have asked for more this year....

Before we start with the 2017 series... I quickly wanted to share all the material with you so that you can continue to learn....

Special thanks to Simon and other guest speakers of making this happen!!

Hope you enjoyed the series..... Looking forward to produce more content in 2017.

Friday, March 17, 2017

Performance over Power : Make the right choice.

Power management is not a new topic when it comes to a hypervisor. We all know that one of the by product of virtualization is "POWER SAVINGS". Even before you start realizing the other benefits of virtualization, power bills is the first Opex savings which makes that return on investment on virtualization speak for itself. 

The reason behind writing this article is to make customers aware that since you have already saved a lot by virtualization, you might not want to cut the corners by trying to save more by scaling down the CPU frequency of an ESXi server to save power. For that matter it applies to all the hypervisors in the industry. We all know that once we start consolidating 10's of physical servers on a single hypervisor, we already end up saving a lot on power and hence you should not worry about throttling down the CPU for saving power on a hypervisor. 

While one can argue that if I can save  more power by using the BIOS features and the hypervisor features to throttle down the CPU frequency, then why not? The answer lies in the trade-off. The trade-off in this case is CPU Performance. While we all know that this throttle is dynamic and will be automatically change on demand, the difference between when the demand is made vs the resource availability leads to Contention. While basic applications might not be impacted by this contention, their would always be applications and the underlying VMs which would not be happy with the latency being introduced due to this throttle. In a lay mans term, this would result in performance issues which are absolutely uncalled for.

I know I am not talking about something unique and every vSphere Admin / Architect is aware of why "High Performance" for power management is critical. I can assure you that there are a number of myths around how power management settings for a hypervisor such as ESXi should be done. Another reason behind highlighting this issue is that vRealize Operations Manager does a great job in tracking the latency which I described earlier. This latency is termed as CPU Contention %. This is the percent of time the virtual machine is unable to run because it is contending for access to the physical CPUs.

If you dissect the statement which I made above, their could be number of reasons behind the inability of the VM to get what it wants, one of them is the efficiency lost due to processor frequency scaling a.k.a power management savings.

The scope of this post is around power management, hence I will not delve into other conditions for now. Before I go further and give you the exact power management settings, I would like to give you a real world example where the CPU contention faced by an application was extremely high due to incorrect power management settings and once they were changed to a mode where we disabled the throttle and made sure CPU was available all the time and never snoozed, the contention dropped down drastically and the application was humming along without any performance bottlenecks. Thanks to vROps that we could identify the issue and solve it within a matter of minutes 💪😁

In the metric chart below, I have a virtual machine which is facing CPU contention % in the range of 10% to 27% in the month of November and December. This was when the application was reported to be sluggish and showing bad performance. In-fact if you observe, the application which was facing an issue is a in memory database with an analytics engine ( it is actually the vROps node itself).

The application actually went into a state where it stopped collecting data as well and hence you see a gap from December 27th to January 8th. This is when the things went out of hand and we decided to take an action to reduce this contention. 

As I explained before, CPU contention could be due to a number of factors. Some of them include, high over commitment, over population of VMs on a host, large virtual machines (crossing NUMA boundaries) and CPU throttle due to power management. Since we knew that this vROps node is the only VM running on that ESXi host, we immediately jumped to check the power management settings on vSphere (hypervisor) and the BIOS (hardware).

Yes, you need to check both and ensure they are set correctly for you to have continuous CPU availability. The correct settings would be:

➦ On vSphere set Power Management to "High Performance"

➦ In BIOS set Power Management to "OS Controlled" (requires restart of ESXi)

You can see from the metric chart below that we are plotting the power management setting of the ESXi hosts (where this VM was running) on both Hardware (Power Management Technology) and vSphere level (Power Management Policy) before and after the change. 

Once the above change was made the CPU Contention % experienced by that virtual machine dropped drastically and we had a well performing application and happy users. You can see the metric chart below which shows the affect on the latency experienced by the VM post the change.

This is a simple yet very powerful example on how Power Management settings play a big role in providing you best performance in your virtual environment. I would recommend that you act immediately to ensure that your environment is not suffering with this issue and the virtual machines are getting what they are suppose to get from a CPU standpoint. Remember a poor CPU performance has a cascading effect on Memory, and I/O buses and hence it is important that this is fixed as soon as possible.

If you have vROps, then it would be very simple for you to visualize the current settings across your environment and track this as a compliance metric going forward to ensure that any new ESXi host added to your environment provides the best in class CPU resources to serve your virtual machines.

If you are vROps 6.4 and beyond, you can simply look at the ESXi host properties by listing them in a view to see what the power management settings are. If they are not correct, you now know that you have a task in hand 😁

Hope this helps..... 👊👊👊

Thursday, March 9, 2017

What is Fast... A recent interview with David Davis!

Working from the VMware headquarters definitely has more benefits than getting great Indian Food and rubbing shoulders with the Whose Who of the Virtualization and Cloud Industry :-)

On a busy Monday afternoon, I was told that +David Davis with his team from +ActualTech Media - is at the HQ and they are speaking with the Cloud Management Business folks about VMware's vRealize product line. David as we all know is a great inspiration and a mentor for many who want to learn about VMware and the technologies in the surrounding eco-system. Just like others, I have followed him very closely and have always appreciated the work he has done for the vCommunity. We spoke at length about the work I am doing related to vRealize Operations Manager, Blogging, Webinar series etc. Later we decided to do a recorded interview which might help the viewers of Actual Tech Media and vXpress to have a quick look at where we are going!!

While I have interviewed with him before, along with +Iwan Rahabok during VMworld 2016, that interview never got published due to technical difficulties. Iwan was missed during this interview, but I had him covered I guess :-)

Here is the recording of the interview. It's just 15 minutes and I am sure it would be worth!!