OpsMgr R2 Release Candidate is available

March 26, 2009, 1:25 am

≫ Next: R2 – Improved Agent Proxy Alerts

The download is available on Connect

Operations Manager 2007 R2 introduces key new and enhanced functionality, including:

Enhanced application performance and availability across heterogeneous platforms

Delivers monitoring across Windows, Linux and Unix servers-all through a single console
Extends end to end monitoring of distributed applications to any workload running on Windows, Unix and Linux platforms
Maximize availability of virtual workloads with integration with System Center Virtual Machine Manager 2008

Improved management of applications in the data center

Delivers on the scale requirements of URL monitoring of your business
Meet agreed service levels with enhanced reporting showing application performance and availability
More efficient problem identification and action to resolve issues

Increased speed of access to information and functionality to drive management

Faster load times for views and results
Improved and simplified management pack authoring experience

For those who are evaluating the Beta release, this Release Candidate offers a number of enhancements over the Operations Manager R2 Beta, including:

-New Power Management MP template (Monitored system must be Windows Server 2008 R2 or Win7)
-Updated branding across all User Interfaces
-Improved trace configuration tools on the CD to help support issues escalated to Customer Support (where applicable)
-Improved Run As Account Distribution Configuration
-Ability to run inline tasks for non-Microsoft servers
-Support for upgrade from Beta deployments to the Release Candidate
-New and updated documentation, including the Usage Guide, Design Guide, Deployment Guide, Upgrade Guide, Security Guide and Operations Guide

Apply - to participate in the "Operations Manager Public Beta" - https://connect.microsoft.com/SelfNomination.aspx?ProgramID=2249&pageType=1&SiteID=446

Get it - System Center Operations Manager 2007 R2 (RC) - http://connect.microsoft.com/Downloads/DownloadDetails.aspx?SiteID=446&DownloadID=17271

Tell us what you think! - Give us your feedback - https://connect.microsoft.com/feedback/default.aspx?SiteID=446

↧

R2 – Improved Agent Proxy Alerts

April 7, 2009, 5:46 pm

≫ Next: R2 – Improved override screens

≪ Previous: OpsMgr R2 Release Candidate is available

Here is a nice add in R2: When we give you the old “agent proxy alert”, we now tell you the name of the Agent that needs agent proxy enabled, and resolve the name of the object type that it was bringing in:

Nice improvement. I enable agent proxy for SQL1CLN2 and get on with my day.

↧

R2 – Improved override screens

April 10, 2009, 4:55 pm

≫ Next: SCOM R2 hits RTM!

≪ Previous: R2 – Improved Agent Proxy Alerts

Here are some changes in R2 that make Overrides a bit more intuitive, and help a LOT in understanding what has been done:

First up: We used to have a lot of names for “Class”. We used “Target”, “Instance Type”, “Object Type” and “Type” before… among others. In the Override screens previously – we called it “All Objects of Type”. Now – it is named “For all objects of class” (see image below) This make a lot more sense.

Unfortunately – in other screens – we still refer to the “Class” as a “Target”… so it still isnt 100% consistent. :-(

Next up – we have a Default Value added! So now – in any override – we can see the MP default, The override value, and (once applied) the effective value.

And now – in the same console – we immediately see all our overrides,

Whats REALLY cool about this? You can scope this new view – based on just about any criteria you would want:

This will really help, and reduce some of the dependency on external tools like Override Explorer.

↧

SCOM R2 hits RTM!

May 22, 2009, 1:08 pm

≫ Next: New web based forums for OpsMgr – no more NNTP newsgroups

≪ Previous: R2 – Improved override screens

New Functionality - Operations Manager 2007 R2 introduces key new and enhanced functionality, including:

Enhanced application performance and availability across heterogeneous platforms

Delivers monitoring across Windows, Linux and UNIX servers–all through a single console.
Extends end to end monitoring of distributed applications to any workload running on Windows, UNIX and Linux platforms.
Maximize availability of virtual workloads with integration with System Center Virtual Machine Manager 2008.

Improved management of applications in the data center

Delivers on the scale requirements of URL monitoring of your business.
Meet agreed service levels with enhanced reporting showing application performance and availability.
More efficient problem identification and action to resolve issues.

Increased speed of access to information and functionality to drive management

Faster load times for views and results.
Improved and simplified management pack authoring experience

Where and when can I obtain the bits?

The RTM release is build 7221.

At General Availability (July 1)

Customers can obtain the bits from the MVLS website.
New customers can obtain the trial bits from the Operations Manager 2007 R2 website or directly @ http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=93ddf25b-1ef0-4851-81b0-5fb9a2f76181#tm

Where can I find collateral, training, and more on Operations Manager 2007 R2?

Released on the TechNet: Webcast Series on Operations Manager 2007 R2:

How else can I extend Operations Manager 2007 R2?

Service Level Dashboard v2 from the Solution Accelerators team lets you measure and report application or system performance & availability in near real time across your organization. Using the Dashboard, you can easily spot trends and head off problems—before they occur. The Dashboard also lets you create role-specific dashboards to support different departments, like HR, Finance, or Operations. Download it today from Microsoft Connect.
Operations Manager 2007 R2 Interoperability Connectors provide the ability to synchronize alerts and status between Operations Manager 2007 R2 and other management systems. Beta connectors for Tivoli Enterprise Console, HP OpenView Operations, and the new Universal Connector can be obtained from the Operations Manager R2 download on Connect. Download the Interop Connectors from the System Center Catalog.
Operations Manager 2007 R2 Visio Add-in delivers the ability to link status and health information gathered by Operations Manager 2007 R2 into normally-static Visio diagrams, adding life and interaction to those diagrams. Download it today from Microsoft Connect.
New Exchange Server 2007 Management Pack (MP) Beta, which provides enhancements over the current Exchange MP such as reducing alert noise and enhanced performance. Download it today from Microsoft Connect.
BridgeWays MP Beta Program, providing beta MPs for MySQL, Apache, and Oracle running on Windows, Linux or Solaris. For more information, and to register into the BridgeWays MP Beta Program, visit http://www.bridgeways.ca/bw_management-pack-beta-program-signup_form.php

↧

New web based forums for OpsMgr – no more NNTP newsgroups

May 22, 2009, 6:37 pm

≫ Next: My experience upgrading to OpsMgr R2 RTM

≪ Previous: SCOM R2 hits RTM!

Getting help from the community for OpsMgr just got easier.

We released the new interface to OpsMgr forums – on the web. No more NNTP newsgroups to parse, with a hard to use web interface.

http://social.technet.microsoft.com/Forums/en-US/category/systemcenteroperationsmanager

Check it out!

↧

My experience upgrading to OpsMgr R2 RTM

May 23, 2009, 1:01 pm

≫ Next: Web Application recorder R2 – the recorder bar missing in IE?

≪ Previous: New web based forums for OpsMgr – no more NNTP newsgroups

I upgraded my test lab from SP1 to R2-RTM this weekend.

My current test lab consists of the following servers:

OMRMS – Server 2003 - RMS role

OMMS3 – Server 2008 - MS role, Web Console

OMMS – Server 2003 - MS role, ACS collector

OMDB – Server 2003/SQL 2005 - OperationsManager Database

OMDW – Server 2003/SQL 2005 - OperationsManagerDW database, Reporting, SRS, ACSDB roles

There are 18 agents reporting to this management group.

So – I start – with a little light reading.

I begin with the release notes. These are available from the R2 CD, and on the web at Operations Manager 2007 R2 Release Notes I dont see anything in there that is terribly applicable to me…. but these are good to commit to short term memory – in case we hit a snag during/after the upgrade.

Next – I move on to the Upgrade guide. This is available on the Technet Library – at Operations Manager 2007 Upgrade Guide I need to spend a little time on this one, mapping out the pre-upgrade steps, and then planning the order of my upgrade based on how my management group is deployed.

So – I start by running down the pre-upgrade checklist at: Preparing to Upgrade Operations Manager 2007

I record my service accounts, make sure my DB’s have plenty of free space, and my t-logs are sized big enough. I make sure the volume with TempDB has plenty of free disk space in case TempDB needs to auto-grow.

Next – I map out my plan – and order of operations, for my management group, and share the plan with my team:

Get most recent backup of Database, Encryption key and Export unsealed MP’s for safekeeping.
Go to pending actions – and reject/remove anything in there.
Verify free space on SQL database and validate log size is appropriate.
I need to uninstall the agent from OMTERM – my terminal server which has a console and an agent only. I decide to go ahead and uninstall the agent, the console, and the SP1 authoring console as well, since I will be replacing it with the R2 auth console. I will replace the agent and consoles when the upgrade is complete for the management group.
I need to disable all my notification subscriptions, and disabled my product connectors. I am running a custom internal product connector – which runs as a service and updates alert properties – so I will stop and disable that service for the duration of the upgrade.
I see a section on Improving Upgrade Performance so I will add that step here – right before I upgrade the first component.
I am now ready to establish the upgrade order for my management group – this is available at: Planning your Operations Manager 2007 Upgrade
RMS (OMRMS)
Reporting Server (OMDW)
Stand Alone Consoles (None – I uninstalled this already in my case)
Management Servers (OMMS3, OMMS)
Gateway Servers (None)
Agents
Web Console (on OMMS3 and OMMS)
Post-Upgrade validation steps

Ok – that's my plan. Time to get rolling.

The SP1 to R2 steps are outlined here: Upgrading from Operations Manager 2007 SP1 to R2

I know from experience with customers – the success of your upgrade HINGES on how well you read AND follow the upgrade steps – VERBATIM. The majority of issues we see (especially on clustered RMS) are when a customer does not follow the steps exactly as written, in the correct order.

I complete steps 1-7 in the plan above, and then start the RMS upgrade at step 8. I run “SetupOM.exe” and kick off the pre-req checker before starting the install, where I hit my first snag. I need to install WS-Management v1.1, because I do plan on monitoring Unix/Linux machines in the future with this management group. (This was documented in the release notes, and in the upgrade guide – so I was expecting this… I should have added this to my plan) So I install WS-man from the link provided in the pre-req, which just takes a few minutes. Now – it looks much better in the pre-req checker:

The install instructions provided on TechNet are very straightforward. The install took about 20 minutes for my small environment. It waited the longest on “Loading Management Packs” on the screen in my environment. It finally ended with an error:

The guide has a note on this – about the fact you might get a warning that a service failed to start – and to hit OK. However – this is a different error – this is a service failing to stop… I click OK, and then a few minutes later – setup completes. I uncheck the box to start the console and to backup the encryption key.

I then ran the RMS upgrade validation steps – checking the registry and the services. Registry setup version shows me all is good.

***Note: We have changed the service display names for R2. See below:

I moved on to Reporting. My SRS, Reporting, and DataWarehouse are all shared on a single server – OMDW.

As I read the guide at Upgrading from Operations Manager 2007 SP1 to R2 I notice this little tidbit – which needs to be given STRONG attention before I kick off the upgrade:

Prior to running the upgrade on the Reporting server, you must remove the Operations Manager 2007 agent; the upgrade will fail if this is not done.

So – I kick off the uninstall of the agent on the Reporting/SRS server (OMDW in my case) from Add/Remove programs – before I start the upgrade. Missing little steps like this will drive you nuts if you aren't methodical.

After the agent uninstall – I pick back up on the guide – and kick off “SetupOM.exe”. Since I am a freak – I go ahead and run a pre-req check just to make sure all is good:

Moving on…. I start the install according to the guide. The install goes without a hitch, and took about 10 minutes to complete.

Next up – Management servers. I start with OMMS3. I hit the pre-req check – and I notice I already have WS-Man installed – so away I go. The installer immediately failed with a pre-req failure. I realized – I have the web console installed on this management server, and I forgot to add that when running the pre-req check manually. When I do – I see:

So – I need to grab the ASP.NET Ajax extensions…. this is to support the new cool health explorer in the Web Console. I click “More” on the pre-req check – which gives me a link to the download.

After this little hurdle – the management servers upgraded very quickly. Once again – I got an expected error about a failure to stop a service.

Click ok and setup completes. I repeat this upgrade on the other management server (OMMS) and these are done. A quick check of the registry – and the setup version is indeed 6.1.7221.0

I don't have any gateways in this lab – so next up is agents.

Lucky me – all 18 agents show up in pending actions for an update. I will approve them all – and let the management server push the update down and upgrade them.

***Note – do not upgrade more than 299 agents in this manner at a time. This is documented in the Upgrade Guide.

All my agents upgraded successfully except for two. BOTH that failed happened to be the two servers that I manually removed the SP1 agent from – OMTERM and OMDW. (I forgot to delete their “agent managed” object from the management group) Both have a different error. OMTERM is failing to install with a push failure for MOMAgentInstaller. I have had trouble with this agent before – possibly because of the TS role - so I just do a manual agent install here. OMDW is different – the console push said it was a success – however – the System Center Management Service (HealthService) will not start – it gives an error:

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7024
Date:        5/23/2009
Time:        1:09:37 AM
User:        N/A
Computer:    OMDW
Description:
The System Center Management service terminated with service-specific error 2147500037 (0x80004005).

I ran a repair action from the console – but got the same error here. So – I manually uninstalled the broken agent – and deleted the agent from the Agent Managed section of the console – and re-pushed the agent. I had a little trouble getting these two to come into the management group… but eventually after a couple delete/reinstalls they finally appear to be working ok. I’d recommend uninstalling them from the console next time…. so this will remove both the agent and the computer object from the console.

Next on the list: Web Console

From the upgrade guide I see this note….

If your Web console server is on the same computer as a management server, the Web console server is upgraded when the management server is upgraded, rendering this upgrade procedure unnecessary. You can still run the verification procedure to ensure that the Web console server upgrade was successful.

Good – my web console is not a stand-alone – it was running on a management server (OMMS3) so that is already taken care of.

Aha – I found something we forgot on the plan…. the ACS Collector. This role is missing from the table at Planning your Operations Manager 2007 Upgrade so I completely missed this as a planning step. However the process is documented at Upgrading from Operations Manager 2007 SP1 to R2. So – we need to do this – I will assume last since it is last on the upgrade detailed steps. Following the guide…. I walked through the steps – no issues.

Looks like we are done! I will now start the post-upgrade validation steps to make sure my management group is actually working as it should without any major issues.

There is a list of post-upgrade checks at Completing the Post-Upgrade Tasks

I am going to walk through those here:

1. I open up discovered inventory – and change target to “Health Service Watcher” and compare this to the list I had before the upgrade. These are agents that have a problem from the management server perspective – which causes them to appear “grey” in all other views. My list is the same as before I started – I have 6 in this list as critical – 5 of them are agents that are VM’s that are currently down – so this is good. 1 of them is an old management server… for some reason we don't groom these out of the view/database – and these seem to stick around forever in this view.

2. I review the event logs on the RMS and all MS roles. I am seeing some errors like below:

Event Type:    Warning
Event Source:    HealthService
Event Category:    Health Service
Event ID:    2120
Date:        5/23/2009
Time:        10:02:15 AM
User:        N/A
Computer:    OMRMS
Description:
The Health Service has deleted one or more items for management group "OPS" which could not be sent in 1440 minutes.

This is normal – it happens when you have agents that are down in your environment.

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    Data Warehouse
Event ID:    31552
Date:        5/23/2009
Time:        10:03:38 AM
User:        N/A
Computer:    OMRMS
Description:
Failed to store data in the Data Warehouse.
Exception 'SqlException': Sql execution failed. Error 777971002, Level 16, State 1, Procedure StandardDatasetGroom, Line 303, Message: Sql execution failed. Error 2812, Level 16, State 62, Procedure StandardDatasetGroom, Line 145, Message: Could not find stored procedure 'KMS_EventGroom'.

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
Instance name: KMS Activation Event Data Set
Instance ID: {800D8126-6F72-CA84-A76B-A94F7E3C93CF}
Management group: OPS

This is not normal – this looks like an issue with the KMS MP – and R2’s advanced logging is picking up on an error that's been there all along, I just didn't know it.

That is all from the RMS – pretty clean. On the Management servers…. I found a bit more – but they were all due to the problems I was having with a handful of agents. Once I removed and fixed those agents – the MS logs are clean.

3. No cluster in this lab – so nothing to test there.

4. Review alerts in the console. I sort by Repeat Count and LastModified (I add these to all my alert views) and look for anything that stands out as repeating a LOT, or something new that looks like a problem. I dont see anything here – so that is good!

5. DB server in perfmon looks good. I examine % Processor Time, and Logical Disk Avg disk sec/read and Avg disk sec/write. Those are both avg under 15ms (.015) on the DB and log volumes - so that looks good. CPU is avg under 25%.

6. Check all the console views. Much snappier than in SP1. Nice.

7. I opened up reporting – and ran the “Microsoft ODR Report Library > Most Common Alerts” report – to test out reporting. It ran with no issues. I test a few of my saved custom and favorite reports – no errors – all good.

8. Authoring pane looks good – I can see my groups, monitors, rules – and wow – they open a LOT faster than before. Very nice.

9. I check out my MP versions. The install upgraded all my core MP’s to 6.1.7221.0. I was already pretty current on my MP’s – so not much to do here now that needs my immediate attention.

10. Re-enable notification subscriptions and product connectors. I turn my subscriptions back on – and fire off a test event that I use to generate an alert and email me a notification. Works great. Next – I got to my custom product connector – and enable the service and start it back up again. I run some test alerts – to make sure my product connector is taking all the necessary actions on the alerts – and forwarding them as appropriately. All good.

11. Review My Workspace. Yep – all my old custom views are there.

12. Re-deploy agents. I already did this. Perhaps I should have waited on this step…. because I spent so much time troubleshooting those last few pesky agents that seem to have trouble.

13. Oh – the BIG ONE. This step is a bit odd – we tell you to go run this SQL query. LET ME WARN YOU – this is not a “quick job”. This is the script that is documented and discussed at my blog post: Does your OpsDB keep growing- Is your localizedtext table using all the space- Dont take this step lightly – running this script could take several hours – so plan accordingly. Read the link above for the details – and consider skipping this step for now…. until you are sure you are ready to execute it. Take some calculations based on the blog post above – how long it will take – how severely you are impacted (row count of your localizedtext table) and make sure you have a LOT of free space for the tempDB and tempDBlog to grow if needed. My LT table was already really small – so no issues for me running this – it completed in less than a minute.

Done! (with the “official” steps)

Now – I just have a couple cleanup steps I need to do – like go back and install the Ops Console and the Auth Console back on my terminal server. Did that without issue. All looks good.

And then I realized – we are missing another step in our plan – under the post-upgrade tasks – make sure the web console is working! I saw lots of items in the release notes about how this might break…. and I imagine someone will complain rather quickly if it isnt working – so we better go check that out.

Sweet! I hit up the web console and it is all good. I check out several of the new views – and run health explorer from the web console. I have tasks, maintenance mode, and health explorer. Very cool. I event execute some of my favorite reports under “My Workspace” just to make sure those are good – ouch – not working. I will have to look into that one.

Ok – that’s enough for today. All in all – a successful upgrade. A good plan written out at the beginning, based on the upgrade guide - makes all the difference.

↧

Web Application recorder R2 – the recorder bar missing in IE?

June 19, 2009, 11:15 am

≫ Next: The R2 Connectors are HERE!

≪ Previous: My experience upgrading to OpsMgr R2 RTM

Sometimes getting the web application recorder to capture a web session can be a little tricky. I have blogged about some typical issues you might run into HERE

Something I noticed today, with R2:

When running the R2 console on an x64 machine – the web recorder bar is not coming up.

On my x86 machine – it was working just fine, however. I notice – when I go into IE settings, Tools, Manage Add-ons….. I see this on a working machine:

However – these add-ons are missing on my x64 based consoles.

The problem turned out to be…. that when you install the console on an x64 machine, it registers the x64 version of these add-ons. However – the IE browser launched by default by hitting “Start Capture”, is the x86 version. You have to manually launch the x64 IE shortcut – in order to use the web recorder browser.

So here are some steps to make this work:

1. Open the web application editor

2. Hit “Start Capture”

3. This will launch Internet Explorer in 32 bit mode. Close this browser.

4. From the start menu, run Internet Explorer (64-bit)

5. The web recorder will appear. (If not – choose View > Explorer Bars > Web Recorder)

From there you can record your session, hit stop, and it will populate the web application tool as normal. Just a minor inconvenience of closing one browser, and opening another.

↧

The R2 Connectors are HERE!

July 24, 2009, 4:52 pm

≫ Next: Nice clean Alert descriptions have been added to R2. Ahem.

≪ Previous: Web Application recorder R2 – the recorder bar missing in IE?

Announcing System Center Operations Manager 2007 R2 Connectors

The Connectors are available for download at the following location:

http://www.microsoft.com/downloads/details.aspx?FamilyID=592e4143-c5c8-4270-9a7a-cd0a31ab3189

The following connectors are available in this initial RTM:

- Operations Manager 2007 R2 Connector for IBM Tivoli Enterprise Management Console

- Operations Manager 2007 R2 Connector for HP Operations Manager (formerly HP OpenView Operations)

- Operations Manager 2007 R2 Connector for BMC Remedy Action Request System (ARS)

- Operations Manager 2007 R2 Universal Connector

The Operations Manager 2007 R2 Connectors provide Operations Manager 2007 R2 alert forwarding to remote systems, such as an Enterprise Management System (EMS) or a help desk system. After Operations Manager 2007 R2 forwards an alert to a remote system, the alert data is synchronized throughout the lifetime of the alert. The result of that data synchronization is a robust and seamless systems management environment. Such an environment enables cross-organization support processes to take advantage of the resources and strengths of formerly independent support groups. The ultimate effect is improved enterprise systems health through improved organizational communication.

Sharing data between Operations Manager 2007 R2 and remote systems enables enterprise correlation of events from Windows-based systems, hardware, network, and UNIX systems. Correlating these events allows IT staff to determine the causes of issues and reduce the time to resolution of IT outages.

Synchronization of data between Operations Manager 2007 R2 and remote systems also enables operational groups to use familiar management interfaces. Users update an alert by using their management tool, and the data is updated in tools that are used by other operational groups.

For more information regarding the Connectors, you can review the following resources:

Download Details:

- http://www.microsoft.com/downloads/details.aspx?FamilyID=592e4143-c5c8-4270-9a7a-cd0a31ab3189

TechNet Documentation:

- http://technet.microsoft.com/en-us/library/dd795265.aspx

TechNet Forums:

- General: http://social.technet.microsoft.com/Forums/en-US/interopgeneral/threads

- Connector for IBM Tivoli Enterprise Console: http://social.technet.microsoft.com/Forums/en-US/interoptivoli/threads

- Connector for HP Operations Manager: http://social.technet.microsoft.com/Forums/en-US/interophpoperationsmanager/threads

- Connector for BMC Remedy ARS: http://social.technet.microsoft.com/Forums/en-US/interopremedy/threads/

- Universal Connector: http://social.technet.microsoft.com/Forums/en-US/interopuniversalconnector/threads

↧

Nice clean Alert descriptions have been added to R2. Ahem.

August 4, 2009, 2:14 am

≫ Next: How to monitor a process on a multi-CPU agent using ScaleBy

≪ Previous: The R2 Connectors are HERE!

I didn't realize this feature got added – very nice.

In OpsMgr SP1 – we had to use some hacks to get the Alert Description formatted to be nicely readable. I wrote about this HERE. The problem was – we could add a </BR> to the alert description and get this to work in SP1 – but in the email subscriptions – it was not formatted the same way and showed the <BR /> in text. You you could have good readable emails or good readable alerts – but not both.

Now – in R2 – this is a much better story.

When authoring a rule against a test event, I can now hit “Enter” and start a new line, just like it should be:

In the console, this now formats exceptionally well – as expected:

However – the email is close – but not perfect. This works most of the time as designed – but occasionally the email subscription does not pick up on the carriage returns.

Note the blobs in yellow above – this is where the Carriage Return did not get picked up.

All is not lost!

One trick – is that this might be caused by ending the line with a variable – as is my example. What I did – was to simply end each variable statement with a real character – I simply added a “period” after each variable as shown:

Which now shows the email formatting as desired:

The XML:

What this does – behind the scenes – is adds the following to the Alert Message ID Display String: (Basically – you can just add carriage returns in the XML and it will be picked up correctly in R2):

        <DisplayString ElementID="MomUIGeneratedRule3407012ebced48c38440ed666eb0ae09.AlertMessage">
          <Name>Custom - Test alert on event 100 from rule</Name>
          <Description>A test event occurred.
The Event ID is: {0}.
The Logging Computer is: {1}.
The Event Source is: {2}.
The Event Level is: {3}.

Event Description = {4}</Description>
</DisplayString>

What do the {numbers} mean? Those are alert parameters. It gets those from the Alert params section in the Rule XML:

<WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
<Priority>2</Priority>
<Severity>2</Severity>
<AlertOwner />
<AlertMessageId>$MPElement[Name="MomUIGeneratedRule3407012ebced48c38440ed666eb0ae09.AlertMessage"]$</AlertMessageId>
<AlertParameters>
    <AlertParameter1>$Data/EventNumber$</AlertParameter1>
    <AlertParameter2>$Data/LoggingComputer$</AlertParameter2>
    <AlertParameter3>$Data/EventSourceName$</AlertParameter3>
    <AlertParameter4>$Data/EventLevel$</AlertParameter4>
    <AlertParameter5>$Data/EventDescription$</AlertParameter5>
</AlertParameters>
<Suppression />
<Custom1 />
<Custom2 />
<Custom3 />
<Custom4 />
<Custom5 />
<Custom6 />
<Custom7 />
<Custom8 />
<Custom9 />
<Custom10 />
</WriteAction>

↧

How to monitor a process on a multi-CPU agent using ScaleBy

July 27, 2010, 8:42 am

≫ Next: OpsMgr 2007 R2 Connectors - Cumulative Update 2 shipped

≪ Previous: Nice clean Alert descriptions have been added to R2. Ahem.

The business need:

It is a very common request to monitor a process on a given set of servers, and collect that data for reporting, or monitor it for a given threshold.

One thing you might notice when trying to monitor some performance counters, is that not all perf counters in perfmon behave the way you might assume.

For instance, I want to monitor “how much CPU a process is using”. Perhaps we wish to monitor our SQLServer.exe process on our SQL servers?

This is easy – because Perfmon already has a Performance Object, Counter, and Instance for that. In perfmon, we would use:

Process > % Processor Time > Sqlserver.exe

Easy enough!

So, we can quite easily create a performance threshold monitor, and a performance collection rule using this. Let’s say we set the monitor to alert anytime the SQLserver.exe process is consuming more than 80% of the CPU sustained for 5 minutes.

The issue:

However, quite quickly we might notice erratic behavior from our monitor and rule. The monitor is generating TONS of alerts from almost all our SQL servers, and then quickly closing them… essentially flip-flopping. When we check the performance data we have collected, we see the process is using up to 800% CPU!!! So – thinking something is wrong with OpsMgr – we inspect a busy SQL server in perfmon directly… but observe the exact same behavior:

As you can see – this process is using almost 400% CPU. Why? How is this possible?

This is because the Process monitoring counters in Windows are not multi-CPU aware. When a server has 4 CPU’s (like this one above does) a process can use more than one CPU at a time, provided it is spawning multiple threads. This way, it can be using up to 100% of each CPU or Core (logical processor). A process on a 4 processor server can consume up to 400% of that process counter. So if a process is really only consuming 20% of the total CPU, that will show up as 80% on a 4-core machine. Think about today’s hardware… many boxes have up to 16 cores these days, which would register as 320% processor utilization for something really only using 20% of the total CPU.

As you can see – this causes a BIG problem for monitoring processes. As an IT Pro – you need to know when a process is consuming more than (x) percent of the *total system resources*…. and every server will likely have a different number of processors.

The solution:

In OpsMgr R2 – a new XML based function was created to help resolve this challenge. This is known as <ScaleBy>

The <ScaleBy> function essentially gives you the ability to take the monitoring data collected by something (that is an integer), and divide by something else (integer).

I can input a fixed value here, in integer form, or I can input a variable. For the variable, I can actually pull data from discovered properties of monitoring classes. This is GREAT in this instance, because we already discover the number of Processors a Windows Computer has. We can use this discovered data, along with this <ScaleBy> function, to fix our monitors and collection rules that need a little massaging to the data we get from perfmon.

Here are the Windows Computer class properties:

Let’s walk through an example using the authoring console.

Open the Authoring console.
Create a new empty management pack.
Go to Health Model, Monitors, right click and create a new monitor.
Windows Performance > Static Thresholds > Consecutive Samples.
Give your workflow an ID, Display Name, and choose a good target class which will contain your process. I will use Windows Server Operating System for example purposes, but you want to always try and choose a target class that will have your process counter in perfmon.
Select System.Health.PerformanceState as the parent Monitor:

Browse a SQL server for the process object you will need – or type in the relevant data. I will set my samples for the monitor to inspect every minute. This data is not collected and inserted in the database for a monitor – this sample data is kept on the agent for inspection of a threshold match… so we can monitor the process with a MUCH higher sample rate than we would ever do a performance collection rule.

I set my monitor to change state when 5 consecutive samples have all been over 80% CPU:

Click finish – then open the properties of the monitor you just created. Go to the configuration tab. Here are all the typical configurable items in a performance monitor workflow.

However – we need to add one more – the <ScaleBy> function.

We have to do this in XML – as there is no UI that added this capability. Click “Edit” on the configuration tab which will pop up the XML of this configuration.

We are going to add a single line after <Frequency> which will be this line:

<ScaleBy>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/LogicalProcessors$</ScaleBy>

What this does – is tell the workflow to take the numeric value received from perfmon, and then divide by the numeric value that is a property of the Windows Computer class for number of logical processors. Then take THIS calculated output and use that for collection or threshold evaluation.

Here is my finished XML snippet:

<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>% Processor Time</CounterName>
<ObjectName>Process</ObjectName>
<InstanceName>sqlservr</InstanceName>
<AllInstances>false</AllInstances>
<Frequency>60</Frequency>
<ScaleBy>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/LogicalProcessors$</ScaleBy>
<Threshold>80</Threshold>
<Direction>greater</Direction>
<NumSamples>5</NumSamples>
</Configuration>

Now – the authoring console was not updated to fully understand this new function, so you might see an error for this. Simply hit ignore.

Your new monitor configuration now looks like this:

You can do the exact same operation on a performance collection rule as well to “normalize” this counter into something that makes more sense for reporting.

Some other uses of this might be for situations where a counter in bytes…. and you want it reported in Megabytes. You could hard code a <ScaleBy> 1000000 (one million). That way – if you wanted to report on how many megabytes a process was consuming over time… instead of representing this as 349,000,000 on a chart (bytes) you can represent this as a simple 349 Megabytes. That XML would simply be:

<ScaleBy>1000000</ScaleBy>

Ok… I hope this made some sense…. this is a valuable method to normalize some perfmon data that might not be in what I call “human format”. Keep in mind – you can ONLY use this XML functionality on an R2 management group, and it will only be understood by an R2 agent.

You can quickly go back to your previously written process monitors, and add this single line of XML really easily, using your XML editor of choice.

One last thing I want to point out….. some of the previously delivered MP’s that Microsoft shipped might be impacted by this issue. For instance – in the current ADMP version 6.0.7065.0 there is a monitor “AD_CPU_Overload.Monitor” (AD Processor Overload (lsass) Monitor) which does not take into account the number of logical processors. This is often one of the MOST noisy monitors in my customer environments, especially on a busy domain controller. This is simply because MOST DC’s have more than one CPU – and this skews the ability for this monitor to work. The issue is – they could not add this <ScaleBy> functionality to this MP – because that would make the ADMP R2-only… which we don't want to do.

You have two workarounds for SP1 management groups: Monitor processes using a script that will query WMI for the number of CPU’s and handle the math for this function (ugly) OR create groups of all Windows Computers based on their number of logical processors (easy) and then override these types of monitor thresholds with relevant numeric's for their processor count.

For R2 customers – I recommend disabling this monitor in the ADMP – and replacing it with a custom one that utilizes the <ScaleBy> functionality.

↧

OpsMgr 2007 R2 Connectors - Cumulative Update 2 shipped

August 2, 2010, 6:45 am

≫ Next: SCOM Console crashes after October Windows cumulative updates – Resolved

≪ Previous: How to monitor a process on a multi-CPU agent using ScaleBy

This is an update, if you use the Microsoft branded OpsMgr R2 connectors.

KB Article: http://support.microsoft.com/?kbid=2274165

Download location: http://www.microsoft.com/downloads/details.aspx?FamilyID=87c27d91-4549-4169-a87a-ca88e4136e4f&displaylang=en

This is a cumulative update – which also includes the fixes in the previously shipped update, 975774.

Changes in this update:

Support for HP Operations Manager for UNIX v9

This update enables support for HP Operations Manager for UNIX v9 with the Operations Manager 2007 R2 Connector for HP Operations Manager. The Connector supports HP Operations Manager for UNIX v9 on the following platforms:

HPUX 11i v3 Itanium
Solaris 10 SPARC
Red Hat Enterprise Linux 5.2 x64

Support for BMC Remedy AR System 7.5

This update enables support for BMC Remedy ARS 7.5 with the Operations Manager 2007 R2 Connector for Remedy. This includes Remedy ARS 7.5 installed on Red Hat and SLES platforms.

Addressed incorrect dates passed to the remote system by the connector

The Connector performed an incorrect date conversion and forwarded the incorrect date to the remote system.

Addressed Product Knowledge not forwarded when the locale is Canadian English

If the locale on the Operations Manager server is set to Canadian English, the Product Knowledge was not be forwarded from Operations Manager to remote system.

This CU2 update includes all fixes from the previous update, 975774:

Issue 1: Potential high CPU usage of HP OVO Event Consumer for Windows (scinteropd.exe)

The Scinteropd.exe process of HP OVO Event Consumer for Windows sometimes increases CPU usage to a very high level. This problem occurs only in System Center Operations Manager 2007 R2 Connectors for HP Operations Manager on Windows. It occurs in all language versions of the connectors.

Issue 2: Synchronization issue in HP OVO Connector

HP Operations Manager and System Center Operations Manager 2007 R2 may be in an asynchronous state when alerts are forwarded that have already been resolved. Open events are still present in HP Operations Manager even though the corresponding alerts in System Center Operations Manager 2007 R2 are already resolved.

Issue 3: Remedy connectors do not populate the Computer Name and Domain fields

When an alert is forwarded from System Center Operations Manager 2007 R2 to Remedy, the Computer Name field and the Domain field are left blank in the Remedy notes field. This issue occurs in the release version of System Center Operations Manager 2007 R2 Connector for Remedy 6.3 and Remedy 7.1.

Issue 4: Unresponsiveness of the Interop provider when an Enterprise Management Service (EMS) API call returns an error

The Interop provider may stop responding and not recover when an API calls to the EMS and returns an error.
Note Usually, the API calls to Remedy.
This issue typically occurs when large localized text strings in the Custom Field Alert properties are forwarded to Remedy. However, it can also occur with other properties.
Note This issue is associated with the Remedy connector. However, it might also occur with other connectors. Therefore, the update provided in this article includes the updated Unix Interop providers. However, do not apply the updated Unix Interop providers unless you are instructed to do this by a support professional. To apply the updated Unix Interop providers, you can uninstall the Interop provider and reinstall the new Unix Interop provider packages that are included in this update package.

So – if you feel you might be impacted by any of these issues, or desire the added functionality of the new system versions to connect to – grab and apply this update.

↧

SCOM Console crashes after October Windows cumulative updates – Resolved

October 27, 2016, 7:55 pm

≫ Next: Deploying SCOM 2016 Agents to Domain controllers – some assembly required

≪ Previous: OpsMgr 2007 R2 Connectors - Cumulative Update 2 shipped

There is an issue where after patching your Windows Server or Workstation machine with the monthly cumulative updates – you might see you SCOM console crash with an exception.

This affects SCOM 2012 and SCOM 2016

Log Name: Application

Event ID: 1000

Description:

Faulting application name: Microsoft.EnterpriseManagement.Monitoring.Console.exe, version: 7.2.11719.0, time stamp: 0x5798acae

Faulting module name: ntdll.dll, version: 10.0.14393.206, time stamp: 0x57dac931

The KB article explains the issue:

System Center Operations Manager Management Console crashes after you install MS16-118 and MS16-126 https://support.microsoft.com/en-us/kb/3200006

We have released updated patches for each OS now, including the latest branch of Windows 10 and Windows Server 2016.

Smaller individual hotfixes are available for:

Windows Vista
Windows 7
Windows 8.1
Windows Server 2008
Windows Server 2008R2
Windows Server 2012
Windows Server 2012 R2

At the following location: http://catalog.update.microsoft.com/v7/site/Search.aspx?q=3200006

(The Microsoft catalog requires Internet Explorer, FYI)

The fix was applied to the latest cumulative update for Windows 10 and Windows Server 2016:

For Windows 10 RTM: https://support.microsoft.com/en-us/kb/3199125

For Windows 10 version 1511: https://support.microsoft.com/en-us/kb/3200068

For the latest Windows 10 version 1607 and Windows Server 2016: https://support.microsoft.com/en-us/kb/3197954

The Windows 10 and Server 2016 updates are available right now via Windows Update.

↧

Deploying SCOM 2016 Agents to Domain controllers – some assembly required

November 4, 2016, 9:28 am

≫ Next: Does SCOM 2012 R2 support monitoring Windows Server 2016?

≪ Previous: SCOM Console crashes after October Windows cumulative updates – Resolved

Something that a fellow PFE (Brian Barrington) called to my attention, with SCOM 2016 agents, when installed on a Domain Controller: the agent just sits there and does not communicate.

The reason? Local System is denied by HSLOCKDOWN.

HSLockdown is a tool that grants or denies a particular RunAs account access to the SCOM agent Healthservice. It is documented here.

When we deploy a SCOM 2016 agent to a domain controller – you might see it goes into a heartbeat failed state immediately, and on the agent – you might see the following events in the OperationsManager log:

Log Name:      Operations Manager
Source:        HealthService
Event ID:      7017
Task Category: Health Service
Level:         Error
Computer:      DC1.opsmgr.net
Description:
The health service blocked access to the windows credential NT AUTHORITY\SYSTEM because it is not authorized on management group SCOM. You can run the HSLockdown tool to change which credentials are authorized.

Followed eventually by a BUNCH of this:

Log Name:      Operations Manager
Source:        HealthService
Event ID:      1102
Task Category: Health Service
Level:         Error
Computer:      DC1.opsmgr.net
Description:
Rule/Monitor “Microsoft.SystemCenter.WMIService.ServiceMonitor” running for instance “DC1.opsmgr.net” with id:”{00A920EF-0147-3FCC-A5DC-CEC1CA93AFED}” cannot be initialized and will not be loaded. Management group “SCOM”

If you open an Elevated command prompt, and browse to the SCOM agent folder – you can run HSLOCKDOWN /L to list the configuration:

There it is. NT Authority\SYSTEM is denied.

I’ll be researching why this change was made – this did not happen by default in SCOM 2012R2.

In the meantime – the resolution is simple.

On domain controllers – simply run the following command in the agent path where HSLOCKDOWN.EXE exists:

HSLockdown.exe <YouManagementGroupName> /R “NT AUTHORITY\SYSTEM”

This will remove the explicit deny for Local System. Restart the SCOM Microsoft Monitoring Agent Service (Healthservice)

Here is an example (my management group name is “SCOM”)

↧

Does SCOM 2012 R2 support monitoring Windows Server 2016?

November 11, 2016, 5:51 am

≫ Next: Monitoring UNIX/Linux with OpsMgr 2016

≪ Previous: Deploying SCOM 2016 Agents to Domain controllers – some assembly required

This has been coming up quite a bit lately –

The answer is YES, and we have updated the SCOM 2012 R2 documentation:

https://technet.microsoft.com/en-us/library/dn281931(v=sc.12).aspx

There is no minimum UR level required to support this. However, we always recommend applying the most current cumulative update rollup to your SCOM agents.

Operations Manager Windows Agent

Windows Server 2003 SP2
Windows 2008 Server SP2
Windows 2008 Server R2
Windows 2008 Server R2 SP1
Windows Server® 2012
Windows Server® 2012 R2
Microsoft Hyper-V Server ® 2012 R2
Windows Server 2016
Windows XP Pro x64 SP2
Windows XP Pro SP32
Windows Vista SP2
Windows XP Embedded Standard
Windows XP Embedded Enterprise
Windows XP Embedded POSReady
Windows 7 Professional for Embedded Systems
Windows 7 Ultimate for Embedded Systems
Windows 7
Windows® 8
Windows® 8.1
Windows ® 10
Windows Server®2016 Technical Preview

↧

Monitoring UNIX/Linux with OpsMgr 2016

November 11, 2016, 12:38 pm

≫ Next: SCOM SQL queries

≪ Previous: Does SCOM 2012 R2 support monitoring Windows Server 2016?

Microsoft started including Unix and Linux monitoring in OpsMgr directly in OpsMgr 2007 R2, which shipped in 2009. Some significant updates have been made to this for OpsMgr 2012. Primarily these updates are around:

Highly available Monitoring via Resource Pools
Sudo elevation support for using a low priv account with elevation rights for specific workflows.
ssh key authentication
New wizards for discovery, agent upgrade, and agent uninstallation
Additional PowerShell cmdlets
Performance and scalability improvements
New monitoring templates for common monitoring tasks

Now – with SCOM 2016 – we have added:

Support for additional releases of operating systems: (Link)
Increased scalability (2x) with asynchronous monitoring workflows
Easier agent deployment using existing RunAs account credentials
New Management Packs and Providers for LAMP stack
New UNIX/Linux Script templates to ease authoring (Link)
Discovery filters for file systems (Link)

I am going to do a step by step guide for getting this deployed with SCOM 2016. As always – a big thanks to Tim Helton of Microsoft for assisting me with all things Unix and Linux.

High Level Overview:

Import Management Packs

Create a resource pool for monitoring Unix/Linux servers

Configure the Xplat certificates (export/import) for each management server in the pool.

Create and Configure Run As accounts for Unix/Linux.

Discover and deploy the agents

Import Management Packs:

The core Unix/Linux libraries are already imported when you install OpsMgr 2016, but not the detailed MP’s for each OS version. These are on the installation media, in the \ManagementPacks directory. Import the specific ones for the Unix or Linux Operating systems that you plan to monitor.

Create a resource pool for monitoring Unix/Linux servers

The FIRST step is to create a Unix/Linux Monitoring Resource pool. This pool will be used and associated with management servers that are dedicated for monitoring Unix/Linux systems in larger environments, or may include existing management servers that also manage Windows agents or Gateways in smaller environments. Regardless, it is a best practice to create a new resource pool for this purpose, and will ease administration, and scalability expansion in the future.

Under Administration, find Resource Pools in the console:

OpsMgr ships 3 resource pools by default:

Let’s create a new one by selecting “Create Resource Pool” from the task pane on the right, and call it “UNIX/Linux Monitoring Resource Pool”

Click Add and then click Search to display all management servers. Select the Management servers that you want to perform Unix and Linux Monitoring. If you only have 1 MS, this will be easy. For high availability – you need at least two management servers in the pool.

Add your management servers and create the pool. In the actions pane – select “View Resource Pool Members” to verify membership.

Configure the Xplat certificates (export/import) for each management server in the pool

Operations Manager uses certificates to authenticate access to the computers it is managing. When the Discovery Wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent.

To configure for high availability, each management server in the resource pool must have all the root certificates that are used to sign the certificates that are deployed to the agents on the UNIX and Linux computers. Otherwise, if a management server becomes unavailable, the other management servers would not be able to trust the certificates that were signed by the server that failed.

We provide a tool to handle the certificates, named scxcertconfig.exe. Essentially what you must do, is to log on to EACH management server that will be part of a Unix/Linux monitoring resource pool, and export their SCX (cross plat) certificate to a file share. Then import each others certificates so they are trusted.

If you only have a SINGLE management server, or a single management server in your pool, you can skip this step, then perform it later if you ever add Management Servers to the Unix/Linux Monitoring resource pool.

In this example – I have two management servers in my Unix/Linux resource pool, MS1 and MS2. Open a command prompt on each MS, and export the cert:

On MS1:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS1.cer
On MS2:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS2.cer

Once all certs are exported, you must IMPORT the other management server’s certificate:

On MS1:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS2.cer
On MS2:
C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS1.cer

If you fail to perform the above steps – you will get errors when running the Linux agent deployment wizard later.

Create and Configure Run As accounts for Unix/Linux

Next up we need to create our run-as accounts for Linux monitoring. This is documented here: (Link)

We need to select “UNIX/Linux Accounts” under administration, then “Create Run As Account” from the task pane. This kicks off a special wizard for creating these accounts.

Lets create the Monitoring account first. Give the monitoring account a display name, and click Next.

On the next screen, type in the credentials that you want to use for monitoring the UNIX/Linux system(s). These accounts must exist on each UNIX/Linux system and have the required permissions granted:

On the above screen – you have two choices. You can use a privileged account for handling monitoring, or you can use an account that is not privileged, but elevated via sudo. I will configure this with the most typical customer scenario – which is to leverage sudo elevation which is specifically granted in the sudoers file. (more on that later)

On the next screen, always choose “more secure” and click “Create”

Now – since we chose More Secure – we must choose the distribution of the Run As account. Find your “UNIX/Linux Monitoring Account” under the UNIX/Linux Accounts screen, and open the properties. On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search. Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK. This will distribute this account credential to all Management servers in our pool:

Next up – we will create the Agent Maintenance Account.

This account is used for SSH, to be able to deploy, install, uninstall, upgrade, sign certificates, all dealing with the agent on the UNIX/Linux system.

Give the account a name:

From here you can choose to use a SSH key, or a username and password credential only. You also can choose to leverage a privileged account, or a regular account that uses sudo. I will be choosing the most typical – which is an account that will leverage sudo:

Next – depending on your OS and elevation standards – choose to use SUDO or SU:

On the next screen, always choose “more secure” and click “Create”

Now – since we chose More Secure – we must choose the distribution of the Run As account. Find your “UNIX/Linux Agent Maintenance Account” under the UNIX/Linux Accounts screen, and open the properties. On the Distribution Security screen, click Add, then select “Search by resource pool name” and click search. Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK. This will distribute this account credential to all Management servers in our pool:

Next up – we must configure the Run As profiles.

There are three profiles for Unix/Linux accounts:

The agent maintenance account is strictly for agent updates, uninstalls, anything that requires SSH. This will always be associated with a privileged (or sudo elevated) account that has access via SSH, and was created using the Run As account wizard above.

The other two Profiles are used for Monitoring workflows. These are:

Unix/Linux Privileged account
Unix/Linux Action Account

The Privileged Account Profile will always be associated with a Run As account like we created above, that is Privileged OR a unprivileged account that has been configured with elevation via sudo. This is what any workflows that typically require elevated rights will execute as.

The Action account is what all your basic monitoring workflows will run as. This will generally be associated with a Run As account, like we created above, but would be used with a non-privileged user account on the Linux systems, and wont request sudo elevation.

***A note on sudo elevated accounts:

sudo elevation must be passwordless.
requiredtty must be disabled for the user.

For my example – I am keeping it very simple. I created two Run As accounts, one for monitoring and one for agent maintenance. I will associate these Run As account to the appropriate RunAs profiles.

I will start with the Unix/Linux Action Account profile. Right click it – choose properties, and on the Run As Accounts screen, click Add, then select our “UNIX/Linux Monitoring Account”. Leave the default of “All Targeted Objects” and click OK, then save.

Repeat this same process for the Unix/Linux Privileged Account profile, and associate it with your “UNIX/Linux Monitoring Account”.

Repeat this same process for the Unix/Linux Agent Maintenance Account profile, but use the “Unix/Linux Agent Maintenance Account”.

Discover and deploy the agents

Run the discovery wizard.

Click “Add”:

Here you will type in the FQDN of the Linux/Unix agent, its SSH port, and then choose All Computers in the discovery type. ((We have another option for discovery type – if you were manually installing the Unix/Linux agent (which is really just a simple provider) and then using a signed certificate to authenticate))

Check the box next to “Use Run As Credentials”. This will leverage our existing Agent Maintenance account for the discovery and deployment.

Click “Save”. On the next screen – select a resource pool. We will choose the resource pool that we already created.

Click Discover, and the results will be displayed:

Check the box next to your discovered system – and click “Manage” to deploy the agent.

DOH!

There are many reasons this could fail. The most common is rights on the UNIX/Linux systems you are trying to manage. In this case – I didn’t configure SUDO on the Linux box. Lets discuss that now.

I need to modify the /etc/sudoers file on each UNIX/Linux server, to grant the granular permissions.

NOTE: The sudoers configuration has changed from SCOM 2012 R2 to SCOM 2016. This is because we no longer install each package directly (such as .rpm packages). Now, each agent is included in a .sh file that has logic to determine which packages are applicable, and install only those. Because of this – even if you configured sudoers for SCOM 2012 R2 and previous support, you will need to make some modifications.

Here is a sample sudoers file for all operating systems, in SCOM 2016:

#-----------------------------------------------------------------------------------
#Example user configuration for Operations Manager 2016 agent
#Example assumes users named: scxmaint & scxmon
#Replace usernames & corresponding /tmp/scx-<username> specification for your environment

#General requirements
Defaults:scxmaint !requiretty

#Agent maintenance
##Certificate signing
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem

##Install or upgrade
#AIX
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --upgrade
#HPUX
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --upgrade
#RHEL
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --upgrade
#SLES
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --upgrade
#SOLARIS
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --upgrade
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --upgrade
#Linux
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade


##Uninstall
scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall

##Log file monitoring
scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p

###Examples
#Custom shell command monitoring example – replace <shell command> with the correct command string
# scxmon ALL=(root) NOPASSWD: /bin/bash -c <shell command>

#Daemon diagnostic and restart recovery tasks example (using cron)
#scxmon ALL=(root) NOPASSWD: /bin/sh -c ps -ef | grep cron | grep -v grep
#scxmon ALL=(root) NOPASSWD: /usr/sbin/cron &  


#End user configuration for Operations Manager agent
#-----------------------------------------------------------------------------------

Since the above file contains ALL OS’s and examples, I am going to trim it down to just what I need for this Ubuntu Linux system:

#-----------------------------------------------------------------------------------
#Ubuntu Linux configuration for Operations Manager 2016 agent

#General requirements
Defaults:scxmaint !requiretty

#Agent maintenance
##Certificate signing
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart
scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem

##Install or upgrade
#Linux
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC
scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade

##Uninstall
scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall

##Log file monitoring
scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p
#-----------------------------------------------------------------------------------

I will edit my sudoers file and insert this configuration. You can use vi, visudo, or my personal favorite since I am a Windows guy – download and install winscp, which will allow a gui editor of the files and helps anytime you need to transfer files to and from Windows and UNIX/Linux using SSH. Generally we want to place this configuration in the appropriate section of the sudoers file – not at the end. There are items at the end of the file that need to stay there. I put this right after the existing “Defaults” section in the existing sudoers configuration, and save it.

Now – back in SCOM – I retry the deployment of the agent:

This will take some time to complete, as the agent is checked for the correct FQDN and certificate, the management servers are inspected to ensure they all have trusted SCX certificates (that we exported/imported above) and the connection is made over SSH, the package is copied down, installed, and the final certificate signing occurs. If all of these checks pass, we get a success!

There are several things that can fail at this point. See the troubleshooting section at the end of this article.

Monitoring Linux servers:

Assuming we got all the way to this point with a successful discovery and agent installation, we need to verify that monitoring is working. After an agent is deployed, the Run As accounts will start being used to run discoveries, and start monitoring. Once enough time has passed for these, check in the Administration pane, under Unix/Linux Computers, and verify that the systems are not listed as “Unknown” but discovered as a specific version of the OS:

Here is is immediately – before the discoveries complete:

Here is what we expect after a few minutes:

Next – go to the Monitoring pane – and select the “Unix/Linux Computers” view at the top. Look that your systems are present and there is a green healthy check mark next to them:

Next – expand the Unix/Linux Computers folder in the left tree (near the bottom) and make sure we have discovered the individual objects, like Linux Server State, Logical Disk State, and Network Adapter state:

Run Health explorer on one of the discovered Linux Server State objects. Remove the filter at the top to see all the monitors for the system:

Close health explorer.

Select the Operating System Performance view. Review the performance counters we collect out of the box for each monitored OS.

Out of the box – we discover and apply a default monitoring template to the following objects:

Operating System
Logical disk
Network Adapters

Optionally, you can enable discoveries for:

Individual Logical Processors
Physical Disks

I don’t recommend enabling additional discoveries unless you are sure that your monitoring requirements cannot be met without discovering these additional objects, as they will reduce the scalability of your environment.

Out of the box – for an OS like RedHat Enterprise Linux 5 – here is a list of the monitors in place, and the object they target:

There are also 50 or more rules enabled out of the box. 46 are performance collection rules for reporting, and 4 rules are event based, dealing with security. Two are informational letting you know whenever a direct login is made using root credentials via SSH, and when su elevation occurs by a user session. The other two deal with failed attempts for SSH or SU.

To get more out of your monitoring – you might have other services, processes, or log files that you need to monitor. For that, we provide Authoring Templates with wizards to help you add additional monitoring, in the Authoring pane of the console under Management Pack templates:

In the reporting pane – we also offer a large number of reports you can leverage, or you can always create your own using our generic report templates, or custom ones designed in Visual Studio for SQL reporting services.

As you can see, it is a fairly well rounded solution to include Unix and Linux monitoring into a single pane of glass for your other systems, from the Hardware, to the Operating System, to the network layer, to the applications.

Partners and 3rd party vendors also supply additional management packs which extend our Unix and Linux monitoring, to discover and provide detailed monitoring on non-Microsoft applications that run on these Unix and Linux systems.

Troubleshooting:

The majority of troubleshooting comes in the form of failed discovery/agent deployments.

Microsoft has written a wiki on this topic, which covers the majority of these, and how to resolve:

http://social.technet.microsoft.com/wiki/contents/articles/4966.aspx

For instance – if your DNS name that you provided does not match the DNS hostname on the Linux server, or match it’s SSL certificate, or if you failed to export/import the SCX certificates for multiple management servers in the pool, you might see:

Agent verification failed. Error detail: The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

The solution to these common issues is covered in the Wiki with links to the product documentation.

Perhaps – you failed to properly configure your Run As accounts and profiles. You might see the following show as “Unknown” under administration:

Or you might see alerts in the console:

Alert: UNIX/Linux Run As profile association error event detected

The account for the UNIX/Linux Action Run As profile associated with the workflow “Microsoft.Unix.AgentVersion.Discovery”, running for instance “rh5501.opsmgr.net” with ID {9ADCED3D-B44B-3A82-769D-B0653BFE54F9} is not defined. The workflow has been unloaded. Please associate an account with the profile.

This condition may have occurred because no UNIX/Linux Accounts have been configured for the Run As profile. The UNIX/Linux Run As profile used by this workflow must be configured to associate a Run As account with the target.

Either you failed to configure the Run As accounts, or failed to distribute them, or you chose a low priv account that is not properly configured for sudo on the Linux system. Go back and double-check your work there.

If you want to check if the agent was deployed to a RedHat system, you can provide the following command in a shell session:

↧

SCOM SQL queries

November 11, 2016, 7:37 pm

≫ Next: Installing the Exchange 2010 Correlation Engine on a Non-Management Server and without a console

≪ Previous: Monitoring UNIX/Linux with OpsMgr 2016

These queries work for SCOM 2012 and SCOM 2016. Updated 12/11/2016

Large Table query. (I am putting this at the top, because I use it so much – to find out what is taking up so much space in the OpsDB or DW)

--Large Table query.  I am putting this at the top, because I use it so much to find out what is taking up so much space in the OpsDB or DW

SELECT TOP 1000
a2.name AS [tablename], (a1.reserved + ISNULL(a4.reserved,0))* 8 AS reserved,
a1.rows as row_count, a1.data * 8 AS data,
(CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data THEN (a1.used + ISNULL(a4.used,0)) - a1.data ELSE 0 END) * 8 AS index_size,
(CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used ELSE 0 END) * 8 AS unused,
(row_number() over(order by (a1.reserved + ISNULL(a4.reserved,0)) desc))%2 as l1,
a3.name AS [schemaname]
FROM (SELECT ps.object_id, SUM (CASE WHEN (ps.index_id < 2) THEN row_count ELSE 0 END) AS [rows],
SUM (ps.reserved_page_count) AS reserved,
SUM (CASE WHEN (ps.index_id < 2) THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count) END ) AS data,
SUM (ps.used_page_count) AS used
FROM sys.dm_db_partition_stats ps
GROUP BY ps.object_id) AS a1
LEFT OUTER JOIN (SELECT it.parent_id,
SUM(ps.reserved_page_count) AS reserved,
SUM(ps.used_page_count) AS used
FROM sys.dm_db_partition_stats ps
INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)
WHERE it.internal_type IN (202,204)
GROUP BY it.parent_id) AS a4 ON (a4.parent_id = a1.object_id)
INNER JOIN sys.all_objects a2  ON ( a1.object_id = a2.object_id )
INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id)
WHERE a2.type <> N'S' and a2.type <> N'IT'

Database Size and used space. (People have a lot of confusion here – this will show the DB and log file size, plus the used/free space in each)

--Database Size and used space.  
--this will show the DB and log file size plus the used/free space in each

select a.FILEID,
[FILE_SIZE_MB]=convert(decimal(12,2),round(a.size/128.000,2)),
[SPACE_USED_MB]=convert(decimal(12,2),round(fileproperty(a.name,'SpaceUsed')/128.000,2)),
[FREE_SPACE_MB]=convert(decimal(12,2),round((a.size-fileproperty(a.name,'SpaceUsed'))/128.000,2)) ,
[GROWTH_MB]=convert(decimal(12,2),round(a.growth/128.000,2)),
NAME=left(a.NAME,15),
FILENAME=left(a.FILENAME,60)
from dbo.sysfiles a

Operational Database Queries:

Alerts Section (OperationsManager DB):

Number of console Alerts per Day:

--Number of console Alerts per Day:

SELECT CONVERT(VARCHAR(20), TimeAdded, 102) AS DayAdded, COUNT(*) AS NumAlertsPerDay
FROM Alert WITH (NOLOCK)
WHERE TimeRaised is not NULL
GROUP BY CONVERT(VARCHAR(20), TimeAdded, 102)
ORDER BY DayAdded DESC

Top 20 Alerts in an Operational Database, by Alert Count

--Top 20 Alerts in an Operational Database, by Alert Count

SELECT TOP 20 SUM(1) AS AlertCount,
 AlertStringName AS 'AlertName',
 AlertStringDescription AS 'Description',
 Name,
 MonitoringRuleId
FROM Alertview WITH (NOLOCK)
WHERE TimeRaised is not NULL
GROUP BY AlertStringName, AlertStringDescription, Name, MonitoringRuleId
ORDER BY AlertCount DESC

Top 20 Alerts in an Operational Database, by Repeat Count

--Top 20 Alerts in an Operational Database, by Repeat Count

SELECT TOP 20 SUM(RepeatCount+1) AS RepeatCount,
 AlertStringName as 'AlertName',
 AlertStringDescription as 'Description',
 Name,
 MonitoringRuleId
FROM Alertview WITH (NOLOCK)
WHERE Timeraised is not NULL
GROUP BY AlertStringName, AlertStringDescription, Name, MonitoringRuleId
ORDER BY RepeatCount DESC

Top 20 Objects generating the most Alerts in an Operational Database, by Repeat Count

--Top 20 Objects generating the most Alerts in an Operational Database, by Repeat Count

SELECT TOP 20 SUM(RepeatCount+1) AS RepeatCount,
 MonitoringObjectPath AS 'Path'
FROM Alertview WITH (NOLOCK)
WHERE Timeraised is not NULL
GROUP BY MonitoringObjectPath
ORDER BY RepeatCount DESC

Top 20 Objects generating the most Alerts in an Operational Database, by Alert Count

--Top 20 Objects generating the most Alerts in an Operational Database, by Alert Count

SELECT TOP 20 SUM(1) AS AlertCount,
 MonitoringObjectPath AS 'Path'
FROM Alertview WITH (NOLOCK)
WHERE TimeRaised is not NULL
GROUP BY MonitoringObjectPath
ORDER BY AlertCount DESC

Number of console Alerts per Day by Resolution State:

--Number of console Alerts per Day by Resolution State:

SELECT
CASE WHEN(GROUPING(CONVERT(VARCHAR(20), TimeAdded, 102)) = 1)
  THEN 'All Days' ELSE CONVERT(VARCHAR(20), TimeAdded, 102)
  END AS [Date],
CASE WHEN(GROUPING(ResolutionState) = 1)
  THEN 'All Resolution States' ELSE CAST(ResolutionState AS VARCHAR(5))
  END AS [ResolutionState],
COUNT(*) AS NumAlerts
FROM Alert WITH (NOLOCK)
WHERE TimeRaised is not NULL
GROUP BY CONVERT(VARCHAR(20), TimeAdded, 102), ResolutionState WITH ROLLUP
ORDER BY DATE DESC

Events Section (OperationsManager DB):

All Events by count by day, with total for entire database: (this tells us how many events per day we are inserting – and helps us look for too many events, event storms, and the result after tuning rules that generate too many events)

--All Events by count by day, with total for entire database

SELECT CASE WHEN(GROUPING(CONVERT(VARCHAR(20), TimeAdded, 102)) = 1)
THEN 'All Days'
ELSE CONVERT(VARCHAR(20), TimeAdded, 102) END AS DayAdded,
COUNT(*) AS EventsPerDay
FROM EventAllView
GROUP BY CONVERT(VARCHAR(20), TimeAdded, 102) WITH ROLLUP
ORDER BY DayAdded DESC

Most common events by event number and event source: (This gives us the event source name to help see what is raising these events)

--Most common events by event number and event source

SELECT top 20 Number as EventID,
 COUNT(*) AS TotalEvents,
 Publishername as EventSource
FROM EventAllView eav with (nolock)
GROUP BY Number, Publishername
ORDER BY TotalEvents DESC

Computers generating the most events:

--Computers generating the most events

SELECT top 20 LoggingComputer as ComputerName,
 COUNT(*) AS TotalEvents
FROM EventallView with (NOLOCK)
GROUP BY LoggingComputer
ORDER BY TotalEvents DESC

Performance Section (OperationsManager DB):

Performance insertions per day:

--Performance insertions per day: 

SELECT CASE WHEN(GROUPING(CONVERT(VARCHAR(20), TimeSampled, 102)) = 1)
 THEN 'All Days'
 ELSE CONVERT(VARCHAR(20), TimeSampled, 102)
 END AS DaySampled, COUNT(*) AS PerfInsertPerDay
FROM PerformanceDataAllView with (NOLOCK)
GROUP BY CONVERT(VARCHAR(20), TimeSampled, 102) WITH ROLLUP
ORDER BY DaySampled DESC

Top 20 performance insertions by perf object and counter name: (This shows us which counters are likely overcollected or have duplicate collection rules, and filling the databases)

--Top 20 performance insertions by perf object and counter name: 

SELECT TOP 20 pcv.ObjectName,
 pcv.CounterName,
 COUNT (pcv.countername) AS Total
FROM performancedataallview AS pdv, performancecounterview AS pcv
WHERE (pdv.performancesourceinternalid = pcv.performancesourceinternalid)
GROUP BY pcv.objectname, pcv.countername
ORDER BY COUNT (pcv.countername) DESC

To view all performance data collected for a given computer:

--To view all performance insertions for a given computer:

select Distinct Path,
 ObjectName,
  CounterName,
  InstanceName
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where path = 'sql2a.opsmgr.net'
order by objectname, countername, InstanceName

To pull all perf data for a given computer, object, counter, and instance:

--To pull all perf data for a given computer, object, counter, and instance:

select Path,
 ObjectName,
 CounterName,
 InstanceName,
 SampleValue,
 TimeSampled
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where path = 'sql2a.opsmgr.net' AND
 objectname = 'LogicalDisk' AND
 countername = 'Free Megabytes'
order by timesampled DESC

State Section:

To find out how old your StateChange data is:

--To find out how old your StateChange data is:

declare @statedaystokeep INT
SELECT @statedaystokeep = DaysToKeep from PartitionAndGroomingSettings
WHERE ObjectName = 'StateChangeEvent'
SELECT COUNT(*) as 'Total StateChanges',
count(CASE WHEN sce.TimeGenerated > dateadd(dd,-@statedaystokeep,getutcdate()) THEN sce.TimeGenerated ELSE NULL END) as 'within grooming retention',
count(CASE WHEN sce.TimeGenerated < dateadd(dd,-@statedaystokeep,getutcdate()) THEN sce.TimeGenerated ELSE NULL END) as '> grooming retention',
count(CASE WHEN sce.TimeGenerated < dateadd(dd,-30,getutcdate()) THEN sce.TimeGenerated ELSE NULL END) as '> 30 days',
count(CASE WHEN sce.TimeGenerated < dateadd(dd,-90,getutcdate()) THEN sce.TimeGenerated ELSE NULL END) as '> 90 days',
count(CASE WHEN sce.TimeGenerated < dateadd(dd,-365,getutcdate()) THEN sce.TimeGenerated ELSE NULL END) as '> 365 days'
from StateChangeEvent sce

Cleanup old statechanges for disabled monitors: http://blogs.technet.com/kevinholman/archive/2009/12/21/tuning-tip-do-you-have-monitors-constantly-flip-flopping.aspx

USE [OperationsManager]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
BEGIN
    SET NOCOUNT ON
    DECLARE @Err int
    DECLARE @Ret int
    DECLARE @DaysToKeep tinyint
    DECLARE @GroomingThresholdLocal datetime
    DECLARE @GroomingThresholdUTC datetime
    DECLARE @TimeGroomingRan datetime
    DECLARE @MaxTimeGroomed datetime
    DECLARE @RowCount int
    SET @TimeGroomingRan = getutcdate()
    SELECT @GroomingThresholdLocal = dbo.fn_GroomingThreshold(DaysToKeep, getdate())
    FROM dbo.PartitionAndGroomingSettings
    WHERE ObjectName = 'StateChangeEvent'
    EXEC dbo.p_ConvertLocalTimeToUTC @GroomingThresholdLocal, @GroomingThresholdUTC OUT
    SET @Err = @@ERROR
    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END
    SET @RowCount = 1
    -- This is to update the settings table 
    -- with the max groomed data 
    SELECT @MaxTimeGroomed = MAX(TimeGenerated)
    FROM dbo.StateChangeEvent
    WHERE TimeGenerated < @GroomingThresholdUTC
    IF @MaxTimeGroomed IS NULL
        GOTO Success_Exit
    -- Instead of the FK DELETE CASCADE handling the deletion of the rows from 
    -- the MJS table, do it explicitly. Performance is much better this way. 
    DELETE MJS
    FROM dbo.MonitoringJobStatus MJS
    JOIN dbo.StateChangeEvent SCE
        ON SCE.StateChangeEventId = MJS.StateChangeEventId
    JOIN dbo.State S WITH(NOLOCK)
        ON SCE.[StateId] = S.[StateId]
    WHERE SCE.TimeGenerated < @GroomingThresholdUTC
    AND S.[HealthState] in (0,1,2,3)
    SELECT @Err = @@ERROR
    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END
    WHILE (@RowCount > 0)
    BEGIN
        -- Delete StateChangeEvents that are older than @GroomingThresholdUTC 
        -- We are doing this in chunks in separate transactions on 
        -- purpose: to avoid the transaction log to grow too large. 
        DELETE TOP (10000) SCE
        FROM dbo.StateChangeEvent SCE
        JOIN dbo.State S WITH(NOLOCK)
            ON SCE.[StateId] = S.[StateId]
        WHERE TimeGenerated < @GroomingThresholdUTC
        AND S.[HealthState] in (0,1,2,3)
        SELECT @Err = @@ERROR, @RowCount = @@ROWCOUNT
        IF (@Err <> 0)
        BEGIN
            GOTO Error_Exit
        END
    END
    UPDATE dbo.PartitionAndGroomingSettings
    SET GroomingRunTime = @TimeGroomingRan,
        DataGroomedMaxTime = @MaxTimeGroomed
    WHERE ObjectName = 'StateChangeEvent'
    SELECT @Err = @@ERROR, @RowCount = @@ROWCOUNT
    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END
Success_Exit:
Error_Exit:
END

State changes per day:

--State changes per day: 

SELECT CASE WHEN(GROUPING(CONVERT(VARCHAR(20), TimeGenerated, 102)) = 1)
THEN 'All Days' ELSE CONVERT(VARCHAR(20), TimeGenerated, 102)
END AS DayGenerated, COUNT(*) AS StateChangesPerDay
FROM StateChangeEvent WITH (NOLOCK)
GROUP BY CONVERT(VARCHAR(20), TimeGenerated, 102) WITH ROLLUP
ORDER BY DayGenerated DESC

Noisiest monitors changing state in the database in the last 7 days:

--Noisiest monitors changing state in the database in the last 7 days:

SELECT DISTINCT TOP 50 count(sce.StateId) as StateChanges,
  m.DisplayName as MonitorName,
  m.Name as MonitorId,
  mt.typename AS TargetClass
FROM StateChangeEvent sce with (nolock)
join state s with (nolock) on sce.StateId = s.StateId
join monitorview m with (nolock) on s.MonitorId = m.Id
join managedtype mt with (nolock) on m.TargetMonitoringClassId = mt.ManagedTypeId
where m.IsUnitMonitor = 1
  -- Scoped to within last 7 days 
AND sce.TimeGenerated > dateadd(dd,-7,getutcdate())
group by m.DisplayName, m.Name,mt.typename
order by StateChanges desc

Noisiest Monitor in the database – PER Object/Computer in the last 7 days:

--Noisiest Monitor in the database – PER Object/Computer in the last 7 days:

select distinct top 50 count(sce.StateId) as NumStateChanges,
bme.DisplayName AS ObjectName,
bme.Path,
m.DisplayName as MonitorDisplayName,
m.Name as MonitorIdName,
mt.typename AS TargetClass
from StateChangeEvent sce with (nolock)
join state s with (nolock) on sce.StateId = s.StateId
join BaseManagedEntity bme with (nolock) on s.BasemanagedEntityId = bme.BasemanagedEntityId
join MonitorView m with (nolock) on s.MonitorId = m.Id
join managedtype mt with (nolock) on m.TargetMonitoringClassId = mt.ManagedTypeId
where m.IsUnitMonitor = 1
   -- Scoped to specific Monitor (remove the "--" below): 
   -- AND m.MonitorName like ('%HealthService%') 
   -- Scoped to specific Computer (remove the "--" below): 
   -- AND bme.Path like ('%sql%') 
   -- Scoped to within last 7 days 
AND sce.TimeGenerated > dateadd(dd,-7,getutcdate())
group by s.BasemanagedEntityId,bme.DisplayName,bme.Path,m.DisplayName,m.Name,mt.typename
order by NumStateChanges desc

Management Pack info:

Rules section:

--To find a common rule name given a Rule ID name:
SELECT DisplayName from RuleView
where name = 'Microsoft.SystemCenter.GenericNTPerfMapperModule.FailedExecution.Alert'

--Rules per MP:
SELECT mp.MPName, COUNT(*) As RulesPerMP
FROM Rules r
INNER JOIN ManagementPack mp ON mp.ManagementPackID = r.ManagementPackID
GROUP BY mp.MPName
ORDER BY RulesPerMP DESC

--Rules per MP by category:
SELECT mp.MPName, r.RuleCategory, COUNT(*) As RulesPerMPPerCategory
FROM Rules r
INNER JOIN ManagementPack mp ON mp.ManagementPackID = r.ManagementPackID
GROUP BY mp.MPName, r.RuleCategory
ORDER BY RulesPerMPPerCategory DESC

--To find all rules per MP with a given alert severity:
declare @mpid as varchar(50)
select @mpid= managementpackid
  from managementpack
  where mpName='Microsoft.SystemCenter.2007'
select rl.rulename,rl.ruleid,md.modulename
  from rules rl, module md
  where md.managementpackid = @mpid
  and rl.ruleid=md.parentid
  and moduleconfiguration like '%<Severity>2%'

--Rules are stored in a table named Rules. This table has columns linking rules to classes and Management Packs. 
--To find all rules in a Management Pack use the following query and substitute in the required Management Pack name:
SELECT *
FROM Rules
WHERE ManagementPackID = (SELECT ManagementPackID from ManagementPack WHERE MPName = 'Microsoft.SystemCenter.2007')

--To find all rules targeted at a given class use the following query and substitute in the required class name:
SELECT * FROM Rules WHERE TargetManagedEntityType = (SELECT ManagedTypeId FROM ManagedType WHERE TypeName = 'Microsoft.Windows.Computer')

Monitors Section:

--Monitors Per MP:
SELECT mp.MPName, COUNT(*) As MonitorsPerMPPerCategory
FROM Monitor m
INNER JOIN ManagementPack mp ON mp.ManagementPackID = m.ManagementPackID
GROUP BY mp.MPName
ORDER BY COUNT(*) Desc

--To find your Monitor by common name:
select * from Monitor m
Inner join LocalizedText LT on LT.ElementName = m.MonitorName
where LTValue = ‘Monitor Common Name’

--To find your Monitor by ID name:
select * from Monitor m
Inner join LocalizedText LT on LT.ElementName = m.MonitorName
where m.monitorname = 'your Monitor ID name'

--To find all monitors targeted at a specific class:
SELECT * FROM monitor WHERE TargetManagedEntityType = (SELECT ManagedTypeId FROM ManagedType WHERE TypeName = 'Microsoft.Windows.Computer')

Groups Section:

--To find all members of a given group (change the group name below):
select TargetObjectDisplayName as 'Group Members'
from RelationshipGenericView
where isDeleted=0
AND SourceObjectDisplayName = 'All Windows Computers'
ORDER BY TargetObjectDisplayName

--Find find the entity data on all members of a given group (change the group name below):
SELECT bme.*
FROM BaseManagedEntity bme
INNER JOIN RelationshipGenericView rgv WITH(NOLOCK) ON bme.basemanagedentityid = rgv.TargetObjectId
WHERE bme.IsDeleted = '0'
AND rgv.SourceObjectDisplayName = 'All Windows Computers'
ORDER BY bme.displayname

--To find all groups for a given computer/object (change “computername” in the query below):
SELECT SourceObjectDisplayName AS 'Group'
FROM RelationshipGenericView
WHERE TargetObjectDisplayName like ('%sql2a.opsmgr.net%')
AND (SourceObjectDisplayName IN
(SELECT ManagedEntityGenericView.DisplayName
FROM ManagedEntityGenericView INNER JOIN
(SELECT     BaseManagedEntityId
FROM          BaseManagedEntity WITH (NOLOCK)
WHERE      (BaseManagedEntityId = TopLevelHostEntityId) AND (BaseManagedEntityId NOT IN
(SELECT     R.TargetEntityId
FROM          Relationship AS R WITH (NOLOCK) INNER JOIN
dbo.fn_ContainmentRelationshipTypes() AS CRT ON R.RelationshipTypeId = CRT.RelationshipTypeId
WHERE      (R.IsDeleted = 0)))) AS GetTopLevelEntities ON
GetTopLevelEntities.BaseManagedEntityId = ManagedEntityGenericView.Id INNER JOIN
(SELECT DISTINCT BaseManagedEntityId
FROM          TypedManagedEntity WITH (NOLOCK)
WHERE      (ManagedTypeId IN
(SELECT     DerivedManagedTypeId
FROM dbo.fn_DerivedManagedTypes(dbo.fn_ManagedTypeId_Group()) AS fn_DerivedManagedTypes_1))) AS GetOnlyGroups ON
GetOnlyGroups.BaseManagedEntityId = ManagedEntityGenericView.Id))
ORDER BY 'Group'

Management Pack and Instance Space misc queries:

--To find all installed Management Packs and their version:
SELECT Name AS 'ManagementPackID',
 FriendlyName,
 DisplayName,
 Version,
 Sealed,
 LastModified,
 TimeCreated
FROM ManagementPackView
WHERE LanguageCode = 'ENU'
OR LanguageCode IS NULL
ORDER BY DisplayName

--Number of Views per Management Pack:
SELECT mp.MPName, v.ViewVisible, COUNT(*) As ViewsPerMP
FROM [Views] v
            INNER JOIN ManagementPack mp ON mp.ManagementPackID = v.ManagementPackID
GROUP BY  mp.MPName, v.ViewVisible
ORDER BY v.ViewVisible DESC, COUNT(*) Desc

--How to gather all the views in the database, their ID, MP location, and view type:
select vv.id as 'View Id',
vv.displayname as 'View DisplayName',
vv.name as 'View Name',
vtv.DisplayName as 'ViewType',
mpv.FriendlyName as 'MP Name'
from ViewsView vv
inner join managementpackview mpv on mpv.id = vv.managementpackid
inner join viewtypeview vtv on vtv.id = vv.monitoringviewtypeid
-- where mpv.FriendlyName like '%default%' 
-- where vv.displayname like '%operating%' 
order by mpv.FriendlyName, vv.displayname

--Classes available in the DB:
SELECT count(*) FROM ManagedType

--Total BaseManagedEntities
SELECT count(*) FROM BaseManagedEntity

--To get the state of every instance of a particular monitor the following query can be run, (replace <Health Service Heartbeat Failure> with the name of the monitor):
SELECT bme.FullName,
 bme.DisplayName,
 s.HealthState
FROM state AS s,
 BaseManagedEntity as bme
WHERE s.basemanagedentityid = bme.basemanagedentityid
AND s.monitorid IN (SELECT Id FROM MonitorView WHERE DisplayName = 'Health Service Heartbeat Failure')

--For example, this gets the state of the Microsoft.SQLServer.2012.DBEngine.ServiceMonitor for each instance of the SQL 2012 Database Engine class.
SELECT bme.FullName,
 bme.DisplayName,
 s.HealthState
FROM state AS s, BaseManagedEntity as bme
WHERE s.basemanagedentityid = bme.basemanagedentityid
AND s.monitorid IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'Microsoft.SQLServer.2012.DBEngine.ServiceMonitor')

--To find the overall state of any object in OpsMgr the following query should be used to return the state of the System.EntityState monitor:
SELECT bme.FullName,
 bme.DisplayName,
 s.HealthState
FROM state AS s, BaseManagedEntity as bme
WHERE s.basemanagedentityid = bme.basemanagedentityid AND s.monitorid IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'System.Health.EntityState')

 --The Alert table contains all alerts currently open in OpsMgr. This includes resolved alerts until they are groomed out of the database. To get all alerts across all instances of a given monitor use the following query and substitute in the required monitor name:
SELECT * FROM Alert WHERE ProblemID IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'Microsoft.SQLServer.2012.DBEngine.ServiceMonitor')

--To retrieve all alerts for all instances of a specific class use the following query and substitute in the required table name, in this example MT_Microsoft$SQLServer$2012$DBEngine is used to look for SQL alerts:
SELECT * FROM Alert WHERE BaseManagedEntityID IN (SELECT BaseManagedEntityID from MT_Microsoft$SQLServer$2012$DBEngine)

--To determine which table is currently being written to for event and performance data use the following query:
SELECT * FROM PartitionTables WHERE IsCurrent = 1

--Number of instances of a type:  (Number of disks, computers, databases, etc that OpsMgr has discovered) 
SELECT mt.TypeName, COUNT(*) AS NumEntitiesByType
FROM BaseManagedEntity bme WITH(NOLOCK)
LEFT JOIN ManagedType mt WITH(NOLOCK) ON mt.ManagedTypeID = bme.BaseManagedTypeID
WHERE bme.IsDeleted = 0
GROUP BY mt.TypeName
ORDER BY COUNT(*) DESC

--To retrieve all performance data for a given rule in a readable format use the following query: (change the r.RuleName value – get list from Rules Table)
SELECT bme.Path, pc.ObjectName, pc.CounterName, ps.PerfmonInstanceName, pdav.SampleValue, pdav.TimeSampled
FROM PerformanceDataAllView AS pdav with (NOLOCK)
INNER JOIN PerformanceSource ps on pdav.PerformanceSourceInternalId = ps.PerformanceSourceInternalId
INNER JOIN PerformanceCounter pc on ps.PerformanceCounterId = pc.PerformanceCounterId
INNER JOIN Rules r on ps.RuleId = r.RuleId
INNER JOIN BaseManagedEntity bme on ps.BaseManagedEntityID = bme.BaseManagedEntityID
WHERE r.RuleName = 'Microsoft.Windows.Server.6.2.LogicalDisk.FreeSpace.Collection'
GROUP BY PerfmonInstanceName, ObjectName, CounterName, SampleValue, TimeSampled, bme.path
ORDER BY bme.path, PerfmonInstanceName, TimeSampled

--To determine what discoveries are still associated with a computer – helpful in finding old stale computer objects in the console that are no longer agent managed, or desired.
select BME.FullName, DS.DiscoveryRuleID, D.DiscoveryName from typedmanagedentity TME
Join BaseManagedEntity BME ON TME.BaseManagedEntityId = BME.BaseManagedEntityId
JOIN DiscoverySourceToTypedManagedEntity DSTME ON TME.TypedManagedEntityID = DSTME.TypedManagedEntityID
JOIN DiscoverySource DS ON DS.DiscoverySourceID = DSTME.DiscoverySourceID
JOIN Discovery D ON DS.DiscoveryRuleID=D.DiscoveryID
Where BME.Fullname like '%SQL2A%'

--To dump out all the rules and monitors that have overrides, and display the context and instance of the override:
select rv.DisplayName as WorkFlowName, OverrideName, mo.Value as OverrideValue,
mt.TypeName as OverrideScope, bme.DisplayName as InstanceName, bme.Path as InstancePath,
mpv.DisplayName as ORMPName, mo.LastModified as LastModified
from ModuleOverride mo
inner join managementpackview mpv on mpv.Id = mo.ManagementPackId
inner join ruleview rv on rv.Id = mo.ParentId
inner join ManagedType mt on mt.managedtypeid = mo.TypeContext
left join BaseManagedEntity bme on bme.BaseManagedEntityId = mo.InstanceContext
Where mpv.Sealed = 0
UNION ALL
select mv.DisplayName as WorkFlowName, OverrideName, mto.Value as OverrideValue,
mt.TypeName as OverrideScope, bme.DisplayName as InstanceName, bme.Path as InstancePath,
mpv.DisplayName as ORMPName, mto.LastModified as LastModified
from MonitorOverride mto
inner join managementpackview mpv on mpv.Id = mto.ManagementPackId
inner join monitorview mv on mv.Id = mto.MonitorId
inner join ManagedType mt on mt.managedtypeid = mto.TypeContext
left join BaseManagedEntity bme on bme.BaseManagedEntityId = mto.InstanceContext
Where mpv.Sealed = 0
Order By mpv.DisplayName

Agent Info:

--To find all managed computers that are currently down and not pingable:
SELECT bme.DisplayName,
  s.LastModified as LastModifiedUTC,
  dateadd(hh,-5,s.LastModified) as 'LastModifiedCST (GMT-5)'
FROM state AS s, BaseManagedEntity AS bme
WHERE s.basemanagedentityid = bme.basemanagedentityid
AND s.monitorid
 IN (SELECT MonitorId FROM Monitor WHERE MonitorName = 'Microsoft.SystemCenter.HealthService.ComputerDown')
 AND s.Healthstate = '3' AND bme.IsDeleted = '0'
ORDER BY s.Lastmodified DESC

--To find a computer name from a HealthServiceID (guid from the Agent proxy alerts)
select DisplayName, Path, basemanagedentityid from basemanagedentity where basemanagedentityid = '<guid>'

--To view the agent patch list (all hotfixes applied to all agents)
select bme.path AS 'Agent Name',
 hs.patchlist AS 'Patch List'
from MT_HealthService hs
inner join BaseManagedEntity bme on hs.BaseManagedEntityId = bme.BaseManagedEntityId
order by path

--Here is a query to see all Agents which are manually installed:
select bme.DisplayName from MT_HealthService mths
INNER JOIN BaseManagedEntity bme on bme.BaseManagedEntityId = mths.BaseManagedEntityId
where IsManuallyInstalled = 1

--Here is a query that will set all agents back to Remotely Manageable:
UPDATE MT_HealthService
SET IsManuallyInstalled=0
WHERE IsManuallyInstalled=1

--Now – the above query will set ALL agents back to “Remotely Manageable = Yes” in the console.  If you want to control it agent by agent – you need to specify it by name here:
UPDATE MT_HealthService
SET IsManuallyInstalled=0
WHERE IsManuallyInstalled=1
AND BaseManagedEntityId IN
(select BaseManagedEntityID from BaseManagedEntity
where BaseManagedTypeId = 'AB4C891F-3359-3FB6-0704-075FBFE36710'
AND DisplayName = 'servername.domain.com')

--Get the discovered instance count of the top 50 agents 
DECLARE @RelationshipTypeId_Manages UNIQUEIDENTIFIER
SELECT @RelationshipTypeId_Manages = dbo.fn_RelationshipTypeId_Manages()
SELECT TOP 50 bme.DisplayName, SUM(1) AS HostedInstances
FROM BaseManagedEntity bme
RIGHT JOIN (
SELECT
      HBME.BaseManagedEntityId AS HS_BMEID,
      TBME.FullName AS TopLevelEntityName,
      BME.FullName AS BaseEntityName,
      TYPE.TypeName AS TypedEntityName
FROM BaseManagedEntity BME WITH(NOLOCK)
      INNER JOIN TypedManagedEntity TME WITH(NOLOCK) ON BME.BaseManagedEntityId = TME.BaseManagedEntityId AND BME.IsDeleted = 0 AND TME.IsDeleted = 0
      INNER JOIN BaseManagedEntity TBME WITH(NOLOCK) ON BME.TopLevelHostEntityId = TBME.BaseManagedEntityId AND TBME.IsDeleted = 0
      INNER JOIN ManagedType TYPE WITH(NOLOCK) ON TME.ManagedTypeID = TYPE.ManagedTypeID
      LEFT JOIN Relationship R WITH(NOLOCK) ON R.TargetEntityId = TBME.BaseManagedEntityId AND R.RelationshipTypeId = @RelationshipTypeId_Manages AND R.IsDeleted = 0
      LEFT JOIN BaseManagedEntity HBME WITH(NOLOCK) ON R.SourceEntityId = HBME.BaseManagedEntityId
) AS dt ON dt.HS_BMEID = bme.BaseManagedEntityId
GROUP by BME.displayname
order by HostedInstances DESC

Misc OpsDB:

--To get all the OperationsManager configuration settings from the database:
SELECT ManagedTypePropertyName,
 SettingValue,
 mtv.DisplayName,
 gs.LastModified
FROM GlobalSettings gs
INNER JOIN ManagedTypeProperty mtp on gs.ManagedTypePropertyId = mtp.ManagedTypePropertyId
INNER JOIN ManagedTypeView mtv on mtp.ManagedTypeId = mtv.Id
ORDER BY mtv.DisplayName


--To view grooming info:
SELECT * FROM PartitionAndGroomingSettings WITH (NOLOCK)

--GroomHistory
select * from InternalJobHistory
order by InternalJobHistoryId DESC

--Information on existing User Roles:
SELECT UserRoleName, IsSystem from userrole

--Operational DB version:
select DBVersion from __MOMManagementGroupInfo__

--To view all Run-As Profiles, their associated Run-As account, and associated agent name:
select srv.displayname as 'RunAs Profile Name',
srv.description as 'RunAs Profile Description',
cmss.name as 'RunAs Account Name',
cmss.description as 'RunAs Account Description',
cmss.username as 'RunAs Account Username',
cmss.domain as 'RunAs Account Domain',
mp.FriendlyName as 'RunAs Profile MP',
bme.displayname as 'HealthService'
from dbo.SecureStorageSecureReference sssr
inner join SecureReferenceView srv on srv.id = sssr.securereferenceID
inner join CredentialManagerSecureStorage cmss on cmss.securestorageelementID = sssr.securestorageelementID
inner join managementpackview mp on srv.ManagementPackId = mp.Id
inner join BaseManagedEntity bme on bme.basemanagedentityID = sssr.healthserviceid
order by srv.displayname

--Config Service logs
SELECT * FROM cs.workitem
ORDER BY WorkItemRowId DESC

--Config Service Snapshot history
SELECT * FROM cs.workitem
WHERE WorkItemName like '%snap%'
ORDER BY WorkItemRowId DESC

My Workspace Views:

SELECT
  MyWSViews.UserSid,
  MyWSViews.SavedSearchName,
  VT.ViewTypeName,
  VT.ManagementPackId,
  MyWSViews.ConfigurationXML
FROM [OperationsManager].[dbo].[SavedSearch] AS MyWSViews
  INNER JOIN [OperationsManager].[dbo].[ViewType] AS VT ON MyWSViews.ViewTypeId=VT.ViewTypeId
WHERE
  MyWSViews.TargetManagedTypeId is not NULL

Data Warehouse Database Queries:

Alerts Section (Warehouse):

--To get all raw alert data from the data warehouse to build reports from:
select * from Alert.vAlertResolutionState ars
inner join Alert.vAlertDetail adt on ars.alertguid = adt.alertguid
inner join Alert.vAlert alt on ars.alertguid = alt.alertguid

--To view data on all alerts modified by a specific user:
select ars.alertguid, alertname, alertdescription, statesetbyuserid, resolutionstate, statesetdatetime, severity, priority, managedentityrowID, repeatcount
from Alert.vAlertResolutionState ars
inner join Alert.vAlert alt on ars.alertguid = alt.alertguid
where statesetbyuserid like '%username%'
order by statesetdatetime

--To view a count of all alerts closed by all users:
select statesetbyuserid, count(*) as 'Number of Alerts'
from Alert.vAlertResolutionState ars
where resolutionstate = '255'
group by statesetbyuserid
order by 'Number of Alerts' DESC

Events Section (Warehouse):

--To inspect total events in DW, and then break it down per day:  (this helps us know what we will be grooming out, and look for partitcular day event storms)
SELECT CASE WHEN(GROUPING(CONVERT(VARCHAR(20), DateTime, 101)) = 1)
THEN 'All Days'
ELSE CONVERT(VARCHAR(20), DateTime, 101) END AS DayAdded,
COUNT(*) AS NumEventsPerDay
FROM Event.vEvent
GROUP BY CONVERT(VARCHAR(20), DateTime, 101) WITH ROLLUP
ORDER BY DayAdded DESC

--Most Common Events by event number:  (This helps us know which event ID’s are the most common in the database)
SELECT top 50 EventDisplayNumber, COUNT(*) AS 'TotalEvents'
FROM Event.vEvent
GROUP BY EventDisplayNumber
ORDER BY TotalEvents DESC

--Most common events by event number and raw event description (this will take a very long time to run but it shows us not only event ID – but a description of the event to help understand which MP is the generating the noise)
SELECT top 50 EventDisplayNumber, Rawdescription, COUNT(*) AS TotalEvents
FROM Event.vEvent evt
inner join Event.vEventDetail evtd on evt.eventoriginid = evtd.eventoriginid
GROUP BY EventDisplayNumber, Rawdescription
ORDER BY TotalEvents DESC

--To view all event data in the DW for a given Event ID:
select * from Event.vEvent ev
inner join Event.vEventDetail evd on ev.eventoriginid = evd.eventoriginid
inner join Event.vEventParameter evp on ev.eventoriginid = evp.eventoriginid
where eventdisplaynumber = '6022'

Performance Section (Warehouse):

--Raw data – core query:
select top 10 *
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

--Raw data – More selective of “interesting” output data:
select top 10 Path, FullName, ObjectName, CounterName, InstanceName, SampleValue, DateTime
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

--Raw data – Scoped to a ComputerName (FQDN)
select top 10 Path, FullName, ObjectName, CounterName, InstanceName, SampleValue, DateTime
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId
WHERE Path = 'sql2a.opsmgr.net'

--Raw data – Scoped to a Counter:
select top 10 Path, FullName, ObjectName, CounterName, InstanceName, SampleValue, DateTime
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId
WHERE CounterName = 'Private Bytes'

--Raw data – Scoped to a Computer and Counter:
select top 10 Path, FullName, ObjectName, CounterName, InstanceName, SampleValue, DateTime
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId
WHERE CounterName = 'Private Bytes'
AND Path like '%op%'

--Raw data – How to get all the possible optional data to modify these queries above, in a list:
Select Distinct Path
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

Select Distinct Fullname
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

Select Distinct ObjectName
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

Select Distinct CounterName
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

Select Distinct InstanceName
from Perf.vPerfRaw pvpr
inner join vManagedEntity vme on pvpr.ManagedEntityRowId = vme.ManagedEntityRowId
inner join vPerformanceRuleInstance vpri on pvpr.PerformanceRuleInstanceRowId = vpri.PerformanceRuleInstanceRowId
inner join vPerformanceRule vpr on vpr.RuleRowId = vpri.RuleRowId

Grooming in the DataWarehouse:

--Here is a view of the current data retention in your data warehouse:
select ds.datasetDefaultName AS 'Dataset Name',
 sda.AggregationTypeId AS 'Agg Type 0=raw, 20=Hourly, 30=Daily',
 sda.MaxDataAgeDays AS 'Retention Time in Days'
from dataset ds, StandardDatasetAggregation sda
WHERE ds.datasetid = sda.datasetid
ORDER by ds.datasetDefaultName

--To view the number of days of total data of each type in the DW:
SELECT DATEDIFF(d, MIN(DWCreatedDateTime), GETDATE()) AS [Current] FROM Alert.vAlert
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Event.vEvent
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfRaw
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfHourly
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM Perf.vPerfDaily
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateRaw
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateHourly
SELECT DATEDIFF(d, MIN(DateTime), GETDATE()) AS [Current] FROM State.vStateDaily

--To view the oldest and newest recorded timestamps of each data type in the DW:
select min(DateTime) from Event.vEvent
select max(DateTime) from Event.vEvent
select min(DateTime) from Perf.vPerfRaw
select max(DateTime) from Perf.vPerfRaw
select min(DWCreatedDateTime) from Alert.vAlert
select max(DWCreatedDateTime) from Alert.vAlert

AEM Queries (Data Warehouse):

--Default query to return all RAW AEM data: 
select * from [CM].[vCMAemRaw] Rw
inner join dbo.AemComputer Computer on Computer.AemComputerRowID = Rw.AemComputerRowID
inner join dbo.AemUser Usr on Usr.AemUserRowId = Rw.AemUserRowId
inner join dbo.AemErrorGroup EGrp on Egrp.ErrorGroupRowId = Rw.ErrorGroupRowId
Inner join dbo.AemApplication App on App.ApplicationRowId = Egrp.ApplicationRowId

--Count the raw crashes per day:
SELECT CONVERT(char(10), DateTime, 101) AS "Crash Date (by Day)", COUNT(*) AS "Number of Crashes"
FROM [CM].[vCMAemRaw]
GROUP BY CONVERT(char(10), DateTime, 101)
ORDER BY "Crash Date (by Day)" DESC

--Count the total number of raw crashes in the DW database:
select count(*) from CM.vCMAemRaw

--Default grooming for the DW for the AEM dataset:  (Aggregated data kept for 400 days, RAW 30 days by default)
SELECT AggregationTypeID, BuildAggregationStoredProcedureName, GroomStoredProcedureName, MaxDataAgeDays, GroomingIntervalMinutes
FROM StandardDatasetAggregation WHERE BuildAggregationStoredProcedureName = 'AemAggregate'

Aggregations and Config churn queries for the Warehouse:

--/* Top Noisy Rules in the last 24 hours */ 
select ManagedEntityTypeSystemName, DiscoverySystemName, count(*) As 'Changes'
from
(select distinct
MP.ManagementPackSystemName,
MET.ManagedEntityTypeSystemName,
PropertySystemName,
D.DiscoverySystemName, D.DiscoveryDefaultName,
MET1.ManagedEntityTypeSystemName As 'TargetTypeSystemName', MET1.ManagedEntityTypeDefaultName 'TargetTypeDefaultName',
ME.Path, ME.Name,
C.OldValue, C.NewValue, C.ChangeDateTime
from dbo.vManagedEntityPropertyChange C
inner join dbo.vManagedEntity ME on ME.ManagedEntityRowId=C.ManagedEntityRowId
inner join dbo.vManagedEntityTypeProperty METP on METP.PropertyGuid=C.PropertyGuid
inner join dbo.vManagedEntityType MET on MET.ManagedEntityTypeRowId=ME.ManagedEntityTypeRowId
inner join dbo.vManagementPack MP on MP.ManagementPackRowId=MET.ManagementPackRowId
inner join dbo.vManagementPackVersion MPV on MPV.ManagementPackRowId=MP.ManagementPackRowId
left join dbo.vDiscoveryManagementPackVersion DMP on DMP.ManagementPackVersionRowId=MPV.ManagementPackVersionRowId
AND CAST(DefinitionXml.query('data(/Discovery/DiscoveryTypes/DiscoveryClass/@TypeID)') AS nvarchar(max)) like '%'+MET.ManagedEntityTypeSystemName+'%'
left join dbo.vManagedEntityType MET1 on MET1.ManagedEntityTypeRowId=DMP.TargetManagedEntityTypeRowId
left join dbo.vDiscovery D on D.DiscoveryRowId=DMP.DiscoveryRowId
where ChangeDateTime > dateadd(hh,-24,getutcdate())
) As #T
group by ManagedEntityTypeSystemName, DiscoverySystemName
order by count(*) DESC

--/* Modified properties in the last 24 hours */
select distinct
MP.ManagementPackSystemName,
MET.ManagedEntityTypeSystemName,
PropertySystemName,
D.DiscoverySystemName, D.DiscoveryDefaultName,
MET1.ManagedEntityTypeSystemName As 'TargetTypeSystemName', MET1.ManagedEntityTypeDefaultName 'TargetTypeDefaultName',
ME.Path, ME.Name,
C.OldValue, C.NewValue, C.ChangeDateTime
from dbo.vManagedEntityPropertyChange C
inner join dbo.vManagedEntity ME on ME.ManagedEntityRowId=C.ManagedEntityRowId
inner join dbo.vManagedEntityTypeProperty METP on METP.PropertyGuid=C.PropertyGuid
inner join dbo.vManagedEntityType MET on MET.ManagedEntityTypeRowId=ME.ManagedEntityTypeRowId
inner join dbo.vManagementPack MP on MP.ManagementPackRowId=MET.ManagementPackRowId
inner join dbo.vManagementPackVersion MPV on MPV.ManagementPackRowId=MP.ManagementPackRowId
left join dbo.vDiscoveryManagementPackVersion DMP on DMP.ManagementPackVersionRowId=MPV.ManagementPackVersionRowId
    AND CAST(DefinitionXml.query('data(/Discovery/DiscoveryTypes/DiscoveryClass/@TypeID)') AS nvarchar(max)) like '%'+MET.ManagedEntityTypeSystemName+'%'
left join dbo.vManagedEntityType MET1 on MET1.ManagedEntityTypeRowId=DMP.TargetManagedEntityTypeRowId
left join dbo.vDiscovery D on D.DiscoveryRowId=DMP.DiscoveryRowId
where ChangeDateTime > dateadd(hh,-24,getutcdate())
ORDER BY MP.ManagementPackSystemName, MET.ManagedEntityTypeSystemName

--Aggregation history
USE OperationsManagerDW;
WITH AggregationInfo AS (
    SELECT
    AggregationType = CASE
        WHEN AggregationTypeId = 0 THEN 'Raw'
        WHEN AggregationTypeId = 20 THEN 'Hourly'
        WHEN AggregationTypeId = 30 THEN 'Daily'
        ELSE NULL
    END
    ,AggregationTypeId
    ,MIN(AggregationDateTime) as 'TimeUTC_NextToAggregate'
    ,COUNT(AggregationDateTime) as 'Count_OutstandingAggregations'
    ,DatasetId
    FROM StandardDatasetAggregationHistory
    WHERE LastAggregationDurationSeconds IS NULL
    GROUP BY DatasetId, AggregationTypeId
)
SELECT
SDS.SchemaName
,AI.AggregationType
,AI.TimeUTC_NextToAggregate
,Count_OutstandingAggregations
,SDA.MaxDataAgeDays
,SDA.LastGroomingDateTime
,SDS.DebugLevel
,AI.DataSetId
FROM StandardDataSet AS SDS WITH(NOLOCK)
JOIN AggregationInfo AS AI WITH(NOLOCK) ON SDS.DatasetId = AI.DatasetId
JOIN dbo.StandardDatasetAggregation AS SDA WITH(NOLOCK) ON SDA.DatasetId = SDS.DatasetId AND SDA.AggregationTypeID = AI.AggregationTypeID
ORDER BY SchemaName DESC

Analyzing Dataset data in the DW:

--Rules creating the most inserts
select TOP(30) vr.RuleSystemName, count (*) AS 'count'
from [Perf].[PerfHourly_99D5C26784F74BA0B17D726400D58097] ph
INNER JOIN PerformanceRuleInstance pri on ph.PerformanceRuleInstanceRowId = pri.PerformanceRuleInstanceRowId
INNER JOIN vRule vr on pri.RuleRowId = vr.RuleRowId
GROUP BY vr.RuleSystemName
Order by count DESC

--Instances with the most perf inserts
select TOP(30) vme.FullName, count (*) AS 'count'
from [Perf].[PerfHourly_99D5C26784F74BA0B17D726400D58097] ph
INNER JOIN vManagedEntity vme on ph.ManagedEntityRowId = vme.ManagedEntityRowId
GROUP BY vme.FullName
Order by count DESC

--Instance types with the most perf inserts
select TOP(30) vmet.ManagedEntityTypeSystemName, count (*) AS 'count'
from [Perf].[PerfHourly_99D5C26784F74BA0B17D726400D58097] ph
INNER JOIN vManagedEntity vme on ph.ManagedEntityRowId = vme.ManagedEntityRowId
INNER JOIN vManagedEntityType vmet on vmet.ManagedEntityTypeRowId = vme.ManagedEntityTypeRowId
GROUP BY vmet.ManagedEntityTypeSystemName
Order by count DESC

--Find the current Perf partition table
SELECT TOP(1) TableGuid, StartDateTime, EndDateTime
FROM StandardDatasetTableMap sdtm
INNER JOIN StandardDataset sd on sd.DatasetId = sdtm.DatasetId
WHERE AggregationTypeId = '20'
AND sd.SchemaName = 'Perf'
ORDER BY sdtm.EndDateTime DESC

Misc Section:

--To get better performance manually:
--Update Statistics (will help speed up reports and takes less time than a full reindex):
EXEC sp_updatestats

--Show index fragmentation (to determine how badly you need a reindex – logical scan frag > 10% = bad. Scan density below 80 = bad):
DBCC SHOWCONTIG
DBCC SHOWCONTIG WITH FAST --(less data than above – in case you don’t have time)

--Reindex the database:
USE OperationsManager
go
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print ‘Reindexing ‘+’?’ DBCC DBREINDEX (‘?’)"

--Table by table:
DBCC DBREINDEX (‘TableName’)

--Query to view the index job history on domain tables in the databases:
select *
from DomainTable dt
inner join DomainTableIndexOptimizationHistory dti
on dt.domaintablerowID = dti.domaintableindexrowID
ORDER BY optimizationdurationseconds DESC

--Query to view the update statistics job history on domain tables in the databases:
select *
from DomainTable dt
inner join DomainTableStatisticsUpdateHistory dti
on dt.domaintablerowID = dti.domaintablerowID
ORDER BY UpdateDurationSeconds DESC

↧

Installing the Exchange 2010 Correlation Engine on a Non-Management Server and without a console

November 18, 2016, 1:45 pm

≫ Next: Understanding SCOM Resource Pools

≪ Previous: SCOM SQL queries

These is an issue with the current Exchange 2010 Correlation Engine – which causes it to fail on SCOM 2012 or 2016 Management Servers. Jimmy wrote about these here:

https://blogs.technet.microsoft.com/jimmyharper/2015/04/15/exchange-2010-correlation-engine-not-generating-alerts/

So one remedy to this – is to install the Correlation Engine (CE) on a non-management server role. Either on a dedicated reporting server, or stand-alone server in the environment. This is advisable – because the CE uses a LOT of memory – and we don’t want it consuming it all from the SCOM Management server. One of the problems with this – is that the CE checks to ensure the SCOM 2007 (or later) console is installed when you kick off the MSI. If it is missing – you get:

The problem with installing the SCOM 2012 Console, is that you end up with the wrong version of the SDK binaries that the CE is expecting. To work around this – we can do a simple “hack”. The alternative to this would be to install the SCOM 2007R2 console. Many customers will not want to install this old console for no other reason.

The Exchange2010ManagementPackForOpsMgr2007-x64.msi is looking in the registry for:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup]
“UIVersion”=”6.0.6278.0”

We can simply create that “Setup” registry key, then a Reg String value for “UIVersion” with “6.0.6278.0” as the data value.

This will allow us to installed the CE.

Once installed – browse to the \Program Files\Microsoft\Exchange Server\v14\Bin directory.

Edit the Microsoft.Exchange.Monitoring.CorrelationEngine.exe.config file.

Here is the default file config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <runtime>
        <generatePublisherEvidence enabled="false"/>
    </runtime>
    <appSettings>
        <add key="OpsMgrRootManagementServer" value="localhost" />
        <add key="OpsMgrLogonDomain" />
        <add key="OpsMgrLogonUser" />
        <add key="ManagementPackId" value="Microsoft.Exchange.2010" />
        <add key="CorrelationIntervalInSeconds" value="300" />
        <add key="CorrelationTimeWindowInSeconds" value="300" />
        <add key="AutoResolveAlerts" value="true" />
        <add key="EnableLogging" value="true" />
        <add key="MaxLogDays" value="30" />
        <add key="LogVerbose" value="false" />
        <add key="MaxLogDirectorySizeInMegabytes" value="1024" />
    </appSettings>
</configuration>

Modify the value for OpsMgrRootManagementServer to a management server (Might as well use your RMSe server). Save the file. UAC might block you from editing this file, if so – open notepad as elevated.

Next – open the Services.msc control applet, and configure the service “Microsoft Exchange Monitoring Correlation”

Set this service to run as your SDK account, or a dedicated service account that has rights to the SCOM SDK as a SCOM Administrator.

Your CE Service will be stuck in a restart loop. It is crashing because of an exception – it is missing the SDK binaries.

Now – following the BLOG POST referenced above – unzip the three SCOM 2007 files in the blog attachment to the Program Files\Microsoft\Exchange Server\v14\Bin\ directory:

The errors should go away – and in the Application event log – you should see the following sequence:

Log Name:      Application
Source:        MSExchangeMonitoringCorrelation
Event ID:      700
Description:
MSExchangeMonitoringCorrelation service starting.

Log Name:      Application
Source:        MSExchangeMonitoringCorrelation
Event ID:      722
Description:
MSExchangeMonitoringCorrelation successfully connected to Operations Manager Root Management Server.

Log Name:      Application
Source:        MSExchangeMonitoringCorrelation
Event ID:      701
Description:
MSExchangeMonitoringCorrelation service started successfully.

↧

Understanding SCOM Resource Pools

November 21, 2016, 3:26 pm

≫ Next: How to move views in My Workspace into a Management Pack

≪ Previous: Installing the Exchange 2010 Correlation Engine on a Non-Management Server and without a console

Resource pools are nothing new – they were introduced in SCOM 2012 RTM, for two reasons:

1. To remove the single-point-of-failure that was the RMS role in SCOM 2007.

2. To provide a mechanism for high availability of agentless/remote workflows, such as Unix/Linux, Network, and URL monitoring, among others.

That said – they are often not fully understood.

Lets talk about the primary components of a Resource Pool. I am going to “dumb this down” a lot…. because it is actually quite complex behind the scenes. So I will break this down more into “roles” with regard to Resource Pools. The primary “role” components we will discuss are:

1. Members

2. Observers

3. Default Observer

Members of a pool are either a Management Server or a Gateway Server.

Observers are “observer-only” roles. These will be a Management Server or a Gateway server, that do NOT participate in loading workflows for the pool, however they participate in quorum decisions. This is actually pretty rare to do anything with Healthservice based observer-only roles…. but you would use these if you wanted high availability for your pool, but only a limited number of Healthservices actually running pool workflows. This is rarely used under normal circumstances.

Default Observer is the SCOM Operations Database. This is set to “Enabled” or “Disabled” for every pool. This is set to enabled by default for all pools created in the UI. It is set to disabled by default, for all pools created via PowerShell, using the New-ResourcePool command. The “reason” this exists is for the following:

To allow for a pool to have high availability when you have two management servers in a pool

Let’s talk about that.

A pool requires ONE or more members.

A pool requires THREE (quorum voting) members to establish high availability.

High availability is the ability to have a member be unavailable, with no loss of monitoring.

The reason we need THREE (quorum voting) members (not two) for high availability is because of the quorum algorithm. We require that MORE than 50% of the quorum voting members in a pool be available. If you have only two members of a pool, and one is down, you have lost quorum, because of the “greater than 50%” rule.

Therefore – the “Default Observer” was dreamed up, so customers would not HAVE to deploy a minimum of THREE management servers just to get high availability for their Resource Pools. It is a special quorum voting “observer” role, to allow for high availability of pools when you have two management servers deployed. This reduced cost and complexity for a basic SCOM deployment.

Lets break this into “scenarios”

Single Management server in pool

The default observer is enabled by default.

There is no high availability, because the management server is a single point of failure.

The default observer provides no benefit (nor harm) in this case.

Two management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are three voting members (2 MS + Default Observer)

If you disable the default observer, you will lose high availability for the pool.

Three management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are four voting members (3 MS + Default Observer)

By default – you can only have ONE management server down, to maintain the pool. (greater than 50% rule) because if two MS are down, this is 50% of voting members, so pool suicides.

The default observer in this case provides NO value. It does not increase the number of management servers that can be down, therefore it does not increase pool stability.

You can consider removing the DO (Default Observer) in this scenario.

Four management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are five voting members (4 MS + Default Observer)

By default – you can only have TWO management server down, to maintain the pool. (greater than 50% rule) because if three MS are down, this is greater than 50% of voting members, so pool suicides.

The default observer in this case provides significant value, because it increases the number of management servers that can be down. Without the DO in this case, you’d only have 4 quorum members, which only allows for ONE to be unavailable.

Five or more management servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are 6 voting members (5 MS + Default Observer)

By default – you can only have TWO management server down, to maintain the pool. (greater than 50% rule) because if three MS are down, this is exactly 50% of voting members, so pool suicides.

The default observer in this case provides NO value. It does not increase the number of management servers that can be down, therefore it does not increase pool stability.

You can consider removing the DO (Default Observer) in this scenario.

One could argue – that once you have 3 or more management servers in a pool, any “odd” number of management servers would be a good consideration to remove the DO from the pool. I’d also argue that once you hit 5 management servers, you are probably big enough that the database is under significant load (you wouldn’t typically have 5 management servers in a small environment). When the database is under heavy load, the default observer might not perform well, and might experience latency in resource pool calculations/voting.

The way the default observer plays a role – is that each MANAGEMENT SERVER in the pool, queries its own local SDK service – which allows it to get data from the database. There is a table in the SCOM Operations database for the default observer. So if the SDK service is under load, or the database, we could experience latency that otherwise would not exist.

Gateways as resource pool members

Next – we should discuss the Gateway role as it pertains to Resource Pools. Microsoft support resource pool membership for Management Servers, AND for Gateway servers.

For instance, a customer might monitor Unix/Linux servers in a firewalled off DMZ, or across a small WAN circuit where you want the agentless communication localized. In this scenario, a customer might create dedicated resource pools for Gateways in those locations, to perform monitoring.

Single Gateway server in pool

The default observer is enabled by default.

There is no high availability, because the Gateway server is a single point of failure.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

Two Gateway servers in pool

The default observer is enabled by default.

One would THINK there is high availability for the pool, because there are two GW’s in the pool, right? HOWEVER – that is NOT the case. As we discussed above – we need three voting members to establish high availability for a pool. Since the Default Observer is NEVER valid for a pool consisting of Gateways, there are only TWO members of this pool. The pool will run, and will load balance workflows, but if either pool member goes down, the pool suicides. In this case – you actually have WORSE availability than if you placed a single member in the pool!

In order to maintain high availability for a pool made of Gateways, you need to have THREE GW’s in the pool.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

Three Gateway servers in pool

The default observer is enabled by default.

There is high availability for the pool, because there are three voting members (3 GW)

By default – you can only have ONE Gateway server down, to maintain the pool. (greater than 50% rule) because if two GW are down, this is >50% of voting members, so pool suicides.

The default observer should NOT be used here, because Gateways do not have a local SDK service, therefore they cannot query the database.

Let’s take a minute and process this.

What we have learned, is that you should remove the DO from any pool comprised of Gateways.

You should consider removing the DO from pools when 5 or more Management Servers are present.

If your pools are stable….. and you aren’t having any problems with high availability….. then this really doesn’t make much difference….. which is why the defaults are set like they are.

So we have talked about pool members, and the default observer…… but what about the “observer” role?

This role is really unique, and will not be used very often. I cannot think of a single enterprise deployment where I have seen it used. Generally speaking – if we are adding a dedicated observer for a pool (which is a management server or a GW server) then why not just make that server a full blown pool member?

There is only one scenario where I can think of where this might be useful. Such as a company with a datacenter with SCOM deployed. In the SAME DATACENTER, they have a DMZ with two gateways deployed because of firewall rules. In this case, you could potentially make their parent management server a dedicated observer only, and this would work because tcp_5723 is open already for Healthservice communication. This is incredibly rare, and the best practice would be to just go ahead and plan for three Gateways servers in the DMZ.

Remember – for resource pool members – Microsoft supports Management Servers and Gateways.

For resource pool observers – the same, Management Servers and Gateways.

That said – I have done some testing making an *agent* a dedicated observer, such as the DMZ scenario above, and it does work. The agent becomes a voting member for quorum, and high availability is created by this. Microsoft didn’t plan or test this scenario – so it is technically unsupported.

Which got me to thinking – “what it I create a resource pool, and make its membership strictly agents”???

Well, that works too. You cannot do this using the UI, but you can in PowerShell. I create a resource pool of only agents, then set up URL monitoring to that pool, and high availability and load distribution worked great. Again, not technically supported by Microsoft, but a unique capability nonetheless.

Lastly – I will demonstrate some PowerShell commands to work with this stuff.

To disable the default observer for a pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name"
$pool.UseDefaultObserver = $false
$pool.ApplyChanges()

To add or remove Management Servers or Gateways from a manual pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name"
$MS = Get-SCOMManagementServer -Name "YourMSorGW.domain.com"
$pool | Set-SCOMResourcePool -Member $MS -Action "Add"
$pool | Set-SCOMResourcePool -Member $MS -Action "Remove"

To add or remove Management Servers or Gateways as Observers only to a pool:

$pool = Get-SCOMResourcePool -DisplayName "Your Pool Name"
$Observer = Get-SCOMManagementServer -Name "YourMSorGW.domain.com"
$pool | Set-SCOMResourcePool -Observer $Observer -Action "Add"
$pool | Set-SCOMResourcePool -Observer $Observer -Action "Remove"

If you want to play with adding AGENTS as a resource pool member or observer (not supported) then simply change “Get-SCOMManagementServer” above – to “Get-SCOMAgent”

Credits:

A debt of gratitude to Mihai Sarbulescu at Microsoft for his guidance on this topic – he has forgotten more about Resource Pools than most people at Microsoft ever knew. Smile

↧

How to move views in My Workspace into a Management Pack

November 21, 2016, 9:13 pm

≫ Next: Windows Server 2016 Management Packs are available

≪ Previous: Understanding SCOM Resource Pools

Customers often use “My Workspace” to create customized views that they use on a regular basis. However, one of the drawbacks on My Workspace is that these views are not available to any other users.

Generally, we recommend customers use My Workspace to test and develop views, then simply re-create them in a Management Pack when they are happy with the results. But what if you wanted to just forklift them from My Workspace, into a MP programmatically?

This isn’t simple – because of how these views are stored. You can access these views in the database with the following query:

SELECT
  ss.UserSid,
  ss.SavedSearchName AS 'ViewDisplayName',
  vt.ViewTypeName,
  mpv.Name AS 'MPName',
  ss.ConfigurationXML
FROM SavedSearch ss
  INNER JOIN ViewType vt ON ss.ViewTypeId = vt.ViewTypeId
  INNER JOIN ManagementPackView mpv on vt.ManagementPackId = mpv.Id
WHERE
  ss.TargetManagedTypeId is not NULL

This will yield output like so:

From this, you can either manually copy and paste this data into a management pack, or you could event build an MP snippet to take this data as input from a CSV, based on the query output. https://blogs.technet.microsoft.com/kevinholman/2014/01/21/how-to-use-snippets-in-vsae-to-write-lots-of-workflows-quickly/

Here is a simple MP example where you could manually copy and paste the My Workspace data into:

<?xml version="1.0" encoding="utf-8"?><ManagementPack ContentReadable="true" SchemaVersion="2.0" OriginalSchemaVersion="1.1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <Manifest>
    <Identity>
      <ID>Demo.Views</ID>
      <Version>1.0.0.0</Version>
    </Identity>
    <Name>Demo - Views</Name>
    <References>
      <Reference Alias="SC">
        <ID>Microsoft.SystemCenter.Library</ID>
        <Version>7.0.8433.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
      <Reference Alias="System">
        <ID>System.Library</ID>
        <Version>7.5.8501.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
    </References>
  </Manifest>
  <Presentation>
    <Views>
      <View ID="View.1" Accessibility="Public" Enabled="true" Target="System!System.Entity" TypeID="SC!Microsoft.SystemCenter.AlertViewType" Visible="true">
<!-- In the line above change the following:
    Change the View ID to something unique in this MP for each view you create
    Change the MP reference alias (SC!) to match a reference MP seen in MPName
    Change the ViewType example (Microsoft.SystemCenter.AlertViewType) from ViewTypeName
-->
        <Category>Custom</Category>
<!-- Insert your query data ConfigurationXML below this line  -->



<!-- Insert your query data ConfigurationXML above this line  -->
      </View>
    </Views>
    <Folders>
      <Folder ID="Demo.Views.Root.Folder" Accessibility="Public" ParentFolder="SC!Microsoft.SystemCenter.Monitoring.ViewFolder.Root" />
    </Folders>
    <FolderItems>
      <FolderItem ElementID="View.1" ID="ib42bd9704bf54df0a6c18c9f5c1614ca" Folder="Demo.Views.Root.Folder" />
    </FolderItems>
  </Presentation>
  <LanguagePacks>
    <LanguagePack ID="ENU" IsDefault="false">
      <DisplayStrings>
        <DisplayString ElementID="Demo.Views">
          <Name>Demo - Views</Name>
        </DisplayString>
        <DisplayString ElementID="Demo.Views.Root.Folder">
          <Name>Demo - Views</Name>
        </DisplayString>
        <DisplayString ElementID="View.1">
          <Name>All Alerts</Name>
        </DisplayString>
      </DisplayStrings>
    </LanguagePack>
  </LanguagePacks>
</ManagementPack>

As you can see, it is a lot of work, not something I’d want to do on a regular basis…. but if you had this as a requirement, it’s possible! Smile

↧

Windows Server 2016 Management Packs are available

November 30, 2016, 5:40 am

≫ Next: Extending Windows Computer class from Registry Keys in SCOM

≪ Previous: How to move views in My Workspace into a Management Pack

These are now available:

https://www.microsoft.com/en-us/download/details.aspx?id=54303

This is version 10.0.8.0. It covers the Windows Server 2016 MP release ONLY, and does not include other management packs for previous operating systems, which will be interesting to see how the product group combines these in future updates, since they share the same base libraries.

Check out the guide – there are MANY updates in this release from the previous technical preview MP’s, so you can tell the product team has been hard at work on these.

All the management packs are supported on System Center 2012, System Center 2012 R2 and System Center 2016 Operations Manager.
Please note that Server Nano monitoring is supported by SCOM 2016 only.

Changes in Version 10.0.8.0
•    Added two new object types (Windows Server 2016 Computer (Nano) and Windows Server 2016 Operation System (Nano)) and a new group type (Windows Server 2016 Computer Group (Nano)). This improvement will help users to differentiate the groups and object types and manage them more accurately.
•    Added a new monitor: Windows Server 2016 Storport Miniport Driver Timed Out Monitor; the monitor alerts when the Storport miniport driver times out a request.
•    Fixed bug with duplicating Nano Server Cluster Disk and Nano Server Cluster Shared Volumes health discoveries upon MP upgrade. See Troubleshooting and Known Issues section for details.
•    Fixed bug with Windows Server 2016 Operating System BPA Monitor: it did not work.
•    Fixed bug with incorrect discovery of Windows Server Operating System on Windows Server 2016 agentless cluster computers occurring upon management pack upgrade. See Troubleshooting and Known Issues section for details.
•    Fixed bug: Free Space monitors did not work on Nano Server.
•    Changed the logic of setting the override threshold values for Free Space (MB and %) monitors: a user can set the threshold values for Error state even within Warning state default thresholds. At that, the Error state will supersede the Warning state according to the set values.
•    Fixed localization issue with root report folder in the Report Library.
•    Fixed bug: Windows Server 2016 Computer discovery was causing repeated log events (EventID: 10000) due to improper discovery of non-2016 Windows Server computers.
•    Fixed bug: [Nano Server] Cluster Seed Name discovery was causing repeated log events (EventID: 10000) due to improper discovery of non-Nano objects.
•    Due to incompatibility issues in monitoring logic, several Cluster Shared Volumes MP bugs remained in version 10.0.3.0. These are now fixed in the current version (see the complete list of bugs below). To provide compatibility with the previous MP versions, all monitoring logic (structure of classes’ discovery) was reverted to the one present in version 10.0.1.0.
o    Fixed bug: disk free space monitoring issue on Quorum disks in failover clusters; the monitor was displayed as healthy, but actually it did not work and no performance data was collected.
o    Fixed bug: logical disk discovery did not discover logical disk on non-clustered server with Failover Cluster Feature enabled.
o    Fixed bug: Cluster Shared Volumes were being discovered twice – as a Cluster Shared Volume and as a logical disk; now they are discovered as Cluster Shared Volumes only.
o    Fixed bug (partially): mount points were being discovered twice for cluster disks mounted to a folder – as a cluster disk and as a logical disk. See Troubleshooting and Known Issues section for details.
o    Fixed bug: Cluster Shared Volume objects were being discovered incorrectly when they had more than one partition (applied to discovery and monitoring): only one partition was discovered, while the monitoring data was discovered for all partitions available. The key field is changed, and now partitions are discovered correctly; see Troubleshooting and Known Issues section for details.
o    Fixed bug: Windows Server 2008 Max Concurrent API Monitor did not work on Windows Server 2008 platform. Now, it is supported on Windows Server platforms starting from Windows Server 2008 R2.
o    Fixed bug: when network resource name was changed in Failover Cluster Management, the previously discovered virtual computer and its objects were displayed for a short time, while new virtual computer and its objects were already discovered.
o    Fixed bug: performance counters for physical CPU (sockets) were collected incorrectly (for separate cores, but not for the physical CPU as a whole).
o    Fixed bug: Windows Server 2016 Operating System BPA monitor was failing with “Command Not Found” exception. Also, see Troubleshooting and Known Issues section for details on the corresponding task.
o    Fixed bug: View Best Practices Analyzer compliance task was failing with exception: “There has been a Best Practice Analyzer error for Model Id”.
o    Fixed bug: in the Operations Console, “Volume Name” fields for logical disks, mount points, or Cluster Shared Volumes were empty in “Detail View”, while the corresponding data was entered correctly.
o    Fixed bug: Logical Disk Fragmentation Level monitor was not working; it never changed its state from “Healthy”.
o    Fixed bug: Logical Disk Defragmentation task was not working on Nano Server.
o    Fixed bug: If network resource name contained more than 15 symbols, the last symbols of the name was cut off, which was resulting in cluster disks and Cluster Shared Volume discovery issues.
o    Fixed bug: Logical Disk Free Space monitor did not change its state. Now it is fixed and considered as deprecated.
•    The Management Pack was checked for compatibility with the latest versions of Windows Server 2016 and updated to support the latest version of Nano Server.
•    Added new design for CPU monitoring: physical and logical CPUs are now monitored in different way.
•    Updated Knowledge Base articles and display strings.
•    Improved discovery of multiple (10+) physical disks.
•    Added compatibility with Nano installation.