Profil de OperationsOperations ManagerBlogListes Outils Aide

Blog


14/07/2007

OpsMgr by Example: Monitoring Web Applications

Operations Manager 2007 includes built-in website monitoring functionality (similar to that provided by MOM 2005’s Web Sites and Web Services MP), using the "Web Application" Management Pack Template. This functionality is quite useful for monitoring web sites. The template records where you go with your browser (to use this functionality you need to configure your browser, in Internet Explorer under Tools -> Internet Options -> Advanced -> Enable third party browser extensions (requires restart) in both IE 6 and IE 7). Web Applications are created in the Operations console under Authoring -> Management Pack Templates -> Web Application.

We decided to start simple, and then move into more complex monitoring configurations.

Starting simple was developing a web application that monitors a single webpage (such as www.google.com or www.microsoft.com) without requiring authentication. There is a great write-up available at http://www.technotesblog.com/?p=432 which provides an step-by-step process to create monitoring for a single web page. We used www.google.com in our particular example, and as in the TechNotesBlog example (which uses the Microsoft website), we disabled link tracking.

Getting more complex: once our application was in a working state we went to the next step that we wanted to test - monitoring the OpsMgr Operations Web console. Since the Operations Web console requires authentication, the monitoring setup is more difficult. We created a new Web Application called Operations Web console (and stored it in a new, non-default management pack), and had the application browse to http://(servername):51908/default.aspx for the Operations Web console. We created the Web Application using the default configurations and ran it on Windows 2003, Windows XP and Windows Vista workstation systems (one of each for testing purposes). Each of these systems went to a critical status due to an access is denied message.

You can check the status of the monitored web sites by navigating in the Operations console to Monitoring -> Web Application -> Web Application State. You can also right-click and open the performance view for any of these and receive a large number of performance information is collected, check the graphic for an example.

Web Site Performance Counters

Resolving the security issue required creating a Run As Account of type Windows (under Administration -> Security -> Run As Accounts), using an account with permissions to access the Operations Web console. We then configured this account to be used by the Web Application in the Authoring section under Authoring -> Management Pack Templates -> Web Application:

  • Edit the web application settings for the Operations Web console just created
  • Select the General tab to configure its settings, select the authentication method of NLTM and specify the account created to monitor the web site

After going back to monitoring section (Operations Console -> Monitoring -> Web Application -> Web Application State) and waiting a little bit, the Operations Web console monitor went to green.

To get even more complex, we created a web test that used the recorder. The Reporting Server was a good test for this. The URL for the Reporting Server is located under Administration -> Settings -> Reporting. In our testing environment this has a value of http://QUICKSILVER:80/ReportServer. To record, we started with http://QUICKSILVER/Reports and worked from that point. We opened up a graphic, and a folder, and a report during the capture process. Running a report would also be an option, but as this would run on a regular basis (every few minutes) we did not want to create that level of overhead with our monitoring. We configured the authentication method (NTLM and the account we previously created) and the watcher node. We then checked its status in the Health Explorer (see graphic), all were green.

Web Site Health Explorer

Lessons Learned:

The systems performing the watcher function did not have any customizations made to their browsers, such as adding the browser location to the trusted sites. Some servers would work well as watchers and other would not (in our case the Root Management Server). We were unable to identify a specific reason for this.

Don’t test authentication items within the Web Application creator. It brings up a pop-up that warns “Running a test of this web application may fail. While running the test, credentials that have been configured for this web application will not be used. If the site you are testing does not explicitly require authentication, the test may still succeed.” Test these by actually checking their alerts and status on the monitoring tab.

If the site requires authentication to get to it, you need to configure authentication for the web tests. Check IIS to see what type it allows and provide a match (NTLM = Integrated Authentication in this particular case).

13/07/2007

OpsMgr by Example: The AD Management Pack

This blog entry is the next in a series of Operations Manager-related items which review the steps performed to install, configure and tune management packs in real-world environments.
Installation:
  1. Download the Active Directory Management Pack (http://www.microsoft.com/downloads/details.aspx?FamilyId=008F58A6-DC67-4E59-95C6-D7C7C34A1447&displaylang=en), and the Active Directory Management Pack Guide (http://www.microsoft.com/downloads/details.aspx?FamilyID=4b945737-e77f-4851-a11c-c4f79c36c360&DisplayLang=en).
  2. Read the Management Pack guide – cover to cover. There are important pieces you need to know that this document spells out in detail.
  3. Import the AD Management Pack (either using the Operations console or PowerShell).
  4. Deploy the OpsMgr agent to all Domain Controllers. The agent must be deployed to all Domain Controllers. Agentless configurations will NOT work for the AD Management Pack.
  5. Get a list of all domain controllers from the Operations console. In the Authoring node, navigate to Authoring -> Groups -> Domain Controllers. Right-click on the group(s) and select View Group Members.
  6. Enable Agent Proxy configuration on all Domain Controllers identified from the groups. This is in the Administration node under Administration -> Device Management -> Agent Managed. Right-click on each domain controller, select Properties, then the Security tab, and check the box to “Allow this agent to act as a proxy and discover managed objects on other computers.” This has to be done for EVERY DOMAIN CONTROLLER (DC), even if the DC is added after your initial configuration of OpsMgr.
  7. Configure the Replication Account under Administration -> Security (full details for this are in the AD MP Guide). This also has to be done for every domain controller, even if a DC is added after your initial OpsMgr configuration.
  8. Validate the existence of the “MOMLatencyMonitors” container. Within this container there should be sub-folders created for each DC, and having the name of each domain controller. If the container does not exist, it is often due to insufficient permissions. (See configuring the replication account within the AD MP Guide for details.)
  9. Open the Operations Console. Go to the Monitoring node and navigate to Monitoring -> Microsoft Windows Active Directory -> Topology Views. You may have to set the scope to the AD Domain Controllers Group to get these views to populate.
  10. Check to make sure that Active Directory shows up under Monitoring -> Distributed Applications as a distributed application which is in the Healthy, Warning or Critical state. If it is in the “Not Monitored” state, check for domain controllers which are not installed or are in a “gray” state.

Tuning/Alerts to Look for: The following are alerts we encountered resolved while tuning of the Active Directory Management pack.

Alert: AD Replication Monitoring – Access denied

Issue: This occurred on one domain controller and there was also an alert stating that it failed to create the MOMLatencyMonitors container. Validated the container by logging into the domain controller, opening up AD Users and Computers, View/Advanced Features, and seeing that the container (and the two existing domain controllers as sub-containers) did exist, per the following screenshot.

 aduc

Resolution: Already resolved as the MSAA had the permissions required to create this container. Validated the MOMLatencyMonitors container existed and that container included sub-folders matching the name of each domain controller. (If the container does not exist, it is often due to insufficient permissions; see configuring the replication account within the AD MP Guide for configuration information.)

Alert: Script or executable failed to run

Issue: On the domain controllers, failure on ADLocalDiscoveryDC.vbs on each domain controller prior to SP1 in OpsMgr.

Resolution: Looking at this thread on the Microsoft TechNet website, http://forums.microsoft.com/technet/showpost.aspx?postid=1628491&siteid=17&sb=0&d=1&at=7&ft=11&tf=0&pageid=1 this appears to be a pre-SP1 issue, so we disabled the rule until SP1 releases. To disable, navigate to Authoring -> Management Pack Objects -> Object Discoveries and perform a Find on “AD DC Local Discovery.” You may have two of these (Windows 2000 Server, Windows Server 2003), depending on what versions of the management pack were imported into your management group. Create an override to disable both rules for all objects of “Windows Domain Controller.” Remove these overrides when you implement Service Pack 1 for OpsMgr 2007.

Alert: The Op Master PDC Last Bind latency is above the configured threshold

Issue: Bind from the domain controller identified in the alert to the PDC emulator is slower than 5 seconds for a warning and slower than 15 seconds for an error. This occurred in a remote site connecting to a central site with the PDC emulator role.

Resolution: The alert appears to be due to a slowness in the link between the two locations, or a condition where one of the two servers identified may have been overloaded. In this particular case it was caused by a domain controller which was overloaded due to insufficient hardware, which had to be decommissioned.

Alert: Session setup failed because no trust account exists : Script – AD Validate Server Trust Event

Issue: Specific computer accounts were identified multiple times as not containing a trust account

Resolution: This is caused by either systems which believe that they are part of the domain but no longer are, or often by systems that are being imaged. Resolution of this is either to drop and rejoin the system to the domain or to close the alert if the system is no longer online.

Alert: KCC cannot compute a replication path

Issue: KCC detected problems on multiple domain controllers

Resolution: Connectivity was lost from the central site to a remote site for a period of several hours. The remote site was down due to a power outage. Errors were logged every 15 minutes from when it was down until when the site was back online. This also occurred when a domain controller had been shut off but still existed from the perspective of Active Directory. This can also occur in environments where the site topology is set to automatically generate the site links but the network is configured so that some sites cannot see other sites. (As an example, in a configuration with a hub in Dallas and sites in Frisco and Plano, where both sites can see Dallas but cannot see each other.)

Alert: A problem was detected with the trust relationship between two domains

Issue: The domain controllers could not connect to the domain controller in the other domain. This was due to a routing issue between the specific domain controllers and the domain controller in the remote domain. Remote sites were connected via VPN and could not route to that subnet.

Resolution: Provided routing from the domain controllers to the domain controller in the other domain.

Alert: AD Replication is slower than the configured threshold

Intersite Expected Max Latency (min) default 15

Intrasite Expected Max Latency (min) default 5.

Issue: This alert will also occur if connectivity is lost between sites for a long enough period of time.

Resolution: If the alert is not current and not repeating and if replication is occurring and the Repadmin Replsum task comes up clean, this alert can be noted (to see if there is a consistent day of week or time that it occurs at) and closed. We added a diagnostic to the AD Replication Monitoring monitor, for the critical state, taking the information from the REPADMIN Replsum task which provided (You must have the admin utilities installed on the DC for this to work):

<Configuration>

<ApplicationName>REPADMIN.EXE</ApplicationName>

<SupportToolsInstallDir>%ProgramFiles%\Support Tools\</SupportToolsInstallDir>

<CommandLine>/replsum</CommandLine>

<TimeoutSeconds>1200</TimeoutSeconds>

</Configuration>

We created the diagnostic to run automatically using:

Program: REPADMIN.EXE

Working Directory: %ProgramFiles%\Support Tools

Parameters: /replsum

Options available included changing the replication topology to replicate every 15 minutes, or configuring overrides. To resolve, we tried creating a custom group for the servers in the location (see the “Creating Computer Groups based on AD Site in OpsMgr” blog entry on http://Cameronfuller.spaces.live.com for additional information) and created an override for the new group changing the Intersite Expected Max Latency to 120 (so it would be double the configuration in AD Sites and Services). We performed this configuration for each remote location which did not have a 15 minute replication interval. This could also be done for all domain controllers using the domain controller computer group(s). This did not function as expected but is being used as an example for how overrides can be creatively configured, in this case based upon sites!

Alert: AD Replication is slower than the configured threshold

Intersite Expected Max Latency (min) default 15

Intrasite Expected Max Latency (min) default 5.

Issue: The remote location replication topology was defined to be 60 minutes, not the standard of 15.

Resolution: At this point in time there is no good workaround to change these configurations and maintain a Microsoft-supported configuration after the change is made. There are discussions in the newsgroups about changing these through exporting the MP, changing the XML and re-importing it as unsealed but Microsoft will not support the AD MP if it is changed in this way. The recommendation right now is if your environment does not use the 15 minute latency to disable both this alert, and the “AD Replication is occurring slowly” alert.

Alert: AD Replication is occurring slowly

Issue: Same as identified in alert “AD Replication is slower than the configured threshold”. This rule does not provide the ability to override the default configuration of 15 minutes. The AD environment is not configured with the default of 15 minutes so these rules do not apply as they are still replicating within a successful timeframe.

Resolution: Disabled this rule (AD Replication is occurring slowly) for group “AD Domain Controller Group (Windows 2003 Server)”. This could also be done for individual servers if there were a limited number of these where the AD replication was not configured with default replication times of 15 minutes. Closed the alerts.

Alert: Script Based Test Failed to Complete

Issue: AD Database and Log : The script ‘AD Database and Log’ failed to create object ‘McActiveDir.ActiveDirectory’. The error returned was: ‘ActiveX component can’t create object’ (0x1AD)

Resolution: Uninstalled OOMADS using Add/Remove programs, Active Directory Management Pack Helper Object (the original version was .05 in size) and re-installed the 64 bit equivalent which was AMD64 in this case. To do this we had to copy the MSI locally to the system to install it, after installation it was .07 in size within Add/Remove programs.

Tuning: Other Issues

Issue: Domain controllers in the DMZ would not install even though they are in a domain within the forest.

Resolution: Copied over the files and manually installed the agents. Opened up port 5723 on the firewall between these systems and the OpsMgr server. Removed the port 1270 which had been used for MOM 2005. (This issue should only occur if you previously used MOM 2005.)

Issue: One DC showing extremely high CPU usage/cscript errors.

Resolution: The server was running with 256 MB of memory, and was using significantly more than that even before the OpsMgr agent was deployed to it. Once the agent was deployed, memory usage went significantly higher and resulted in cscript errors which timed out due to the slowness of alerts.

Alert: One or more domain controllers may not be replicating.

Issue: The AD MP will report replication issues across all DC’s if only one was down (and thus not able to replicate its monitor objects).

Resolution: Get all domain controllers monitored by OpsMgr. Validate replication in the environment.

Tuning concept: Weekly close out any alerts greater than 5 days which have not been resolved if they represent issues which may have self-resolved.

Alert: Script or executable failed to run.

Issue: On the domain controllers, failure on ADLocalDiscoverDC.vbs on each domain controller prior to OpsMgr 2007 SP1.

Resolution: Looking at the http://forums.microsoft.com/technet/showpost.aspx?postid=1628491&siteid=17&sb=0&d=1&at=7&ft=11&tf=0&pageid=1 thread on the Microsoft TechNet website, this appears to be a pre-SP1 issue, so we disabled the rule until SP1 releases. To disable, navigate to Authoring -> Management Pack Objects -> Object Discoveries, and perform a Find on "AD DC Local Discovery". You may have two of these rules (Windows 2000 Server, Windows Server 2003), depending on the versions of the management pack that were imported into your management group. Create an override to disable both rules for all objects of "Windows Domain Controller". Remove these overrides when you implement Service Pack 1 for OpsMgr 2007.

Problem: We can't disable this until ALL domain controllers are integrated into OpsMgr. If the rule is disabled before the domain controllers are added, they will never get added.

 

Additional Thoughts: Install the support tools on the domain controllers so you can take advantage of the tasks and use the tools as part of the diagnostics and recoveries.

11/07/2007

OpsMgr by Example: Configuring Baselines - Part II

This entry is an update to the June 20 2007 article on Configuring Baselines.

After configuring the inner and outer sensitivities (Steps 4 and 5 in that article), several of the rules were still  generating large volumes of alerts. Those rules include:

  • IS Virtual Bytes is outside the calculated baseline
  • Number of RPC requests is outside the calculated baseline

The alerts were identified as "Above Inner Envelope." To minimize their frequency, we previously changed both the rule and the monitor's sensitivity from 2.81 to 3.31 on the overrides.

From reading up on this sensitivity concept, it appears that increases to this value decreases the frequency of the alerts, as it decreases the sensitivity to the difference from the calculated baseline.

In theory, if the 3.31 override was not sufficient, then one should next try 3.81; this is because the increase from 2.81 to 3.31 is an increase of .5; therefore another .5 increase seems logical if another value change is required. This is an extrapolation based on what we have seen so far, as we do not know the internal workings of the algorithm! 

 ... Feedback received indicates the change to 3.81 was even better. 

7/19/2007 7:57 AM

09/07/2007

More on TechEd 2007: OpsMgr podcast (In the Fish bowl)

While Kerrie and Cameron were at TechEd in June, Microsoft asked us to do a podcast for Microsoft for Virtual TechEd. Over 60 videotaped interviews were recorded in the "Fish Bowl" that week. The fish bowl was a little soundproof recording studio behind the TechEd bookstore, in the middle of all the action. It had a big leather sofa which filled up the entire room :)

Our podcast discusses our authoring experiences for the MOM 2005 Unleashed book and our upcoming System Center Operations Manager 2007 Unleashed, which will be published by SAMS. The OpsMgr 2007 book is now over 50% author complete!

If you want to see the podcast (and see what we look like), it is available at Authoring Microsoft Operations Manager 2005 Unleashed! The main page for Virtual TechEd is www.virtualteched.com.

We also did a podcast for Pearson (who publishes the SAMS Unleashed series) while we were at TechEd; we'll let you know how to access that once it is available.