Profil de OperationsOperations ManagerBlogListes Outils Aide

Blog


26/09/2007

OpsMgr by Example: The Dell Management Pack

This blog entry is the next in a series of Operations Manager-related items which review the steps performed to install, configure and tune management packs in real-world environments.

The Dell management pack guide is part of the Dell Management Pack download available at http://support.dell.com/FileLib/Format.aspx?c=us&l=en&ReleaseID=R158716. The OpsMgr Dell management pack actually only takes what information is provided by the Dell OpenManage agent and integrates it with OpsMgr. As a result, the alerts raised are directly related to hardware issues that are shown in the logs available through the OpenManage interface.

Prerequisites:
Before you install the Dell MP, you must install the updated Microsoft Operations Manager 2005 Backward Compatibility Management Pack. You can download the Backward Compatibility MP at http://go.microsoft.com/fwlink/?LinkId=98874. Without the updated Backward Compatibility MP, you may experience CPU spikes!

Lessons Learned:

Make sure that all of the systems in your environment that have the Dell OpenManage software installed are at least version 5.2. A variety of errors  occur if you try to monitor using the OpsMgr management pack and an older version of Dell OpenManage (including a lot of Script or Executable Failed to run alerts). You can check the version either by running the Dell Server Administrator and checking the version it lists or through checking the registry key available under HKLM\SOFTWARE\Dell Computer Corporation\OpenManage\Applications\SystemsManagement, in the Version field.

Tuning/Alerts to Look for: The following are alerts that were found and resolved during the tuning of Dell Management pack.

Alert: Dell.Connections.ServerAdministrator.Alert.1306.Critical

Issue: Redundancy lost Redundancy unit: System Power Unit Chassis location: Main System Chassis Previous redundancy state was Normal Number of devices required for full redundancy: 2. Checked with the Launch Server Administrator task, and did not find any current issues on the server. The actual issue was in the alert log not in the hardware log.

Resolution: Found the issue through the alert log on the Dell OpenManage. This appears to be an issue with a sensor or a power supply on the system. Entered company knowledge on the time and server that the alert occurred on to determine if this is a component that may be failing.

Alert: Dell.Connections.ServerAdministrator.Alert.1104

Issue: Fan sensor detected a failure Sensor location: ESM MB Fan7 RPM Chassis location: Main System Chassis Previous state was: OK (Normal) Fan sensor value (in RPM): 0. Checked with the Launch Server Administrator task, and did not find any current issues on the server. The actual issue was in the alert log not in the hardware log.

Resolution: Found the issue through the alert log on the Dell OpenManage. This appears to be an issue with a sensor or a power supply on the system.

Issue: DellStorageDiscovery.vbs failing on an Exchange 2003 server, process exited with 0. Searched the XML files to validate that this is part of the Dell management pack. Checked the Dell Server Administrator on the system and it was running version 1.9 (5.2 is required).

Resolution: Upgrade the version of the Dell Server Administrator software or disable the alert on this system.

Alert: Script or Executable Failed to run

Issue: DellServerFansUnitUnitMonitor.vbs failing on a Windows 2003 server. Checked the Dell Server Administrator on the system and it was running version 1.8 (5.2 is required).

Resolution: Upgrade the version of the Dell Server Administrator software or disable the alert on this system.

Alert: Dell.Connections.ServerAdministrator.Alert.1554

Issue: Log size is full Log type: ESM

Resolution: Validated that the log was full (used the Launch Server Administrator task) and then use the Clear ESM Logs task to clear out the logs as the items were not current but were historical. Closed the alert.

Alert: Dell.Connections.ServerAdministrator.Alert.1553

Issue: Log size is near or at capacity Log type: ESM

Resolution: Used the Clear ESM Logs task after reviewing them with the Launch Server Administrator task. Closed the alert.

22/09/2007

OpsMgr by Example: Tuning Management Packs

Philosophy

Deploying management packs involves planning, evaluating in a test environment, and finally importing the management pack along with any changes from test to your production environment. You also will want to decide which management packs you want to deploy, and in what order. Some applications have dependencies on others, so you may want to implement the related management packs as you deploy those products.

For example, say you use Exchange Server. Exchange requires Active Directory, which in turn utilizes DNS. You would want to first test and implement in order the dependencies, e.g. the DNS MP, then AD, and finally Exchange. Another consideration is where you might get the most "bang for the buck" - which management packs give you the most benefit for the least amount of cost (tuning, resources, or effort). The order in which you implement management packs will depend on the priorities of your organization and your goals for monitoring.

TIPS

  • It is best to introduce a single management pack at a time, as this practice makes it easier to deal with management pack issues as they occur. When there is more than one management pack involved it may be difficult to determine what initially caused a problem.
  • We also suggest you only install those management packs you need, as extra rules and monitors impose a cost on your system resources; increased memory utilizations of the agents targeted by the management pack, and increased traffic between the managed computers and the management server.

Initial Tuning - By Function

Once you have determined your strategy and order for deploying management packs, it is time to import selected management packs one at a time into your test environment. As part of your approach, be sure to refer to that MP's management pack guide. The MP guides discuss particulars for installing, configuring, and tuning that particular management pack. The management pack guides are typically included in the download package with the management pack.

It is of course, best to tune in a test environment as the impact of a badly-performing rule or monitor is less than in production. Testing pre-production also helps minimize the information load and unnecessary work for your production computer operators.

As you evaluate a management pack's behavior you may decide to tune one or more parameters to meet your organization's needs. For example, you may have a performance monitor generating an alert or threshold value that is inappropriate for your particular environment. You can tune that setting by overriding the default settings for that monitor.

You will either work on a server-by-server or application-by-application basis, tuning from the highest severity alerts and dependencies to the lowest.

A server-by-server approach addresses issues identified while deploying servers into OpsMgr. Once that is complete, your process should be an application-by-application / service-by-service basis, focusing on the overall health of the application or service. Look at alerts first, then open he Health Explorer to drill down specifically into the problem.

Managing Alerts

When you implement a management pack, it will generate alerts that you will want to review and evaluate for tuning. Some rules and monitors may generate low severity alerts, depending on your specific environment these may not be worth investigating or resolving, and you may consider disabling that rule or monitor. Any changes made are saved to an unsealed management pack. You can document your actions using the Company Knowledge section of the object.

Review the Operations console regularly to see whether information is captured that is unnecessary for your environment, as this could take up an excessive amount of storage within Operations Manager.

Tuning Tips

  • Review any new alerts reported for servers monitored with the new management pack. You can use the Alerts and Most Common Alerts reports to help you discover your most common alerts.
  • Resolve the issue generating the alert. Use the product knowledge base information regarding the specific error. When you first install a management pack, it tends to discover a multitude of previously unknown issues. Monitor the alerts to determine potential areas of concern
  • Override the monitor or rule as applicable for a particular object type, a group, or a specific object.
  • Disable the monitor or rule if the issue is not severe enough to warrant an alert and you do not need to be made aware of the specific situation being monitored.
  • Change the threshold of the monitor that is generating the alert if you want the underlying condition to be monitored, but the alert is being generated before the condition is actually a problem for your particular environment. Remember OpsMgr incorporates self-tuning thresholds, so in many instances the system will do this for you.
  • If a new management pack generates a ton of alerts, you may want to start by disabling monitors or rules within the that management pack. You can turn them on gradually, making the new management pack easier to tune and troubleshoot.

Finished?

After you reach a comfort level with each management pack, it is time to put it into production. Because you've already "tuned" the MP and want to keep your changes, export your customizations from the test environment and then import that into production along with the vendor-provided management pack.

This is an abbreviated excerpt of tuning information from our forthcoming book, System Center Operations Manager 2007 Unleashed.

19/09/2007

OpsMgr 2007 Resource Kit: The Vista gadget bar

(This article was contributed by our co-author on the Operations Manager 2007 Unleashed book, John Joyner.)

Last week, Microsoft released the OpsMgr 2007 resource kit.

It includes a Vista gadget bar. If you run Vista, you can attach a Vista gadget that is like the Red/Yellow/Green state counter toolbar in MOM 2005. This feature complements the OM Console significantly, so much that it makes Vista almost a preferred desktop platform for OM 07 operators. The gadget is displayed below:

vista gadget

To use the gadget, you select the target class or group you are interested in, such as the Computer class or the All Computers group. The top section lists active alerts from that target, you can modify sort order. The bottom section lists the state of the objects in the target class or group. For example, Exchange administrators can run the gadget focused on the Exchange group, to keep a handle on Exchange server events and state.

The next two figures show fly-out views when clicking the Alerts or State section of the gadget, respectively.

Clicking on the Alerts section of the gadget Clicking on the State section of the gadget

(Selecting these figures by clicking on them enlarges them so you can see the detail.)

The gadget can be downloaded from http://go.microsoft.com/fwlink/?LinkId=94593 (the System Center Operations Manager TechCenter). You can also get it directly from Microsoft's download center at http://www.microsoft.com/downloads/details.aspx?FamilyID=0f5a060b-635d-4355-b0ac-32ee2943bfd9&DisplayLang=en

17/09/2007

OpsMgr by Example: The Exchange 2007 Management Pack

This blog entry continues our series of Operations Manager-related items that review the steps performed to install, configure and tune management packs in real-world environments.

Since there is not currently an Exchange 2007 management pack for Operations Manager 2007 (but there is one for MOM 2005), the following results are from a tuning exercise on Exchange 2007 using a version of the management pack converted from MOM 2005 to OpsMgr 2007. The core functionality appears to have converted well (excellent call on this one/thanks to our co-author John Joyner for the idea and his help getting this put together!). Tuning on this version of the management pack should provide a jump-start for tuning on the Exchange 2007 management pack designed specifically for OpsMgr 2007. The management pack guide is available at: http://technet.microsoft.com/en-us/library/bb217782.aspx.

Issues:

· No reports (it is a converted management pack so that is why - there should be reports in the Exchange 2007 MP when it is released by Microsoft).

· Cannot edit company knowledge on the monitors within the Exchange 2007 management pack (there is no Company Knowledge tab within the Alert properties).

· No Exchange wizard for configuration: Whoho!

Tuning/Alerts to Look for: The following are alerts that we found and resolved while tuning the converted Exchange 2007 Management pack.

Alert: LDAP Search Time - sustained for 5 minutes - Red(>100msec).

Issue: This condition occurs sporadically on the servers in the environment.

Resolution: First tried to create an override for the rule [LDAP Search Time – sustained for 5 minutes – Red(>100msec)] to move this from sustained for 5 minutes to sustained for 10 minutes. (Also in another environment we tried setting the override to 15 minutes). Re-configured this threshold to a higher value (200msec) as no network issues were found on the systems (gigabit linked on each side, direct to the switch), no bottlenecks were found on the systems, and this is occurring in multiple environments. Changed the threshold [right-clicked on the LDAP Search Time – sustained for 5 minutes – Red(>100msec) and choose View or edit the settings of this Monitor, on the configuration tab, within the XML changed 100 to 200]. Also renamed to now say >200msec and renamed the alert as well.

Alert: LDAP Search Time - sustained for 5 minutes - Yellow(>50msec).

Issue: This condition occurs sporadically on the servers in the environment.

Resolution: First tried to create an override for the rule [LDAP Search Time – sustained for 5 minutes – Yellow(>50msec)] to move this from sustained for 5 minutes to sustained for 10 minutes. (Also in another environment we tried setting the override to 15 minutes). Re-configured this threshold to a higher value (150msec) as no network issues were found on the systems (gigabit linked on each side, direct to the switch), no bottlenecks were found on the systems, and this is occurring in multiple environments. Changed the threshold [right-clicked on the LDAP Search Time – sustained for 5 minutes – Yellow(>50msec) and choose View or edit the settings of this Monitor, on the configuration tab, within the XML changed 100 to 200]. Also renamed to now say >150msec and renamed the alert as well.

Alert: Application log size.

Issue: Exchange 2007 application log size was 16 MB, per the Exchange 2007 MP this should be at least 40 MB for Exchange servers.

Resolution: Increased the application log size on the servers indicated.

Alert: Crash upload logging disabled.

Issue: Exchange fatal information is not being sent to Microsoft.

Resolution: Per the knowledge link, this can be changed with the Exchange UI (http://technet.microsoft.com/en-us/library/2582b127-b826-4eac-88b6-47a79ed49c6d.aspx) to resolve the issue. For those environments where there is a requirement to not send this information, the rule can be disabled.

Alert: WebServices connectivity (Internal) transaction failure - The credentials cannot be used to test Web Services.

Additional Alerts:

Error occurred while executing the Test-ExchangeSearch diagnostic cmdlet.

Error occurred while executing the Test-Mailflow (Remote) diagnostic cmdlet.

Error occurred while executing the Test-Mailflow (Local) diagnostic cmdlet.

Error occurred while executing the Test-MAPIConnectivity diagnostic cmdlet.

Exchange ActiveSync connectivity (Internal) transaction failure - The test credentials cannot be used to test Exchange ActiveSync.

Outlook Web Access connectivity (External) transaction failure - The test credentials cannot be used to test Outlook Web Access.

Outlook Web Access connectivity (Internal) transaction failure - The test credentials cannot be used to test Outlook Web Access.

Issue: Exchange 2007 management pack configuration required.

Resolution: Ran the new-TestCasConnectivityUser.ps1 on the Exchange server from the Exchange Management Console within the Exchange shell on the Mailbox server. To run the utility, enter a temporary password for the system, press enter to continue, and specify an OrganizationUnit to put this in (the OU name needs to be unique or you need to point it to the full name of the OU). This creates the account in the OU that you specify. CAS_{sid}

Alert: The Microsoft Exchange Replication Service requires re-seeding a storage group on the passive node.

Issue: Passive node

Resolution: Microsoft provides product knowledge on how to fix this, available at: http://technet.microsoft.com/en-us/library/63367703-1226-44b2-a4b8-205ed7222da0.aspx

Alert: MSExchange Replication: ReplayQueueLength - sustained for 5 minutes - Red(>15).

Issue: Problems were occurring with Cluster Continuous Replication where the passive node required a re-seeding

Resolution: See the “The Microsoft Exchange Replication Service requires re-seeding a storage group on the passive node.” alert.

Alert: Edge Synchronization transaction failure - Recipients are out of sync.

Issue: Edge Synchronization issue

Resolution: http://technet.microsoft.com/en-us/library/7b5897c5-9c72-40ee-b977-4f4f6821d1ed.aspx

Alert: Percentage of Committed Memory in Use is too high

Issue: Several Microsoft products including Exchange, SQL Server, and the Operations Manager RMS will use all available memory. This is especially noticeable on 64-bit platforms where memory can scale-out more effectively for the applications.

Resolution: Configured servers with Exchange, SQL or Operations Manager RMS to have a 95% threshold instead of 80%.

Additional Notes:

It may be required to configure URL monitoring to work correctly on the managed Exchange 2007 box: (the following  command sets the configuration is documented at http://technet.microsoft.com/en-us/library/bb691294.aspx)

set-owavirtualdirectory "Server01\owa (Default Web Site)" -externalurl:"https://Server01.Domain.contoso.com/owa"

Alerts Not Resolved:

Alert: Delay DSNs - increase over 60 minutes - Red(>20) - Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Delay DSNs - increase over 60 minutes - Yellow(>10) - Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Failure DSNs Total - increase over 60 minutes - Red(>40) - Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Failure DSNs Total - increase over 60 minutes - Yellow(>30) - Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Inbound direct trust certificate has expired. Run New-ExchangeCertificate to generate a new direct trust certificate.

Issue: Unknown.

Resolution: Unknown.

10/09/2007

OpsMgr by Example: The Jalasoft Management Pack

This blog entry is the next in a series of Operations Manager-related items which review the steps performed to install, configure and tune management packs in real-world environments.

What is Jalasoft and why would people want to deploy it with Operations Manager 2007? Jalasoft provides extensions which allow OpsMgr to monitor routers & switches as well as Unix based operating systems. The full list includes:

  • APC UPS
  • Availability (ICMP only)
  • Cisco PIX/ASA
  • Cisco Routers
  • Cisco Switches
  • Cisco VPN Concentrators
  • Cisco Wireless
  • F5 Big Ip
  • Generic Network Device
  • HP Procurve Switches
  • Linux MySQL
  • Linux Servers
  • Solaris Servers
  • VMware ESX
  • VMware VirtualCenter

There are a variety of vendors which provide competitive products to the Jalasoft offerings. These vendors include:

  • eXc Software’s management packs and virtual agents
  • Quest’s Management Xtensions
  • nworks VMware Management

For the purposes of this example, we used the standard Jalasoft management packs and the following additional management packs:

  • Cisco Switches & Routers
  • Linux Servers
  • General Availability

Jalasoft Installation:

  • Install and configure Operations Manager 2007, including the reporting components.
  • Read the guides on the Jalasoft products, which are available at: http://www.jalasoft.com/jalasoftweb/jsp/products/xianio/ under the More Information section.
  • Downloaded the Jalasoft evaluation from the same site (http://www.jalasoft.com/jalasoftweb/jsp/products/xianio/).
  • Extracted the files from the XianIo.zip file.
  • Ran the Index.hta and then chose Install Xian Network Manager Io. Installed with a custom installation including all of the programs available.
  • Prerequisite included MSMQ as it is used for communication between the Xian services. This was added through Control Panel Add/Remove programs -> Add/Remove Windows Components -> Application Server -> Details -> Message Queueing.
  • During the installation chose a single server installation, connecting to the existing SQL server in the OpsMgr environment, default database name of XIAN on the default port of 8586, with defaults on the remainder of the configurations.
  • This installs the Xian console which is available on via Start -> Programs -> Xian Network Manager Io -> Xian Network Manager Io. After launching the console, we added the evaluation license provided as part of the download.
  • Installed the management packs for evaluation from the directory where Jalasoft was extracted to within the Management Packs folder. In our case, we added each of the MPs on the top level of the folder but none of the subfolders (APCUPS, Availability, Cisco, F5BigIP, HPProCurveSwitch, NetworkDevice, Solaris, Linux, MySQLServer, and VMWare).
  • Added the management packs (and reporting management packs) for the folders we were implementing: Cisco Routers, Cisco Switches, Linux.
  • Ended up also adding the availability MPs to test with the minimal configuration and impacts to the first sets of clients.

Jalasoft usage:

  • Availability Monitoring: Adding the availability monitoring (IP Test) worked great. We just added the IP address through the Xian console as part of the availability view. The device appeared in the Operations console under Monitoring -> Xian Network Manager -> All Xian Monitored Network Devices. We removed it from the Xian console and it disappeared out of OpsMgr as well. Testing the availability pieces of this included addition of systems which either could not have the agent deployed or were currently not deployed due to network restrictions on the devices. We added several devices, and then configured an ICMP availability rule to create a critical alert if the device was offline (and to create a warning if response time was too slow).
  • Network Monitoring: Deploying network monitoring for Cisco switches and Cisco routers was very straightforward. Adding the IP address and the read-only community string integrated our switches with the Cisco Switches section of the Jalasoft console, which in turn appear within the Operations console -> Monitoring -> Xian Network Manager -> All Xian Monitored Network Devices State. The default configuration provides up/down information only. Addition of rules to the device in the Jalasoft console provided testing for temperature status and device availability.
  • Unix Monitoring: The Unix monitoring component requires installing software on the Unix system (a daemon) (XianServer-3.1.727.5-727.i386.rpm). Once the daemon is deployed, Unix systems are easily added into Jalasoft from a process very similar to the one used for networking devices.

General thoughts on Jalasoft’s solution:

  • The Xian console provides an easy method to integrate with OpsMgr 2007. It is unfortunate however that a separate product needs to be installed on the environment rather than integrating with existing OpsMgr network monitoring functionality.
  • The evaluation version appears to be fully functional and supports up to 10 devices for up to 60 days.
  • Once devices are configured to be monitored by Jalasoft, the actual monitoring does not take place until you configure the specific rules that are active for the device (as an example, adding the ICMP availability active rule to the system being monitored within availability).
  • The integration with OpsMgr was easy to use and was able to provide not only up/down information, but also gather and track performance metrics for the various devices.
01/09/2007

Moving the Data Warehouse database to another server

Background: we decided to take a "simpler approach" than we did with moving the Operations Database. Moving that database included registry hacks and security changes. For moving the the Data Warehouse database, we wanted to be more straightforward - uninstall the OpsMgr Data Warehouse component, install it on a different server, then copy over the original database. However, this actually required a bit more discovery and work that was anticipated, since some things were not intuitively obvious (see step 10). 

  1. On the RMS, stop the SDK and Config Services.
  2. On the RMS and all other management servers, stop the Health Service. Stopping the OpsMgr services prevents updates from being posted to the databases while you are moving the data warehouse.
  3. On the current Data Warehouse server, use SQL Management Studio to backup the Data Warehouse database (default name: OperationsManagerDW) to a shared folder on the server. You will want to back up the master database as well, as a precaution.
  4. On the current Data Warehouse server, uninstall the OpsMgr data warehouse component. Open Control Panel, Add/Remove Programs, select the System Center Operations Manager 2007 Reporting Server and choose Change, in the Reporting Setup select Modify, and select the Data Warehouse component to not be available. Note: that this does not physically remove the data warehouse database as a SQL Server database. After removing the Data Warehouse component from OpsMgr, delete it manually using SQL Management Studio (this assumes you backed it up in step 3!).
  5. On the new Data Warehouse server, install the OpsMgr data warehouse component by running OMSetup.exe. Select the option to Install Operations Manager 2007 Reporting, selecting ONLY the Data Warehouse component for installation (Mark the Reporting Services component to not be available on this server).
  6. On the new Data Warehouse server, copy the backup of the data warehouse database (step 3) to a local folder. (If the shared folder on the original server is accessible as a mapped drive from SQL Management Studio, you can skip this step.)
  7. On the new Data Warehouse server, use SQL Management Studio to restore the data warehouse database backup (delete the existing database first, be sure the default option to Delete backup and restore history information for databases is checked). Restoring the original data warehouse database is necessary to not lose the report data you have already collected for your management group.
  8. On the new Data Warehouse server, create a login for the SDK Account, Data Warehouse Action Account and the Data Reader Account in SQL Management Studio. Ensure the database permissions are correct for these accounts.
  9. On the RMS, start the SDK service.
  10. On the server running SQL Reporting Services, modify the data source. In Internet Explorer, open http://localhost/reports. On the Properties page, choose Show Details. The data source is named "Data Warehouse Main." Select that data source, and in the connection string, change the name of the database server from the old data warehouse server to the new data warehouse server.
  11. Change the name of the data warehouse server in the OpsMgr databases. Open SQL Server Management Studio to do your edits. For the OperationsManager database, go to the MT_Datawarehouse table and change the value of the MainDatabaseServerName_16781F33_F72D_033C_1DF4_65A2AFF32CA3 column (that really is the column name!) to the new data warehouse database server. For the OperationsManagerDW database, navigate to the MemberDatabase table and change the value of ServerName. Be sure to close the Management Studio when you are through, to save your changes.
  12. Restart the Config and SDK services on the RMS and the Health service on the RMS and all other management servers.