OpsMgr by Example: The Exchange 2007 Management Pack

This blog entry continues our series of Operations Manager-related items that review the steps performed to install, configure and tune management packs in real-world environments.

Since there is not currently an Exchange 2007 management pack for Operations Manager 2007 (but there is one for MOM 2005), the following results are from a tuning exercise on Exchange 2007 using a version of the management pack converted from MOM 2005 to OpsMgr 2007. The core functionality appears to have converted well (excellent call on this one/thanks to our co-author John Joyner for the idea and his help getting this put together!). Tuning on this version of the management pack should provide a jump-start for tuning on the Exchange 2007 management pack designed specifically for OpsMgr 2007. The management pack guide is available at: http://technet.microsoft.com/en-us/library/bb217782.aspx.

Issues:

· No reports (it is a converted management pack so that is why – there should be reports in the Exchange 2007 MP when it is released by Microsoft).

· Cannot edit company knowledge on the monitors within the Exchange 2007 management pack (there is no Company Knowledge tab within the Alert properties).

· No Exchange wizard for configuration: Whoho!

Tuning/Alerts to Look for: The following are alerts that we found and resolved while tuning the converted Exchange 2007 Management pack.

Alert: LDAP Search Time – sustained for 5 minutes – Red(>100msec).

Issue: This condition occurs sporadically on the servers in the environment.

Resolution: First tried to create an override for the rule [LDAP Search Time – sustained for 5 minutes – Red(>100msec)] to move this from sustained for 5 minutes to sustained for 10 minutes. (Also in another environment we tried setting the override to 15 minutes). Re-configured this threshold to a higher value (200msec) as no network issues were found on the systems (gigabit linked on each side, direct to the switch), no bottlenecks were found on the systems, and this is occurring in multiple environments. Changed the threshold [right-clicked on the LDAP Search Time – sustained for 5 minutes – Red(>100msec) and choose View or edit the settings of this Monitor, on the configuration tab, within the XML changed 100 to 200]. Also renamed to now say >200msec and renamed the alert as well.

Alert: LDAP Search Time – sustained for 5 minutes – Yellow(>50msec).

Issue: This condition occurs sporadically on the servers in the environment.

Resolution: First tried to create an override for the rule [LDAP Search Time – sustained for 5 minutes – Yellow(>50msec)] to move this from sustained for 5 minutes to sustained for 10 minutes. (Also in another environment we tried setting the override to 15 minutes). Re-configured this threshold to a higher value (150msec) as no network issues were found on the systems (gigabit linked on each side, direct to the switch), no bottlenecks were found on the systems, and this is occurring in multiple environments. Changed the threshold [right-clicked on the LDAP Search Time – sustained for 5 minutes – Yellow(>50msec) and choose View or edit the settings of this Monitor, on the configuration tab, within the XML changed 100 to 200]. Also renamed to now say >150msec and renamed the alert as well.

Alert: Application log size.

Issue: Exchange 2007 application log size was 16 MB, per the Exchange 2007 MP this should be at least 40 MB for Exchange servers.

Resolution: Increased the application log size on the servers indicated.

Alert: Crash upload logging disabled.

Issue: Exchange fatal information is not being sent to Microsoft.

Resolution: Per the knowledge link, this can be changed with the Exchange UI (http://technet.microsoft.com/en-us/library/2582b127-b826-4eac-88b6-47a79ed49c6d.aspx) to resolve the issue. For those environments where there is a requirement to not send this information, the rule can be disabled.

Alert: WebServices connectivity (Internal) transaction failure – The credentials cannot be used to test Web Services.

Additional Alerts:

Error occurred while executing the Test-ExchangeSearch diagnostic cmdlet.

Error occurred while executing the Test-Mailflow (Remote) diagnostic cmdlet.

Error occurred while executing the Test-Mailflow (Local) diagnostic cmdlet.

Error occurred while executing the Test-MAPIConnectivity diagnostic cmdlet.

Exchange ActiveSync connectivity (Internal) transaction failure – The test credentials cannot be used to test Exchange ActiveSync.

Outlook Web Access connectivity (External) transaction failure – The test credentials cannot be used to test Outlook Web Access.

Outlook Web Access connectivity (Internal) transaction failure – The test credentials cannot be used to test Outlook Web Access.

Issue: Exchange 2007 management pack configuration required.

Resolution: Ran the new-TestCasConnectivityUser.ps1 on the Exchange server from the Exchange Management Console within the Exchange shell on the Mailbox server. To run the utility, enter a temporary password for the system, press enter to continue, and specify an OrganizationUnit to put this in (the OU name needs to be unique or you need to point it to the full name of the OU). This creates the account in the OU that you specify. CAS_{sid}

Alert: The Microsoft Exchange Replication Service requires re-seeding a storage group on the passive node.

Issue: Passive node

Resolution: Microsoft provides product knowledge on how to fix this, available at: http://technet.microsoft.com/en-us/library/63367703-1226-44b2-a4b8-205ed7222da0.aspx

Alert: MSExchange Replication: ReplayQueueLength – sustained for 5 minutes – Red(>15).

Issue: Problems were occurring with Cluster Continuous Replication where the passive node required a re-seeding

Resolution: See the “The Microsoft Exchange Replication Service requires re-seeding a storage group on the passive node.” alert.

Alert: Edge Synchronization transaction failure – Recipients are out of sync.

Issue: Edge Synchronization issue

Resolution: http://technet.microsoft.com/en-us/library/7b5897c5-9c72-40ee-b977-4f4f6821d1ed.aspx

Alert: Percentage of Committed Memory in Use is too high

Issue: Several Microsoft products including Exchange, SQL Server, and the Operations Manager RMS will use all available memory. This is especially noticeable on 64-bit platforms where memory can scale-out more effectively for the applications.

Resolution: Configured servers with Exchange, SQL or Operations Manager RMS to have a 95% threshold instead of 80%.

Additional Notes:

It may be required to configure URL monitoring to work correctly on the managed Exchange 2007 box: (the following  command sets the configuration is documented at http://technet.microsoft.com/en-us/library/bb691294.aspx)

set-owavirtualdirectory "Server01\owa (Default Web Site)" -externalurl:"https://Server01.Domain.contoso.com/owa"

Alerts Not Resolved:

Alert: Delay DSNs – increase over 60 minutes – Red(>20) – Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Delay DSNs – increase over 60 minutes – Yellow(>10) – Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Failure DSNs Total – increase over 60 minutes – Red(>40) – Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Failure DSNs Total – increase over 60 minutes – Yellow(>30) – Hub Transport.

Issue: Issue did not recur.

Resolution: No resolution found.

Alert: Inbound direct trust certificate has expired. Run New-ExchangeCertificate to generate a new direct trust certificate.

Issue: Unknown.

Resolution: Unknown.

This entry was posted in Tuning and Configuration. Bookmark the permalink.

6 Responses to OpsMgr by Example: The Exchange 2007 Management Pack

  1. Stefano says:

    All is interesting, but a trouble. I\’m using a CCR cluster in
    active/passive configuration. On passive node I\’m full of errors as
    SCOM trying to execute all tests where many service are of course not
    running. THere\’s a way to avoid this problem?

  2. Operations says:

    Hi,
    We actually haven’t seen CCR cluster’s in active/passive in the production world, but suspects that there are MPs which will spit out errors because of the configuration since the MP probably isn’t smart enough to identify a passive member of a CCR cluster. Long story made short, don’t know – but if you want to provide specific errors we might be able to track it down.

  3. Heinrich says:

    About: Inbound direct trust certificate has expired. Run New-ExchangeCertificate to generate a new direct trust certificate.This occurs when the self-signed certificate has expired. Note that this will affect Server-to-Server mail delivery, so importantly: after running New-ExchangeCertificate, one must run Enable-ExchangeCertificate (with a few parameters 😛 ) to enable the certificate for specific services (e.g. IIS, SMTP).Failing to run Enable-ExchangeCertificate will prevent the Exchange HT role from performing correctly (Client Access too, probably; this was the least of my concerns for verifying at the time 🙂 )-H.

  4. Micheal says:

    Alert: Failure DSNs Total – increase over 60 minutes – Red(>40) – Hub Transport.
    Issue: Issue did not recur.
    Resolution: No resolution found.

    Alert: Failure DSNs Total – increase over 60 minutes – Yellow(>30) – Hub Transport.
    Issue: Issue did not recur.
    Resolution: No resolution found.
    The organization that I am tuning right now receives these on a regular basis.  They are the hardest thing to truly track for legitimacy when they are happening every hour. 

  5. Jon says:

    RE: Failure DSNs AlertsDSNs can be identified and resolved using the Message Tracking Tool (frustrating) or the Exchange Management Shell (easier). In my case, most of the DSNs were from outdated / invalid alert notification lists from various monitoring and HR application. Working with the application admins to clean up their lists/DB\’s brought us back under the limit.Using EMS to identifying source/root cause of DSN’s1. Open EMS | type “get-messagetrackinglog -eventid DSN”This will identify which senders that are getting DSN’s. For this example we will assume badappnotify@domain.com was getting a lot of DSNs.2. Type “get-messagetrackinglog –sender badappnotify@domain.com –eventid FAIL”This will identify users / DL’s that badappnotify@domain.com can’t send to.3. Clean up the badapps notify lists or DL memberships.4. Wait a short while after cleanup and re-run “get-messagetrackinglog –sender badappnotify@domain.com –eventid FAIL” to confirm cleanup is complete.Note that you can use –start –end switches to specify time and date to narrow the search or get a specified period of time.

  6. Dave says:

    RE: LDAP Search Time – sustained for 5 minutes – Red(>100msec). How did you change the threshold?We have MOM 2005 and I can;t seem to find the option.Any tips would be appreciated.

Leave a comment