Notes on RMC

Notes on RMC

Introduction

Resource Monitoring and Control (RMC) is a function that gives you the ability to monitor the state of system resources and respond when predefined thresholds are crossed, so that you can perform many routine tasks automatically.

RMC is a no charge feature of AIX 5L Version 5.1 that can be configured to monitor resources (disk space, CPU usage, processor status, application processes, and so on) and perform an action in response to a defined condition.


The predefined conditions are ready to use and just need to be enabled or configured to match the exact requirements of your systems. If predefined conditions do not satisfy your systems requirements, RMC allows you to create new conditions, responses, and actions to tailor the system to respond when and how you require.

Technically, Resource Monitoring and Control (RMC) is a subset function of Reliable Scalable Cluster Technology (RSCT).

Clustered environment

RMC is capable of working in a stand-alone1 or clustered environment. In AIX 5L Version 5.1, nodes in a cluster may be configured for either of the following cluster domains: Peer domain or Management domain.

In a peer domain, all nodes are considered equal and any node can monitor and control (or be monitored and controlled) by any other node. In a management domain, a management node is aware of all nodes it is managing but the nodes

themselves know nothing of each other.

Management domain

In a management domain, nodes are managed by a management server. The management server is aware of all the nodes it manages and all managed nodes are aware of their management server. However, the managed nodes know

nothing about each other.

Filesets and packages

cd /usr/sys/inst.data/sys_bundles; grep -i rsct *

man lslpp

Lists installed software products: lslpp -L

To list the fileset that owns installp, type: lslpp -w /usr/sbin/installp

1. Stop RMC daemons: # /usr/sbin/rsct/bin/rmcctrl -z

2. Reconfigure necessary information and start RMC daemons:# /usr/sbin/rsct/bin/rmcctrl -A

Uninstall RMC from your system

To remove all the RMC filesets:# installp -ug rsct.*

Architecture and components

Provide single monitoring and management infrastructure for clusters.

Provide global access to subsystems and resources throughout the cluster.

Support operations for configuring, monitoring, and controlling all cluster resources by RMC clients.

Encapsulate all resource dependent operations.

Provide a common access control mechanism across all resources.

Support integration with other subsystems to achieve the highest levels of availability.

Resource, resource class, and attribute

Resource

The resource is a fundamental concept in the RMC architecture. A resource is an abstraction of an instance of a physical or logical entity that provides services to applications or system components. A system or cluster is composed of

numerous resources of various types.

Resource class

A resource class is a collection of resources that have similar characteristics. The resource class provides descriptive information about the properties and characteristics that are common to any resource within the resource class.

Attribute

A resource class and a resource have several attributes. An attribute has a value and a unique name within the resource class or the resource. A resource attribute is classified into either a public or a private property. The property is used as a hint for the RMC client for whether the attribute is to be presented to general users. Private attributes typically contain information that is not relevant to general users; therefore, the private attributes are hidden by default. However, you can display private attributes by specifying a flag in the command.

Attributes fall into two categories: persistent and dynamic.

Persistent attributes

For resources, persistent attributes are configuration parameters and are set either by the resource monitor harvesting real resources, or through the mkrsrc or the chrsrc commands. Persistent attributes define the characteristics of the

resource. For resource classes, persistent attributes describe or control the operations of the class.

Dynamic attributes

Dynamic attributes reflect internal states or performance variables of resources and resource classes.

You generally refer to dynamic attributes to define the monitoring condition of the resource you want to monitor.

Resource managers

Each resource manager is the interface between the RMC subsystem and a specific aspect of the AIX instance it is controlling. All resource managers have the same architecture and interact with the other RMC components.

Four groups of resource managers:

Logging and debugging >>> Audit Log resource manager

Configuration >>> configuration resource manager

Reacting to events >>> Event Response resource manager

Monitoring data

>>> File system resource manager

>>> Host resource manager

>>> Sensor resource manager

Event Response resource manager (ERRM)

The Event Response resource manager provides the system administrator with the ability to define a set of conditions to monitor in the various nodes of the cluster, and to define actions to take in response to these events. The conditions

are applied to dynamic properties of any resources of any resource manager in the cluster. The Event Response resource manager provides a simple automation mechanism for implementing event driven actions.

RMC command line interface

Environment variables

RMC is a distributed environment, where commands can be executed on any node in a cluster to act on any node in the cluster. Two environment variables, CT_CONTACT and CT_MANAGEMENT_SCOPE, control the behavior of all

RMC commands with regard to the geographical aspects of RMC.

CT_CONTACT

This variable defines where the command will be executed. It should contain the host name or the IP address of the host where the command will be executed. If you are not logged on the target node, the CLI contacts the RMC daemon on the target host specified by CT_CONTACT and passes it the commands to be executed. If the variable is unset, the default is to execute the command on the local node.

CT_MANAGEMENT_SCOPE

This variable specifies whether commands apply to only local resources or to resources located on the other nodes in the cluster. This variable can be set to an integer value of 0, 1, 2, or 3.

0 or 1 Resources on local node only

2 Resources on all the nodes in a peer domain

3 Resources on all the nodes in a management domain

If the CT_MANAGEMENT_SCOPE variable is not set to any value, the default value is 0, therefore commands refer to resources on the local node only.

Command flags and pattern match operators

Most RMC commands accept the following commonly used command flags:

Display format -l, -V, -t, -x, -d, and -D

Help -h; Selection -s

The -s flag takes a selection string argument that can contain at least one expression.

Display format

Most commands support the same flags to specify the output format. The -l and -V flags are used for interactive purpose, and display respectively long and verbose results. The -t, -x, -d, or -D flags are more commonly used within scripts, yielding results in tabular format, without header, with predefined (colon) or user specified field delimiter, that are easier to parse than the default results.

lsrsrc -V IBM.Host NodeNameList NumProcessors RealMemSize OSName

Selection

The selection string is an expression that is evaluated against each resource or class. The selection string is made of variables, operators and constants. The variables refer to persistent attributes of the target resource or class. Selection

cannot be performed on dynamic attributes.

lsrsrc -s 'Mount == "true"' IBM.FileSystem Name PercentTotUsed

lsrsrc -s 'Mount=="true" && Size < 40000' IBM.FileSystem Name PercentTotUsed Size

Pattern match operators

There are two pattern match operators, =~ and =?, and two not pattern match operators, !~ and ~?. The =~ and !~ operators match substrings, while the SQL-like =? and !? operators match full strings.

When using the extended regular expression operators =~ and !~ with wild cards, you should use:

The dot (.) to match exactly one character.

The star (*) to match zero or more occurrences of the preceding characters.

With the SQL-like syntax operators, =~ and !~, you should use the following wild card characters:

The percent sign (%) matches zero or more characters.

The underscore (_) matches exactly one character.

The percentage and underscore characters can be quoted with the pound sign (#) to override their special meaning.

lsrsrc -s 'Mount =? "t%"' IBM.FileSystem Name Mount

lsrsrc -s 'Mount =~ "t.*"' IBM.FileSystem Name Mount

A peer domain cluster setup

Preparing your node security: preprpnode svr02 svr04

Creating the cluster

To create a peer domain cluster, issue mkrpdomain on one node that is already prepared with preprpnode using either of the following methods (PD is the cluster name): mkrpdomain PD svr02 svr04

To check the status of the domain, use lsrpdomain to list all known domains and provide a summary of their status:

Bringing the cluster online: # startrpdomain PD

Adding a node to the cluster

Once the security environment is configured, the next step is to add the node to the online cluster. The command addrpnode performs this operation, and must be run on a node that is online in the cluster you are modifying. Nodes may be defined in several clusters simultaneously, but can only be online to one cluster at any time.

# addrpnode svr05; lsrpdomain; lsrpnode

Bringing a node online: # startrpnode svr05

Stopping (offline) a node in the cluster: # stoprpnode svr05

Removing a node from the cluster

If you must remove a node from the cluster, the node must first be in the offline state, and then it can be removed with the rmrpnode command. This must be executed on an online node of the cluster: # rmrpnode svr05

Stopping (offline) a cluster

The stoprpdomain command is used to take all the online nodes of a cluster offline. It must be run on an online node.

# stoprpdomain PD

The cluster configuration is preserved after the cluster status is changed to offline.

Removing a cluster

If a cluster must be deleted and the configuration information removed, rmrpdomain can be used.

To remove a peer domain cluster, do the following:

1. Bring the peer domain online, if it is not online yet.

2. Execute the rmrpdomain command on an online node: # rmrpdomain PD

3. If there is any node that is not network accessible when the peer domain is being removed, execute the rmrpdomain command with the -f option on each node to clean up the cluster configuration on the node: # rmrpdomain -f PD

SRC commands

AIX has the system resource controller (SRC), which provides a consistent controlling method for various system daemon processes (referred to as subsystems).

lssrc

The lssrc command provided by AIX SRC can be used to check if the RMC subsystems are active: # lssrc -g rsct; # lssrc -g rsct_rm

The ctrmc subsystem, which we refer to as the RMC subsystem in this redbook, is instantiated as the rmcd daemon (/usr/sbin/rsct/bin/rmcd).

A security subsystem, ctcas, is in charge of authentication in the cluster, by means of UNIX identity-based credentials. The ctcas subsystem is instantiated as the ctcasd daemon (/usr/sbin/rsct/bin/ctcasd).

Both the ctrmc and ctcas subsystems belong to the rsct group.

Other SRC commands

Even if RMC components are subsystems that can be managed with any AIX System Resource Controller (SRC) commands, we recommend you only use the SRC lssrc command with these subsystems. Do not use the startsrc and

stopsrc commands to start and stop the RMC daemon, but rather the rmcctrl (RMC Control) command.

The refresh command may be used to force the RMC subsystem to re-read its configuration files.

RSCT commands

The rmcctrl command manages both the RMC subsystem and the resource managers subsystem.

During the AIX installation, the rmcctrl-a command is performed automatically, and the following entry will be added in the /etc/inittab: # grep ctrmc /etc/inittab

ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1

Therefore, the RMC subsystem is started by default upon every reboot. There is no need to perform any action to manually start it in normal operation.

The -s and -k flags start and stop the ctrmc only.

Both the -K and -z options stops RMC and the resource managers, but the former is asynchronous while the later is synchronous. With -z, the command will not return before all subsystems have actually stopped

RMC commands

There are five categories of generic RMC commands:

Creation mkrsrc;

Display lsrsrc, lsrsrcdef, and lsactdef

Modification chrsrc

Deletion rmrsrc

Refresh refrsrc

lsrsrc

The lssrc command sends a request to the System Resource Controller to get status on a subsystem, a group of subsystems, or all subsystems.

To Get All Status lssrc [ -h Host ] -a

To Get Group Status lssrc [ -h Host ] -g GroupName

To Get Subsystem Status lssrc [ -h Host ] [ -l ] -s Subsystem

To Get Status by PID lssrc [ -h Host ] [ -l ] -p SubsystemPID

To Get Subserver Status lssrc [ -h Host ] [ -l ] -t Type [ -p SubsystemPID ] [ -o Object ] [ -P SubserverPID ]

To get the status of the tcpip subsystem group, enter: lssrc -g tcpip

You should verify that some sub-systems are running on your system: # lssrc -g rsct; lssrc -g rsct_rm

The lsrsrc command displays the persistent and dynamic attributes and their values for a resource or a resource class.

To list the names of all of the resource classes: lsrsrc

2 To list the persistent attributes for resource IBM.Host that have 4 processors, enter: lsrsrc -s "NumProcessors == 4" -A p -p 0 IBM.Host

3 To list the public dynamic attributes for resource IBM.Host on node 1, enter: lsrsrc -s 'Name == "c175n05.ppd.pok.ibm.com"' -A d IBM.Host

4 To list the Name, Variety, and ProcessorType attributes for the IBM.Processor resource on all the online nodes, enter: lsrsrc IBM.Processor Name Variety ProcessorType

5 To list both the persistent and dynamic attributes for the resource class IBM.Condition:lsrsrc -c -A b -p 0 IBM.Condition; lsrsrc IBM.Host Name

The first argument of lsrsrc is a resource class name. By default, the result will relate to resources in this class. You need to specify the -c flag to retrieve class attributes and values.

# lsrsrc -c IBM.PagingDevice; # lsrsrc IBM.PagingDevice

Attributes are either persistent (static) or dynamic. If you do not explicitely specify attribute(s) in the command line, the default is to display only persistent attributes. The -a flag followed by p (persistent), d (dynamic) or b (both) overrides

the default behavior: # lsrsrc -ad IBM.PagingDevice

By default, lsrsrc displays only public attributes. In general, attributes which are not public are only useful for application developers and not for end users. You need to specify -p0 to list all attributes: lsrsrc -p0 IBM.PagingDevice

lsrsrcdef

The lsrsrcdef command returns the description of a resource class or a resource and their attributes.

You can use it when:

Creating selection strings

Developing new monitoring conditions

Obtain detailed information about a resource

The -ap or -ad flags instruct the command to return persistent or dynaic attributes respectively

The -c flag indicates whether lsrsrcdef returns information pertaining to the class or the resource.

The -e flag indicates that the returned information will return the full description of an attribute. This description can be long and is not displayed by default.

# lsrsrcdef -ap -e IBM.FileSystem Bf

The lssrcdef command also displays public attributes only by default. Using the -p0 flag, all attributes are displayed.

refrsrc

refresh forces the resource manager owning this class to detect any changes in the configuration of resources in this class. The lsrsrc -x command returns all the resource class names.

To chop the double-quote characters returned by lsrsrc around the resource class names:

# for i in `lsrsrc -x | sed 's/\"//g'`; do; echo $i; done

refresh all resource classes using refrsrc:

# for i in `lsrsrc -x | sed 's/\"//g'`; do; refrsrc -V $i; done

Sensor commands

When you want to monitor a part of the AIX environment for which RMC does not provide a predefined resource class or attribute, you can create your own resource, called a sensor, using the Sensor resource manager. A sensor is a command that will be periodically executed, returning a set of values that can be monitored with the Event Response resource manager.

The Sensor resource manager provides four commands to list, create, modify or remove sensors; lssensor, mksensor, chsensor, and rmsensor.

lssensor

The lssensor command displays either a list of sensors, or the list of attributes of sensors. With no flags, the lssensor list the sensors defined locally, and with the -a flags, all sensors defined in the cluster. The CT_CONTACT variable indicate where the command will be executed.

If you prefer not to modify the CT_CONTACT variable, you can use the -n flag to specify the nodes on which you look for the defined sensors: # lssensor -n svr04,svr05

mksensor

The mksensor command creates a sensor on one node. You cannot create the same sensor on multiple nodes at the same time.

# lssensor -n svr05; # mksensor -n svr05 NumUsers1 "/SharedTools/NumLogins.ksh"; # lssensor -n svr05

The only option that can be set during the creation of the sensor is the period at which the sensor is executed, using the -i flag followed by an integer representing a time in seconds. Apart from the sensor name, you must specify as the second argument of the mksensor command the name of the program you want to execute. You can also specify arguments to this program by surrounding the program name and its arguments with double quotes.

# mksensor -i 300 NumUsers1 "/SharedTools/NumLogins.pl abc"

chsensor

The chsensor command is used to rename a sensor. You cannot use it to modify the sensor period or the command to be executed. If you need such a change, the only solution is to delete the sensor and recreate it.

Using the -a flag, you can rename a sensor on all nodes in the cluster, or you can restrict the renaming to a set of nodes with the -n flag, followed by the nodes names.

rmsensor

The rmsensor command delete sensor on one or several nodes at the same time, using the same -a and -n flags as chsensor. You can delete several sensors in one command by passing several sensor names as arguments to rmsensor.

ERRM commands

The Event Response resource manager handles three types of objects: conditions, responses and associations.

Display lscondition, lsresponse, and lscondresp

Creation mkcondition, mkresponse, and mkcondresp

Modification chcondition and chresponse

Removal rmcondition, rmresponse, and rmcondresp

Activation startcondresp and stopcondresp

The monitoring of events using the CLI follows the same principles as when using the GUI:

1. First, define the kind of event you want to monitor.

2. Define the response you want the system to exercise when the events occurs. It is made from either the predefined actions or any actions you provide (programs and scripts).

3. Define associations between the conditions you want to monitor and the responses you have chosen.

4. Activate the monitoring of the actions by starting the associations.


Resources:

A Practical Guide for Resource Monitoring and Control (RMC)


Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts