Notes on RMC
Resource Monitoring and Control (RMC) is a function that gives you the ability to monitor the state of system resources and respond when predefined thresholds are crossed, so that you can perform many routine tasks automatically.
RMC is a no charge feature of AIX 5L Version 5.1 that can be configured to monitor resources (disk space, CPU usage, processor status, application processes, and so on) and perform an action in response to a defined condition.
The predefined conditions are ready to use and just need to be enabled or configured to match the exact requirements of your systems. If predefined conditions do not satisfy your systems requirements, RMC allows you to create new conditions, responses, and actions to tailor the system to respond when and how you require.
Technically, Resource Monitoring and Control (RMC) is a subset function of Reliable Scalable Cluster Technology (RSCT).
RMC is capable of working in a stand-alone1 or clustered environment. In AIX 5L Version 5.1, nodes in a cluster may be configured for either of the following cluster domains: Peer domain or Management domain.
In a peer domain, all nodes are considered equal and any node can monitor and control (or be monitored and controlled) by any other node. In a management domain, a management node is aware of all nodes it is managing but the nodes
themselves know nothing of each other.
In a management domain, nodes are managed by a management server. The management server is aware of all the nodes it manages and all managed nodes are aware of their management server. However, the managed nodes know
nothing about each other.
Filesets and packages
cd /usr/sys/inst.data/sys_bundles; grep -i rsct *
Lists installed software products: lslpp -L
To list the fileset that owns installp, type: lslpp -w /usr/sbin/installp
1. Stop RMC daemons: # /usr/sbin/rsct/bin/rmcctrl -z
2. Reconfigure necessary information and start RMC daemons:# /usr/sbin/rsct/bin/rmcctrl -A
Uninstall RMC from your system
To remove all the RMC filesets:# installp -ug rsct.*
Architecture and components
Provide single monitoring and management infrastructure for clusters.
Provide global access to subsystems and resources throughout the cluster.
Support operations for configuring, monitoring, and controlling all cluster resources by RMC clients.
Encapsulate all resource dependent operations.
Provide a common access control mechanism across all resources.
Support integration with other subsystems to achieve the highest levels of availability.
Resource, resource class, and attribute
The resource is a fundamental concept in the RMC architecture. A resource is an abstraction of an instance of a physical or logical entity that provides services to applications or system components. A system or cluster is composed of
numerous resources of various types.
A resource class is a collection of resources that have similar characteristics. The resource class provides descriptive information about the properties and characteristics that are common to any resource within the resource class.
A resource class and a resource have several attributes. An attribute has a value and a unique name within the resource class or the resource. A resource attribute is classified into either a public or a private property. The property is used as a hint for the RMC client for whether the attribute is to be presented to general users. Private attributes typically contain information that is not relevant to general users; therefore, the private attributes are hidden by default. However, you can display private attributes by specifying a flag in the command.
Attributes fall into two categories: persistent and dynamic.
For resources, persistent attributes are configuration parameters and are set either by the resource monitor harvesting real resources, or through the mkrsrc or the chrsrc commands. Persistent attributes define the characteristics of the
resource. For resource classes, persistent attributes describe or control the operations of the class.
Dynamic attributes reflect internal states or performance variables of resources and resource classes.
You generally refer to dynamic attributes to define the monitoring condition of the resource you want to monitor.
Each resource manager is the interface between the RMC subsystem and a specific aspect of the AIX instance it is controlling. All resource managers have the same architecture and interact with the other RMC components.
Four groups of resource managers:
Logging and debugging >>> Audit Log resource manager
Configuration >>> configuration resource manager
Reacting to events >>> Event Response resource manager
>>> File system resource manager
>>> Host resource manager
>>> Sensor resource manager
Event Response resource manager (ERRM)
The Event Response resource manager provides the system administrator with the ability to define a set of conditions to monitor in the various nodes of the cluster, and to define actions to take in response to these events. The conditions
are applied to dynamic properties of any resources of any resource manager in the cluster. The Event Response resource manager provides a simple automation mechanism for implementing event driven actions.
RMC command line interface
RMC is a distributed environment, where commands can be executed on any node in a cluster to act on any node in the cluster. Two environment variables, CT_CONTACT and CT_MANAGEMENT_SCOPE, control the behavior of all
RMC commands with regard to the geographical aspects of RMC.
This variable defines where the command will be executed. It should contain the host name or the IP address of the host where the command will be executed. If you are not logged on the target node, the CLI contacts the RMC daemon on the target host specified by CT_CONTACT and passes it the commands to be executed. If the variable is unset, the default is to execute the command on the local node.
This variable specifies whether commands apply to only local resources or to resources located on the other nodes in the cluster. This variable can be set to an integer value of 0, 1, 2, or 3.
0 or 1 Resources on local node only
2 Resources on all the nodes in a peer domain
3 Resources on all the nodes in a management domain
If the CT_MANAGEMENT_SCOPE variable is not set to any value, the default value is 0, therefore commands refer to resources on the local node only.
Command flags and pattern match operators
Most RMC commands accept the following commonly used command flags:
Display format -l, -V, -t, -x, -d, and -D
Help -h; Selection -s
The -s flag takes a selection string argument that can contain at least one expression.
Most commands support the same flags to specify the output format. The -l and -V flags are used for interactive purpose, and display respectively long and verbose results. The -t, -x, -d, or -D flags are more commonly used within scripts, yielding results in tabular format, without header, with predefined (colon) or user specified field delimiter, that are easier to parse than the default results.
lsrsrc -V IBM.Host NodeNameList NumProcessors RealMemSize OSName
The selection string is an expression that is evaluated against each resource or class. The selection string is made of variables, operators and constants. The variables refer to persistent attributes of the target resource or class. Selection
cannot be performed on dynamic attributes.
lsrsrc -s 'Mount == "true"' IBM.FileSystem Name PercentTotUsed
lsrsrc -s 'Mount=="true" && Size < 40000' IBM.FileSystem Name PercentTotUsed Size
Pattern match operators
There are two pattern match operators, =~ and =?, and two not pattern match operators, !~ and ~?. The =~ and !~ operators match substrings, while the SQL-like =? and !? operators match full strings.
When using the extended regular expression operators =~ and !~ with wild cards, you should use:
The dot (.) to match exactly one character.
The star (*) to match zero or more occurrences of the preceding characters.
With the SQL-like syntax operators, =~ and !~, you should use the following wild card characters:
The percent sign (%) matches zero or more characters.
The underscore (_) matches exactly one character.
The percentage and underscore characters can be quoted with the pound sign (#) to override their special meaning.
lsrsrc -s 'Mount =? "t%"' IBM.FileSystem Name Mount
lsrsrc -s 'Mount =~ "t.*"' IBM.FileSystem Name Mount
A peer domain cluster setup
Preparing your node security: preprpnode svr02 svr04
Creating the cluster
To create a peer domain cluster, issue mkrpdomain on one node that is already prepared with preprpnode using either of the following methods (PD is the cluster name): mkrpdomain PD svr02 svr04
To check the status of the domain, use lsrpdomain to list all known domains and provide a summary of their status:
Bringing the cluster online: # startrpdomain PD
Adding a node to the cluster
Once the security environment is configured, the next step is to add the node to the online cluster. The command addrpnode performs this operation, and must be run on a node that is online in the cluster you are modifying. Nodes may be defined in several clusters simultaneously, but can only be online to one cluster at any time.
# addrpnode svr05; lsrpdomain; lsrpnode
Bringing a node online: # startrpnode svr05
Stopping (offline) a node in the cluster: # stoprpnode svr05
Removing a node from the cluster
If you must remove a node from the cluster, the node must first be in the offline state, and then it can be removed with the rmrpnode command. This must be executed on an online node of the cluster: # rmrpnode svr05
Stopping (offline) a cluster
The stoprpdomain command is used to take all the online nodes of a cluster offline. It must be run on an online node.
# stoprpdomain PD
The cluster configuration is preserved after the cluster status is changed to offline.
Removing a cluster
If a cluster must be deleted and the configuration information removed, rmrpdomain can be used.
To remove a peer domain cluster, do the following:
1. Bring the peer domain online, if it is not online yet.
2. Execute the rmrpdomain command on an online node: # rmrpdomain PD
3. If there is any node that is not network accessible when the peer domain is being removed, execute the rmrpdomain command with the -f option on each node to clean up the cluster configuration on the node: # rmrpdomain -f PD
AIX has the system resource controller (SRC), which provides a consistent controlling method for various system daemon processes (referred to as subsystems).
The lssrc command provided by AIX SRC can be used to check if the RMC subsystems are active: # lssrc -g rsct; # lssrc -g rsct_rm
The ctrmc subsystem, which we refer to as the RMC subsystem in this redbook, is instantiated as the rmcd daemon (/usr/sbin/rsct/bin/rmcd).
A security subsystem, ctcas, is in charge of authentication in the cluster, by means of UNIX identity-based credentials. The ctcas subsystem is instantiated as the ctcasd daemon (/usr/sbin/rsct/bin/ctcasd).
Both the ctrmc and ctcas subsystems belong to the rsct group.
Other SRC commands
Even if RMC components are subsystems that can be managed with any AIX System Resource Controller (SRC) commands, we recommend you only use the SRC lssrc command with these subsystems. Do not use the startsrc and
stopsrc commands to start and stop the RMC daemon, but rather the rmcctrl (RMC Control) command.
The refresh command may be used to force the RMC subsystem to re-read its configuration files.
The rmcctrl command manages both the RMC subsystem and the resource managers subsystem.
During the AIX installation, the rmcctrl-a command is performed automatically, and the following entry will be added in the /etc/inittab: # grep ctrmc /etc/inittab
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
Therefore, the RMC subsystem is started by default upon every reboot. There is no need to perform any action to manually start it in normal operation.
The -s and -k flags start and stop the ctrmc only.
Both the -K and -z options stops RMC and the resource managers, but the former is asynchronous while the later is synchronous. With -z, the command will not return before all subsystems have actually stopped
There are five categories of generic RMC commands:
Display lsrsrc, lsrsrcdef, and lsactdef
The lssrc command sends a request to the System Resource Controller to get status on a subsystem, a group of subsystems, or all subsystems.
To Get All Status lssrc [ -h Host ] -a
To Get Group Status lssrc [ -h Host ] -g GroupName
To Get Subsystem Status lssrc [ -h Host ] [ -l ] -s Subsystem
To Get Status by PID lssrc [ -h Host ] [ -l ] -p SubsystemPID
To Get Subserver Status lssrc [ -h Host ] [ -l ] -t Type [ -p SubsystemPID ] [ -o Object ] [ -P SubserverPID ]
To get the status of the tcpip subsystem group, enter: lssrc -g tcpip
You should verify that some sub-systems are running on your system: # lssrc -g rsct; lssrc -g rsct_rm
The lsrsrc command displays the persistent and dynamic attributes and their values for a resource or a resource class.
To list the names of all of the resource classes: lsrsrc
2 To list the persistent attributes for resource IBM.Host that have 4 processors, enter: lsrsrc -s "NumProcessors == 4" -A p -p 0 IBM.Host
3 To list the public dynamic attributes for resource IBM.Host on node 1, enter: lsrsrc -s 'Name == "c175n05.ppd.pok.ibm.com"' -A d IBM.Host
4 To list the Name, Variety, and ProcessorType attributes for the IBM.Processor resource on all the online nodes, enter: lsrsrc IBM.Processor Name Variety ProcessorType
5 To list both the persistent and dynamic attributes for the resource class IBM.Condition:lsrsrc -c -A b -p 0 IBM.Condition; lsrsrc IBM.Host Name
The first argument of lsrsrc is a resource class name. By default, the result will relate to resources in this class. You need to specify the -c flag to retrieve class attributes and values.
# lsrsrc -c IBM.PagingDevice; # lsrsrc IBM.PagingDevice
Attributes are either persistent (static) or dynamic. If you do not explicitely specify attribute(s) in the command line, the default is to display only persistent attributes. The -a flag followed by p (persistent), d (dynamic) or b (both) overrides
the default behavior: # lsrsrc -ad IBM.PagingDevice
By default, lsrsrc displays only public attributes. In general, attributes which are not public are only useful for application developers and not for end users. You need to specify -p0 to list all attributes: lsrsrc -p0 IBM.PagingDevice
The lsrsrcdef command returns the description of a resource class or a resource and their attributes.
You can use it when:
Creating selection strings
Developing new monitoring conditions
Obtain detailed information about a resource
The -ap or -ad flags instruct the command to return persistent or dynaic attributes respectively
The -c flag indicates whether lsrsrcdef returns information pertaining to the class or the resource.
The -e flag indicates that the returned information will return the full description of an attribute. This description can be long and is not displayed by default.
# lsrsrcdef -ap -e IBM.FileSystem Bf
The lssrcdef command also displays public attributes only by default. Using the -p0 flag, all attributes are displayed.
refresh forces the resource manager owning this class to detect any changes in the configuration of resources in this class. The lsrsrc -x command returns all the resource class names.
To chop the double-quote characters returned by lsrsrc around the resource class names:
# for i in `lsrsrc -x | sed 's/\"//g'`; do; echo $i; done
refresh all resource classes using refrsrc:
# for i in `lsrsrc -x | sed 's/\"//g'`; do; refrsrc -V $i; done
When you want to monitor a part of the AIX environment for which RMC does not provide a predefined resource class or attribute, you can create your own resource, called a sensor, using the Sensor resource manager. A sensor is a command that will be periodically executed, returning a set of values that can be monitored with the Event Response resource manager.
The Sensor resource manager provides four commands to list, create, modify or remove sensors; lssensor, mksensor, chsensor, and rmsensor.
The lssensor command displays either a list of sensors, or the list of attributes of sensors. With no flags, the lssensor list the sensors defined locally, and with the -a flags, all sensors defined in the cluster. The CT_CONTACT variable indicate where the command will be executed.
If you prefer not to modify the CT_CONTACT variable, you can use the -n flag to specify the nodes on which you look for the defined sensors: # lssensor -n svr04,svr05
The mksensor command creates a sensor on one node. You cannot create the same sensor on multiple nodes at the same time.
# lssensor -n svr05; # mksensor -n svr05 NumUsers1 "/SharedTools/NumLogins.ksh"; # lssensor -n svr05
The only option that can be set during the creation of the sensor is the period at which the sensor is executed, using the -i flag followed by an integer representing a time in seconds. Apart from the sensor name, you must specify as the second argument of the mksensor command the name of the program you want to execute. You can also specify arguments to this program by surrounding the program name and its arguments with double quotes.
# mksensor -i 300 NumUsers1 "/SharedTools/NumLogins.pl abc"
The chsensor command is used to rename a sensor. You cannot use it to modify the sensor period or the command to be executed. If you need such a change, the only solution is to delete the sensor and recreate it.
Using the -a flag, you can rename a sensor on all nodes in the cluster, or you can restrict the renaming to a set of nodes with the -n flag, followed by the nodes names.
The rmsensor command delete sensor on one or several nodes at the same time, using the same -a and -n flags as chsensor. You can delete several sensors in one command by passing several sensor names as arguments to rmsensor.
The Event Response resource manager handles three types of objects: conditions, responses and associations.
Display lscondition, lsresponse, and lscondresp
Creation mkcondition, mkresponse, and mkcondresp
Modification chcondition and chresponse
Removal rmcondition, rmresponse, and rmcondresp
Activation startcondresp and stopcondresp
The monitoring of events using the CLI follows the same principles as when using the GUI:
1. First, define the kind of event you want to monitor.
2. Define the response you want the system to exercise when the events occurs. It is made from either the predefined actions or any actions you provide (programs and scripts).
3. Define associations between the conditions you want to monitor and the responses you have chosen.
4. Activate the monitoring of the actions by starting the associations.