Installing, Configuring, Using, and Troubleshooting RSV¶
About This Guide¶
The Resource and Service Validation (RSV) software helps a site administrator verify that certain site resources and services are working as expected. OSG recommends that sites install and run RSV, but it is optional; further, each site selects which specific tests (called probes) to run.
Use this page to learn more about RSV in general, and how to install, configure, run, test, and troubleshoot RSV from the OSG software repositories. For documentation on specific probes or on how to write your own probes, please check the Reference section.
Introduction to RSV¶
The Resource and Service Validation (RSV) software provides OSG site administrators a scalable and easy-to-maintain resource and service monitoring infrastructure. The components of RSV are:
- RSV Client. The client tools allow a site administrator to run tests against their site by providing a set of tests (which can run on the same or other hosts within a site), HTCondor-Cron for scheduling, and tools for collecting and storing the results (using Gratia). The client package is not installed by default and may be installed on a CE or other host. Generally, you configure the RSV client to run tests at scheduled time intervals and then it makes results available on a local website. Also, the client can upload test results to a central collector (see next item).
- RSV Collector/Server. The central OSG RSV Collector accepts and stores results from RSV clients throughout OSG, which can be viewed in MyOSG, on the “Current RSV Status” page and under the “Resource Group” menu.
- MyOSG and OIM Links. RSV picks up resource information, WLCG interoperability information, etc., from a MyOSG resource group summary listing, which is in turn based on the OSG Information Management (OIM) (topology) system (Requires registration). Resource maintenance scheduled on OIM are forwarded to WLCG SAM, if applicable.
Before starting the installation process, consider the following points (consulting the Reference section below as needed):
- User IDs: If they do not exist already, the installation will create the Linux user IDs
- Service certificate: The RSV service requires a service certificate (
/etc/grid-security/rsv/rsvcert.pem) and matching key (
- Network ports: To view results, port 80 must accept incoming requests; outbound connectivity to tested services must work, too
- Host choice: Install RSV on your site CE unless you have specific reasons (e.g., performance) for installing on a separate host
As with all OSG software installations, there are some one-time (per host) steps to prepare in advance:
- Ensure the RSV host has a supported operating system
- Obtain root access to the host
- Prepare the required Yum repositories
- Install CA certificates
An installation of RSV at a site consists of the RSV client software, the Apache web server, parts of HTCondor (for its cron-like scheduling capabilities), and various other small tools. To simplify installation, OSG provides a convenience RPM that installs all required software with a single command.
Consider updating your local cache of Yum repository data and your existing RPM packages:
updatecommand will update all packages on your system.
If you have installed HTCondor already but not by RPM, install a special empty RPM to make RSV happy:
[email protected] # yum install empty-condor --enablerepo=osg-empty
Install RSV and related software:
[email protected] # yum install rsv
After installation, there are some one-time configuration steps to tell RSV how to operate at your site.
/etc/osg/config.d/30-rsv.iniand follow the instructions in the file. There are detailed comments for each setting. In the simplest case — to monitor only your CE — set the
htcondor_ce_hostsvariable to the fully qualified hostname of your CE.
If you have installed HTCondor already but not by RPM, specify the location of the Condor installation in
condor_locationsetting. If an HTCondor RPM is installed, you do not need to set
Complete the configuration using the
The following configuration steps are optional and will likely not be required for setting up a small or typical site. If you do not need any of the following special configurations, skip to the section on using RSV.
Generally speaking, read the ConfigureRsv page for more advanced configuration options.
Configuring RSV to run probes using a remote server¶
RSV monitors systems by running probes, which can run on the RSV host itself (the default case), via a separate batch system like HTCondor, or via a remote batch system using a Globus gatekeeper and its job manager. The last two options both can count those jobs and report them to, for example, Gratia.
In this case, remember to:
- Add the RSV user
rsvon all the systems where the probes may run, and
- Map the RSV service certificate to the user you intend to use for RSV. This should be a local user used exclusively for RSV and not belonging to an institutional VO to avoid for the RSV probes to be accounted as regular VO jobs in Gratia. This can be done in GUMS or using a grid-mapfile-local (if you use a grid-mapfile). MapServiceCertToRsvUser explains how to configure GUMS or the grid-mapfile. Also see the CE installation document for more information.
Configuring the RSV web server to use HTTPS instead of HTTP¶
If you would like your local RSV web server to use HTTPS instead of the default HTTP (for compatibility or security reasons), complete the steps below. This procedure assumes that you already have an HTTP service certificate (or a copy of the host certificate) in
/etc/grid-security/http/. If not, omit the
SSLCertificate* modifications below, and your web server will start with its own, self-signed certificate.
[email protected] # yum install mod_ssl
Make an alternate set of HTTP service certificate files:
Back up existing Apache configuration files:
Change the default port for HTTP connections to 8000 by editing
Set up HTTPS access by editing
Listen 8443 <VirtualHost _default_:8443> SSLCertificateFile /etc/grid-security/http/httpcert2.pem SSLCertificateKeyFile /etc/grid-security/http/httpkey2.pem
After these changes, when you start the Apache service, it will listening on ports
8000(for HTTP) and
8443(for HTTPS), rather than the default port
80(for HTTP only).
if you make the changes above, you must restart the Apache server after each CA certificate update to pick up the changes.
Managing RSV and associated services¶
In addition to the RSV service itself, there are a number of supporting services in your installation. The specific services are:
||See CA documentation|
Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as
|To …||Run the command …|
|Start a service||
|Stop a service||
|Enable a service to start during boot||
|Disable a service from starting during boot||
Running RSV manually¶
Normally, the HTCondor-Cron scheduler runs RSV periodically. However, you can run RSV probes manually at any time:
[email protected] # rsv-control --run --all-enabled
If successful, results will be available from your local RSV web server (e.g.,
http://localhost/rsv) and, if enabled (which is the default) on MyOSG.
You can also run the metrics individually or pass special parameters as explained in the rsv-control document.
To get assistance, use the help procedure.
RSV has a tool to collect information useful for troubleshooting into a tarball that can be shared with the developers and support staff. To use it:
[email protected]# rsv-control --profile Running the rsv-profiler... OSG-RSV Profiler Analyzing... Making tarball (rsv-profiler.tar.gz)
You can find more information on troubleshooting RSV in the rsv-control documentation.
If you are getting assistance via the trouble ticket system, you must add a
.txt extension to the tarball so it can be uploaded:
Failed to send via Gratia¶
If you see
Failed to send record Failed to send via Gratia: Server unable to receive data: in
/var/log/rsv/consumers/gratia-consumer.output you should process to disable the gratia consumer using the following commands
[email protected]# rsv-control --disable --host <YOUR RSV HOST> gratia-consumer [email protected]# rsv-control --off --host <YOUR RSV HOST> gratia-consumer
Important file locations¶
Logs and configuration:
|Metric log files||
|Consumer log files||
||Generally files in this directory should not be edited directly. Use
||To change arguments and environment|
To find the metrics and the other files in RSV you can use also the RPM commands:
rpm -ql rsv-metrics and
rpm -ql rsv.
Getting more information from rsv-control¶
The first step to getting more information is to run rsv-control with more verbosity. Use the
-v) flag. This flag can be used with any of rsv-control's abilities (run, enable, list, etc). The verbosity levels are:
- 0 = print nothing
- 1 = print warnings and errors along with usual output of command being run (1 is the default level)
- 2 = adds informational messages
- 3 = full debugging output
For example, here is the output when running a metric with -v2.
Show detailed ouput
[[email protected] condor]# rsv-control -r org.osg.general.osg-version -v 2 -u osg-edu.cs.wisc.edu INFO: Reading configuration file /etc/rsv/rsv.conf INFO: Reading configuration file /etc/rsv/consumers.conf INFO: Validating configuration: INFO: Validating user: INFO: Invoked as root. Switching to 'rsv' user (uid: 100 - gid: 102) INFO: Registered consumers: html-consumer, gratia-consumer INFO: Loading config file '/etc/rsv/meta/metrics/org.osg.general.osg-version.meta' INFO: Loading config file '/etc/rsv/metrics/org.osg.general.osg-version.conf' INFO: Optional config file '/etc/rsv/metrics/osg-edu.cs.wisc.edu/org.osg.general.osg-version.conf' does not exist INFO: Checking proxy: INFO: Using service certificate proxy INFO: Running command with timeout (1200 seconds): /usr/bin/openssl x509 -in /tmp/rsvproxy -noout -enddate -checkend 21600 INFO: Exit code of job: 0 INFO: Service certificate valid for at least 6 hours. INFO: Pinging host osg-edu.cs.wisc.edu: INFO: Running command with timeout (1200 seconds): /bin/ping -W 3 -c 1 osg-edu.cs.wisc.edu INFO: Exit code of job: 0 INFO: Ping successful Running metric org.osg.general.osg-version: INFO: Executing job remotely using Condor-G INFO: Setting up job environment: INFO: No environment setup declared INFO: Condor-G working directory: /var/tmp/rsv/condor_g-JiQthF INFO: Forming arguments: INFO: Arguments: '' INFO: List of files to transfer: /usr/libexec/rsv/probes/RSVMetric.pm INFO: Condor submission: Submitting job(s). 1 job(s) submitted to cluster 2. INFO: Trimming data to 10000 bytes because details-data-trim-length is set INFO: Creating record for html-consumer consumer at '/var/spool/rsv/html-consumer/org.osg.general.osg-version.7rgLfn' INFO: Creating record for gratia-consumer consumer at '/var/spool/rsv/gratia-consumer/org.osg.general.osg-version.-qelnL' INFO: Result: metricName: org.osg.general.osg-version metricType: status timestamp: 2012-01-25 16:12:40 CST metricStatus: OK serviceType: OSG-CE serviceURI: osg-edu.cs.wisc.edu gatheredAt: fermicloud016.fnal.gov summaryData: OK detailsData: OSG 1.2.26 EOT
To get assistance, please use this page and attach the output of
[email protected] # rsv-control --profile Running the rsv-profiler... OSG-RSV Profiler Analyzing... Making tarball (rsv-profiler.tar.gz)
The RSV installation will create two users unless they are already created. The users are created when the
condor-cron packages are installed.
||Runs the RSV tests; the RSV certificate (below) will need to be owned by this user|
||Runs the Condor Cron processes to schedule the running of the tests|
if you pre-create the RSV user, it should have a working shell. That is, it shouldn't have a default shell of
If you manage your
/etc/passwd file with configuration management software such as Puppet, CFEngine or 411, make sure the UID and GID in
/etc/condor-cron/config.d/condor_ids matches the UID and GID of the
cndrcron user and group in
/etc/passwd. If it does not, create a file named
/etc/condor-cron/config.d/condor_ids_override with the contents:
GID are the UID and GID of the
cndrcron user and group.
|Certificate||User that owns certificate||Path to certificate|
|RSV service certificate||
Ensure an RSV service certificate is installed in
/etc/grid-security/rsv/ and the certificate files are owned by the
rsv user. Adjust the permissions if necessary (cert needs to be readable by all, key needs to be readable by nobody but owner).
You may need another certificate owned by
apache if you'd like an authenticated web server; see Configuring the RSV web server to use HTTPS instead of HTTP above.
See instructions to request a service certificate.
|Service Name||Protocol||Port Number||Inbound||Outbound||Comment|
|HTTP||tcp||80||YES||RSV runs an HTTP server (Apache) that publishes a page with the RSV testing results|
|HTTP||tcp||80||YES||RSV pushes testing results to the OSG Gratia Collectors at opensciencegrid.org|
|various||various||various||YES||Allow outbound network connection to all services that you want to test|
Or, if you'd rather have your RSV web page appear as
https://...:8443/rsv/ like it used to in OSG 1.2, the first column above would be HTTPS / tcp / 8443. See above for how to configure this.