Readme file for IBM® Spectrum LSF 10.1 Fix 601993

Abstract

P105112. This fix introduces a new feature to LSF called cluster affinity. When a user submits jobs with the same cluster affinity attribute LSF will run the jobs in the same cluster.

Description

Readme documentation for IBM Spectrum LSF 10.1 Fix 601993 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This fix introduces a new feature to LSF called cluster affinity. When a user submits jobs with the same cluster affinity attribute LSF will run the jobs in the same cluster. This feature is intended for multicluster environments.


LSF scheduling behaviour


If a job does not have a cluster affinity requirement specified, there is no change to how the job is scheduled. If a job does have a cluster affinity requirement specified, but it is not within a multi-cluster environment, there is no change to how the job is scheduled. If a job does have a cluster affinity requirement specified and is within a multi-cluster environment, then the scheduler will use the job’s cluster affinity requirement to determine to which cluster the job should be forwarded. The first time the scheduler sees a job with a specific cluster affinity requirement, the job will be scheduled as normal. LSF will make note of which cluster was used for that specific cluster affinity requirement. The next time a job with the same specific cluster affinity requirement requires scheduling, the scheduler will recall  the first job used and forward the new job to the same cluster. Note that if the configured time-to-live for cluster affinity has elapsed before seeing a new job with the same cluster affinity requirement, LSF will delete the corresponding cluster affinity attribute.

Submitting jobs

To submit a job with a cluster affinity attribute, the  -jobaff option has been extended to allow you to specify a cluster affinity attribute. The first time a job is submitted with a specific cluster affinity attribute, LSF creates the specific cluster affinity attribute:

bsub -jobaff “cluster_affinity(attribute_name)” ./a.out

 

Jobs submitted by the same user with the same cluster affinity attribute name will be forwarded to the same cluster. The scope of the cluster affinity attributes is at the user level within the submission cluster. This means:

• Different users can specify the same cluster affinity attributes, but LSF will treat them as unique.

• Jobs submitted by the same user in different clusters with the same cluster affinity attribute are in a unique namespace based on the submission cluster.


The bmod command can be used to change a job’s cluster affinity attribute if the job is still pending and has not been forwarded to a remote cluster.

Viewing cluster affinity attribute information

The battr show command has been extended to display cluster affinity information:

> battr show -h

Usage:    show        {[-json | -w] [-m "host_name/host_group ... "] [-u user_name] [attr_name ...]} | {-cluster_affinity [-w | -l | -json] [-c cluster_name] [-u user_name | all] [attr_name ...]}

Use the -cluster_affinity option to view the cluster affinity attributes information:

-cluster_affinity              view cluster affinity attribute information
-w                                            wide format
-l                                            long format
-json                                      output JSON format
-c <cluster_name>               show attributes associated with the specified cluster
-u <user_name>                    show attributes associated with the specified user
-u all                                    show attributes associated with all users
attr_name                              show attributes with the specified names

> battr show -cluster_affinity

NAME           TTL(HH:MM:SS) USER       CLUSTER        #JOBS DESCRIPTION
tag1           -             root       -              1     Cluster affinity attri
tag            00:08:17      root       cluster2       0     Cluster affinity attri

> battr show -cluster_affinity -l
         NAME: tag1
TTL(HH:MM:SS): -
         USER: user1
      CLUSTER: -
  DESCRIPTION: Cluster affinity attribute created for job 137.
         JOBS: 137
------------------------------------------------------------------------------
         NAME: tag
TTL(HH:MM:SS): 00:09:24
         USER: user1
      CLUSTER: cluster2
  DESCRIPTION: Cluster affinity attribute created for job 110.
         JOBS:

NAME:   Cluster affinity attribute name.
TTL:     Time-to-live if no running or pending jobs associated with the cluster affinity attribute.
USER:   The user.
DESCRIPTION: LSF generated. The job that caused the creation of this cluster affinity attribute.

If no cluster is associated with a cluster affinity name a hyphen (“-“) will be shown for the CLUSTER field. If there are running or pending jobs associated with a cluster affinity name a hyphen will be shown for the TTL field.

Automatic cluster affinity attribute deletion

LSF will automatically delete a cluster affinity attribute when time-to-live (TTL) value has expired. The TTL counter for a cluster affinity attribute begins to decrement when there are no pending or running jobs that reference it (-jobaff “cluster_affinity(attribute_name)”). When the counter reaches zero it is removed. If another job is submitted that references it the TTL counter is reset.

Clearing cluster associations

Use the battr reset command to clear or reset a cluster affinity attribute association.

This can be useful to deal with the case where a group of jobs get bound to a cluster, but for whatever reason, the whole group cannot complete using the cluster because the cluster goes down. This provides a way to reset,delete, or clear the current cluster association and LSF tries again:

> battr reset -h
Usage:    reset       -cluster_affinity [-u user_name | all] attr_name [attr_name ...]

-cluster_affinity              reset the cluster affinity association
-u <user_name>                    reset attributes associated with the specified user
-u all                                    reset attributes associated with all users
attr_name                              reset attributes with the specified names

> battr reset -cluster_affinity attr1
Cluster affinity attribute <attr1> successfully reset.

The LSF administrator can reset any cluster affinity association. An ordinary user can only reset those that belong to them.

Configuring TTL cluster wide

Configure CLUSTER_AFFINITY_TTL in the lsb.params file. Specifies the time-to-live for newly-created cluster affinity Attributes.

CLUSTER_AFFINITY_TTL=<time_hours>

CLUSTER_AFFINITY_TTL=<time_minutes>m | M

Valid values

Any positive integer between 10 minutes and 100 hours. If this parameter is set to a value that is lower than 10m, CLUSTER_AFFINITY_TTL defaults to the minimum value of 10 minutes. If this parameter is set to a value that is higher than 100, CLUSTER_AFFINITY_TTL defaults to the maximum value of 100 hours.

Default

1 (1 hour)

 

Readme file for: IBM® Spectrum LSF

Product or component release: 10.1

Update name: Fix 601993

Fix ID: LSF-10.1-build601993

Publication date: 24 April 2024

 

Contents

1. List of fixes

2. Download location

3. Product or components affected

4. System requirements

5. Installation and configuration

6. List of files

7. Product notifications

8. Copyright and trademark information

 

1. List of fixes

P105112

 

2. Download locations

Download Fix 601993 from the following location: https://www.ibm.com/support/fixcentral

 

3. Product or components affected

Affected product or components include:

LSF/bparams

LSF/bsub

LSF/bmod

LSF/bjobs

LSF/battr

LSF/bhist

LSF/mbatchd

LSF/mbschd

LSF/ebrokerd

LSF/bhosts

LSF/liblsbstream.so

LSF/liblsbstream.a

LSF/libbat.so

LSF/libbat.a

LSF/liblsf.so

LSF/liblsf.a

LSF/lsbatch.h

LSF/lsf.h

LSF/lsfproxyd

4. System requirements

linux2.6-glibc2.3-x86_64

linux3.10-glibc2.17-x86_64

 

5. Installation and configuration

Before you install

LSF_TOP is the full path to the top-level installation directory of LSF.

1.      Before you apply this fix, ensure that you installed LSF 10.1 Fix Pack 14 or above. You can download LSF 10.1 Fix Pack 14 from https://www.ibm.com/support/fixcentral. Search for build601547.

2.      Log on to the LSF management host as the LSF primary administrator.

3.      Set your environment:
-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf
-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

Installation steps

1.      Log on to the LSF management host as the root user and set your environment:
-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

2.      Go to the install directory: cd $LSF_ENVDIR/../10.1/install/

3.      Copy the fix file to the install directory: $LSF_ENVDIR/../10.1/install/

4.      Run patchinstall: ./patchinstall <fix>

After you install

1.      Log on to the LSF management host as the LSF primary administrator and set your environment:

-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

2.      Run badmin mbdrestart -s

Uninstallation

1.      Log on to the LSF management host as the root user and set your environment:

-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

2.      Go to the install directory: cd $LSF_ENVDIR/../10.1/install/

3.      Run ./patchinstall -r <fix>

4.      Log on to the LSF management host as the LSF primary administrator and set your environment:

-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

5.      Run badmin mbdrestart -s

 

6. List of files

The following components in all Linux and UNIX packages:

LSF/bparams

LSF/bsub

LSF/bmod

LSF/bjobs

LSF/battr

LSF/bhist

LSF/mbatchd

LSF/mbschd

LSF/ebrokerd

LSF/bhosts

LSF/liblsbstream.so

LSF/liblsbstream.a

LSF/libbat.so

LSF/libbat.a

LSF/liblsf.so

LSF/liblsf.a

LSF/lsbatch.h

LSF/lsf.h

LSF/lsfproxyd

 

7. Product Notifications

To receive information about product solution and fix updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.

 

8. Copyright and Trademark Information

©Copyright IBM Corporation 2024

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.