Readme file for IBM® Spectrum LSF 10.1 Fix 602128

Abstract

P105195. This fix prevents jobs from randomly exiting with a 255 exit status code and stops the sbatchd daemon from core dumping when v2 cgroups are enabled on a Linux system.

Description

Readme documentation for IBM Spectrum LSF 10.1 Fix 602128 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This fix addresses the following issue:

When a Linux system uses v2 cgroups, jobs can randomly exit with a 255 (abnormal termination) status code, and the sbatchd daemon can core dump if it cannot find the job file while executing.

 

Readme file for: IBM® Spectrum LSF

Product or component release: 10.1

Update name: Fix 602128

Fix ID: LSF-10.1-build602128

Publication date: 13 August 2024

 

Contents

1. List of fixes

2. Download location

3. Product or components affected

4. System requirements

5. Installation and configuration

6. List of files

7. Product notifications

8. Copyright and trademark information

 

1. List of fixes

P105195

 

2. Download locations

Download Fix 602128 from the following location: https://www.ibm.com/support/fixcentral

 

3. Product or components affected

Affected product or components include:

LSF/sbatchd

LSF/res

 

4. System requirements

linux2.6-glibc2.3-x86_64

linux3.10-glibc2.17-x86_64

 

5. Installation and configuration

Before you install

LSF_TOP is the full path to the top-level installation directory of LSF.

1.      Before you apply this fix make sure that you installed LSF 10.1 Fix Pack 12 or later. You can download LSF 10.1 Fix Pack 12 or later from https://www.ibm.com/support/fixcentral.

2.      Starting in LSF 10.1 Fix Pack 13, the default values of the following three GPU parameters are changed to:
LSF_GPU_AUTOCONFIG=Y
LSB_GPU_NEW_SYNTAX=extend
LSF_GPU_RESOURCE_IGNORE=Y

If you have Fix Pack 13 installed, and these GPU parameters are not configured in the lsf.conf configuration file, the default values will be used, and the parameters already configured in the lsf.conf file will not be affected.

If you want to keep the former GPU behavior, and if any of the three parameters are missing in the lsf.conf configuration file, you must explicitly configure the following default settings that are defined in Fix Pack 12 or earlier:
LSF_GPU_AUTOCONFIG=N
LSB_GPU_NEW_SYNTAX=N
LSF_GPU_RESOURCE_IGNORE=N

3.      If you are using the lsfd.service file from Fix Pack 14 or the Fix  build601849, make sure you install the Fix build602067. You can download the LSF fix from https://www.ibm.com/support/fixcentral. Search for build602067. Contact IBM LSF Support if you have any questions or problems with installing build602067.     

4.   Log on to the LSF management host as the LSF primary administrator.

5.      Set your environment:
-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf
-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

Installation steps

1.      Run badmin hclose all

2.    Run badmin qinact all

3.    Log on to the LSF management host as root and set the LSF cluster environment

4.      Go to the install directory: cd $LSF_ENVDIR/../10.1/install/

5.      Copy the fix file to the install directory: $LSF_ENVDIR/../10.1/install/

4.      Run patchinstall: ./patchinstall <fix>

After you install

1.      Log on to the LSF management host as the LSF primary administrator and set your environment:

-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

2.    Run lsadmin resrestart all

3.    Run badmin hrestart all

4.    Run badmin hopen all

5.    Run badmin qact all

 

Uninstallation

1.    Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment

2.    Run badmin hclose all

3.    Run badmin qinact all

4.    Log on to the LSF management host as root and set the LSF cluster environment

5.    Go to the patch install directory: cd $LSF_ENVDIR/../10.1/install/

6.    Run ./patchinstall -r <patch>

7.    Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment

8.   Run lsadmin resrestart all

9.    Run badmin hrestart all

10.   Run badmin hopen all

11.   Run badmin qact all

 

6. List of files

The following components in all Linux and Unix packages:

LSF/sbatchd

LSF/res

 

7. Product Notifications

To receive information about product solution and fix updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.

 

8. Copyright and trademark information

©Copyright IBM Corporation 2024

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.