Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 5.0.x applies for all supported platforms.

Problems fixed in IBM Spectrum Scale 5.0.3.2 [July 18, 2019]

Problem description: There will be a long waiter like below: Waiting 8349.1305 sec since 00:03:05, monitored, thread 133060 AcquireBRTHandlerThread: on ThCond 0x3FFE74012E78 (MsgRecordCondvar), reason 'RPC wait' for tmMsgBRRevoke on node 192.168.117.82
Work around: No
Problem trigger: race condition between handling an inbound connection and node joining
Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters
Platforms affected: All
Functional Area affected: All
Customer Impact: High IJ17133
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The mmlslicense command with the -Y option is displaying product edition information for all nodes in the list from the local node information. This is incorrect. It should only display the edition for the local node only and "-" for all other nodes. All of the other options on this command only display the local edition information as well.
Work around: Ignore edition information any node that is not the local node.
Problem trigger: Just running the command with a 2 node or more cluster
Symptom: Error output/message
Platforms affected: All
Functional Area affected: Admin Commands
Customer Impact: Suggested IJ17136
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The mmlsfileset command with "-i -d" options could run into an infinite loop when there is no enough free memory and indirect block descriptors in system. In addition, the similar loop issue could happen during mmrestripefs, snapshot deletion and ACL garbage collection processes.
Work around: Increase the maxFilesToCache to allow more indirect block descriptors in cache. Also make sure there's enough free physical memory in system.
Problem trigger: Run mmlsfileset -i -d, snapshot delete and mmrestripefs commands, or enable ACL, when no enough free physical memory in system with default or low configuration for maxFilesToCache parameter.
Symptom: The mmlsfileset, snapshot delete and mmrestripefs commands hang there and other mm* commands cannot proceed as well. The background ACL garbage collection thread is running in a loop if ACL is enabled.
Platforms affected: All
Functional Area affected: mmlsfileset, mmrestripefs, snapshot delete commands and ACL garbage collection process.
Customer Impact: Critical IJ16674
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The mmfs.log file may contain an entry like this: "[E] sdrServ: Communication error on socket /var/mmfs/mmsysmon/mmsysmonitor.socket, [err 79] Can not access a needed shared library"
Work around: N/A. The reported error code "79" is internally used, and means "connection refused".
Problem trigger: No recreate procedure available for the reported issue. The underlying issue was, that GPFS internal codes were not mapped to Linux system codes. That gave the wrong message text when printing the corresponding system message text for such a code.
Symptom: Error output/message
Platforms affected: ALL Linux OS environments
Functional Area affected: System Health
Customer Impact: has little or no impact on customer operation IJ16707
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: File system unmounted when application overwrite data blocks
Work around: None
Problem trigger: Overwriting data block followed by disk down in the file system.
Symptom: unmounted
Platforms affected: All
Functional Area affected: gpfs core
Customer Impact: High Importance IJ16712
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: On RHEL7.6 node, with supported GPFS versions 4.2.3.13 or higher and 5.0.2.2 or higher, when kernel upgrade to version 3.10.0-957.19.1 or 3.10.0-957.21.2 (after apply RHBA-2019:1337) or higher, the node may encounter a kernel crash while running an IO operations.
Work around: disable selinux
Problem trigger: An inconsistency between the GPFS kernel portability layer and the kernel level
Symptom: Abend/Crash
Platforms affected: RHEL7.6 with kernel 3.10.0-957.19.1 or higher
Functional Area affected: All
Customer Impact: High Importance IJ16783
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A user may create a file system with an unhealthy number of allocated inodes in the root fileset. This can cause the inode allocation map to become sub optimal when creating further independent filesets that don't have as many allocated inodes. The only way to reformat the inode allocation map is to recreate the file system.
Work around: Recreate file system with favorable inode allocation map parameters.
Problem trigger: Create file system with very large NumInodesToPreallocate.
Symptom: Performance Impact/Degradation
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ16716
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Raising the fsstruct_fixed event as stated in the documentation will not work and returns an error in version 5.0.2-x instead.
Work around: Include the file system name two times as arguments of mmsysmonc to raise fsstruct_fixed
Problem trigger: Spectrum Scale Version 5.0.2-x is installed
Symptom: Unexpected Results/Behavior
Platforms affected: All
Functional Area affected: System Health
Customer Impact: Suggested: has little or no impact on customer operation IJ16782
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: mmlslicense --capacity fails to report the correct disk size
Work around: Manually getting the disk size from blockdev command.
Problem trigger: Underlying device names are not found on all NSD servers
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments
Functional Area affected: Admin Commands
Customer Impact: Suggested: has little or no impact on customer operation IJ16678
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: There are 3 problems: 1. if an upload of a file >2GB crashes, this blocks all further not service ticket-related uploads of call home forever 2. the call home feature of resending of failed scheduled uploads does not work 3. If any of call home group members crashed during the data collection, mmsysmonitor.log on the group master will have a persistent repeating error entry in its log
Work around: For 3 aforementioned issues: 1. in LOCKINFO (/var/mmfs/callhome/log/ecc/rsENCallECCLock.dat) change FILE_SIZE to a value, which is less than 2G 2. none 3. delete on the call home master node the contents of /callhome/incomingFTDC2CallHome
Problem trigger: 1. upload of files >2GB, which are not service ticket-related 2. instable connection to ECuRep 3. call home group members crashing during the call home scheduled data collection
Symptom: Component Level Outage
Platforms affected: ALL Linux OS environments
Functional Area affected: Callhome
Customer Impact: Suggested IJ17147
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A node in the home cluster hit the following assertion when a remote node joins the cluster: 2019-04-16_14:55:37.346+0200: [X] logAssertFailed: (nodesPP[nidx] == NULL || nodesPP[nidx] == niP)
Work around: No
Problem trigger: remote node joins and leaves the cluster
Symptom: Abend/Crash
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ16676
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Trying to clear the READONLY attribute of an immutable file through SMB succeeded within the retention period.
Work around: No
Problem trigger: A Windows SMB client is trying to clear the READONLY attribute on an immutable file that has not expired.
Symptom: Error output/message
Platforms affected: Windows Only
Functional Area affected: SMB/Immutability
Customer Impact: High Importance IJ17524
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: When an encryption policy references a key identifier that is longer than 64 characters, policy application fails.
Work around: No
Problem trigger: Create an encryption policy that references a key identifier which is longer than 64 characters and attempt to apply the policy
Symptom: Policy application fails.
Platforms affected: All
Functional Area affected: encryption
Customer Impact: Low IJ17569
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Memory leak when the gateway node joins the cluster. Reply data is not freed after obtaining the lead gateway node. Lead gateway functionality is no longer used.
Work around: No
Problem trigger: Gateway node joining the cluster.
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17534
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Memory leak when the gateway node is not yet ready to handle the requests when the node designation is changed
Work around: No
Problem trigger: Gateway node joining the cluster.
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17537
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Advisory locks are recorded in the Linux kernel on the local node via file_lock structures, and GPFS maintains an additional structure to accomplish locking across nodes. There are times when inode object was freed, companioned with a blocked lock waiter is resumed by GPFS, GPFS will try to free the file_lock along with the GPFS structure, and access the obsolete inode structure data, which causes kernel crash.
Work around: No
Problem trigger: A large fcntl locking workload and lock contention.
Symptom: Abend/Crash
Platforms affected: ALL Linux OS environments
Functional Area affected: All
Customer Impact: High Importance IJ17471
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Today we're going through all the nodes in the cluster (including those remote cluster nodes that mount the local filesystem), to find the single Gateway node for the fileset to which we need to queue the application IO request that is generated. On clusters having huge number of remote cluster mounted nodes, this causes a considerable application performance degradation.
Work around: No
Problem trigger: Have a large number of remote cluster nodes mounting the filesystem from the owning cluster. (customer has about 9000 such nodes mounting the FS). Now every time an application node sends request to the gateway node - in order to find the gateway node it needs to go through the entire list of 9K nodes to find this single node. In similar fashion the gateway node also needs to confirm that it is indeed the serving gateway node for the request sent. Verifying from the 9K node list. This takes up considerable amount of time in the application IO path to queue the request from the app node to gateway and ack from gateway back to application node in order to complete the application IO request.
Symptom: Silent performance degradation for the applications performing IO to the AFM fileset.
Platforms affected: ALL Linux OS environments (AFM Gateway nodes). All Linux and AIX environments (Application nodes running IO to the AFM fileset).
Functional Area affected: AFM - NFS and GPFS backend filesets. with afmHashVersions 2 and 5. With afmFastHashVersion tunable turned on.
Customer Impact: High Importance IJ17170
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: getxattr for 'security' namespace is not well blocked during quiesce that may cause assert "SGNotQuiesced"
Work around: No
Problem trigger: When file system is quiesced (for example when run mmcrsnapshot/mmdelsnapshot), all vfs operations should be blocked. If there are applications which accessing file's 'security' namespace extended attributes (for example 'getcap' command), that getxattr vfs operation is not well blocked and may cause assert "SGNotQuiesced"
Symptom: Abend/Crash
Platforms affected: All
Functional Area affected: Snapshots
Customer Impact: High Importance IJ17112
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: When RDMA connection is in bad situation, the new NSD requests will go remaining RDMA connections. The in flight NSD requests will fail back to TCP socket for them even there are still other remaining RDMA connections.
Work around: No
Problem trigger: port or link error on node which has multi IB ports
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments
Functional Area affected: RDMA
Customer Impact: Suggested IJ17172
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: AFM prefetch does not work if the files have 64 bit inode numbers assigned to them. When checking the file for the cached bit, 32 bit inode number is used and the integer overflow might cause file's cached state to be returned as true.
Work around: No
Problem trigger: AFM prefetch
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17557
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Primary fileset might run out of inode space if large number of files are created/deleted.
Work around: No
Problem trigger: Inode space might be exhausted.
Symptom: Abend/Crash
Platforms affected: Linux Only
Functional Area affected: AFM DR
Customer Impact: IJ17175
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: When the UID remapping is enabled, daemon asserts or the kernel crash occurs on the nodes in the client cluster. This happens when the remapping scripts does not remap any credentials or the enableStatUIDremap is not enabled.
Work around: 1. For the daemon assert, correct the remap scripts to remap the credentials 2. For the kernel crash, enable enableStatUIDremap config option
Problem trigger: UID remapping with incorrect mmname2uid script and file metadata modification when enableStatUIDremap is not enabled.
Symptom: Abend/crash
Platforms affected: All
Functional Area affected: Remote cluster mount/UID remapping
Customer Impact: High Importance IJ17114
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: AFM prefetch on the small sized files have performance issue as the file is flushed to the disk without closing the open instance. This causes file not to be shrunk to fit into the subblocks and the full block of data is transferred to the NSD server.
Work around: No
Problem trigger: AFM prefetch
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17576
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Data write operation is being performed if file was already synced but migrated at secondary in role reversal feature. If file(s) are migrated then write operation should be skipped in role reversal and set only attrs.
Work around: None
Problem trigger: Migrated files are there in role reversal
Symptom: Write operation is happened on migrated file.
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM and AFM DR
Customer Impact: Suggested IJ17570
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: RPC message was reported as lost, like below: Message ID 735239 was lost by node ip_address node_name wasLost 1
Work around: None
Problem trigger: Network is not good which leads to reconnect happening several times
Symptom: Node expel/Lost Membership
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ17538
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: FSSTRUCT error: FSErrValidate could be generated in system log after adding new disk to a file sytem.
Work around: None
Problem trigger: Add new disk to a file system while running GPFS 5.0.1.0 thru 5.0.3.1
Symptom: Error output/message
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ17554
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: After reboot of a node the systemhealth NFS monitoring was started, but not the SMB component and monitoring. AD authentication was configured for NFS, which depends on a running SMB component. This constellation yield to a "winbind-down" event, but gave no hint about the root cause
Work around: mmshutdown followed by mmstartup might help, since the entire stack (including SMB/NFS and their monitors) are restarted. The log level could be increased during the startup and check phase (mmces log level 3) to get more details in the mmfs.log file. For production, this log level should be lowered ( to 0 or 1).
Problem trigger: The circumstances which may lead to the detected mismatch were not repeatable. This seems to be a rare race situation, and was not reported before.
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments (CES nodes)
Functional Area affected: CES
Customer Impact: High Importance IJ17559
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: mmexpelnode fails, when cluster manager and file system managers network cable pulled for CCR enabled clusters and tiebreaker disks configured. GPFS file systems got unmounted on other hosts.
Work around: None
Problem trigger: mmexpelnode executed in a CCR enabled cluster with tiebraker disks configured.
Symptom: Unexpected Results/Behavior
Platforms affected: All
Functional Area affected: Admin Commands (mmexpelnode)
Customer Impact: High Importance IJ17580
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: When accessing gpfs zlib compressed file by mmap (or execute a gpfs zlib compressed executable file), kernel may crash with oops message "unable to handle kernel paging request" at IoDone routine
Work around: None
Problem trigger: accessing gpfs zlib compressed file by mmap (or execute a zlib compressed executable file)
Symptom: Abend/Crash
Platforms affected: ALL Linux OS environments
Functional Area affected: GPFS Native Compression
Customer Impact: High Importance IJ17593IJ17593
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Deadlock when AFM filesets are accessed using the remote mounted file system due to the mismatch in the gateway node configuration between client (remote) and storage (home) clusters. It is unclear how the configuration mismatch happens.
Work around: None
Problem trigger:
Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters
Platforms affected: All OS environments
Functional Area affected: AFM and AFM DR
Customer Impact: Critical IJ17581I
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A stripe group / file system manager panic occurs while another node (non-SGmgr) is accessing files in a snapshot. These accesses can be part of the snapshot deletion itself, or another maintenance command (such as mmdeldisk or mmrestripefs), or even ordinary user accesses from the kernel. The diagnostic error reported in the log on the stripe group (SG) manager node looks like this, though the line number may vary: 2019-05-06_23:23:22.122-0300: [X] File System fs1 unmounted by the system with return code 2 reason code 0, at line 4646 in /afs/apd.pok.ibm.com/u/gpfsbld/buildh/ttn423ptf13/src/avs/fs/mmfs/ts/fs/llio.C The "unmount in llio.C" message is usually followed by a message mentioning "Reason: SGPanic", but this does not always occur, and a SGPanic can be caused by other unrelated problems. The error is triggered by a snapshot listed as DeleteRequired by mmlssnapshot. The snapshot access that causes the error, however, will be to an earlier snapshot (with smaller snapId); though it may be difficult to determine which access or which node caused the panic. Further, at least one snapshot must be a fileset snapshot (file systems with only global snapshots, are not affected). The specific enabling factors, however, are complicated and quite rare for most customers, so this is not a common problem.
Work around: The work-around is to remove DeleteRequired snapshots with an mmdelsnapshot command with an explicit -N argument listing only the SG manager node.
Problem trigger: The error is triggered by a snapshot listed as DeleteRequired by mmlssnapshot. The snapshot access that causes the error, however, will be to an earlier snapshot (with smaller snapId); though it may be be to an earlier snapshot (with smaller snapId); though it may be Further, at least one snapshot must be a fileset snapshot (file systems with only global snapshots, are not affected). The specific enabling factors, however, are complicated and quite rare for most customers, so this is not a common problem.
Symptom: Cluster/File System Outage
Platforms affected: All OS environments
Functional Area affected: Snapshots
Customer Impact: Suggested: has little or no impact on customer operation IJ17595
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: For AFM migration, provide an option to revalidate with home once after the cut over to the new system for the performance improvement during the fileset access.
Work around: None
Problem trigger: AFM migration
Symptom: Performance Impact/Degradation
Platforms affected: All OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17582
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: gnrhealthcheck is not catching the case where an ESS system is setup without having verified that that both servers see the enclosures/drives.
Work around: None
Problem trigger: This problem is caused by an invalid ESS deployment.
Symptom: Error output/message
Platforms affected: ALL Linux OS environments
Functional Area affected: ESS/GNR
Customer Impact: Suggested: IJ17583
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: When running I/O with NFS unexpected failovers occurred without an obvious reason. NFS is reported as 'not active', even it is still working.
Work around: No workaround available. There is a manuel way to temporary modify the event declaration for the observed "nfs_not_active" event by modifying the event action in the event configuration file ( ask L2 for support).
Problem trigger: In the reported cases some high I/O load lead to the situation that NFS v3 and/or v4 (whatever is configured) NULL requests failed, and that a following internal statistics check reported no activity regarding the number of internal NFS operations. The monitor interpreted this as a "hung" state and triggered a failover. In fact, the system might be still functional, and the internally detected "unresponsive" state might be just temporarily, so that a failover is not advised in this case. However, at the time of monitoring there was no further indication available.
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments (CES nodes)
Functional Area affected: Systemhealth
Customer Impact: High Importance IJ17598
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: QOS may deadlock on the file system manager node, particularly if there are many (hundreds) of nodes mounting the file system and the manager node is is heavily CPU or network loaded.
Work around: 1) mmchqos FS stat-slot-time 15000 stat-poll-interval 60 or if that is not sufficient... 2) Disable QOS until fix is available.
Problem trigger: See problem description.
Symptom: Hang or Deadlock
Platforms affected: ALL
Functional Area affected: QOS
Customer Impact: High Importance, especially for customers using QOS with hundreds of nodes mounting the file system. IJ17584
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A filesystem containing a dot in the name was declared as to be ignored by declaring a file /var/mmfs/etc/ignoreAnyMount.. However, the systemhealth monitor treated it as a missing filesystem.
Work around: No work around available. Filesystems could be named with an underscore instead of a dot, if a separator is wanted
Problem trigger: A filename /var/mmfs/etc/ignoreAnyMount. is split internally by dots, so that it results in three items (which is not wanted): /var/mmfs/etc/ignoreAnyMount filesystemWith dot
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments (CES nodes)
Functional Area affected: Systemhealth
Customer Impact: little impact on customer operation IJ17600
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Customer cannot create a smb export under specific conditions.
Work around: Choose names of gpfs file systems while no file system is a substring of any other
Problem trigger:
Symptom: Customer is limited to special setup for his gpfs file systems
Platforms affected: ALL Linux OS environments
Functional Area affected: SMB
Customer Impact: Suggested IJ17585
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: If a relative pathname is provided in an export definition, the mmnfs command will allow it which will cause the Ganesha NFS server to fail.
Work around: None
Problem trigger: Relative pathname to --pseudo option of the mmnfs command.
Symptom: Unexpected results.
Platforms affected: Linux
Functional Area affected: Protocols
Customer Impact: Suggested IJ17607
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: AFM is unable to prefetch the data if the file metadata is changed. For example if the user changes the metadata (ex. chmod) on the uncached file, prefetch skips reading the file.
Work around: Read the file manually without the prefetch
Problem trigger: AFM prefetch
Symptom: Unexpected Results/Behavior
Platforms affected: All OS environments
Functional Area affected: AFM
Customer Impact: High Importance IJ17601
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: "mmhealth node show" might show degraded status for CLOUDGATEWAY even though "mmcloudgateway service status -N tctServers" shows all OK
Work around: None
Problem trigger: If Cloudgateway was in a degraded state and changed to "only_ensures_cloud_container_exists" status it did not trigger mmhealth to go to a "healthy" state.
Symptom: Unexpected Results/Behavior
Platforms affected: Linux
Functional Area affected: System Health TCT
Customer Impact: High Importance: an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ17665
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
This update addresses the following APARs: IJ16674 IJ16676 IJ16678 IJ16707 IJ16712 IJ16716 IJ16782 IJ16783 IJ17112 IJ17114 IJ17133 IJ17136 IJ17147 IJ17170 IJ17172 IJ17175 IJ17471 IJ17524 IJ17534 IJ17537 IJ17538 IJ17554 IJ17557 IJ17559 IJ17569 IJ17570 IJ17576 IJ17580 IJ17581 IJ17582 IJ17583 IJ17584 IJ17585 IJ17593 IJ17595 IJ17598 IJ17600 IJ17601 IJ17607 IJ17665.

Problems fixed in Spectrum Scale 5.0.3.1 [May 31, 2019]

Problem description: When creating DMAPI session there is a small window where memory is getting corrupted causing GPFS daemon crash with sig 11.
Work Around: None
Problem trigger: Creating lots of DMAPI sessions with heavy workload
Symptom: Abend/Crash
Platforms affected: All
Functional Area affected: DMAPI
Customer Impact: Suggested IJ15859
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: RPCs sending via RDMA are pending there forever and they are in 'sending' state. Long waiters with Verbs RDMA like: Waiting 2273.0813 sec since 11:05:04, monitored, thread 113229 BackgroundSyncThread: for RDMA send completion fast on node 192.168.1.1
Work Around: None
Problem trigger: Reply lost on RDMA network
Symptom: Hang
Platforms affected: ALL Linux OS environments
Functional Area affected: RDMA
Customer Impact: High IJ15892
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: If gpfs is shutdown on a node it it possible that ces ips are assigned to this nodes two minutes after shutdown. This ces ips are not usable for the customer.
Work Around: Suspend node before gpfs shutdown.
Problem trigger: Node has still a valid gpfs lease two minutes after shutdown.
Symptom: Unexpected Results/Behavior
Platforms affected: ALL
Functional Area affected: CES
Customer Impact: High IJ15912
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A race condition may cause mmperfmon that update sensor fail with the following message: fput failed: Invalid version on put (err 807) Other commands fail with the above message as well.
Work Around: Rerun the failed command.
Problem trigger: Problem hit more often using spectrum command to install.
Symptom: Error output/message "fput failed: Invalid version on put (err 807)" Upgrade/Install failure
Platforms affected: ALL Operating System environments but more oftern on Linux nodes in CCR environment.
Functional Area affected: Admin Commands
Customer Impact: Suggested IJ16079
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: "mmuserauth service create" command failed due to TCP port 445 being blocked. However, error message indicated incorrect credentials which was not the correct reason for failure.
Work Around: None
Problem trigger: The issue is seen at the time of configuring Authentication, in those setups where TCP Port 445 is blocked. The command internally tries to connect to the DC specified via the Port. Due to blocked port, it fails to connect with a timeout. However, the error message shown currently indicates of incorrect credentials which is not the case.
Symptom: Error output/message
Platforms affected: ALL Linux OS environments
Functional Area affected: Authentication
Customer Impact: Suggested IJ16084IJ16084
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: FSErrInodeCorrupted FSSTRUCT error could be written to system log as result of stale buffer for directory block.
Work Around: None
Problem trigger: Change in token manager list as result of either node failure or change in number of manager nodes.
Symptom: Error output/message
Platforms affected: ALL Linux OS environments
Functional Area affected: All
Customer Impact: Suggested IJ16085
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The output of mmlscluster --ces show multiple entries for the same IP address. The cesiplist file (stored in ccr) did contain these multiple entries, so mmlscluster just displayed them. This was obviously a misconfiguration.
Work Around: A reassignment of IPs (moves, failover,suspend/resume) triggers some rewrite of the cesiplist file, which cleans up these inconsistencies. It is necessary that the affected node is involved in the IP movement.
Problem trigger: The circumstances which may lead to multiple IP entries of the same IP for a node is not known. This seems to happen occasionally, but very rarely.
Symptom: Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments (CES nodes)
Functional Area affected: CES
Customer Impact: has little or no impact on customer operation IJ16091
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Unexpected wndb down after smb startup without know reason at log level 0.
Work Around: Start wndb manually.
Problem trigger: Unknown
Symptom: Unexpected Results/Behavior
Platforms affected: All
Functional Area affected: CES
Customer Impact: Medium Importance IJ16093
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Trying to delete an immutable file through SMB fails after the retention period expires. The problem is that Samba as SMB server denies deletion when the READONLY flag is set.
Work Around: None
Problem trigger: A Windows SMB client is trying to delete an immutable file after the retention period expires.
Symptom: Error output/message
Platforms affected: Windows Only
Functional Area affected: SMB/Immutability
Customer Impact: High Importance IJ16094
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: If too many pdisks are unreadable (not missing) because of which we are not able to write to a vtrack, it is possible that we commit the stale strips information to metadata log. When scrubber tries to scrub the vtrack, it will examine this stale strip data and declare data loss.
Work Around: None
Problem trigger: Unavailability of pdisks to do a write vtrack.
Symptom: IO error.
Platforms affected: ALL Linux OS environments
Functional Area affected: ESS/GNR
Customer Impact: Critical IJ16095
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Problem description: FSErrCheckHeaderFailed error could be correctly issued and logged in the system log.

Work Around: None

Problem trigger: User application move files out of directory before deleting the directory.

Symptom: Error output/message

Platforms affected: All

Functional Area affected: All

Customer Impact: Suggested IJ15910
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: GPFS daemon will sig11 or log assert with "offset < ddbP->mappedLen" when user application, log recovery, tsdbfs or mmfsck command access a corrupted directory (directory's file size is smaller than 32 Bytes - the size of directory block header structure).
Work Around: None
Problem trigger: This kind corrupted directory could be caused by previous code bug.
Symptom: Abend/Crash
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ15909
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Few operations on the IW fileset will take longer than expected as there's a unintended dependency created on previous operations performed on the fileset which will be attempted to replicate to the remote/home side via the operation that is currently performed.
Work Around: None
Problem trigger: Users running 5.0.3 and running workload on AFM IW mode filesets should see a few elongated operations (performance impact) on the filesets owing to a few dependent operations performed on the same file/fileset earlier - which are waiting to be asynchronously pushed to the home/remote site.
Symptom: Few operations on the IW fileset might take longer than expected - since it is working other asynchronous operations as its dependents to the remote site. Few waiters might be seen to linger for a few extra seconds and once the dependencies are resolved the waiters should vanish.
Platforms affected: ALL Operating System environments (AFM application and Gateway nodes).
Functional Area affected: AFM - and Specifically users on AFM IW mode filesets only.
Customer Impact: High Importance. IJ16110
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Enable AFM prefetch for the single fileset to run from the multiple gateway nodes for improving the migration performance
Work Around: None
Problem trigger: AFM prefetch, slow performance
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: Suggested IJ16112
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: GPFS daemon crash when application writing data into file system
Work Around: None
Problem trigger: A memory failure of newBuffer in a busy system.
Symptom: Crash
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ15993
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Enable AFM prefetch for the single fileset to run from the multiple gateway nodes for improving the migration performance. This enhancement also handles the scenario where same file is being read from the multiple gateway nodes.
Work Around: None
Problem trigger: AFM prefetch, slow performance
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM
Customer Impact: Suggested IJ16113
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: On a system without ifup/ifdown commands installed, nearly any call to a mm-command shows messages like which: no ifup in (/bin:/usr/bin:/sbin:/usr/sbin:/usr/lpp/mmfs/bin) which: no ifdown in (/bin:/usr/bin:/sbin:/usr/sbin:/usr/lpp/mmfs/bin) and terminate the called mm-program
Work Around: Not available. An install of ifup/ifdown would resolve the issue, but might yield to other issues
Problem trigger: Any mm-command may run into this issue if the ifup/ifdown commands are not installed on the system
Symptom: Error output/message
Platforms affected: ALL Linux OS environments
Functional Area affected: CES
Customer Impact: High Importance IJ16114
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The mmfsadm dump command could run into an infinite loop when dumping the token objects.
Work Around: avoid to run mmfsadm dump command.
Problem trigger: run mmfsadm dump command while workloads are running in the cluster.
Symptom: mmfsadm dump command hang.
Platforms affected: ALL Operating System environments except Windows
Functional Area affected: mmfsadm dump command
Customer Impact: Suggested IJ15996
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: If the file system was formatted with narrow disk address (2.2 version or older), and the gpfs version is 4.2.3 or 5.0.x version, GPFS daemon assert would happen randomly.
Work Around: None
Problem trigger: Application I/O into a narrow disk address file system by using 4.2.3 or 5.0.x GPFS versions.
Symptom: Crash, likes assert subblocksPerFileBlock==(1<<(tinodeP->getFblockSize()))
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ16116
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The mmrepquota -q and -t option command usage is ambiguous. Options -q and -t should not be used when combined with Device:Fileset because they are file system attributes.
Work Around: None
Problem trigger: The current mmrepquota command usage allows invoking -q option as follows: mmrepquota -q Device:fileset
Symptom: mmrepquota -q Device:fileset gives file system default quota information and not perfileset-quota.
Platforms affected: All
Functional Area affected: Quotas
Customer Impact: Suggested IJ15914
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: For file systems created with large NumNodes and large NumInodesToPreallocate arguments, the inode allocation map ends up with a large value for nRegions and nBitsPerSubsegment. For subsequent independent filesets created with orders of magnitude less NumInodesToPreallocate, this can leave most of the inode map segments as unusable/surplus. During inode lookup as part of inode allocation, these surplus segments may be read from disk many times causing performance degradation.
Work Around: Increase allocated inodes in the problem fileset.
Problem trigger: File systems created with large NumNodes and large NumInodesToPreallocate arguments. Then independent filesets are created with orders of magnitude less NumInodesToPreallocate.
Symptom: Performance Impact/Degradation
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ15991
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Fileset might get stuck and prevent filesystem quiesce when AFM DR filesets finds that inode did not have remote attributes and it tries to build the remote attributes using tsfindinode command after blocking the filesystem quiesce. Remote attributes are used to find the remote file using the file handle for the replication.
Work Around: None
Problem trigger: AFM DR with renames to the deleted directories
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments
Functional Area affected: AFM DR
Customer Impact: Critical IJ16024
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: FSErrInodeCorrupted FSSTRUCT error could be issued incorrectly during lookup when both directory and its parent directory are being deleted.
Work Around: None
Problem trigger: Perform lookup on '..' entry of a directory that is being deleted.
Symptom: Error output/message
Platforms affected: All
Functional Area affected: All
Customer Impact: Suggested IJ15916
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: During file manager take over, the new manager will broadcast to all mount nodes to invalidate their cached low level file metadata. If at the same time, a low level file is being opened on the mount node, they have chance to race and causes logAssertFailed "ibdP->llfileP == this" or logAssertFailed "inode.indirectionLevel >= 1
Work Around: One of our customers reports they hit this problem while they run mmdelsnapshot. For mmdelsnapshot scenario, deleting the oldest snapshot first will greatly reduce the risk.
Problem trigger: The race existing between file manager take over and low level file opening (the latter one can happen for many reasons - including but not limited to mmdelsnapshot)
Symptom: Abend/Crash
Platforms affected: All
Functional Area affected: All
Customer Impact: High Importance IJ15961
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: GPFS admin commands may cause high CPU usage. This is due to remote GPFS command calls find command to cleanup tmp files on system with large number of subdirs and files under /var/mmfs/tmp.
Work Around: Manually cleanup to reduce the number of subdirs and files under /var/mmfs/tmp. Kill running find processes that invoked from /usr/lpp/mmfs/mmremote processes.
Problem trigger: Nodes with large number of subdirs and files under /var/mmfs/tmp are mostly likely affected.
Symptom: Performance Impact/Degradation, hang
Platforms affected: All
Functional Area affected: Admin Commands
Customer Impact: High Importance an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ15858
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Not checking session info length when creating DMAPI session which is supposed to be less than or equal to 256 bytes. As per DMAPI standards it needs to return E2BIG errno. Instead GPFS is truncating the length to 256 bytes and proceeding with the session creation.
Work Around: None
Problem trigger: Creating DMAPI session with very long session info string
Symptom: None
Platforms affected: All
Functional Area affected: DMAPI
Customer Impact: Suggested IJ16117
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The arping command is used by the NFS failover mechanism, but was not found on the system. It was installed, but the log files show a No such file or directory message, which indicates that the arping command was not found in the expected path.
Work Around: Probably it would help to set a symbolic link from the arping command to "/usr/bin/arping", which is the default if the distro could not be properly detected. Basically using links is not advised, since they could be a security issue.
Problem trigger: The circumstances which lead to the issue is not fully understood. Most likely the OS detection using the /etc/redhat-release file detection did not work, so that the wrong distro was assumed, which lead to a wrong expected path name for the arping command location. So finally it was not found then. This older CentOS version does not yet have the /etc/os-release file provided by newer distros, which we use meanwhile, too.
Symptom: Error output/message
Platforms affected: All CentOS environments (CES nodes)
Functional Area affected: CES
Customer Impact: has little or no impact on customer operation IJ15998
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Deadlock during the AFM fileset recovery due to lock ordering issue when rename operations are being executed
Work Around: None
Problem trigger: AFM fileset recovery with renames to newly created directories.
Symptom: Long Waiters/Deadlock
Platforms affected: All Linux OS
Functional Area affected: AFM and AFM DR
Customer Impact: Critical IJ15963
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: gpfs systemd service (gpfs.service) may report fail after shutdown
Work Around: The fail systemd status is not an error condition of GPFS shutdown. The systemd fail status can be ignored.
Problem trigger: When shutting down GPFS, if the main systemd process (runmmfs) does not exit quickly, a kill signal is sent to the main process either by the shutdown subroutine or by systemd manager itself.
Symptom: Error output/message Unexpected Results/Behavior
Platforms affected: ALL Linux OS environments with systemd version >= 219
Functional Area affected: Admin Commands/systemd
Customer Impact: has little or no impact on customer operation IJ15962
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Using mmchcluster command to enable CCR may fail. While the mmchcluster command is working to enable ccr, any other mm cmd can remove authorized_ccr_keys file which is needed for in the final step of CCR enable. This problem occurs more often when the first quorum node in the list is on a GPFS supported systemd system. If the mmchcluster command is running on a quorum node, the mmchcluster command considers that node is the first quorum node in the list.
Work Around: Run mmchcluster on a quorum node that does not support GPFS systemd. Or temporarily disable system health chmod 000 /usr/lpp/mmfs/bin/mmsysmon*
Problem trigger: While the mmchcluster command is working to enable ccr, any other mm cmd can remove authorized_ccr_keys file which is needed for in the final step of CCR enable.
Symptom: Error output/message
Platforms affected: ALL Linux OS environments with systemd version >= 219
Functional Area affected: CCR Admin Commands
Customer Impact: High Importance to customers that want to enable CCR. IJ15915
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Some GPFS commands don't work correctly if the cluster name contains special characters.
Work Around: Change the name of the cluster so that it does not contain any special characters.
Problem trigger: Cluster name with special character like the ampersand "&" causes command like mmauth show . to fail
Symptom: GPFS admin commands error. Error output/message Unexpected Results/Behavior
Platforms affected: all
Functional Area affected: admin commands
Customer Impact: Low Importance IJ15908
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: AFM does not keep directory mtime in sync while reading the directory contents from the home. This may be a problem for some users during the migration
Work Around: None
Problem trigger: AFM migration/prefetch or cache readdir/lookup
Symptom: Unexpected results
Platforms affected: All Linux OS
Functional Area affected: AFM
Customer Impact: Critical IJ15990
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: The NFS/Ganesha service did not process I/O, and the systemhealth monitor showed that the NFS NULL checks for protocol versions 3 and 4 failed. The Ganesha process was shown in the process list, and also logging and replies to requests via Dbus worked. There was no failover.
Work Around: Manual restart of NFS/Ganesha mmces service stop nfs ( or kill the gpfs.ganesha process) then mmces service start nfs
Problem trigger: The reason why the NFS/Ganesha hung was not evaluated. The main issue was that the Ganesha process was not entirely "dead" since the process was running, and it replied to remote requests via Dbus and also wrote log entries. It was "dead" regarding I/O handling, but the systemhealth monitor did not notice this properly.
Symptom: Performance Impact/Degradation
Platforms affected: ALL Linux OS environments (CES Nodes running NFS)
Functional Area affected: CES
Customer Impact: High Importance IJ16036
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Assert exp(totalLen <= extensionLen) in line 16424 of file /project/sprelttn423/build/rttn423s008a/src/avs/fs/mmfs/ts/nsd/nsdServer.C
Work Around: None
Problem trigger: This issue affects customers running IBM Spectrum Scale 4.2.3 and later if the following conditions are true 1) mixed-endianess cluster, or mixed-endianess remote clusters. 2) RDMA enabled (and NSD client may send NSD requests to a NSD server which has a different endianess) 3) NSD client or NSD Server is IBM Spectrum Scale 4.2.3 It's a rare case assert which may happen when the client sends the first NSD request to a NSD server which has different endianess.
Symptom: Abend/Crash
Platforms affected: ALL Linux OS environments
Functional Area affected: RDMA
Customer Impact: Suggested IJ16020
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: Not failing with err 22 when dm_getall_disp dmapi is called with bad sessionId
Work Around: None
Problem trigger: When dm_getall_disp is called bad sessionId
Symptom: Error output/message
Platforms affected: ALL
Functional Area affected: DMAPI
Customer Impact: Suggested IJ16064
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: mmfsck man page provides instruction to clear the fsstruct error from the mmhealth command. "mmsysmonc event filesystem fsstruct_fixed " But this is not correct. As a result, documented command will fail with syntax error.
Work Around: None
Problem trigger: Executing command as instructed in man page
Symptom: Error output/message due to documentation problem
Platforms affected: ALL Operating System environments
Functional Area affected: System Health
Customer Impact: High Importance IJ16329
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Problem description: A recent performance change in GPFS 5.0.3 that GPFS commands more sensitive to network congestion. This causes command like mmgetstate -a to report unknown status or other GPFS commands to report nodes unreached.
Work Around: Command like mmgetstate -a can be issued again to get the status.
Problem trigger: This affects only on node running GPFS 5.0.3. It affects all GPFS admin commands that need to execute command remotely.
Symptom: Error message like the below: "The following nodes could not be reached:" mmgetstate -N or -a reports "unknown" state.
Platforms affected: All
Functional Area affected: Admin Commands
Customer Impact: High Importance: an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ16395
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
This update addresses the following APARs: IJ15858 IJ15859 IJ15892 IJ15908 IJ15909 IJ15910 IJ15912 IJ15914 IJ15915 IJ15916 IJ15961 IJ15962 IJ15963 IJ15990 IJ15991 IJ15993 IJ15996 IJ15998 IJ16020 IJ16024 IJ16036 IJ16064 IJ16079 IJ16084 IJ16085 IJ16091 IJ16093 IJ16094 IJ16095 IJ16110 IJ16112 IJ16113 IJ16114 IJ16116 IJ16117 IJ16329 IJ16395.

Problems fixed in Spectrum Scale 5.0.3.2 for Protocols include the following:

smb: Return share name in correct case from net rpc conf showshare
smb: Add gpfs.smb 4.9.8_gpfs_21-1

Problems fixed in Spectrum Scale 5.0.3.1 for Protocols include the following:

gui: AD names should allow dots

gui: Better handling on warning message for remote mounted file systems

gui: Filesets - The "Type" and "AFM Role" displayed in the export correction

gui: Updates required to accurately show GNR User Condition definitions

gui: The CAPACITY_LICENSE task fails when there are no NSDs

gui: Edit quota dialog not displayed for user,group,fileset quotas

gui: Do no longer display a warning or error icon on SSD endurance percentage

gui: Hourly call to mmaudit list should not occur

toolkit: Fixed WCE parsing for some SAS cards

smb: Version 4.9.7_gpfs_20-1

smb: Change the memory check to cover the total of main memory and swap space

smb: Stabilize gencache after gencache flush

smb: Fill gencache with domain info returned from domain controller

smb: Enable logging for early startup failures

smb: Properly track the size of talloc objects

smb: Remove implementations of SaveKey/RestoreKey

smb: Pass back what we have in _wbint_Sids2UnixIDs().

callhome: Updated to 5.0.3-1 nomenclature

kafka: Updated to 5.0.3-1 nomenclature

Problems fixed in Spectrum Scale Protocols Packages 5.0.3-0 [Apr 19, 2019]

Please see the "What's New" page in the IBM Knowledge Center

Copyright and trademark information

http://www.ibm.com/legal/copytrade.shtml

Notices

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Microsoft, Windows, and Windows Server are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Other company, product, or service names may be trademarks or service marks of others.

THIRD-PARTY LICENSE TERMS AND CONDITIONS, NOTICES AND INFORMATION

The license agreement for this product refers you to this file for details concerning terms and conditions applicable to third party software code included in this product, and for certain notices and other information IBM must provide to you under its license to certain software code. The relevant terms and conditions, notices and other information are provided or referenced below. Please note that any non-English version of the licenses below is unofficial and is provided to you for your convenience only. The English version of the licenses below, provided as part of the English version of this file, is the official version.

Notwithstanding the terms and conditions of any other agreement you may have with IBM or any of its related or affiliated entities (collectively "IBM"), the third party software code identified below are "Excluded Components" and are subject to the following terms and conditions:

the Excluded Components are provided on an "AS IS" basis
IBM DISCLAIMS ANY AND ALL EXPRESS AND IMPLIED WARRANTIES AND CONDITIONS WITH RESPECT TO THE EXCLUDED COMPONENTS, INCLUDING, BUT NOT LIMITED TO, THE WARRANTY OF NON-INFRINGEMENT OR INTERFERENCE AND THE IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
IBM will not be liable to you or indemnify you for any claims related to the Excluded Components
IBM will not be liable for any direct, indirect, incidental, special, exemplary, punitive or consequential damages with respect to the Excluded Components.

Product/Component Name:	Platform:	Fix:
IBM Spectrum Scale	Linux 64-bit,x86_64 RHEL Linux 64-bit,x86_64 SLES Linux 64-bit,x86_64 Ubuntu	Spectrum_Scale_Erasure_Code-5.0.3.2-x86_64-Linux

Readme and Release notes for release 5.0.3.2 IBM Spectrum Scale 5.0.3.2 Spectrum_Scale_Erasure_Code-5.0.3.2-x86_64-Linux Readme