Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 5.0.x applies for all supported platforms.
Problems fixed in IBM Spectrum Scale 5.0.3.2 [July 18, 2019]
- Problem description: There will be a long waiter like below: Waiting 8349.1305
sec since 00:03:05, monitored, thread 133060 AcquireBRTHandlerThread: on
ThCond 0x3FFE74012E78 (MsgRecordCondvar), reason 'RPC wait' for tmMsgBRRevoke
on node 192.168.117.82
- Work around: No
- Problem trigger: race condition between handling an inbound connection and node joining
- Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High IJ17133
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The mmlslicense command with the -Y option is displaying product edition information for all nodes in the list from the local node information. This is incorrect. It should only display the edition for the local node only and "-" for all other nodes. All of the other options on this command only display the local edition information as well.
- Work around: Ignore edition information any node that is not the local node.
- Problem trigger: Just running the command with a 2 node or more cluster
- Symptom: Error output/message
- Platforms affected: All
- Functional Area affected: Admin Commands
- Customer Impact: Suggested IJ17136
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The mmlsfileset command with "-i -d" options could run into an infinite loop when there is no enough free memory and indirect block descriptors in system. In addition, the similar loop issue could happen during mmrestripefs, snapshot deletion and ACL garbage collection processes.
- Work around: Increase the maxFilesToCache to allow more indirect block descriptors in cache. Also make sure there's enough free physical memory in system.
- Problem trigger: Run mmlsfileset -i -d, snapshot delete and mmrestripefs commands, or enable ACL, when no enough free physical memory in system with default or low configuration for maxFilesToCache parameter.
- Symptom: The mmlsfileset, snapshot delete and mmrestripefs commands hang there and other mm* commands cannot proceed as well. The background ACL garbage collection thread is running in a loop if ACL is enabled.
- Platforms affected: All
- Functional Area affected: mmlsfileset, mmrestripefs, snapshot delete commands and ACL garbage collection process.
- Customer Impact: Critical IJ16674
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The mmfs.log file may contain an entry like this: "[E] sdrServ: Communication error on socket /var/mmfs/mmsysmon/mmsysmonitor.socket, [err 79] Can not access a needed shared library"
- Work around: N/A. The reported error code "79" is internally used, and means "connection refused".
- Problem trigger: No recreate procedure available for the reported issue. The underlying issue was, that GPFS internal codes were not mapped to Linux system codes. That gave the wrong message text when printing the corresponding system message text for such a code.
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments
- Functional Area affected: System Health
- Customer Impact: has little or no impact on customer operation IJ16707
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: File system unmounted when application overwrite data blocks
- Work around: None
- Problem trigger: Overwriting data block followed by disk down in the file system.
- Symptom: unmounted
- Platforms affected: All
- Functional Area affected: gpfs core
- Customer Impact: High Importance IJ16712
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: On RHEL7.6 node, with supported GPFS versions 4.2.3.13 or higher and 5.0.2.2 or higher, when kernel upgrade to version 3.10.0-957.19.1 or 3.10.0-957.21.2 (after apply RHBA-2019:1337) or higher, the node may encounter a kernel crash while running an IO operations.
- Work around: disable selinux
- Problem trigger: An inconsistency between the GPFS kernel portability layer and the kernel level
- Symptom: Abend/Crash
- Platforms affected: RHEL7.6 with kernel 3.10.0-957.19.1 or higher
- Functional Area affected: All
- Customer Impact: High Importance IJ16783
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A user may create a file system with an unhealthy number of allocated inodes in the root fileset. This can cause the inode allocation map to become sub optimal when creating further independent filesets that don't have as many allocated inodes. The only way to reformat the inode allocation map is to recreate the file system.
- Work around: Recreate file system with favorable inode allocation map parameters.
- Problem trigger: Create file system with very large NumInodesToPreallocate.
- Symptom: Performance Impact/Degradation
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ16716
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Raising the fsstruct_fixed event as stated in the documentation will not work and returns an error in version 5.0.2-x instead.
- Work around: Include the file system name two times as arguments of mmsysmonc to raise fsstruct_fixed
- Problem trigger: Spectrum Scale Version 5.0.2-x is installed
- Symptom: Unexpected Results/Behavior
- Platforms affected: All
- Functional Area affected: System Health
- Customer Impact: Suggested: has little or no impact on customer operation IJ16782
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: mmlslicense --capacity fails to report the correct disk size
- Work around: Manually getting the disk size from blockdev command.
- Problem trigger: Underlying device names are not found on all NSD servers
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments
- Functional Area affected: Admin Commands
- Customer Impact: Suggested: has little or no impact on customer operation IJ16678
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: There are 3 problems: 1. if an upload of a file >2GB crashes, this blocks all further not service ticket-related uploads of call home forever 2. the call home feature of resending of failed scheduled uploads does not work 3. If any of call home group members crashed during the data collection, mmsysmonitor.log on the group master will have a persistent repeating error entry in its log
- Work around: For 3 aforementioned issues: 1. in LOCKINFO
(/var/mmfs/callhome/log/ecc/rsENCallECCLock.dat) change FILE_SIZE to a value,
which is less than 2G 2. none 3. delete on the call home master node the
contents of
/callhome/incomingFTDC2CallHome - Problem trigger: 1. upload of files >2GB, which are not service ticket-related 2. instable connection to ECuRep 3. call home group members crashing during the call home scheduled data collection
- Symptom: Component Level Outage
- Platforms affected: ALL Linux OS environments
- Functional Area affected: Callhome
- Customer Impact: Suggested IJ17147
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A node in the home cluster hit the following assertion when a remote node joins the cluster: 2019-04-16_14:55:37.346+0200: [X] logAssertFailed: (nodesPP[nidx] == NULL || nodesPP[nidx] == niP)
- Work around: No
- Problem trigger: remote node joins and leaves the cluster
- Symptom: Abend/Crash
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ16676
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Trying to clear the READONLY attribute of an immutable file through SMB succeeded within the retention period.
- Work around: No
- Problem trigger: A Windows SMB client is trying to clear the READONLY attribute on an immutable file that has not expired.
- Symptom: Error output/message
- Platforms affected: Windows Only
- Functional Area affected: SMB/Immutability
- Customer Impact: High Importance IJ17524
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: When an encryption policy references a key identifier that is longer than 64 characters, policy application fails.
- Work around: No
- Problem trigger: Create an encryption policy that references a key identifier which is longer than 64 characters and attempt to apply the policy
- Symptom: Policy application fails.
- Platforms affected: All
- Functional Area affected: encryption
- Customer Impact: Low IJ17569
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Memory leak when the gateway node joins the cluster. Reply data is not freed after obtaining the lead gateway node. Lead gateway functionality is no longer used.
- Work around: No
- Problem trigger: Gateway node joining the cluster.
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17534
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Memory leak when the gateway node is not yet ready to handle the requests when the node designation is changed
- Work around: No
- Problem trigger: Gateway node joining the cluster.
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17537
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Advisory locks are recorded in the Linux kernel on the local node via file_lock structures, and GPFS maintains an additional structure to accomplish locking across nodes. There are times when inode object was freed, companioned with a blocked lock waiter is resumed by GPFS, GPFS will try to free the file_lock along with the GPFS structure, and access the obsolete inode structure data, which causes kernel crash.
- Work around: No
- Problem trigger: A large fcntl locking workload and lock contention.
- Symptom: Abend/Crash
- Platforms affected: ALL Linux OS environments
- Functional Area affected: All
- Customer Impact: High Importance IJ17471
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Today we're going through all the nodes in the cluster (including those remote cluster nodes that mount the local filesystem), to find the single Gateway node for the fileset to which we need to queue the application IO request that is generated. On clusters having huge number of remote cluster mounted nodes, this causes a considerable application performance degradation.
- Work around: No
- Problem trigger: Have a large number of remote cluster nodes mounting the filesystem from the owning cluster. (customer has about 9000 such nodes mounting the FS). Now every time an application node sends request to the gateway node - in order to find the gateway node it needs to go through the entire list of 9K nodes to find this single node. In similar fashion the gateway node also needs to confirm that it is indeed the serving gateway node for the request sent. Verifying from the 9K node list. This takes up considerable amount of time in the application IO path to queue the request from the app node to gateway and ack from gateway back to application node in order to complete the application IO request.
- Symptom: Silent performance degradation for the applications performing IO to the AFM fileset.
- Platforms affected: ALL Linux OS environments (AFM Gateway nodes). All Linux and AIX environments (Application nodes running IO to the AFM fileset).
- Functional Area affected: AFM - NFS and GPFS backend filesets. with afmHashVersions 2 and 5. With afmFastHashVersion tunable turned on.
- Customer Impact: High Importance IJ17170
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: getxattr for 'security' namespace is not well blocked during quiesce that may cause assert "SGNotQuiesced"
- Work around: No
- Problem trigger: When file system is quiesced (for example when run mmcrsnapshot/mmdelsnapshot), all vfs operations should be blocked. If there are applications which accessing file's 'security' namespace extended attributes (for example 'getcap' command), that getxattr vfs operation is not well blocked and may cause assert "SGNotQuiesced"
- Symptom: Abend/Crash
- Platforms affected: All
- Functional Area affected: Snapshots
- Customer Impact: High Importance IJ17112
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: When RDMA connection is in bad situation, the new NSD requests will go remaining RDMA connections. The in flight NSD requests will fail back to TCP socket for them even there are still other remaining RDMA connections.
- Work around: No
- Problem trigger: port or link error on node which has multi IB ports
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments
- Functional Area affected: RDMA
- Customer Impact: Suggested IJ17172
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: AFM prefetch does not work if the files have 64 bit inode numbers assigned to them. When checking the file for the cached bit, 32 bit inode number is used and the integer overflow might cause file's cached state to be returned as true.
- Work around: No
- Problem trigger: AFM prefetch
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17557
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Primary fileset might run out of inode space if large number of files are created/deleted.
- Work around: No
- Problem trigger: Inode space might be exhausted.
- Symptom: Abend/Crash
- Platforms affected: Linux Only
- Functional Area affected: AFM DR
- Customer Impact: IJ17175
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: When the UID remapping is enabled, daemon asserts or the kernel crash occurs on the nodes in the client cluster. This happens when the remapping scripts does not remap any credentials or the enableStatUIDremap is not enabled.
- Work around: 1. For the daemon assert, correct the remap scripts to remap the credentials 2. For the kernel crash, enable enableStatUIDremap config option
- Problem trigger: UID remapping with incorrect mmname2uid script and file metadata modification when enableStatUIDremap is not enabled.
- Symptom: Abend/crash
- Platforms affected: All
- Functional Area affected: Remote cluster mount/UID remapping
- Customer Impact: High Importance IJ17114
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: AFM prefetch on the small sized files have performance issue as the file is flushed to the disk without closing the open instance. This causes file not to be shrunk to fit into the subblocks and the full block of data is transferred to the NSD server.
- Work around: No
- Problem trigger: AFM prefetch
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17576
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Data write operation is being performed if file was already synced but migrated at secondary in role reversal feature. If file(s) are migrated then write operation should be skipped in role reversal and set only attrs.
- Work around: None
- Problem trigger: Migrated files are there in role reversal
- Symptom: Write operation is happened on migrated file.
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM and AFM DR
- Customer Impact: Suggested IJ17570
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: RPC message was reported as lost, like below: Message ID 735239 was lost by node ip_address node_name wasLost 1
- Work around: None
- Problem trigger: Network is not good which leads to reconnect happening several times
- Symptom: Node expel/Lost Membership
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ17538
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: FSSTRUCT error: FSErrValidate could be generated in system log after adding new disk to a file sytem.
- Work around: None
- Problem trigger: Add new disk to a file system while running GPFS 5.0.1.0 thru 5.0.3.1
- Symptom: Error output/message
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ17554
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: After reboot of a node the systemhealth NFS monitoring was started, but not the SMB component and monitoring. AD authentication was configured for NFS, which depends on a running SMB component. This constellation yield to a "winbind-down" event, but gave no hint about the root cause
- Work around: mmshutdown followed by mmstartup might help, since the entire stack (including SMB/NFS and their monitors) are restarted. The log level could be increased during the startup and check phase (mmces log level 3) to get more details in the mmfs.log file. For production, this log level should be lowered ( to 0 or 1).
- Problem trigger: The circumstances which may lead to the detected mismatch were not repeatable. This seems to be a rare race situation, and was not reported before.
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments (CES nodes)
- Functional Area affected: CES
- Customer Impact: High Importance IJ17559
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: mmexpelnode fails, when cluster manager and file system managers network cable pulled for CCR enabled clusters and tiebreaker disks configured. GPFS file systems got unmounted on other hosts.
- Work around: None
- Problem trigger: mmexpelnode executed in a CCR enabled cluster with tiebraker disks configured.
- Symptom: Unexpected Results/Behavior
- Platforms affected: All
- Functional Area affected: Admin Commands (mmexpelnode)
- Customer Impact: High Importance IJ17580
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: When accessing gpfs zlib compressed file by mmap (or execute a gpfs zlib compressed executable file), kernel may crash with oops message "unable to handle kernel paging request" at IoDone routine
- Work around: None
- Problem trigger: accessing gpfs zlib compressed file by mmap (or execute a zlib compressed executable file)
- Symptom: Abend/Crash
- Platforms affected: ALL Linux OS environments
- Functional Area affected: GPFS Native Compression
- Customer Impact: High Importance IJ17593IJ17593
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Deadlock when AFM filesets are accessed using the remote mounted file system due to the mismatch in the gateway node configuration between client (remote) and storage (home) clusters. It is unclear how the configuration mismatch happens.
- Work around: None
- Problem trigger:
- Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters
- Platforms affected: All OS environments
- Functional Area affected: AFM and AFM DR
- Customer Impact: Critical IJ17581I
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A stripe group / file system manager panic occurs while another node (non-SGmgr) is accessing files in a snapshot. These accesses can be part of the snapshot deletion itself, or another maintenance command (such as mmdeldisk or mmrestripefs), or even ordinary user accesses from the kernel. The diagnostic error reported in the log on the stripe group (SG) manager node looks like this, though the line number may vary: 2019-05-06_23:23:22.122-0300: [X] File System fs1 unmounted by the system with return code 2 reason code 0, at line 4646 in /afs/apd.pok.ibm.com/u/gpfsbld/buildh/ttn423ptf13/src/avs/fs/mmfs/ts/fs/llio.C The "unmount in llio.C" message is usually followed by a message mentioning "Reason: SGPanic", but this does not always occur, and a SGPanic can be caused by other unrelated problems. The error is triggered by a snapshot listed as DeleteRequired by mmlssnapshot. The snapshot access that causes the error, however, will be to an earlier snapshot (with smaller snapId); though it may be difficult to determine which access or which node caused the panic. Further, at least one snapshot must be a fileset snapshot (file systems with only global snapshots, are not affected). The specific enabling factors, however, are complicated and quite rare for most customers, so this is not a common problem.
- Work around: The work-around is to remove DeleteRequired snapshots with an mmdelsnapshot command with an explicit -N argument listing only the SG manager node.
- Problem trigger: The error is triggered by a snapshot listed as DeleteRequired by mmlssnapshot. The snapshot access that causes the error, however, will be to an earlier snapshot (with smaller snapId); though it may be be to an earlier snapshot (with smaller snapId); though it may be Further, at least one snapshot must be a fileset snapshot (file systems with only global snapshots, are not affected). The specific enabling factors, however, are complicated and quite rare for most customers, so this is not a common problem.
- Symptom: Cluster/File System Outage
- Platforms affected: All OS environments
- Functional Area affected: Snapshots
- Customer Impact: Suggested: has little or no impact on customer operation IJ17595
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: For AFM migration, provide an option to revalidate with home once after the cut over to the new system for the performance improvement during the fileset access.
- Work around: None
- Problem trigger: AFM migration
- Symptom: Performance Impact/Degradation
- Platforms affected: All OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17582
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: gnrhealthcheck is not catching the case where an ESS system is setup without having verified that that both servers see the enclosures/drives.
- Work around: None
- Problem trigger: This problem is caused by an invalid ESS deployment.
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments
- Functional Area affected: ESS/GNR
- Customer Impact: Suggested: IJ17583
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: When running I/O with NFS unexpected failovers occurred without an obvious reason. NFS is reported as 'not active', even it is still working.
- Work around: No workaround available. There is a manuel way to temporary modify the event declaration for the observed "nfs_not_active" event by modifying the event action in the event configuration file ( ask L2 for support).
- Problem trigger: In the reported cases some high I/O load lead to the situation that NFS v3 and/or v4 (whatever is configured) NULL requests failed, and that a following internal statistics check reported no activity regarding the number of internal NFS operations. The monitor interpreted this as a "hung" state and triggered a failover. In fact, the system might be still functional, and the internally detected "unresponsive" state might be just temporarily, so that a failover is not advised in this case. However, at the time of monitoring there was no further indication available.
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments (CES nodes)
- Functional Area affected: Systemhealth
- Customer Impact: High Importance IJ17598
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: QOS may deadlock on the file system manager node, particularly if there are many (hundreds) of nodes mounting the file system and the manager node is is heavily CPU or network loaded.
- Work around: 1) mmchqos FS stat-slot-time 15000 stat-poll-interval 60 or if that is not sufficient... 2) Disable QOS until fix is available.
- Problem trigger: See problem description.
- Symptom: Hang or Deadlock
- Platforms affected: ALL
- Functional Area affected: QOS
- Customer Impact: High Importance, especially for customers using QOS with hundreds of nodes mounting the file system. IJ17584
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A filesystem containing a dot in the name was declared as
to be ignored by declaring a file /var/mmfs/etc/ignoreAnyMount.
. However, the systemhealth monitor treated it as a missing filesystem. - Work around: No work around available. Filesystems could be named with an underscore instead of a dot, if a separator is wanted
- Problem trigger: A filename /var/mmfs/etc/ignoreAnyMount.
is split internally by dots, so that it results in three items (which is not wanted): /var/mmfs/etc/ignoreAnyMount filesystemWith dot - Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments (CES nodes)
- Functional Area affected: Systemhealth
- Customer Impact: little impact on customer operation IJ17600
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Customer cannot create a smb export under specific conditions.
- Work around: Choose names of gpfs file systems while no file system is a substring of any other
- Problem trigger:
- Symptom: Customer is limited to special setup for his gpfs file systems
- Platforms affected: ALL Linux OS environments
- Functional Area affected: SMB
- Customer Impact: Suggested IJ17585
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: If a relative pathname is provided in an export definition, the mmnfs command will allow it which will cause the Ganesha NFS server to fail.
- Work around: None
- Problem trigger: Relative pathname to --pseudo option of the mmnfs command.
- Symptom: Unexpected results.
- Platforms affected: Linux
- Functional Area affected: Protocols
- Customer Impact: Suggested IJ17607
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: AFM is unable to prefetch the data if the file metadata is changed. For example if the user changes the metadata (ex. chmod) on the uncached file, prefetch skips reading the file.
- Work around: Read the file manually without the prefetch
- Problem trigger: AFM prefetch
- Symptom: Unexpected Results/Behavior
- Platforms affected: All OS environments
- Functional Area affected: AFM
- Customer Impact: High Importance IJ17601
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: "mmhealth node show" might show degraded status for CLOUDGATEWAY even though "mmcloudgateway service status -N tctServers" shows all OK
- Work around: None
- Problem trigger: If Cloudgateway was in a degraded state and changed to "only_ensures_cloud_container_exists" status it did not trigger mmhealth to go to a "healthy" state.
- Symptom: Unexpected Results/Behavior
- Platforms affected: Linux
- Functional Area affected: System Health TCT
- Customer Impact: High Importance: an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ17665
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- This update addresses the following APARs: IJ16674 IJ16676 IJ16678 IJ16707 IJ16712 IJ16716 IJ16782 IJ16783 IJ17112 IJ17114 IJ17133 IJ17136 IJ17147 IJ17170 IJ17172 IJ17175 IJ17471 IJ17524 IJ17534 IJ17537 IJ17538 IJ17554 IJ17557 IJ17559 IJ17569 IJ17570 IJ17576 IJ17580 IJ17581 IJ17582 IJ17583 IJ17584 IJ17585 IJ17593 IJ17595 IJ17598 IJ17600 IJ17601 IJ17607 IJ17665.
Problems fixed in Spectrum Scale 5.0.3.1 [May 31, 2019]
- Problem description: When creating DMAPI session there is a small window where memory is getting corrupted causing GPFS daemon crash with sig 11.
- Work Around: None
- Problem trigger: Creating lots of DMAPI sessions with heavy workload
- Symptom: Abend/Crash
- Platforms affected: All
- Functional Area affected: DMAPI
- Customer Impact: Suggested IJ15859
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: RPCs sending via RDMA are pending there forever and they
are in 'sending' state. Long waiters with Verbs RDMA like: Waiting 2273.0813
sec since 11:05:04, monitored, thread 113229 BackgroundSyncThread: for RDMA
send completion fast on node 192.168.1.1
- Work Around: None
- Problem trigger: Reply lost on RDMA network
- Symptom: Hang
- Platforms affected: ALL Linux OS environments
- Functional Area affected: RDMA
- Customer Impact: High IJ15892
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: If gpfs is shutdown on a node it it possible that ces ips are assigned to this nodes two minutes after shutdown. This ces ips are not usable for the customer.
- Work Around: Suspend node before gpfs shutdown.
- Problem trigger: Node has still a valid gpfs lease two minutes after shutdown.
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL
- Functional Area affected: CES
- Customer Impact: High IJ15912
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A race condition may cause mmperfmon that update sensor fail with the following message: fput failed: Invalid version on put (err 807) Other commands fail with the above message as well.
- Work Around: Rerun the failed command.
- Problem trigger: Problem hit more often using spectrum command to install.
- Symptom: Error output/message "fput failed: Invalid version on put (err 807)" Upgrade/Install failure
- Platforms affected: ALL Operating System environments but more oftern on Linux nodes in CCR environment.
- Functional Area affected: Admin Commands
- Customer Impact: Suggested IJ16079
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: "mmuserauth service create" command failed due to TCP port 445 being blocked. However, error message indicated incorrect credentials which was not the correct reason for failure.
- Work Around: None
- Problem trigger: The issue is seen at the time of configuring Authentication, in those setups where TCP Port 445 is blocked. The command internally tries to connect to the DC specified via the Port. Due to blocked port, it fails to connect with a timeout. However, the error message shown currently indicates of incorrect credentials which is not the case.
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments
- Functional Area affected: Authentication
- Customer Impact: Suggested IJ16084IJ16084
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: FSErrInodeCorrupted FSSTRUCT error could be written to system log as result of stale buffer for directory block.
- Work Around: None
- Problem trigger: Change in token manager list as result of either node failure or change in number of manager nodes.
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments
- Functional Area affected: All
- Customer Impact: Suggested IJ16085
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The output of mmlscluster --ces show multiple entries for the same IP address. The cesiplist file (stored in ccr) did contain these multiple entries, so mmlscluster just displayed them. This was obviously a misconfiguration.
- Work Around: A reassignment of IPs (moves, failover,suspend/resume) triggers some rewrite of the cesiplist file, which cleans up these inconsistencies. It is necessary that the affected node is involved in the IP movement.
- Problem trigger: The circumstances which may lead to multiple IP entries of the same IP for a node is not known. This seems to happen occasionally, but very rarely.
- Symptom: Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments (CES nodes)
- Functional Area affected: CES
- Customer Impact: has little or no impact on customer operation IJ16091
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Unexpected wndb down after smb startup without know reason at log level 0.
- Work Around: Start wndb manually.
- Problem trigger: Unknown
- Symptom: Unexpected Results/Behavior
- Platforms affected: All
- Functional Area affected: CES
- Customer Impact: Medium Importance IJ16093
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Trying to delete an immutable file through SMB fails after the retention period expires. The problem is that Samba as SMB server denies deletion when the READONLY flag is set.
- Work Around: None
- Problem trigger: A Windows SMB client is trying to delete an immutable file after the retention period expires.
- Symptom: Error output/message
- Platforms affected: Windows Only
- Functional Area affected: SMB/Immutability
- Customer Impact: High Importance IJ16094
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: If too many pdisks are unreadable (not missing) because of which we are not able to write to a vtrack, it is possible that we commit the stale strips information to metadata log. When scrubber tries to scrub the vtrack, it will examine this stale strip data and declare data loss.
- Work Around: None
- Problem trigger: Unavailability of pdisks to do a write vtrack.
- Symptom: IO error.
- Platforms affected: ALL Linux OS environments
- Functional Area affected: ESS/GNR
- Customer Impact: Critical IJ16095
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: FSErrCheckHeaderFailed error could be correctly issued and logged in the system log.
- Work Around: None
- Problem trigger: User application move files out of directory before deleting the directory.
- Symptom: Error output/message
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: Suggested IJ15910
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: GPFS daemon will sig11 or log assert with "offset < ddbP->mappedLen" when user application, log recovery, tsdbfs or mmfsck command access a corrupted directory (directory's file size is smaller than 32 Bytes - the size of directory block header structure).
- Work Around: None
- Problem trigger: This kind corrupted directory could be caused by previous code bug.
- Symptom: Abend/Crash
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ15909
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Few operations on the IW fileset will take longer than expected as there's a unintended dependency created on previous operations performed on the fileset which will be attempted to replicate to the remote/home side via the operation that is currently performed.
- Work Around: None
- Problem trigger: Users running 5.0.3 and running workload on AFM IW mode filesets should see a few elongated operations (performance impact) on the filesets owing to a few dependent operations performed on the same file/fileset earlier - which are waiting to be asynchronously pushed to the home/remote site.
- Symptom: Few operations on the IW fileset might take longer than expected - since it is working other asynchronous operations as its dependents to the remote site. Few waiters might be seen to linger for a few extra seconds and once the dependencies are resolved the waiters should vanish.
- Platforms affected: ALL Operating System environments (AFM application and Gateway nodes).
- Functional Area affected: AFM - and Specifically users on AFM IW mode filesets only.
- Customer Impact: High Importance. IJ16110
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Enable AFM prefetch for the single fileset to run from the multiple gateway nodes for improving the migration performance
- Work Around: None
- Problem trigger: AFM prefetch, slow performance
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: Suggested IJ16112
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: GPFS daemon crash when application writing data into file system
- Work Around: None
- Problem trigger: A memory failure of newBuffer in a busy system.
- Symptom: Crash
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ15993
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Enable AFM prefetch for the single fileset to run from the multiple gateway nodes for improving the migration performance. This enhancement also handles the scenario where same file is being read from the multiple gateway nodes.
- Work Around: None
- Problem trigger: AFM prefetch, slow performance
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM
- Customer Impact: Suggested IJ16113
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: On a system without ifup/ifdown commands installed, nearly any call to a mm-command shows messages like which: no ifup in (/bin:/usr/bin:/sbin:/usr/sbin:/usr/lpp/mmfs/bin) which: no ifdown in (/bin:/usr/bin:/sbin:/usr/sbin:/usr/lpp/mmfs/bin) and terminate the called mm-program
- Work Around: Not available. An install of ifup/ifdown would resolve the issue, but might yield to other issues
- Problem trigger: Any mm-command may run into this issue if the ifup/ifdown commands are not installed on the system
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments
- Functional Area affected: CES
- Customer Impact: High Importance IJ16114
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The mmfsadm dump command could run into an infinite loop when dumping the token objects.
- Work Around: avoid to run mmfsadm dump command.
- Problem trigger: run mmfsadm dump command while workloads are running in the cluster.
- Symptom: mmfsadm dump command hang.
- Platforms affected: ALL Operating System environments except Windows
- Functional Area affected: mmfsadm dump command
- Customer Impact: Suggested IJ15996
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: If the file system was formatted with narrow disk address (2.2 version or older), and the gpfs version is 4.2.3 or 5.0.x version, GPFS daemon assert would happen randomly.
- Work Around: None
- Problem trigger: Application I/O into a narrow disk address file system by using 4.2.3 or 5.0.x GPFS versions.
- Symptom: Crash, likes assert subblocksPerFileBlock==(1<<(tinodeP->getFblockSize()))
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ16116
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The mmrepquota -q and -t option command usage is ambiguous. Options -q and -t should not be used when combined with Device:Fileset because they are file system attributes.
- Work Around: None
- Problem trigger: The current mmrepquota command usage allows invoking -q option as follows: mmrepquota -q Device:fileset
- Symptom: mmrepquota -q Device:fileset gives file system default quota information and not perfileset-quota.
- Platforms affected: All
- Functional Area affected: Quotas
- Customer Impact: Suggested IJ15914
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: For file systems created with large NumNodes and large NumInodesToPreallocate arguments, the inode allocation map ends up with a large value for nRegions and nBitsPerSubsegment. For subsequent independent filesets created with orders of magnitude less NumInodesToPreallocate, this can leave most of the inode map segments as unusable/surplus. During inode lookup as part of inode allocation, these surplus segments may be read from disk many times causing performance degradation.
- Work Around: Increase allocated inodes in the problem fileset.
- Problem trigger: File systems created with large NumNodes and large NumInodesToPreallocate arguments. Then independent filesets are created with orders of magnitude less NumInodesToPreallocate.
- Symptom: Performance Impact/Degradation
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ15991
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Fileset might get stuck and prevent filesystem quiesce when AFM DR filesets finds that inode did not have remote attributes and it tries to build the remote attributes using tsfindinode command after blocking the filesystem quiesce. Remote attributes are used to find the remote file using the file handle for the replication.
- Work Around: None
- Problem trigger: AFM DR with renames to the deleted directories
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments
- Functional Area affected: AFM DR
- Customer Impact: Critical IJ16024
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: FSErrInodeCorrupted FSSTRUCT error could be issued incorrectly during lookup when both directory and its parent directory are being deleted.
- Work Around: None
- Problem trigger: Perform lookup on '..' entry of a directory that is being deleted.
- Symptom: Error output/message
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: Suggested IJ15916
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: During file manager take over, the new manager will broadcast to all mount nodes to invalidate their cached low level file metadata. If at the same time, a low level file is being opened on the mount node, they have chance to race and causes logAssertFailed "ibdP->llfileP == this" or logAssertFailed "inode.indirectionLevel >= 1
- Work Around: One of our customers reports they hit this problem while they run mmdelsnapshot. For mmdelsnapshot scenario, deleting the oldest snapshot first will greatly reduce the risk.
- Problem trigger: The race existing between file manager take over and low level file opening (the latter one can happen for many reasons - including but not limited to mmdelsnapshot)
- Symptom: Abend/Crash
- Platforms affected: All
- Functional Area affected: All
- Customer Impact: High Importance IJ15961
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: GPFS admin commands may cause high CPU usage. This is due to remote GPFS command calls find command to cleanup tmp files on system with large number of subdirs and files under /var/mmfs/tmp.
- Work Around: Manually cleanup to reduce the number of subdirs and files under /var/mmfs/tmp. Kill running find processes that invoked from /usr/lpp/mmfs/mmremote processes.
- Problem trigger: Nodes with large number of subdirs and files under /var/mmfs/tmp are mostly likely affected.
- Symptom: Performance Impact/Degradation, hang
- Platforms affected: All
- Functional Area affected: Admin Commands
- Customer Impact: High Importance an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ15858
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Not checking session info length when creating DMAPI session which is supposed to be less than or equal to 256 bytes. As per DMAPI standards it needs to return E2BIG errno. Instead GPFS is truncating the length to 256 bytes and proceeding with the session creation.
- Work Around: None
- Problem trigger: Creating DMAPI session with very long session info string
- Symptom: None
- Platforms affected: All
- Functional Area affected: DMAPI
- Customer Impact: Suggested IJ16117
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The arping command is used by the NFS failover mechanism, but was not found on the system. It was installed, but the log files show a No such file or directory message, which indicates that the arping command was not found in the expected path.
- Work Around: Probably it would help to set a symbolic link from the arping command to "/usr/bin/arping", which is the default if the distro could not be properly detected. Basically using links is not advised, since they could be a security issue.
- Problem trigger: The circumstances which lead to the issue is not fully understood. Most likely the OS detection using the /etc/redhat-release file detection did not work, so that the wrong distro was assumed, which lead to a wrong expected path name for the arping command location. So finally it was not found then. This older CentOS version does not yet have the /etc/os-release file provided by newer distros, which we use meanwhile, too.
- Symptom: Error output/message
- Platforms affected: All CentOS environments (CES nodes)
- Functional Area affected: CES
- Customer Impact: has little or no impact on customer operation IJ15998
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Deadlock during the AFM fileset recovery due to lock ordering issue when rename operations are being executed
- Work Around: None
- Problem trigger: AFM fileset recovery with renames to newly created directories.
- Symptom: Long Waiters/Deadlock
- Platforms affected: All Linux OS
- Functional Area affected: AFM and AFM DR
- Customer Impact: Critical IJ15963
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: gpfs systemd service (gpfs.service) may report fail after shutdown
- Work Around: The fail systemd status is not an error condition of GPFS shutdown. The systemd fail status can be ignored.
- Problem trigger: When shutting down GPFS, if the main systemd process (runmmfs) does not exit quickly, a kill signal is sent to the main process either by the shutdown subroutine or by systemd manager itself.
- Symptom: Error output/message Unexpected Results/Behavior
- Platforms affected: ALL Linux OS environments with systemd version >= 219
- Functional Area affected: Admin Commands/systemd
- Customer Impact: has little or no impact on customer operation IJ15962
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Using mmchcluster command to enable CCR may fail. While the mmchcluster command is working to enable ccr, any other mm cmd can remove authorized_ccr_keys file which is needed for in the final step of CCR enable. This problem occurs more often when the first quorum node in the list is on a GPFS supported systemd system. If the mmchcluster command is running on a quorum node, the mmchcluster command considers that node is the first quorum node in the list.
- Work Around: Run mmchcluster on a quorum node that does not support GPFS systemd. Or temporarily disable system health chmod 000 /usr/lpp/mmfs/bin/mmsysmon*
- Problem trigger: While the mmchcluster command is working to enable ccr, any other mm cmd can remove authorized_ccr_keys file which is needed for in the final step of CCR enable.
- Symptom: Error output/message
- Platforms affected: ALL Linux OS environments with systemd version >= 219
- Functional Area affected: CCR Admin Commands
- Customer Impact: High Importance to customers that want to enable CCR. IJ15915
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Some GPFS commands don't work correctly if the cluster name contains special characters.
- Work Around: Change the name of the cluster so that it does not contain any special characters.
- Problem trigger: Cluster name with special character like the ampersand "&" causes command like mmauth show . to fail
- Symptom: GPFS admin commands error. Error output/message Unexpected Results/Behavior
- Platforms affected: all
- Functional Area affected: admin commands
- Customer Impact: Low Importance IJ15908
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: AFM does not keep directory mtime in sync while reading the directory contents from the home. This may be a problem for some users during the migration
- Work Around: None
- Problem trigger: AFM migration/prefetch or cache readdir/lookup
- Symptom: Unexpected results
- Platforms affected: All Linux OS
- Functional Area affected: AFM
- Customer Impact: Critical IJ15990
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: The NFS/Ganesha service did not process I/O, and the systemhealth monitor showed that the NFS NULL checks for protocol versions 3 and 4 failed. The Ganesha process was shown in the process list, and also logging and replies to requests via Dbus worked. There was no failover.
- Work Around: Manual restart of NFS/Ganesha mmces service stop nfs ( or kill the gpfs.ganesha process) then mmces service start nfs
- Problem trigger: The reason why the NFS/Ganesha hung was not evaluated. The main issue was that the Ganesha process was not entirely "dead" since the process was running, and it replied to remote requests via Dbus and also wrote log entries. It was "dead" regarding I/O handling, but the systemhealth monitor did not notice this properly.
- Symptom: Performance Impact/Degradation
- Platforms affected: ALL Linux OS environments (CES Nodes running NFS)
- Functional Area affected: CES
- Customer Impact: High Importance IJ16036
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Assert exp(totalLen <= extensionLen) in line 16424 of file /project/sprelttn423/build/rttn423s008a/src/avs/fs/mmfs/ts/nsd/nsdServer.C
- Work Around: None
- Problem trigger: This issue affects customers running IBM Spectrum Scale 4.2.3 and later if the following conditions are true 1) mixed-endianess cluster, or mixed-endianess remote clusters. 2) RDMA enabled (and NSD client may send NSD requests to a NSD server which has a different endianess) 3) NSD client or NSD Server is IBM Spectrum Scale 4.2.3 It's a rare case assert which may happen when the client sends the first NSD request to a NSD server which has different endianess.
- Symptom: Abend/Crash
- Platforms affected: ALL Linux OS environments
- Functional Area affected: RDMA
- Customer Impact: Suggested IJ16020
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: Not failing with err 22 when dm_getall_disp dmapi is called with bad sessionId
- Work Around: None
- Problem trigger: When dm_getall_disp is called bad sessionId
- Symptom: Error output/message
- Platforms affected: ALL
- Functional Area affected: DMAPI
- Customer Impact: Suggested IJ16064
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: mmfsck man page provides instruction to clear the
fsstruct error from the mmhealth command. "mmsysmonc event filesystem
fsstruct_fixed
" But this is not correct. As a result, documented command will fail with syntax error. - Work Around: None
- Problem trigger: Executing command as instructed in man page
- Symptom: Error output/message due to documentation problem
- Platforms affected: ALL Operating System environments
- Functional Area affected: System Health
- Customer Impact: High Importance IJ16329
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- Problem description: A recent performance change in GPFS 5.0.3 that GPFS commands more sensitive to network congestion. This causes command like mmgetstate -a to report unknown status or other GPFS commands to report nodes unreached.
- Work Around: Command like mmgetstate -a can be issued again to get the status.
- Problem trigger: This affects only on node running GPFS 5.0.3. It affects all GPFS admin commands that need to execute command remotely.
- Symptom: Error message like the below: "The following nodes could not be reached:" mmgetstate -N or -a reports "unknown" state.
- Platforms affected: All
- Functional Area affected: Admin Commands
- Customer Impact: High Importance: an issue which will cause a degradation of the system in some manner, or loss of a less central capability IJ16395
- IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
- This update addresses the following APARs: IJ15858 IJ15859 IJ15892 IJ15908 IJ15909 IJ15910 IJ15912 IJ15914 IJ15915 IJ15916 IJ15961 IJ15962 IJ15963 IJ15990 IJ15991 IJ15993 IJ15996 IJ15998 IJ16020 IJ16024 IJ16036 IJ16064 IJ16079 IJ16084 IJ16085 IJ16091 IJ16093 IJ16094 IJ16095 IJ16110 IJ16112 IJ16113 IJ16114 IJ16116 IJ16117 IJ16329 IJ16395.
Problems fixed in Spectrum Scale 5.0.3.2 for Protocols include the following:
- smb: Return share name in correct case from net rpc conf showshare
- smb: Add gpfs.smb 4.9.8_gpfs_21-1
Problems fixed in Spectrum Scale 5.0.3.1 for Protocols include the following:
- gui: AD names should allow dots
- gui: Better handling on warning message for remote mounted file systems
- gui: Filesets - The "Type" and "AFM Role" displayed in the export correction
- gui: Updates required to accurately show GNR User Condition definitions
- gui: The CAPACITY_LICENSE task fails when there are no NSDs
- gui: Edit quota dialog not displayed for user,group,fileset quotas
- gui: Do no longer display a warning or error icon on SSD endurance percentage
- gui: Hourly call to mmaudit list should not occur
- toolkit: Fixed WCE parsing for some SAS cards
- smb: Version 4.9.7_gpfs_20-1
- smb: Change the memory check to cover the total of main memory and swap space
- smb: Stabilize gencache after gencache flush
- smb: Fill gencache with domain info returned from domain controller
- smb: Enable logging for early startup failures
- smb: Properly track the size of talloc objects
- smb: Remove implementations of SaveKey/RestoreKey
- smb: Pass back what we have in _wbint_Sids2UnixIDs().
- callhome: Updated to 5.0.3-1 nomenclature
- kafka: Updated to 5.0.3-1 nomenclature
Problems fixed in Spectrum Scale Protocols Packages 5.0.3-0 [Apr 19, 2019]
- Please see the "What's New" page in the IBM Knowledge Center