Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 4.2.x applies for all supported platforms.
Problems fixed in IBM Spectrum Scale 4.2.0.2 [March 7, 2016]
- Fix an assert caused by an NSD being deleted and then quickly recreated.
- Fix a cluster not being able to start when a node hosting LROC disks is not available.
- Fix asserts on didEmpty and Signal 11 faults in delSnapshotEmpty that can occur during snapshot deletion.
- Fix a AFM: Write file system remote error 9 which can occur when a write is in progress on a file being deleted.
- Fix a mmrestorefs assert which can occur during the delete clone file phase. The clone was left in a bad state during a force unlink of a fileset.
- Fix ENOENT failures that can occur during a snapshot restore and during iopen64 API calls.
- Fix a logAssertFailed: cfP->inodeHdr.nlink > 0 that can occur when mmdelfileset is run while unlinking a file in that fileset.
- Fix an incorrect no space error that can occur when doing mmrestripefs and mmdeldisk at the same time when the file system is 3.5 or older.
- Fix assert exp(synched.isNULL()) that can occur during a high work load on a LROC disk.
- Fix a wrong vdisk name that shows up in a DA rebuild failure messages.
- Fix a problem in which the primary recovery group server is unable to take control of the recovery group when it comes back up.
- Fix a logAssertFailed: (((LkObj::wa) & ~(opP->lockmode) & 0x3F) == 0), which could occur when a file is truncated by user while a mmrestripefs command happens to be running and working on the same file to fix its compression (-z option to fix illcompressed files).
- Fix a MD5sum mismatch in data after resync operation which can occur if a resync, a touch, and a write all happens at the same time.
- Fix AFM readdir performance over NFS backend.
- Fix a problem where gpfs_getacl returns a bad ACL entry when called with the GPFS_GETACL_STRUCT flag and acl_level GPFS_ACL_LEVEL_V4FLAGS.
- Fix a GPFS daemon abort that can occur when a GNR backup is performed and the server is down.
- Fix command failure in CCR cluster on nodes that have other non-GPFS gskit installed.
- To prevent confusion in messages between GNR, GSS, ESS products, and the GPFS file system metadata, the word "metadata" was removed from all GNR errors and log messages.
- Fix deadlock that can occur when the NFS server remote mount is not responding.
- Allow snapshots to be created while snapshots are being deleted.
- Fix daemon assert that can occur on a IW fileset during a prefetch that happens while commands like stat and readdir are being run.
- Fix a signal 11 that can occur during failure recovery of a node while running a command that gets node information.
- Fix an unexpected CES IP assignment and movement of CES nodes which are not ready to host CES IPs when the address distribution policy node-affinity is selected.
- Fix AFM sync messages getting stuck over internal NFS mount if server is not responding. This can cause a deadlock.
- Add a new optional parameter "--allowed-nfs-clients" to the mmcesdr CLI command.
- Fix a fileset being placed in NeedsResync state that can occur on a SW fileset. This can happen if while creating and writing to a file, droppending and resync commands are run. Then while the resync is in progress a hard link gets created and both files are deleted.
- Fix a mmapplypolicy command fail when multiple commands are issued nearly simultaneously AND tscCmdPortRange has been configured in a SONAS environment.
- Fix a E_NOATTR prefetch error that can occur after a fileset is converted from SW to RO.
- Fix a problem where GPFS should allow the non-root user who is the owner of the immutable or appendOnly file to advance the retention time.
- Fix a problem where GPFS did not flush the data when setting file to immutable or appendOnly to avoid data loss in the case the node crashes immediately after.
- Fix clone corruption that can occur during resync/failover.
- Fix a E_ROFS write error that can occur when you write over a clone file and make it a clone parent and then run recovery.
- Enhancements for supporting different --list-file formats for eviction, prefetch and flushPending.
- Fix a problem where GPFS does not remove write permission bit of a file under IAM modes that is set to immutable via "mmchattr -i yes"
- Fix a problem where GPFS resets the retention time after changing the file under IAM modes to immutable.
- Fix a problem which stops autorecovery from being triggered if a node which has only dataAndMetadata disks is down.
- Fix unexpected CES IP assignments and movements to CES nodes which are not ready to host CES IPs (e.g. suspended or not healthy) when the address distribution policy node-affinity is selected.
- This update addresses the following APARs: IV78972 IV81069 IV81070 IV81339 IV81340 IV81341 IV81343 IV81346 IV81657 IV81662 IV81667 IV81671 IV81676 IV81869 IV81871 IV81874 IV81875 IV81876 IV81880 IV81917.
Problems fixed in IBM Spectrum Scale 4.2.0.1 [January 15, 2016]
- Fix an issue that could cause the GPFS daemon to abnormally terminate or could cause incorrect performance data to be reported when GPFS SNMP subagent, mmpmon, or zimon are being utilized.
- Fix GNR AU log long waiters seen in SSD replacement.
- Fix the snapshot restore issue that some files in a live file system are not restored.
- Fix a problem where mmchfs -z, -Q or --perfileset-quota prematurely releases a sdr lock which can result in the command to fail.
- Fix signal 11 in verbsDisconnect_i when "large" fabnum value is used.
- Fix a problem with GPFS logging code that could cause the GPFS daemon to die with signal 11. This problem can only occur on nodes with LROC enabled.
- On a GSS / ESS / GNR system that uses NVRAM for the log tip, short outages of one of the nodes can cause inappropriately strongly worded error messages in the log, which state "[E] Insufficient spare space; unable to complete rebalance of DA ...". Those messages have been changed to be more sensible.
- Fix a problem that could cause an FSSTRUCT error to be incorrectly logged when reading from a disk. This could only occur when LROC is enabled.
- Fix a signal 11 daemon crash that can occur while running the mmchcarrier or mmchpdisk commands while a disk enclosure is in a failed state. This fix is recommended for all GNR (ESS/GSS) customers.
- Fix a problem in which all I/O stops and all nodes go into arbitrating state that can occur during a network failure.
- Fix logAssertFailed: !(_ownedByCaller((lockWordCopy), (lockWord_t *)&(lockWordCopy))) that can occur during high stress work loads.
- Fix failures that can occur when trying to resume pdisks including mmchcarrier command failures. These errors occur on a GSS/ESS system.
- Fix AioWorkerThread to not allow it to steal a dirty buffer that could cause a deadlock.
- Fix a "Constraint error" that can occur during the mmdiscovercomp command when trying to add servers to the component database. This fix applies to GSS/ESS customers.
- Improve GNR write performance by using more threads to flush internal GNR metadata.
- Fix code to avoid quorum loss declaration of the current cluster manager when the network is broken between two nodes.
- Fix a deadlock that can occur when queue memory crosses AFM hard memory limit.
- Fix a mmfsd daemon crash that is possible when Zimon is used to monitor the node and a file system is force unmounted due to some unrecoverable file system error.
- This fix removes the restriction that daemon interface changes are not allowed on CCR enabled clusters. You can now make daemon interface changes but only from non-quorum nodes.
- Fix logAssertFailed: (_ownedByCaller((lockWordCopy), (lockWord_t *)&(lockWordCopy))) that can occur when a fileset goes to disconnected mode.
- If a system built on GNR/GSS/ESS servers has been getting IO errors on GPFS file systems (reported all the way to the end user application, not internal disk IO errors on individual physical disks), and those IO errors happened exactly at a time when some pdisks were unreachable (for example due to cabling or connectivity issues), and those pdisks would have been reachable from the backup node of the GNR server, then this fix will prevent the IO errors, by failing the recovery group containing the affected vdisk over to the backup node.
- This fix restricts the mmchcluster command from disabling CCR in a cluster that has a CES node.
- Fix a problem in which orphans (files with inode allocated but not initialized) that have been moved to .ptrash can not be deleted.
- Fix a GNR server node crash that can occur when a network fails to connect a GNR server pair.
- Fix a GPFS daemon assert that can occur during restripe file operations. If a storage pool gets deleted by mmdeldisk -p or mmdeldisk -c the GPFS daemon assert could occur during either a mmrestripefile command or a mmchattr -I yes command.
- Fix an assert that can occur when adding pdisks with --replace option.
- Fix a logAssert that can occur during a snapshot restore process that is scanning a sparse file whose size is close to the GPFS maximum file size limitation.
- Fix the path to the Linux modprobe command that the mmchfirmware command uses when --type adapter is specified. This fix applies to GSS/ESS customers.
- Starting with 4.1.1, GPFS changed the contents of the Linux NFS filehandle. This means if the AFM home is upgraded to 4.1.1 or later, existing AFM filesets detect a change in the export since the filehandle was changed and will suspend future synchronization with home. Similarly, a change from knfsd to Ganesha at home also causes a filehandle change even though the export is the same. The only solution is to resync the cache using failover which is expensive. This fix handles upgrades if home is running GPFS by detecting and upgrading cached filehandles when the filehandle changes for an inode.
- Fix a mmbackup failure that can occur when the command line arguments list is too long.
- Fix a fileset can become stuck in an unmounted state problem that can occur if a remote fileset becomes stale and then comes back and both the application and gateway nodes are the same.
- Fix a node crash that can occur during a rolling upgrade.
- On a GNR, ESS, and GSS systems, error messages are printed when an I/O to a physical disk does not succeed. These messages were printed even when the I/O operation was not even attempted. In those cases, the I/O error messages are now suppressed.
- Fix a mmdiscovercomp command failure that can occur when adding storage servers in GSS/ESS.
- Fix a problem in which mmaddnode fails to copy the committed key file to the new node. This only occurs on a CCR disabled cluster and if there are 2 key files.
- Fix a problem in which a hard memory limit is not honored when a fileset is in disconnected mode.
- This update addresses the following APARs: IV79381 IV79745 IV79747 IV79749 IV79750 IV79752 IV79753 IV79754 IV79757 IV79759 IV79762 IV79763 IV79764 IV79766 IV79768.
Copyright and trademark information
http://www.ibm.com/legal/copytrade.shtml
Notices
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. Some jurisdictions do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this
statement may not apply to you.
This information could include technical inaccuracies or
typographical errors. Changes are periodically made to the
information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or
changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Microsoft, Windows, and Windows Server are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino,
Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and
Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Other company, product, or service names may be trademarks or
service marks of others.
THIRD-PARTY LICENSE TERMS AND CONDITIONS, NOTICES AND INFORMATION
The license agreement for this product refers you to this file for
details concerning terms and conditions applicable to third party
software code included in this product, and for certain notices
and other information IBM must provide to you under its license
to certain software code. The relevant terms and conditions,
notices and other information are provided or referenced below.
Please note that any non-English version of the licenses below is
unofficial and is provided to you for your convenience only. The
English version of the licenses below, provided as part of the
English version of this file, is the official version.
Notwithstanding the terms and conditions of any other agreement
you may have with IBM or any of its related or affiliated entities
(collectively "IBM"), the third party software code identified
below are "Excluded Components" and are subject to the following
terms and conditions:
- the Excluded Components are provided on an "AS IS" basis
- IBM DISCLAIMS ANY AND ALL EXPRESS AND IMPLIED WARRANTIES AND CONDITIONS WITH RESPECT TO THE EXCLUDED COMPONENTS, INCLUDING, BUT NOT LIMITED TO, THE WARRANTY OF NON-INFRINGEMENT OR INTERFERENCE AND THE IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- IBM will not be liable to you or indemnify you for any claims related to the Excluded Components
- IBM will not be liable for any direct, indirect, incidental, special, exemplary, punitive or consequential damages with respect to the Excluded Components.