Power8 System Firmware

Applies to:   9119-MHE, 9119-MME, 9080-MHE and 9080-MME.

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power System E880 (9119-MHE ), Power Systems E880C (9080-MHE), Power System E870 (9119-MME) and Power Systems E870C (9080-MME) servers only.

The firmware level in this package is:

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code level for this firmware is:  HMC V8 R8.6.0 (PTF MH01654) with Mandatory efix (PTF MH01655) or higher.

Although the Minimum HMC Code level for this firmware is listed above,  HMC V8 R8.6.0 Service Pack 2 (PTF MH01690) with iFix (PTF MH01722) or higher is recommended.

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home

NOTES:
                -You must be logged in as hscroot in order for the firmware installation to complete correctly.
                - Systems Director Management Console (SDMC) does not support this System Firmware level.

1.2 AIX iFix Required

For IBM Power System servers with the PCIe 2-port Async EIA-232 Adapter installed on AIX partitions, an AIX fix resolving the async port interrupt handling (APAR IV77596) must be installed before updating to the SC840_056 (FW840.00) or later level of firmware.  The ports on the adapter (feature code EN27/EN28, CCIN 57D4) may become un-usable with the installation of that firmware level due to an issue with how interrupts are handled.  Many JAS_RTS error log entries are written to the error log due to this issue.

Prior to this APAR shipping in a future Service Pack, AIX intends to publish ifixes for the latest Service Packs on all active Technology Levels on our ftp server, in ftp://aix.software.ibm.com/aix/ifixes/iv77596/ on or before Oct 13, 2015.  If you need an ifix other than the ones on this server, contact IBM support to request one for your specific situation.

The procedure is intended to be performed by the customer.  In the event that the customer has questions or concerns with the procedure, you should contact IBM Support.  Please contact IBM Support: 
US Support: 1.800.IBM.SERV
WW Support (select your country):  http://www.ibm.com/planetwide/

2.0 Important Information

Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

2.1 IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is only supported on HMC Managed Systems only.

2.3 DPSS Updates

Power 8 servers use a programmable power controller called a DPSS (Digital Power Subsystem Sweep) which is located in each system node. The DPSS is used to control P8 fan speeds, check voltage levels of the power supplies for proper level, and operation in the system node.  The DPSS image is persistent and is only reloaded if there is a system firmware update that contains a DPSS change.  If there is a DPSS change and the system firmware update is concurrent, the DPSS update is delayed to the next IPL of the CEC which will cause an additional 18 to 20 minutes to be added on to the IPL.   If there is a change and the firmware update is disruptive, then DPSS update occurs when the service processor is resetting to service processor stand-by state, and will add 18 to 20 minutes to this transition.  During the DPSS update the HMC or op-panel, will display DPSS update progress codes which may be overwritten on the HMC, but will be displayed as C100C300 thru C100C3FF.   If there is a DPSS change in a system firmware service pack, the change will be designated as deferred in the service pack README.   DPSS changes will be described along with a reminder of the 18 to 20 minute additional time in the Firmware Information and Description section in the README.

The DPSS download progress codes are documented in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8eai/C1xx_info.htm

2.4 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8hat/p8hat_lparmemory.htm


3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01SCxxx_yyy_zzz

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01SC830_040_040 and 01SC860_040_045 are different service packs.

An installation is disruptive if:

            Example: Currently installed release is 01SC850_040_040, new release is 01SC860_050_050.

            Example: SC830_040_040 is disruptive, no matter what level of SC830 is currently installed on the system.

            Example: Currently installed service pack is SC830_040_040 and new service pack is SC830_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is SC830_040_040, new service pack is SC830_071_040.

3.1 Firmware Information and Description

 
Filename Size Checksum md5sum
01SC860_118_056.rpm
85572533
23610 b5a22afb5edfe8fea9d9ad0b76157267

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01SC860_118_056.rpm

SC860
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

The following Fix description table will only contain the N (current) and N-1 (previous) levels.
The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
SC860_118_056 / FW860.40

11/08/17
Impact:  Availability      Severity:  SPE

New features and functions

  • Support was added to the Advanced System Management Interface (ASMI) for providing an "All of the above" cable validation display option so that each individual cable option does not have to be selected to get a full report on the cable status.  Select  "System Service Aids ->  Cable Validation -> Display Cable Status"  "All of the above"  and click "Continue"  to see the status of all the cables.
System firmware changes that affect all systems
  • A problem was fixed for recovery from clock card loss of lock failures that resulted in a clock card FRU unnecessarily being called out for repair.  This error happened whenever there was a loss of lock (PLL or CRC) for the clock card.  With the fix, the firmware will not be calling out the failing clock card, but rather it will be reconfigured as the new backup clock card after doing a clock card failover.  Customers will see a benefit from improved system availability by the avoidance of disruptive clock card repairs.
  • A problem was fixed for the "Minimum code level supported" not being shown by the Advanced System Management Interface (ASMI) when selecting the "System Configuration/Firmware Update Policy" menu.  The message shown is "Minimum code level supported value has not been set".  The workaround to find this value is to use the ASMI command line interface with the "registry -l cupd/MinMifLevel" command.
  • A problem was fixed for "sh: errl: not found " error messages to the service processor console whenever the Advanced System Management Interface (ASMI) was used to display error logs.  These messages did not cause any problems except to clutter the console output as seen in the service processor traces.
  • A problem was fixed for the LineInputVoltage and LastPowerOutputWatts being displayed in millivolts and milliwatts, respectively,  instead of volts and watts for the output from the Redfish API for power properties for the chassis.  The URL affected is the following:  "https://<fsp ip>/redfish/v1/Chassis/<id>/Power"
  • A problem was fixed for system node fans going to maximum RPM speeds after a service processor failover that needed the On-Chip Controllers (OCC) to be reloaded.  Without the fix, the system node fan speeds can be restored to normal speed by changing the Power Mode in the Advanced System Management Interface using steps from the IBM Knowledge Center:  https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/areaa_pmms.htm.  After changing the Power Mode, wait about 10 minutes to change the Power Mode back to the original setting.
    If the fix is applied without rebooting the system, the system node fan speeds can be corrected by either changing the Power Mode as above or using the HMC to do an Administrative Failover (AFO).
  • A problem was fixed for a Power Supply Unit (PSU) failure of  SRC 110015xF  logged with a power supply fan call out when doing a hot re-plug of a PSU.   The power supply may be made operational again by doing a dummy replace of the PSU that was called out (keeping the same PSU for the replace operation).  A re-IPL of the system will also recover the PSU.
  • A problem was fixed for the service processor low-level boot code always running off the same side of the flash image, regardless of what side has been selected for boot ( P-side or T-side).  Because this low-level boot code rarely changes, this should not cause a problem unless corruption occurs in the flash image of the boot code.  This problem does not affect firmware side-switches as the service processor initialization code (higher-level code than the boot code) is running correctly from the selected side.  Without the fix, there is no recovery for boot corruption for systems with a single service processor as the service processor must be replaced.
  • A problem was fixed for a missing serviceable event from a periodic call home reminder.  This occurred if there was an FRU deconfigured for the serviceable event.
  • A problem was fixed for help text in the Advanced System Management Interface (ASMI) not informing the user that system fan speeds would increase if the system Power Mode was changed to "Fixed Maximum Frequency" mode.  If ASMI panel function "System Configuration->Power Management->Power Mode Setup" "Enable Fixed Maximum Frequency mode" help is selected, the updated text states "...This setting will result in the fans running at the maximum speed for proper cooling."
  • A problem was fixed for a degraded PCI link causing a Predictive SRC for a non-cacheable unit (NCU) store time-out that occurred with SRC B113E540 or B181E450 and PRD signature  "(NCUFIR[9]) STORE_TIMEOUT: Store timed out on PB".  With the fix, the error is changed to be an Informational as the problem is not with the processor core and the processor should not be replaced.  The solution for degraded PCI links is different from the fix for this problem, but a re-IPL of the CEC or a reset of the PCI adapters could help to recover the PCI links from their degraded mode.
  • A problem was fixed for a Redfish Patch on the "Chassis"  "HugeDynamicDMAWindowSlotCount" for the validation of incorrect values.  Without the fix, the user will not get proper error messages when providing bad values to the patch.

System firmware changes that affect certain systems

  • DEFERRED:  On systems using PowerVM firmware, a problem was fixed for DPO (Dynamic Platform Optimizer) operations taking a very long and impacting the server system with a performance degradation.  The problem is triggered by a DPO operation being done on a system with unlicensed processor cores and a very high I/O load.  The fix involves using a different lock type for the memory relocation activities (to prevent lock contention between memory relocation threads and partition threads) that is created at IPL time, so an IPL is needed to activate the fix.  More information on the DPO function can be found at the IBM Knowledge Center:  https://www.ibm.com/support/knowledgecenter/en/8247-42L/p8hat/p8hat_dpoovw.htm
  • On systems using PowerVM firmware,  a problem was fixed for an intermittent service processor core dump and a callout for netsCommonMSGServer with SRC B181EF88.   The HMC connection to the service processor automatically recovers with a new session.
  • On systems using PowerVM firmware, a problem was fixed where the Power Enterprise Pool (PEP) grace period expired early, being short by one hour.  For example, 71 hours may be provided instead of 72 hours in some cases. See https://www.ibm.com/support/knowledgecenter/en/POWER8/p8ha2/entpool_cod_compliance.htm for more information about the PEP grace period.
  • On systems using PowerVM firmware, a problem was fixed for a concurrent firmware update failure with HMC error message "E302F865-PHYPTooBusyToQuiesce".  This error can occur when the error log is full on the hypervisor and it cannot accept more error logs from the service processor.  But the service processor keeps retrying the send of an error log, resulting in a "denial of service" scenario where the hypervisor is kept busy rejecting the error logging attempts.  Without the fix, the problem may be circumvented by starting a  logical partition (if none are running) or by purging the error logs on the service processor.
  • On systems using PowerVM firmware with mirrored memory running IBM i partitions, a problem was fixed for memory fails in the partition that also caused the system to crash.  The system failure will occur any time that IBM i partition memory towards the beginning of the partition's assigned memory fails.  With the fix, the memory failure is isolated to the impacted partition, leaving the rest of the system unaffected.
  • On systems using PowerVM firmware, a problem was fixed for failures deconfiguring SR-IOV Virtual Functions (VFs).  This can occur during Live Partition Mobility (LPM) migrations with HMC error messages of  HSCLAF16, HSCLAF15 and HSCLB602 shown. This results in an LPM migration failure and a system reboot is required to recover the VFs for the I/O adapters.  This error may occur more frequently in cases where the I/O adapter has pending I/O at the time of the deconfigure request for the VF.
  • On systems using PowerVM firmware, a problem was fixed for a vNIC client that has backing devices being assigned an active server that was not the one intended by an HMC user failover for the client adapter.  This only can happen if the vNIC client adapter had never been activated.  A circumvention is to activate the client OS and initialize the vNIC device (ifconfig "xxx" up) and an active backing device will then be selected.
  • On systems using PowerVM firmware, a problem was fixed for partitions with more than 32TB memory failing to IPL with memory space errors.  This can occur if the logical memory block (LMB) size is small as there is a memory loss associated with each LMB.  The problem can be circumvented by reducing the amount of partition memory or increasing the LMB size to reduce the total number of LMBs needed for the memory allocation.
  • On systems using PowerVM firmware,  a problem was fixed for the error handling of EEH events for the SR-IOV Virtual Functions (VFs) that can result in IPL failure with B7006971, B400FF05, and BA210000 SRCs logged.  In these cases, the partition console stops at an OFDBG prompt.  Also, a DLPAR add of a VF may result in a partition crash due to a 300 DSI exception because of a low-level EEH event.  A circumvention for the problem would be to debug the EEH events which should be recovered errors and eliminate the cause of the EEH events.  With the fix, the EEH events still log Predictive Errors but do not cause a partition failure.
  • On systems using PowerVM firmware, a problem was fixed for Power Enterprise Pool (PEP) "not applicable" error messages being displayed when re-entering PEP XML files for PEP updates, in which one of the XML operations calls for Conversion of Perm Resources to PEP Resources.  There is no error as the PEP key was accepted on the first use.  The following message may be seen on the HMC and can be ignored:   "...HSCL0520 A Mobile CoD processor conversion code to convert 0 permanently activated processors to Mobile CoD processors on the managed system has been entered.  HSCL050F This CoD code is not valid for your managed system.  Contact your CoD administrator."
  • On systems using PowerVM firmware, a problem was fixed for Power Enterprise Pool (PEP) busy errors from the system anchor card when creating or updating a PEP pool.    The error returned by the HMC is "HSCL9015 The managed system cannot currently process this operation.  This
    condition is temporary.  Please try the operation again."  To try again, the customer needs to update the pool again.  Typically on the second PEP update, the code is accepted.
    The problem is intermittent and occurs only rarely.
  • On systems using PowerVM firmware, a problem was fixed for an invalid date from the service processor causing the customer date and time to go to the Epoch value (01/01/1970) without a warning or chance for a correction.  With the fix,  the first IPL attempted on an invalid date will be rejected with a message alerting the user to set the time correctly in the service processor.  If the warning is ignored and the date/time is not corrected, the next IPL attempt will complete to the OS with the time reverted to the Epoch time and date.  This problem is very rare but it has been known to occur on service processor replacements when the repair step to set the date and time on the new service processor was inadvertently skipped by the service representative.
  • On systems using PowerVM firmware, a problem was fixed for a Power Enterprise Pool (PEP) system losing its assigned processor and memory resources after an IPL of the system.  This is an intermittent problem caused by a small timing window that makes it possible for the server to not get the IPL-time assignment of resources from the HMC.  If this problem occurs, it can be corrected by the HMC to recover the pool without needing another IPL of the system.
  • On systems using PowerVM firmware with PowerVM NovaLink, a problem was fixed for a lost of a communications channel between the hypervisor and the PowerVM NovaLink during a reset of the service processor.  Various NovaLink tasks, including deploy, could fail with a "No valid host was found" error.  With the fix, PowerVM NovaLink prevents normal operations from being impacted by a reset of the service processor.
  • On systems using PowerVM firmware, a problem was fixed for a rare system hang caused by a process dispatcher deadlock timing window.  If this problem occurs, the HMC will also go to an "Incomplete" state for the managed system.
  • On systems using PowerVM firmware,  a  problem was fixed for communication failures on adapters in SR-IOV shared mode.  This communication failure only occurs when a logical port's VLAN ID ( PVID) is dynamically changed from non-zero to zero.  An SR-IOV logical port is an I/O device created for a partition or a partition profile using the management console (HMC) when a user intends for the partition to access an SR-IOV adapter Virtual Function.  The error can be recovered from by a reboot of the partition.
    This fix updates adapter firmware to 10.2.252.1929, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57.
    The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters.  A system reboot will update all SR-IOV shared-mode adapters with the new firmware level.  In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced).  And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC).  To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates:   https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
    Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running).
  • On systems using PowerVM firmware, a problem was fixed for error logs not getting sent to the OS running in a partition.   This problem could occur if the error log buffer was full in the hypervisor and then a re-IPL of the system occurred.  The error log full condition was persisting across the re-IPL, preventing further logs from being sent to the OS.
  • On systems using PowerVM firmware, a problem was fixed in the text for the Firmware License agreement to correct a link that pointed to a URL that was not specific to microcode licensing.  The message is displayed for a machine during its initial power on.  Once accepted, the message is not displayed again.  The fixed link in the licensing agreement is the following: http://www.ibm.com/support/docview.wss?uid=isg3T1025362.
SC860_103_056 / FW860.30

06/30/17
Impact:  Availability      Severity:  SPE

New features and functions

  • Support was added for Redfish API to allow the ISO 8610 extended format for the time and date so that the date/time can be represented as an offset from UTC (Universal Coordinated Time).
  • Support for the Redfish API for power and thermal properties for the chassis.  The new URIs are as follows::
    https://<fsp ip>/redfish/v1/Chassis/<id>/Power  : Provides fan data
    https://<fsp ip>/redfish/v1/Chassis/<id>/Thermal : Provides power supply data
    Only the Redfish GET operation is supported for these resources.
System firmware changes that affect all systems
  • A problem was fixed for service actions with SRC B150F138 missing an Advanced System Management Interface (ASMI) Deconfiguration Record.  The deconfiguration records make it easier to organize the repairs that are needed for the system and they need to be consistent with the periodic maintenance reminders that are logged for the failed FRUs.
  • A problem was fixed for a false 1100026B1 (12V power good failure) caused by an I2C bus write error for a LED state.  This error can be triggered by the fan LEDs changing state.
  • A problem was fixed for a fan LED turning amber on solid when there is no fan fault, or when the fan fault is for a different fan.  This error can be triggered anytime a fan LED needs to change its state.  The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for sporadic blinking amber LEDs for the system fans with no SRCs logged.  There was no problem with the fans.  The LED corruption occurred when two service processor tasks attempted to update the LED state at the same time.  The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for a Redfish Patch on the "Chassis" or "IBMEnterpriseComputerSystem" with empty data that caused a "500 Internal Server Error".  Validation for the empty data case has been added to prevent the server error.
  • A problem was fixed for hardware dumps only collecting data for the master processor if a run-time service processor failover had occurred prior to the dump.  Therefore, there would be only master chip and master core data in the event of a core unit checkstop.  To recover to a system state that is able to do a full collection of debug data for all processors and cores after a run-time failover, a re-IPL of the system is needed.
  • A problem was fixed for a Redfish Patch on power mode to "MaxPowerSaver" that caused a  "500 Internal Server Error" when that power mode was not supported on the system.  With the fix, the Redfish server response is a list of the valid power modes that be used for the system.
  • A problem was fixed for the loss of Operations Panel function 30 (displaying ethernet port  HMC1 and HMC2 IP addresses) after a concurrent repair of the Operations Panel.  Operations  Panel function 30 can be restored concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for a core dump of the rtiminit (service processor time of day) process that logs an SRC B15A3303  and could invalidate the time on the service processor.  If the error occurs while the system is powered on, the hypervisor has the master time and will refresh the service processor time, so no action is needed for recovery.  If the error occurs while the system is powered off, the service processor time must be corrected on the systems having only a single service processor.  Use the following steps from the IBM Knowledge Center to change the UTC time with the Advanced System Management Interface:  https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/viewtime.htm.
  • A problem was fixed for the service processor boot watch-dog timer expiring too soon during DRAM initialization in the reset/reload, causing the service processor to go unresponsive.  On systems with a single service processor, the SRC B1817212 was displayed on the control panel.  For systems with redundant service processors, the failing service processor was deconfigured.  To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action.  This problem is intermittent and very infrequent as most of the reset/reloads of the service processor will work correctly to restore the service processor to a normal operating state.
  • A problem was fixed for host-initiated resets of the service processor causing the system to terminate.  A prior fix for this problem did not work correctly because some of the host-initiated resets were being translated to unknown reset types that caused the system to terminate.  With this new correction for failed host-initiated resets, the service processor will still be unresponsive but the system and partitions will continue to run.  On systems with a single service processor, the SRC B1817212 will be displayed on the control panel.  For systems with redundant service processors, the failing service processor will be deconfigured.  To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action.  This problem is intermittent and very infrequent as most of the host-initiated resets of the service processor will work correctly to restore the service processor to a normal operating state.
  • A problem was fixed for a service processor reset triggered by a spurious false IIC interrupt request in the kernel.  On systems with a single service processor, the SRC B1817201 is displayed on the Operator Panel.  For systems with redundant service processors, an error failover to the backup service processor occurs.  The problem is extremely infrequent and does not impact processes on the running system.
  • A problem was fixed for the System Attention LED failing to light for an error failover for the redundant service processors with an SRC B1812028 logged.
  • A problem was fixed for a system failure at run time with SRC B111E450 corefir(55) that could not reIPL.  A system node should have been deconfigured for an ABUS error on a processor chip but instead, the system was terminated.  To recover from this problem, manually guard the node containing the failed processor and then the IPL will be successful.
  • A problem was fixed for an incorrect Redfish error message when trying to use the $metadata URI:   "The resource at the URI https://<systemip>/redfish/v1/%24metadata was not found.". This %24 is meaningless.  The "%24" has been replaced with a "$" in the error message.  The Redfish $metadata URI is not supported.
  • A problem was fixed for a system failure caused by Host boot problems with one node but the other nodes good.  With the fix, the node that is failing the Hostboot is deconfigured and the system is able to IPL on the remaining nodes.  To recover from this problem, manually guard the node that is failing and reIPL.

System firmware changes that affect certain systems

  • DEFERRED: On systems using PowerVM firmware, a problem was fixed for PCIe3 I/O expansion drawer (#EMX0) link improved stability.  The settings for the continuous time linear equalizers (CTLE) was updated for all the PCIe adapters for the PCIe links to the expansion drawer.  The system must be re-IPLed for the fix to activate.
  •  On systems using PowerVM firmware with a Linux Little Endian (LE) partition, a problem was fixed for system reset interrupts returning the wrong values in the debug output for the NIP and MSR registers.  This problem reduces the ability to debug hung Linux partitions using system reset interrupts.  The error occurs every time a system reset interrupt is used on a Linux LE partition.
  • On systems using PowerVM firmware, a problem was fixed for "Time Power On" enabled partitions not being capable of suspend and resume operations.  This means Live Partition Mobility (LPM) would not be able to migrate this type of partition.  As a workaround, the partition could be transitioned to a "Non-time Power On" state and then made capable of suspend and resume operations.
  • On systems using PowerVM firmware, a problem was fixed for manual vNIC failovers (from the HMC, manually "Make the Backing Device Active") so that the selected server was chosen for the failover, regardless of its priority.  With the problem, the server chosen for the VNIC failover will be the one with the most favorable priority. 
    There are two possible workarounds to the problem:
    (1) Disable auto-priority-failover; Change priority to the server that is needed as the  target of the failover; Force the vNIC failover; Change priority back to original setting.
    (2) Or use auto-priority-failover and change the priority so the server that is needed as the target of the failover is favored.
  • On systems using PowerVM firmware, a problem was fixed for extra error logs in the VIOS due to failovers taking place while the client vNIC is inactive.  The inactive client vNIC failovers are skipped unless the force flag is on.  With the problem occurring, Enhanced Error Handling (EEH) Freeze/Temporary Error/Recovery logs posted in the VIOS error log of the client partition boot can be ignored unless an actual problem is experienced.
  • On systems using PowerVM firmware, a problem was fixed for a Live Partition Mobility (LPM) migration abort and reboot on the FW860  target CEC caused by a mismatched address space for the source and target partition.  The occurrence of this problem is very rare and related to performance improvements made in the memory management on the FW860 system that exposed a timing window in the partition memory validation for the migration.  The reboot of the migrated partition recovers from the problem as the migration was otherwise successful.
  • On systems using PowerVM firmware, a problem was fixed for reboot retries for IBM i partitions such that the first load source I/O adapter (IOA) is retried instead of bypassed after the first failed attempt.  The reboot retries are done for an hour before the reboot process gives up.  This error can occur if there is more than one known load source, and the IOA of the first load source is different from the IOA of the last load source.  The error can be circumvented by retrying the boot of the partition after the load source device has become available.
  • On systems using PowerVM firmware, a problem was fixed for adapters failing to transition to shared SR-IOV mode on the IPL after changing the adapter from dedicated mode.  This intermittent problem could occur on systems using SR-IOV with very large memory configurations.
  • On systems using PowerVM firmware,  a  problem was fixed for SR-IOV adapters in shared mode for a transmission stall or time out with SRC B400FF01 logged.  The time out happens during Virtual Function (VF) shutdowns and during Function Level Resets (FLRs) with network traffic running.
    This fix updates adapter firmware to 10.2.252.1927, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57.
    The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters.  A system reboot will update all SR-IOV shared-mode adapters with the new firmware level.  In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced).  And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC).  To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates:   https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
    Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running). 
  • On systems with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), a  problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from overheating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged.  This happened because of On-Chip Controller (OCC) timeout errors when collecting Analog Power Subsystem Sweep (APSS) data, used by the OCC to tune the processor frequency.  This problem occurs more frequently on systems that are running heavy workloads.  Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
    To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line:
    1) Log into ASMI as celogin with  dynamic celogin password generated by IBM Support
    2) Select System Service Aids
    3) Select Service Processor Command Line
    4) Enter "tmgtclient --query_mode_and_function" from the command line
    The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active.
  • A  problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from overheating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged.  This happened because of an On-Chip Controller (OCC) internal queue overflow. The problem has only been observed for systems running heavy workloads with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), but this may not be required to encounter the problem.  Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
    To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line:
    1) Log into ASMI as celogin with  dynamic celogin password generated by IBM Support
    2) Select System Service Aids
    3) Select Service Processor Command Line
    4) Enter "tmgtclient --query_mode_and_function" from the command line
    The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active.
  • On systems using PowerVM firmware,  a  problem was fixed for a partition boot from a USB 3.0 device that has an error log SRC BA210003.  The error is triggered by an Open Firmware entry to the trace buffer during the partition boot.  The error log can be ignored as the boot is successful to the OS.
  • On systems using PowerVM firmware,  a  problem was fixed for a partition boot fail or hang from a Fibre Channel device having fabric faults.  Some of the fabric errors returned by the VIOS are not interpreted correctly by the Open Firmware VFC drive, causing the hang instead of generating helpful error logs.
  • On systems with redundant service processors,  a problem was fixed for an extra SRC B150F138 logged for a power supply that had already been replaced.  The problem was triggered by a service processor failover and an old power supply fault event that was not cleared on the backup service processor.  This caused the SRC B150F138 to be logged for a second time.  This problem can be circumvented by clearing the error log associated with the bad FRU when the FRU is replaced.
  • On systems using PowerVM firmware, a problem was fixed for a Power Enterprise Pool (PEP) resource Grace Period not being reset when the server is in the "Out of Compliance" state and the resource has been returned to put the server back in Compliance.  The Grace Period was not being reset after a double-commit of a resource (doing an "remove" of an active resource) was resolved by restarting the server with the double-committed resource. When Grace Period ends, the "double-committed" resources on the server have to have been freed up from use to prevent the server from going to "Out of Compliance".  If the user fails to free up the resource, the PEP is in an "Out of Compliance" state, and the only PEP actions allowed are ones to free up the double-commit. Once that is completed, the PEP is back In Compliance. The loss of the Grace Period for the error makes it difficult to move resources around in the PEP.  Without the fix, the user can  "Add" another PEP resource to the server, and the action of adding a PEP resource resets the Grace Period timer.  One could then "Remove" that one PEP resource just added, and then any further "removes" of PEP resources would behave as expected with the full Grace Period in effect.
  • On systems using PowerVM firmware,  a problem was fixed for  Power Enterprise Pool (PEP) IFL processors assignments causing an "Out of Compliance" for normal processor licenses.  The number of IFL processors purchased was first credited as satisfying any "unreturned" PEP processor resources, thus potentially leaving the system "Out Of Compliance" since IFL processors should not be taking the place of the normal (expensive) processor usage.  In this situation, without the fix, the user will need to either purchase more "expensive" non-IFL processors to satisfy the non-IFL workloads or adjust the partitions to reduce the usage of non-IFL processors.  This is a very infrequent problem for the following reasons: 
    1) PEP processors are infrequently left "unreturned" for short periods of time for specialized operations such as LPM migrations
    2) The user would have to purchase IFL processors from IBM, which is not a common occurrence.
    3) The user would have to put in a COD key for IFL processors while a PEP processor is still "unreturned"
  • On systems using PowerVM firmware,  a  problem was fixed for a power off hanging at D200C1FF caused by a vNIC VF failover error with SRC B200F011.  The power off hang error is infrequent because it requires that a VF failover error having occurred first.  The system can be recovered by using the power off immediate option from the Hardware Management Console (HMC).
  • On systems using PowerVM firmware, a problem was fixed for the incorrect reporting of the Universally Unique Identifier (UUID) to the OS, which prevented the tracking of a partition as it moved within a data center.  The UUID value as seen on HMC or the NovaLink did not match the value as displayed in the OS.
  • On systems using PowerVM firmware, a problem was fixed for an error finding the partition load source that has a GPT format.  GUID Partition Table (GPT) is a standard for the layout of the partition table on a physical storage device used in the server, such as a hard disk drive or solid-state drive, using globally unique identifiers (GUID).  Other drives that are working may be using the older master boot record (MBR) partition table format.  This problem occurs whenever load sources utilizing the GPT format occur in other than the first entry of the boot table.  Without the fix, a GPT disk drive must be the first entry in the boot table to be able to use it to boot a partition.
  • On systems using PowerVM firmware, a problem was fixed for an SRC BA090006 serviceable event log occurring whenever an attempt was made to boot from an ALUA  (Asymmetric Logical Unit Access) drive.  These drives are always busy by design and cannot be used for a partition boot, but no service action is required if a user inadvertently tries to do that.  Therefore, the SRC was changed to be an informational log.
SC860_082_056 / FW860.20

03/17/17
Impact:  Availability      Severity:  SPE 
SC860_070_056 / FW860.12

01/13/17
Impact:  Availability      Severity:  SPE
SC860_063_056 / FW860.11

12/05/16
Impact:  N/A      Severity:  N/A
  • This Service Pack contained updates for MANUFACTURING ONLY.
SC860_056_056 / FW860.10

11/18/16
Only DISRUPTIVE and DEFERRED fix descriptions are displayed for this service pack. 
The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
Impact:  New      Severity:  New

System firmware changes that affect certain systems

  • DISRUPTIVE:  On systems using the PowerVM firmware, a problem was fixed for an "Incomplete" state caused by initiating a resource dump with selector macros from NovaLink (vio -dump -lp 1 -fr).   The failure causes a communication process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a hypervisor page fault that disrupts the NovalLink and/or HMC communications. The recovery action is to re-IPL the CEC but that will need to be done without the assistance of the management console.  For each partition that has a OS running on the system, shut down each partition from the OS.  Then from the Advanced System Management Interface (ASMI),  power off the managed system.  Alternatively, the system power button may also be used to do the power off.  If the management console Incomplete state persists after the power off, the managed system should be rebuilt from the management console.  For more information on management console recovery steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.  The fix is disruptive because the size of the PowerVM hypervisor must be increased to accommodate the over-sized stack frame of the failing task.
  • DEFERRED:  On systems using the PowerVM firmware, a problem was fixed for a CAPI function unavailable condition on a system with the maximum number of CAPI adapters and partitions.  Not enough bytes were allocated for CAPI for the maximum configuration case.  The problem may be circumvented by reducing the number of active partitions or CAPI adapters.   The fix is deferred because the size of the hypervisor must be increased to provide the additional CAPI space.
  • DEFERRED:   On systems using PowerVM firmware, a problem was fixed for cable card capable PCI slots that fail during the IPL.  Hypervisor I/O Bus Interface UE B7006A84 is reported for each cable card capable PCI  slot that doesn't contain a PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code #EJ05).  PCI slots containing a cable card will not report an error but will not be functional.  The problem can be resolved by performing an AC cycle of the system.  The trigger for the failure is the I2C devices used to detect the cable cards are not coming out of the power on reset process in the correct state due to a race condition.

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: SC830_123.


5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: SCxxx_yyy_zzz

Where xxx = release level

Instructions for installing firmware updates and upgrades can be found at http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8ha1/updupdates.htm

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central: 
http://www-933.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

7.0 Firmware History

The complete Firmware Fix History (including HIPER descriptions)  for this Release level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html

8.0 Change History

Date
Description
December 08, 2017 Fix Description update for SC860_118 / FW860.40