Power8 System Firmware

Applies to:   9119-MHE, 9119-MME, 9080-MHE and 9080-MME.

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power System E880 (9119-MHE ), Power Systems E880C (9080-MHE), Power System E870 (9119-MME) and Power Systems E870C (9080-MME) servers only.

The firmware level in this package is:

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

NOTES:
Although the Minimum HMC Code level for this firmware is listed above,  HMC V8 R8.6.0 Service Pack 2 (PTF MH01690) with ifix (PTF MH01716) or higher is recommended.

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home

NOTES:
                -You must be logged in as hscroot in order for the firmware installation to complete correctly.
                - Systems Director Management Console (SDMC) does not support this System Firmware level.

1.2 AIX iFix Required

For IBM Power System servers with the PCIe 2-port Async EIA-232 Adapter installed on AIX partitions, an AIX fix resolving the async port interrupt handling (APAR IV77596) must be installed before updating to the SC840_056 (FW840.00) or later level of firmware.  The ports on the adapter (feature code EN27/EN28, CCIN 57D4) may become un-usable with the installation of that firmware level due to an issue with how interrupts are handled.  Many JAS_RTS error log entries are written to the error log due to this issue.

Prior to this APAR shipping in a future Service Pack, AIX intends to publish ifixes for the latest Service Packs on all active Technology Levels on our ftp server, in ftp://aix.software.ibm.com/aix/ifixes/iv77596/ on or before Oct 13, 2015.  If you need an ifix other than the ones on this server, contact IBM support to request one for your specific situation.

The procedure is intended to be performed by the customer.  In the event that the customer has questions or concerns with the procedure, you should contact IBM Support.  Please contact IBM Support: 
US Support: 1.800.IBM.SERV
WW Support (select your country):  http://www.ibm.com/planetwide/

2.0 Important Information

Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

2.1 IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is only supported on HMC Managed Systems only.

2.3 DPSS Updates

Power 8 servers use a programmable power controller called a DPSS (Digital Power Subsystem Sweep) which is located in each system node. The DPSS is used to control P8 fan speeds, check voltage levels of the power supplies for proper level, and operation in the system node.  The DPSS image is persistent and is only reloaded if there is a system firmware update that contains a DPSS change.  If there is a DPSS change and the system firmware update is concurrent, the DPSS update is delayed to the next IPL of the CEC which will cause an additional 18 to 20 minutes to be added on to the IPL.   If there is a change and the firmware update is disruptive, then DPSS update occurs when the service processor is resetting to service processor stand-by state, and will add 18 to 20 minutes to this transition.  During the DPSS update the HMC or op-panel, will display DPSS update progress codes which may be overwritten on the HMC, but will be displayed as C100C300 thru C100C3FF.   If there is a DPSS change in a system firmware service pack, the change will be designated as deferred in the service pack README.   DPSS changes will be described along with a reminder of the 18 to 20 minute additional time in the Firmware Information and Description section in the README.

The DPSS download progress codes are documented in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8eai/C1xx_info.htm

2.4 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8hat/p8hat_lparmemory.htm


3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01SCxxx_yyy_zzz

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01SC830_040_040 and 01SC840_040_045 are different service packs.

An installation is disruptive if:

            Example: Currently installed release is 01SC840_040_040, new release is 01SC850_050_050.

            Example: SC830_040_040 is disruptive, no matter what level of SC830 is currently installed on the system.

            Example: Currently installed service pack is SC830_040_040 and new service pack is SC830_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is SC830_040_040, new service pack is SC830_071_040.

3.1 Firmware Information and Description

 
Filename Size Checksum md5sum
01SC840_177_056.rpm
82067107
38928
6600007cd5f0066e5216ecff1c735397

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01SC840_177_056.rpm

SC840
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

The following Fix description table will only contain the N (current) and N-1 (previous) levels.
The complete Firmware Fix History (including HIPER descriptions) for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
SC840_177_056 / FW840.60

09/29/17
Impact:  Availability      Severity:  SPE

System firmware changes that affect all systems

  • A problem was fixed for a false 110026B1 (12V power good failure) caused by an I2C bus write error for a LED state.  This error can be triggered by the fan LEDs changing state.
  • A problem was fixed for a fan LED turning amber on solid when there is no fan fault, or when the fan fault is for a different fan.  This error can be triggered anytime a fan LED needs to change its state.  The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for sporadic blinking amber LEDs for the system fans with no SRCs logged.  There was no problem with the fans.  The LED corruption occurred when two service processor tasks attempted to update the LED state at the same time.  The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for the loss of Operations Panel function 30 (displaying ethernet port  HMC1 and HMC2 IP addresses) after a concurrent repair of the Operations Panel.  Operations  Panel function 30 can be restored concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm
  • A problem was fixed for a core dump of the rtiminit (service processor time of day) process that logs an SRC B15A3303  and could invalidate the time on the service processor.  If the error occurs while the system is powered on, the hypervisor has the master time and will refresh the service processor time, so no action is needed for recovery.  If the error occurs while the system is powered off, the service processor time must be corrected on the systems having only a single service processor.  Use the following steps from the IBM Knowledge Center to change the UTC time with the Advanced System Management Interface:  https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/viewtime.htm.
  • A problem was fixed for the "Minimum code level supported" not being shown by the Advanced System Menu Interface when selecting the "System Configuration/Firmware Update Policy" menu.  The message shown is "Minimum code level supported value has not been set".  The workaround to find this value is to use the ASMI command line interface with the "registry -l cupd/MinMifLevel" command.
  • A problem was fixed for a degraded PCI link causing a Predictive SRC for a non-cacheable unit (NCU) store time-out that occurred with SRC B113E540 or B181E450 and PRD signature  "(NCUFIR[9]) STORE_TIMEOUT: Store timed out on PB".  With the fix, the error is changed to be an Informational as the problem is not with the processor core and the processor should not be replaced.  The solution for degraded PCI links is different from the fix for this problem, but a re-IPL of the CEC or a reset of the PCI adapters could help to recover the PCI links from their degraded mode.
  • A problem was fixed for system node fans going to maximum RPM speeds after a service processor failover that needed the On-Chip Controllers (OCC) to be reloaded.  Without the fix, the system node fan speeds can be restored to normal speed by changing the Power Mode in the Advanced System Menu Interface using steps from the IBM Knowledge Center:  https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/areaa_pmms.htm.  After changing the Power Mode, wait about 10 minutes to change the Power Mode back to the original setting.
    If the fix is applied concurrently and the fans are already in the maximum RPM speed condition, the system node fan speeds can be corrected by either changing the Power Mode as above, or using the HMC to do an Administrative Failover (AFO).
  • A problem was fixed for the System Attention LED failing to light for an error failover for the redundant service processors with an SRC B1812028 logged.
  • A problem was fixed for a service processor reset triggered by a spurious false IIC interrupt request in the kernel.  On systems with a single service processor, the SRC B1817201 is displayed on the Operator Panel.  For systems with redundant service processors, an error failover to the backup service processor occurs.  The problem is extremely infrequent and does not impact processes on the running system.
  • A problem was fixed for the service processor low-level boot code always running off the same side of the flash image, regardless of what side has been selected for boot ( P-side or T-side).  Because this low-level boot code rarely changes, this should not cause a problem unless corruption occurs in the flash image of the boot code.  This problem does not affect firmware side-switches as the service processor initialization code (higher-level  code than the boot code) is running correctly from the selected side.  Without the fix, there is no recovery for boot corruption for systems with a single service processor as the service processor must be replaced.
  • A problem was fixed for a system failure caused by Hostboot problems with one node but where the other nodes are good.  With the fix, the node that is failing the Hostboot is deconfigured and the system is able to IPL on the remaining nodes.  To recover from this problem, manually guard the node that is failing and re-IPL.
  • A problem was fixed for help text in the Advanced System Management Interface (ASMI) not informing the user that system fan speeds would increase if the system Power Mode was changed to "Fixed Maximum Frequency" mode.  If ASMI panel function "System Configuration->Power Management->Power Mode Setup" "Enable Fixed Maximum Frequency mode" help is selected, the updated text states "...This setting will result in the fans running at the maximum speed for proper cooling."
  • A problem was fixed for a Power Supply Unit (PSU) failiure of  SRC 110015xF  logged with a power supply fan call out when doing a hot re-plug of a PSU.   The power supply may be made operational again by doing a dummy replace of the PSU that was called out (keeping the same PSU for the replace operation).  A re-IPL of the system will also recover the PSU.
  • A problem was fixed for recovery from clock card loss of lock failures that resulted in a clock card FRU unnecessarily being called out for repair.  This error happened whenever there was a loss of lock (PLL or CRC) for the clock card.  With the fix, firmware will not be calling out the failing clock card, but rather it will be re-configured as the new backup clock card after doing a clock card failover.  Customers will see a benefit from improved system availability by the avoidance of disruptive clock card repairs.

System firmware changes that affect certain systems

  • DEFERRED:  On systems using PowerVM firmware, a problem was fixed for PCIe3 I/O expansion drawer (#EMX0) link improved stability.  The settings for the continuous time linear equalizers (CTLE) was updated for all the PCIe adapters for the PCIe links to the expansion drawer.  The CEC must be re-IPLed for the fix to activate.
  • On systems using PowerVM firmware,  a problem was fixed for an intermittent service processor core dump and callout for netsCommonMSGServer with SRC B181EF88.   The HMC connection to the service processor automatically recovers with a new session.
  • On systems using PowerVM firmware with a Linux Little Endian (LE) partition, a problem was fixed for system reset interrupts returning the wrong values in the debug output for the NIP and MSR registers.  This problem reduces the ability to debug hung Linux partitions using system reset interrupts.  The error occurs every time a system reset interrupt is used on a Linux LE partition.
  • On systems using PowerVM firmware, a problem was fixed for "Time Power On" enabled partitions not being capable of suspend and resume operations.  This means Live Partition Mobility (LPM) would not be able to migrate this type of partition.  As a workaround, the partition could be transitioned to a "Non-time Power On" state and then made capable of suspend and resume operations.
  • On systems using PowerVM firmware,  a problem was fixed for  Power Enterprise Pool (PEP) IFL processors assignments causing an "Out of Compliance" for normal processor licenses.  The number of IFL processors purchased was first credited as satisfying any "unreturned" PEP processor resources, thus potentially leaving the system "Out Of Compliance" since IFL processors should not be taking the place of the normal (expensive) processor usage.  In this situation, without the fix, the user will need to either purchase more "expensive" non-IFL processors to satisfy the non-IFL workloads or adjust the partitions to reduce the usage of non-IFL processors.  This is a very infrequent problem for the following reasons: 
    1) PEP processors are infrequently left "unreturned" for short periods of time for specialized operations such as LPM migrations
    2) The user would have to purchase IFL processors from IBM, which is not a common occurrence.
    3) The user would have to put in a COD key for IFL processors while a PEP processor is still "unreturned"
  • On systems using PowerVM firmware, a problem was fixed for a Power Enterprise Pool (PEP) resource Grace Period being short by one hour with 71 hours provided instead of 72 hours.  The Grace Period is provided when all PEP resources are assigned and the user double-uses these resources (typically this is done for a Live Partition Mobility (LPM) migration).  This "borrowing" is temporarily permitted in this case even if there are not enough licenses to cover resources in both servers. The PEP goes into "Approaching Out Of Compliance", indicating the user has a certain amount of time to resolve this double-use. The problem here is that the time length of this Grace Period lasts one hour less than stated.  For a 72-hour Grace Period (the standard setting), the user only gets 71 hours.  The user sees "71 hours remaining" (correct) on first display at start,  then right away, if the user displays again, 70 hours is shown remaining.  But thereafter, the Grace Period time decrements correctly for the time remaining.
  • On systems using PowerVM firmware, a problem was fixed for Power Enterprise Pool (PEP) non-applicable error messages being displayed when re-entering PEP XML files for PEP updates, in which one of the XML operations calls for Conversion of Perm Resources to PEP Resources.  There is no error as the PEP key was accepted on the first use.  The following message may be seen on the HMC and can be ignored:   "...HSCL0520 A Mobile CoD processor conversion code to convert 0 permanently activated processors to Mobile CoD processors on the managed system has been entered.  HSCL050F This CoD code is not valid for your managed system.  Contact your CoD administrator."
  • On systems using PowerVM firmware, a problem was fixed for reboot retries for IBM i partitions such that the first load source I/O adapter (IOA) is retried instead of bypassed after the first failed attempt.  The reboot retries are done for an hour before the reboot process gives up.  This error can occur if there is more than one known load source, and the IOA of the first load source is different from the IOA of the last load source.  The error can be circumvented by retrying the boot of the partition after the load source device has become available.
  • On systems using PowerVM firmware with mirrored memory running IBM i partitions, a problem was fixed for memory fails in the partition that also caused the system to crash.  The system failure will occur any time that IBM i partition memory towards the beginning of the partition's assigned memory fails.  With the fix, the memory failure is isolated to the impacted partition, leaving the rest of the system unaffected.
  • On systems using PowerVM firmware, a problem was fixed for failures deconfiguring SR-IOV Virtual Functions (VFs).  This can occur during Live Partition Mobility (LPM) migrations with HMC error messages of  HSCLAF16,HSCLAF15 and HSCLB602 shown  This results in a LPM migration failure and a system reboot is required to recover the VFs for the I/O adapters.  This error may occur more frequently in cases where the I/O adapter has pending I/O at the time of the deconfigure request for the VF.
  • On systems using PowerVM firmware, a problem was fixed for the incorrect reporting of the Universally Unique Identifier (UUID) to the OS, which prevented the tracking of a partition as it moved within a data center.  The UUID value as seen on HMC or the NovaLink did not match the value as displayed in the OS.
  • On systems using PowerVM firmware,  a  problem was fixed for a partition boot from a USB 3.0 device that has an error log SRC BA210003.  The error is triggered by an Open Firmware entry to the trace buffer during the partition boot.  The error log can be ignored as the boot is successful to the OS.
  • On systems using PowerVM firmware,  a  problem was fixed for a partition boot fail or hang from a Fibre Channel device having fabric faults.  Some of the fabric errors returned by the VIOS are not interpreted correctly by the Open Firmware VFC drive, causing the hang instead of generating helpful error logs.
  • On systems using PowerVM firmware,  problems were fixed for communication failures on adapters in SR-IOV shared mode:
    1) A problem  was fixed for SR-IOV adapters in shared mode for a transmission stall or time out with SRC B400FF01 logged.  The time out happens during Virtual Function (VF) shutdowns and during Function Level Resets (FLRs) with network traffic running.
    2) A problem was fixed for an SR-IOV logical port whose Port VLAN ID (PVID) changing from non-zero to zero causes a communication failure under certain conditions.  The communication failure only occurs when a logical port's PVID is dynamically changed from non-zero to zero.  An SR-IOV logical port is an I/O device created for a partition or a partition profile using the management console (HMC) when a user intends for the partition to access an SR-IOV adapter Virtual Function.  The error can be recovered from by a reboot of the partition.
    These fixes updates adapter firmware to 10.2.252.1929, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57.
    The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters.  A system reboot will update all SR-IOV shared-mode adapters with the new firmware level.  In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced).  And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC).  To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates:   https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
    Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running).
  • On systems using PowerVM firmware with PowerVM NovaLink, a problem was fixed for a lost of a communications channel between the hypervisor and the PowerVM NovaLink during a reset of the service processor.  Various NovaLink tasks, including deploy, could fail with a "No valid host was found" error.  With the fix, PowerVM NovaLink prevents normal operations from being impacted by a reset of the service processor.
  • On systems using PowerVM firmware with PowerVM NovaLink, a problem was fixed for returning to HMC-only management from  co-management  when a Novalink partition is deleted holding the master mode.  A circumvention is to release master mode before deleting the NovaLink partition and then reconnect the disconnected management console.  Please refer to IBM Knowledge Center link "http://ibm.biz/novalink-kc" for more information on the PowerVM NovaLink feature and changing the master authority when doing co-management.
  • On systems using PowerVM firmware with PowerVM NovaLink, a problem was fixed for a master management console becoming disconnected and blocking other management consoles from performing virtualization changes. A circumvention is to use the HMC CLI on another management console to request the master mode with the force option.   Please refer to IBM Knowledge Center link "http://ibm.biz/novalink-kc" for more information on the PowerVM NovaLink feature and changing the master authority when doing co-management.
  • On systems using PowerVM firmware, a problem was fixed for Power Enterprise Pool (PEP) busy errors from the system anchor card when creating or updating a PEP pool.    The error returned by the HMC is "HSCL9015 The managed system cannot currently process this operation.  This condition is temporary.  Please try the operation again."  To try again, the customer needs to update the pool again.  Typically on the second PEP update, the code is accepted.
    The problem is intermittent and occurs only rarely.
  • On systems using PowerVM firmware, a problem was fixed for an invalid date from the service processor causing the customer date and time to go to the Epoch value (01/01/1970) without a warning or chance for a correction.  With the fix,  the first IPL attempted on an invalid date will be rejected with a message alerting the user to set the time correctly in the service processor.  If the warning is ignored and the date/time is not corrected, the next IPL attempt will complete to the OS with the time reverted to the Epoch time and date.  This problem is very rare but it has been known to occur on service processor replacements when the repair step to set the date and time on the new service processor was inadvertently skipped by the service representative.
  • On systems using PowerVM firmware, a problem was fixed for a Power Enterprise Pool (PEP) system losing its assigned processor and memory resources after an IPL of the system.  This is an intermittent problem caused by a small timing window that makes it possible for the server to not get the IPL-time assignment of resources from the HMC.  If this problem occurs, it can be corrected by the HMC to recover the pool without needing another IPL of the system.
  • On systems using PowerVM firmware,  a problem was fixed for the error handling of EEH events for the SR-IOV Virtual Functions (VFs) that can result in IPL failure with B7006971, B400FF05, and BA210000 SRCs logged.  In these cases, the partition console stops at an OFDBG prompt.  Also a DLPAR add of a VF may result in a parttion crash due to a 300 DSI exception because of a low-level EEH event.  A circumvention for the problem would be to debug the EEH events which should be recovered errors and eliminate the cause of the EEH events.  With the fix, the EEH events still log Predictive Errors but do not cause a partition failure.
  • On systems using PowerVM firmware, a problem was fixed for an error finding the partition load source that has a GPT format.  GUID Partition Table (GPT) is a standard for the layout of the partition table on a physical storage device used in the server, such as a hard disk drive or solid-state drive, using globally unique identifiers (GUID).  Other drives that are working may be using the older master boot record (MBR) partition table format.  This problem occurs whenever load sources utilizing the GPT format occur in other than the first entry of the boot table.  Without the fix, a GPT disk drive must be the first entry in the boot table to be able to use it to boot a partition.
  • On systems using PowerVM firmware, a problem was fixed for an SRC BA090006 serviceable event log occurring whenever an attempt was made to boot from an ALUA  (Asymmetric Logical Unit Access) drive.  These drives are always busy by design and cannot be used for a partition boot, but no service action is required if a user inadvertently tries to do that.  Therefore, the SRC was changed to be an informational log.
  • On systems using PowerVM firmware, a problem was fixed for Live Partition Mobility (LPM) migrations from FW860.12 or later to the FW840.50 level of firmware. Subsequent DLPAR add operations of Virtual Adapters will fail with HMC error message HSCLAB2B, which contains text similar to the following:  "The operation to add a virtual NIC in slot 8 on partition 9 failed. The requested amounts of slot(s) to be added is 1 and the completed amount is 0."  The  AIX OS standard error message with return code 3 is the following: "0931-007 You have specified an invalid drc_name."   This issue affects partitions installed with AIX 7.2 TL 1 and later.   Not affected by this issue are partitions installed with VIOS, IBM i, or earlier levels of AIX.  The error can be recovered by a reboot of the affected partition.
SC840_168_056 / FW840.50

04/21/17
Impact:  Availability      Severity:  SPE

New features and functions

  • Support for the Advanced System Management Interface (ASMI) was changed to allow the special characters of "I", "O", and "Q" to be entered for the serial number of the I/O Enclosure under the Configure I/O Enclosure option.  These characters have only been found in an IBM serial number rarely, so typing in these characters will normally be an incorrect action.  However, the special character entry is not blocked by ASMI anymore so it is able to support the exception case.  Without the enhancement, the typing of one of the special characters causes message "Invalid serial number" to be displayed.
  • On systems using PowerVM firmware, support was added  for the Universally Unique IDentifier (UUID) property for each partition.  The UUID provides each partition with an identifier that is persisted by the platform across partition reboots, reconfigurations, OS reinstalls, partition migration,  and hibernation.

System firmware changes that affect all systems

  • A problem was fixed for the setting the disable of a periodic notification for a call home error log SRC B150F138 for Memory Buffer resources (membuf) from the Advanced System Management Interface (ASMI).
  • A problem was fixed for incorrect callouts of the Power Management Controller (PMC) hardware with SRC  B1112AC4 and SRC B1112AB2 logged.  These extra callouts occur when the On-Chip Controller (OCC) has placed the system in the safe state for a prior failure that is the real problem that needs to be resolved.
  • A problem was fixed for device time outs during a IPL logged with a SRC B18138B4.  This error is intermittent and no action is needed for the error log.  The service processor hardware server has allotted more time of the device transactions to allow the transactions to complete without a time-out error.
  • A problem was fixed for the Advanced System Management Interface (ASMI) "System Service Aids => Error/Event Logs" panel not showing the "Clear" and "Show" log options and also having a truncated error log when there are a large number of error logs on the system.
  • A problem was fixed for the failover to the backup PNOR on a Hostboot Self Boot Engine (SBE) failure.  Without the fix, the failed SBE causes loss of processors and memory with B15050AD logged.  With the fix, the SBE is able to access the backup PNOR and IPL successfully by deconfiguring the failing PNOR and calling it out as a failed FRU.
  • A problem was fixed for System Vital Product Data (SVPD) FRUs  being guarded but not having a corresponding error log entry.  This is a failure to commit the error log entry that has occurred only rarely.
  • A problem was fixed for  a system going into safe mode with SRC B1502616 logged as informational without a call home notification.  Notification is needed because the system is running with reduced performance.  If there are unrecoverable error logs and any are marked with reduced performance and the system has not been rebooted, then the system is probably running in safe mode with reduced performance.  With the fix, the SRC B1502616 is a Unrecoverable Error (UE).
  • A problem was fixed for the service processor boot watch-dog timer expiring too soon during DRAM initialization in the reset/reload, causing the service processor to go unresponsive.  On systems with a single service processor, the SRC B1817212 was displayed on the control panel.  For systems with redundant service processors, the failing service processor was deconfigured.  To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action.  This problem is intermittent and very infrequent as most of the reset/reloads of the service processor will work correctly to restore the service processor to a normal operating state.
  • A problem was fixed for host-initiated resets of the service processor causing the system to terminate.  A prior fix for this problem did not work correctly because some of the host-initiated resets were being translated to unknown reset types that caused the system to terminate.  With this new correction for failed host-initiated resets, the service processor will still be unresponsive but the system and partitions will continue to run.  On systems with a single service processor, the SRC B1817212 will be displayed on the control panel.  For systems with redundant service processors, the failing service processor will be deconfigured.  To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action.  This problem is intermittent and very infrequent as most of the host-initiated resets of the service processor will work correctly to restore the service processor to a normal operating state.
  • A problem was fixed for hardware dumps only collecting data for the master processor if a run-time service processor failover had occurred prior to the dump.  Therefore, there would be only master chip and master core data in the event of a core unit checkstop.  To recover to a system state that is able to do a full collection of debug data for all processors and cores after a run-time failover, a re-IPL of the system is needed.
  • A problem was fixed for incorrect error messages from the Advanced System Management Interface (ASMI) functions when the system is powered on but in the  "Incomplete State".  For this condition, ASMI was assuming the system was powered off because it could not communicate to the PowerVM hypervisor.  With the fix, the ASMI error messages will indicate that ASMI functions have failed because of the bad hypervisor connection instead of falsely stating that the system is powered off.
  • A problem was fixed for a single node failure on a multi-node system preventing an IPL.  The error occurred if  Hostboot hung on a node and timed out  without calling out problem hardware.  With the fix, a service processor failover is used to IPL on an alternate path to recover from the error.  And an error log has been added for the IPL timeout for the node with SRC B111BAAB and a callout for the master processor and PNOR.
  • A  problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from over-heating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged.  This happened  because of an On-Chip Controller (OCC) internal queue overflow. The problem has only been observed for systems running heavy workloads with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), but this may not be required to encounter the problem.  Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
    To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line:
    1) Log into ASMI as celogin with  dynamic celogin password generated by IBM Support
    2) Select System Service Aids
    3) Select Service Processor Command Line
    4) Enter "tmgtclient --query_mode_and_function" from the command line
    The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active.

System firmware changes that affect certain systems

  • On systems using  PowerVM firmware, a  problem was fixed for cable card (PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer) capable PCI slots that fail during the IPL.  Hypervisor I/O Bus Interface UE B7006A84 is reported for each cable card capable PCI slot that doesn't contain a cable card.  PCI slots containing a cable card will not report an error but will not be functional.  The problem can be resolved by doing a "power off/power on" re-IPL of the system. The trigger for the failure is the I2C devices used to detect the cable cards are not coming out of the power on reset process in the correct state due to a race condition.  The affected optical cable adapters have feature codes #EJ05, #EJ07, and #EJ08 with CCINs 2B1C, 6B52, and 2CE2, respectively.
  • On systems using PowerVM firmware,  a problem was fixed for a blank SRC in the LPA dump for user-initiated non-disruptive adjunct dumps.  The SRC is needed for problem determination and dump analysis.
  • On systems using PowerVM firmware, a problem was fixed with SR-IOV adapter error recovery where the adapter is left in a failed state in nested error cases for some adapter errors.  The probability of this occurring is very low since the problem trigger is multiple low-level adapter failures.  With the fix, the adapter is recovered and returned to an operational state.
  • On systems using PowerVM firmware  with PCIe adapters in Single Root I/O Virtualization (SR-IOV) shared mode, a problem was fixed for the hypervisor SR-IOV adjunct partition failing during the IPL with SRCs B200F011 and B2009014 logged. The SR-IOV adjunct partition successfully recovers after it reboots and the system is operational.
  • On systems using PowerVM firmware, a problem was fixed for PCIe Host Bridge (PHB) outages and PCIe adapter failures in the PCIe I/O expansion drawer caused by error thresholds being exceeded for the LEM bit [21] errors in the FIR accumulator.  These are typically minor and expected errors in the PHB that occur during adapter updates and do not warrant  a reset of the PHB and the PCIe adapter failures.  Therefore, the threshold LEM[21] error limit has been increased and the LEM fatal error has been changed to a Predictive Error to avoid the outages for this condition.
  • On systems using PowerVM firmware with a large memory configuration (greater than 8 TB), a problem was fixed for a SR-IOV adjunct failure during the IPL, causing loss of SR-IOV function.  The large system memory space causes an overflow in the space calculations for SR-IOV adapters in PCIe slots with Enlarged IO Capacity enabled.  The problem can be avoided by reducing the number of PCIe slots with Enlarged IO Capacity enabled so it does not include adapters in SR-IOV shared-mode.  Another circumvention option is to move the SR-IOV adapters to  SR-IOV capable PCIe slots where Enlarged IO Capacity is not enabled.   Reducing system physical memory to below 8 TB will also work as a circumvention.
  • On systems using PowerVM firmware, a problem was fixed for Live Partition Mobility (LPM) migrations from FW860.10 or FW860.11 to older levels of firmware. Subsequent DLPAR of Virtual Adapters will fail with HMC error message HSCL294C, which contains text similar to the following:  "0931-007 You have specified an invalid drc_name." This issue affects partitions installed with AIX 7.2 TL 1 and later. Not affected by this issue are partitions installed with VIOS, IBM i, or earlier levels of AIX.
  • On a system using PowerVM firmware running a Linux OS,  a problem was fixed for support for Coherent Accelerator Processor Interface (CAPI) adapters.  The CAPI related RTAS h-calls for the CAPI devices could not be made by the Linux OS, impacting the CAPI adapter functionality and usability.  This problem involves the following adapters:  the PCIe3 LP CAPI Accelerator Adapter with F/C #EJ16 that is used on the S812L(8247-21L) and S822L (8247-22L) models;  the PCIe3 CAPI FlashSystem Acclerator Adapter with F/C #EJ17  that is used on the S814(8286-41A) and S824(8286-42A) models;  and the PCIe3 CAPI FlashSystem Accelerator Adapter with F/C #EJ18 that is used on the S822(8284-22A), E870(9119-MME), and E880(9119-MHE) models.  This problem does not pertain to PowerVM AIX partitions using CAPI adapters.
  • On a system using PowerVM firmware, a problem was fixed for corruption of the partition data in the service processor NVRAM during a power off that causes the managed system to go into the  HMC "Recovery" error state.  A circumvention for the error is to restore partition data from the HMC.  If using Novalink to manage the partition, a recovery can be done from the Novalink backup.  The error is very infrequent but more likely to occur on an immediate power off of the system.  Instead, if a delayed powered off is used, that would allow the hypervisor to complete all pending operations before shutting down cleanly.
  • On systems using PowerVM firmware, a problem was fixed for a group of shared processor partitions being able to exceed the designated capacity placed on a shared processor pool.  This error can be triggered by using the DLPAR move function for the shared processor partitions, if the pool has already reached its maximum specified capacity.  To prevent this problem from occurring when making DLPAR changes when the pool is at the maximum capacity, do not use the DLPAR move operation but instead break it into two steps:  DLPAR remove followed by DLPAR add.  This gives enough time for the DLPAR remove to be fully completed prior to starting the DLPAR add request.
  • On systems using PowerVM firmware, a problem was fixed for NVRAM corruption and a HMC recovery state when using Simplified Remote Restart partitions.  The failing systems will have at least one Remote Restart partition and on the failed IPL there will be a B70005301 SRC with word 7 being 0X00000002.
  • On systems using PowerVM firmware with an IBM i partition, a problem was fixed for incorrect maximum performance reports based on the wrong number of "maximum" processors for the system.   Certain performance reports that can be generated on IBMi systems contain not only the existing machine information, but also "what-if" information, such as "how would this system perform if it had all the processors possible installed in this system".  This "what-if" report was in error because the maximum number of processors possible was too high for the system.
  • On systems using PowerVM firmware, a problem was fixed for NVRAM corruption that can occur when deleting a partition that owns a CAPI adapter, if that CAPI adapter is not assigned to another partition before the system is powered off.  On a subsequent IPL, the system will come up in recovery mode if there is NVRAM corruption.  To recover, the partitions must be restored from the HMC.  The frequency of this error is expected to be rare.  The CAPI adapters have the following feature codes:  #EC3E, #EC3F, #EC3L, #EC3M, #EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B.
  • On systems using PowerVM firmware, a problem was fixed for PCIe3 I/O expansion drawer (#EMX0) link improved stability.  The settings for the continuous time linear equalizers (CTLE) was updated for all the PCIe adapters for the PCIe links to the expansion drawer. The CEC must be re-IPLed for the fix to activate.
  • On systems using PowerVM firmware,  the following problems were fixed for SR-IOV adapters:
    1) Insufficient resources reported for SR-IOV logical port configured with promiscuous mode enable and a Port VLAN ID (PVID) when creating new interface on the SR-IOV adapters.
    2) Spontaneous dumps and reboot of the adjunct partition for SR-IOV adapters.
    3) Adapter enters firmware loop when single bit ECC error is detected.  System firmware detects this condition as a adapter command time out.  System firmware will reset and restart the adapter to recover the adapter functionality.  This condition will be reported as a temporary adapter hardware failure.
    4) vNIC interfaces not being deleted correctly causing SRC  B400FF01 to be logged and Data Storage Interrupt (DSI) errors with failiure on boot of the LPAR.
    This set of fixes updates adapter firmware to 10.2.252.1926, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38 , EL3C, EL56, and EL57.
    The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters.  A system reboot will update all SR-IOV shared-mode adapters with the new firmware level.  In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced).  And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC).  To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates:   https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
    Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running).
  • On systems using PowerVM firmware, a problem was fixed for partition boot failures and run time DLPAR failures when adding I/O that log BA210000, BA210003, and/or BA210005 errors.  The fix also applies to run time failures configuring an I/O adapter following an EEH recovery that log BA188001 events.  The problem can impact IBMi partitions running in any processor mode or AIX/Linux partitions running  in P7 (or older) processor compatibility modes.  The problem is most likely to occur when the system is configured in the Manufacturing Default Configuration (MDC) mode.  The trigger for the problem is a race-condition between the hypervisor and the physical operations panel with a very rare frequency of occurrence.
  • On systems with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), a  problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from over-heating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged.  This happened  because of On-Chip Controller (OCC) time out errors when collecting Analog Power Subsystem Sweep (APSS) data, used by the OCC to tune the processor frequency.  This problem occurs more frequently on systems that are running heavy workloads.  Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor:  https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm.
    To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line:
    1) Log into ASMI as celogin with  dynamic celogin password generated by IBM Support
    2) Select System Service Aids
    3) Select Service Processor Command Line
    4) Enter "tmgtclient --query_mode_and_function" from the command line
    The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active.
SC840_147_056 / FW840.40

10/28/16
Impact:  Availability      Severity:  SPE
SC840_139_056 / FW840.30

09/28/16
Impact:  Availability      Severity:  SPE
SC840_132_056 / FW840.24

08/31/16
Only HIPER fix descriptions are displayed for this service pack. 
The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
Impact:  Availability      Severity:  HIPER

System firmware changes that affect certain systems

  • HIPER/Non-Pervasive: For a system using PowerVM firmware at a FW840 level and having an AIX partition or VIOS partition at specific back levels,  a problem was fixed for PCI adapters not getting configured in the OS.  DVD boots hang with status code 518 when attempts are made to boot off the AIX or VIOS DVD image.  NIM installs hang with status code 608.  If the firmware is updated to 840_104 through 840_118 for a SAS booted system, the subsequent reboot will hang with status code 554.
    The failing AIX and VIOS levels are as follows:
    AIX:
    AIX 7100-02-06 - AIX 7100-02-07
    AIX 6100-08-06 - AIX 6100-08-07
    VIOS:
    VIOS 2.2.2.6 - VIOS 2.2.2.70
    Without the fix, the problem may be circumvented by upgrading the AIX to 7100-03-03 or 6100-09-03 and the VIOS to 2.2.3.4.
    Depending on the adapter not getting configured, the error may result in Defined devices, EEH errors, and/or failure to boot the partition (if the failing adapter is the boot device).  These errors may also be seen for a rebooted partition after a LPM migration to FW840.
    With the fix applied, the error state for some of the  adapters in the running OS may persist and it will be necessary to reboot the OS to recover from those errors.
SC840_118_056 / FW840.23

07/28/16
Only HIPER/Deferred fix descriptions are displayed for this service pack. 
The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
Impact: Data            Severity:  HIPER

System firmware changes that affect certain systems

  • HIPER/NON-PERVASIVE: DEFERRED:  On systems with DDR4 memory installed, a problem was fixed for the handling of data errors in the L4 cache.   If a data error occurs in the L4 cache of the memory buffer on an affected system and it is pushed out to mainline memory, the data error will not be correctly handled.   A data error originating in the L4 cache may result in incorrect data being stored into memory.  The DDR4 DRAM has feature code (FC) EM8Y for a 256GB 1600 MHz CDIMM.
    At this firmware level, DDR4 and DDR3 memory cannot be mixed in the system.  At FW860.10, DDR4 and DDR3 can be mixed in a system, but each system node must have either DDR3 or DDR4 only.
    IBM strongly recommends that the customer should plan an outage to install the firmware fix immediately.  Fix activation requires a subsequent platform IPL following the installation of the firmware fix to eliminate any exposure to this issue.
SC840_113_056 / FW840.22

07/06/16
Impact:  Availability      Severity:  ATT
SC840_111_056 / FW840.21

06/24/16
Impact:  Availability      Severity:  SPE
SC840_104_056 / FW840.20

05/31/16
Only Deferred fix descriptions are displayed for this service pack. 
The complete Firmware Fix History for this Release Level can be reviewed at the following url:
Impact:  Availability      Severity:  SPE

System firmware changes that affect all systems

  • DEFERRED:  A problem was fixed in the dynamic ram (DRAM) initialization to update the VREF on the dimms to the optimal settings and to add an additional margin check test to improve the reliability of the DRAM by screening out more marginal dimms before they can result in a run-time memory fault.

System firmware changes that affect certain systems

  • DEFERRED:  On systems using PowerVM firmware, a performance improvement was made by disabling the Hot/Cold Affinity (HCA) hardware feature, which gathers memory usage statistics for consumption by partition operating system memory management algorithms.  The statistics gathering can, in rare cases, cause performance to degrade.  The workloads that may experience issues are memory-intensive workloads that have little locality of reference and thus cannot take advantage of hardware memory cache.  As a consequence, the problem occurs very infrequently or not at all except for very specific workloads in a HPC environment.  This performance fix requires an IPL of the system to activate it after it is applied.
  • DEFERRED:  On systems using 256GB DDR4 dimms, a problem was fixed in the 3DS packaging that could result in a recoverable memory error.  This fix requires an IPL of the system to take effect.  Any system with DDR4 dimms should be re-IPLed at the next opportunity to do so after applying this service pack to provide the best running conditions for the DDR4 dimms for reliable operation.
SC840_087_056 / FW840.11

03/18/16
Impact:  Availability      Severity:  ATT
SC840_079_056 / FW840.10

03/04/16
Impact:  Availability      Severity:  SPE
SC840_056_056 / FW840.00

12/04/15
Impact:  New      Severity:  New

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: SC830_123.


5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: SCxxx_yyy_zzz

Where xxx = release level

Instructions for installing firmware updates and upgrades can be found at http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8ha1/updupdates.htm

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central: 
http://www-933.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

7.0 Firmware History

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html

8.0 Change History

Date
Description
December 08, 2017
Fix Description update for SC840_177 / FW840.60.
November 27, 2017 Fix Description update for SC840_168 / FW840.50.
October 24, 2017 Fix list correction for firmware level SC840_177_056 / FW840.60.