Power8 System Firmware

Applies to:   8247-21L; 8247-22L; 8247-42L; 8284-22A; 8286-41A and 8286-42A.

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power System S812L (8247-21L), Power System S822L (8247-22L), Power System S824L (8247-42L), Power System S822 (8284-22A), Power System S814 (8286-41A) and Power System S824 (8286-42A) servers only.

The firmware level in this package is:

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code levels for this firmware are:

        HMC V8 R8.1.0 Service Pack 1  (PTF MH01420) with Security fix (PTF MH01474) and OPENSSL POODLE Security fix (PTF MH01481), or higher.
                                                                                                                   -OR-

        HMC V8 R8.2.0 (PTF MH01453) with Mandatory fix (PTF MH01454) and OPENSSL POODLE Security fix (PTF MH01486), or higher.

NOTE: For the firmware installation to proceed, the HMC must be updated to one of the above minimum levels,  prior to installing this server firmware level.

Although the Minimum HMC Code level for this firmware is listed above,  HMC V8 R8.2.0 Service Pack 1 (PTF MH01455) with security fixes (PTF MH01515 and MH01521) or higher is recommended.

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/

For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home

NOTES:
                -You must be logged in as hscroot in order for the firmware installation to complete correctly.
                - Systems Director Management Console (SDMC) does not support this System Firmware level.

2.0 Important Information

Note: The installation of this Service Pack (SV810_126 / FW810.31) is concurrent if your system is at firmware level SV810_081 / FW810.10, SV810_087 / FW810.11, SV810_101 / FW810.20, SV810_108 / FW810.21 or SV810_124 / FW810.30.  Otherwise the installation will be disruptive.

Recently, several enhancements were released to improve the reliability and function of new and existing adapters used on Power8 systems. To ensure the highest level of availability and performance, it is important that the following System Firmware, IO, AIX & VIOS maintenance is performed.  For efficiency, IBM recommends that all applicable System Firmware, IO, AIX & VIOS maintenance is consolidated and performed during the same session to reduce the number of scheduled maintenance windows.

System F/W: SV810_081 / FW810.10 (or higher)
- For systems in PowerVM mode, a problem was fixed for unresponsive PCIe adapters after a partition power off or a partition reboot.

I/O:
- Device: PCIe2 4-Port (10GbE SFP+ & 1GbE RJ45) Adapter
   Feature Codes: EN0S EN0T EN0U EN0V
   Version: 30090140 (or higher)
   An enhancement added to support Network Installation on 1GB speed switch ports.

- Device: PCIe2 2-Port 10GbE Base-T Adapter
   Feature Codes: EN0W EN0X
   Version: 20110140 (or higher)
   Fixes a Network Installation issue seen with 1GB speed switch port setting.

AIX/VIOS:
- VIOS 2233/61 TL09 SP3: IV63449
- AIX 71 TL03 SP03        :  IV63680

For Power8 systems using NIC adapter Feature Codes (FC) EN0U, EN0V, EN0S, EN0T, EL3Z, EN0W, EN0X which translate to:
PCIe2 4-Port Adapter (10GbE SFP+)
PCIe2 4-Port Adapter (1GbE RJ45)
PCIe2 2-Port 10GbE Base-T Adapter

These APARs correct a problem that occurs when promiscuous mode is not set when the adapter gets reset (e.g. when adapter becomes backup in SEA fail over mode or Encounters a transmit error). This would cause the adapter to transmit packet but not receive packets.

Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

Concurrent Firmware Updates

Concurrent system firmware update is only supported on HMC Managed Systems only.

The concurrent firmware update will cause the system fan speeds to accelerate to maximum RPMs with loud noise emissions.  This increased fan level and loud sound level will persist for several minutes while the service processor is reset and the new firmware level is activated.  Thereafter, the fan speeds will gradually adjust back to normal operating speed and sound levels.

Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
http://www-01.ibm.com/support/knowledgecenter/8286-42A/p8hat/p8hat_lparmemory.htm


3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01SVxxx_yyy_zzz

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01SV810_040_040 and 01SV820_040_045 are different service packs.

An installation is disruptive if:

            Example: Currently installed release is 01SV810_040_040, new release is 01SV820_050_050.

            Example: SV810_040_040 is disruptive, no matter what level of SV810 is currently installed on the system.

            Example: Currently installed service pack is SV810_040_040 and new service pack is SV810_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is SV810_040_040, new service pack is SV810_071_040.

3.1 Firmware Information and Description

 
Filename Size Checksum
01SV810_126_081.rpm
90611472
49397

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01SV810_126_081.rpm

SV810
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html
SV810_126_081 / FW810.31

07/08/15
Impact: Usability       Severity: ATT

System firmware changes that affect all systems

  • A problem was fixed for an In-band firmware update exhibiting a 45-minute delay (using the i5 PTF process or the update_flash utility for AIX or Linux) from FW810 firmware to FW830.  During the delay of the new code level activation, SRC D133C002 is displayed on the operations panel.  This delay occurred because an updated level for the Digital Power Systems Sweep (DPSS) chip was needed but the power off to do the DPSS update hung until a time-out allowed the power off operation to complete.  The firmware update to FW830.00 was successful after the 45-minute delay.  The firmware updates done with the Hardware Management Console (HMC) will not experience this 45-minute delay.
SV810_124_081 / FW810.30

05/29/15
Impact: Availability    Severity: SPE

New features and functions

  • Support for setting Power Management Tuning Parameters from the Management Console (Fixed Maximum Frequency (FMF), Idle Power Save, and DPS Tunables) without needing to use the Advanced System Management Interface (ASMI) on the service processor.  This allows FMF mode to be set by default without having to modify any tunable parameters using ASMI.
  • Support was added for a new menu for the Advanced System Management Interface (ASMI) that is used to reset/reload the service processor.  A reset/reload or "soft reset" maintains the state of the hypervisor and the operating systems running in the partitions while rebooting the service processor so it can recover from service processor errors.  The menu that does this function is called "System Service Aids/Soft Reset Service Processor."
  • Support was added to the Advanced System Management Interface (ASMI) to display Anchor card VPD failures in the "Deconfigurations records" menu.
  • Support for the Nvidia Compute Intensive Accelerator (PCIe attached 300W GPU) with F/C #EC4B.  This feature is only supported on the IBM Power System S824L (8247-42L).  It is a PCIe 3 X16/Long/Full High/Double wide adapter with the PCIe connection in the left slot and overlaps another PCIe slot.  This feature ships with an auxiliary power cord used inside the system to support the 300W card.

System firmware changes that affect all systems

  • A problem was fixed for systems with a corrupted date of "1900" showing for the Update Access Key (UAK).  The firmware update is allowed to proceed on systems with a bad UAK date because the fix is in an emergency service pack.  After the fix is installed, the user should correct the UAK date, if needed, by using the original UAK key for the system.  On the Management Console,  enter the original update access key via the "Enter COD Code" panel. Or on the Advanced System Manager Interface (ASMI),  enter the original update access key via the "On Demand Utilities/COD Activation" panel.
  • A problem was fixed for the iptables process consuming all available memory, causing an out of memory dump and reset/reload of the service processor.
  • A problem was fixed for a CEC IPL hang failure with CEC Hardware Subsystem SRC UE B150BE14 when having persistent L2/L3 cache memory errors.  The IPL was stuck in a loop with progress codes C1C3C200 through C1C3C213 and having repeating error log informational SRCs of HostBoot BC8A1402 and Processor Unit (CPU) BC13E504.  With the fix, the failing core chiplet is guarded out and the IPL is able to complete.
  • A problem was fixed for the NEBS DC power supply showing up in the part inventories for the CEC as "IBM AC PS".  The description string has been changed to "IBM PS" as power supplies can be of DC or AC type.
  • A problem was fixed for missing hardware callouts in Vital Product Data (VPD) error logs.
  • A problem was fixed for the callouts for a checkstop with SRC B111E504 with PBCENTFIR[5] of PB_CENT_CRESP_ADDR_ERROR so that FSPSP16 is added as the high priority callout.  This checkstop is most likely caused by software error, not hardware.
  • A problem was fixed for SRC B1104800 having duplicate FRU call outs for the PNOR flash FRU.
  • A problem was fixed in the hardware server to prevent a UE B181BA07 abort when a host boot dump collection is in progress.
  • A problem was fixed for the unnecessary guarding of DIMMs for a memory bus error for SRC Memory Card/FRU B124E504.  The error recovery has been improved so that DIMMs are not guarded and the failing memory bus lane is replaced by the spare memory bus data lane.
  • A problem was fixed for a processor core unit being deconfigured but not guarded for a SRC B113E504 processor error in host boot with fault isolation register (FIR) code "RC_PMPROC_CHKSLW_NOT_IN_ETR" that caused the CEC to go to termination.  By guarding the failed processor core, the fix insures the core is not used on the reIPL of the CEC.
  • A problem was fixed for the On Chip Controller (OCC) taking the system into safe mode under certain work loads by increasing the time allowed for getting an update of the Analog Power Subsystem Sweep (APSS) data for current temperatures and power consumption.  If the OCC does not get data from the APSS within its time-out period,  the OCC will go to safe mode and run the processor at a minimum frequency.
  • A problem was fixed for intermittent firmware database errors that logged an UE SRC of B1818611 and had a fwdbServer core dump.
  • A problem was fixed for an intermittent reset/reload of the service processor during the early part of an IPL with SRC B1814616 logged.
  • A problem was fixed that prevented a second management console from being added to the CEC.  In some cases, network outages caused defunct management console connection entries to remain in the service processor connection table,  making connection slots unavailable for new management consoles  A reset of the service processor could be used to remove the defunct entries.
  • A problem was fixed for a false guarding and call out of a PSI link with SRC B15CDA27.  This failure is very infrequent but sometimes seen after the reset/reload of the service processor during a concurrent firmware update.   Since there is no actual hardware failure, a manual unguarding of the PSI link allows it to be reused.
  • A problem was fixed for performance dumps to speed its processing so it is able to handle partitions with a large number of processors configured.  Previously, for large systems, the performance dump took too long in collecting performance data to be useful in the debugging of some performance problems.
  • A problem was fixed for a CEC power off error with SRC B1818903 logged.  The error causes a dump and reset of the service processor that allows the power off operation to complete.
  • A problem was fixed for firmware update to be able do a code update downgrade from a SV830 release to SV810.  This error causes the service processor to go to a stopped state with a user power cycle needed to recover to the P-side which will be correctly at the SV810 level.
  • A problem was fixed for missing "fastarray" data in hardware dump type HWPROC.  The "fastarray" contains debug information for the processor cores.
  • A problem was fixed for the Dynamic Power Saving (DPS) mode where, when favoring performance,  the system instead favored lower power use.   A work-around for the problem is to use the Advanced System Management Interface (ASMI) menu of System Configuration/Power Management/Tuning parameters to change the parameter labelled "Utilization threshold to determine active cores with slack" to 10.0%.
  • A problem was fixed for the Automatic Power On Policy (APOR) where the system failed to re-IPL after a AC power loss.  The APOR process needed to wait longer for the AC fault to clear before doing the IPL retry.
  • A problem was fixed for the Advanced System Manager Interface (ASMI)  IPv4 Network Configuration where the IP address was being overwritten by value in the subnet mask field for the initial values of the panel.  If the network configuration was saved without fixing the IP address, the wrong IP address was also saved.
  • A problem was fixed for missing call outs when having multiple "Memory Card/FRU" failures with SRC B124E504.  There is a call out for the first memory FRU of the failures but any other memory FRUs failing at the same time are not reported.
  • A problem was fixed for errors during a CEC power off with SRCs B1812616 and B1812601.  These occurred if the CEC was powered off immediately after a power on such that the On-Chip Controllers (OCCs) had to shutdown during their initialization.
  • A problem was fixed for a highly intermittent IPL failure with SRC B18187D9 caused by a defunct attention handler process.  For this problem, the IPL will continue to fail until the service processor is reset.
  • A security vulnerability, commonly referred to as GHOST, was fixed in the service processor glibc functions getbyhostname() and getbyhostname2() that allowed remote users of the functions to cause a buffer overflow and execute arbitrary code with the permissions of the server application.  There is no way to exploit this vulnerability on the service processor but it has been fixed to remove the vulnerability from the firmware.  The Common Vulnerabilities and Exposures issue number is CVE-2015-0235.
  • A security problem in GNU Bash was fixed to prevent arbitrary commands hidden in environment variables from being run during the start of a Bash shell.  Although GNU Bash is not actively used on the service processor, it does exist in a library so it has been fixed.  This is IBM Product Security Incident Response Team (PSIRT) issue #2211.  The Common Vulnerabilities and Exposures issue numbers for this problem are CVE-2014-6271, CVE-2014-7169, CVE-2014-7186, and CVE-2014-7187.
  • A security problem was fixed in OpenSSL where the service processor would, under certain conditions, accept Diffie-Hellman client certificates without the use of a private key, allowing a user to falsely authenticate .  The Common Vulnerabilities and Exposures issue number is CVE-2015-0205.
  • A security problem was fixed in OpenSSL to prevent a denial of service when handling certain Datagram Transport Layer Security (DTLS) messages.  A specially crafted DTLS message could exhaust all available memory and cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number is CVE-2015-0206.
  • A security problem was fixed in OpenSSL to prevent a denial of service when handling certain Datagram Transport Layer Security (DTLS) messages.  A specially crafted DTLS message could do an null pointer de-reference and cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number is CVE-2014-3571.
  • A security problem was fixed in OpenSSL to fix multiple flaws in the parsing of X.509 certificates.  These flaws could be used to modify an X.509 certificate to produce a certificate with a different fingerprint without invalidating its signature, and possibly bypass fingerprint-based blacklisting.  The Common Vulnerabilities and Exposures issue number is CVE-2014-8275.
  • A security problem was fixed in the OpenSSL (Secure Socket Layer) protocol that allowed a man-in -the middle attacker, via a specially crafted fragmented handshake packet, to force a TLS/SSL server to use TLS 1.0, even if both the client and server supported newer protocol versions. The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3511.
  • A security problem was fixed in OpenSSL for formatting fields of security certificates without null-terminating the output strings.  This could be used to disclose portions of the program memory on the service processor.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3508.
  • Multiple security problems were fixed in the way that OpenSSL handled Datagram Transport Layer Security (DLTS) packets.  A specially crafted DTLS handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2014-3505, CVE-2014-3506 and CVE-2014-3507.
  • A security problem was fixed in OpenSSL to prevent a denial of service when handling certain Datagram Transport Layer Security (DTLS) ServerHello requests.  A specially crafted DTLS handshake packet with an included Supported EC Point Format extension could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3509.
  • A security problem was fixed in OpenSSL to prevent a denial of service by using an exploit of a null pointer de-reference during anonymous Diffie Hellman (DH) key exchange.  A specially crafted handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3510.
  • A problem was fixed for an intermittent problem in a CEC IPL where an On-Chip Controller is stuck in a reset loop, logging repeated SRCs for B1702A17, and eventually places the CEC in safe mode, running at minimum processor clock frequencies.
  • A problem was fixed for NVRAM initialization to support a service processor side switch after an in-band firmware downgrade from SV830 to SV810.  A service processor failure with  SRC B1817212 occurs on a side switch after the downgrade (side switching from SV810 to SV830).  This happens because of the difference in size of the NVRAM used between SV810 and SV830 with a need for more NVRAM initialization on level SV830.  This problem does not affect out of band firmware downgrades to a new release using the Management Console because in that case a code update accept automatically occurs and the T and P sides are updated to the same SV810 release level.
  • A problem was fixed for an error on a re-IPL of a powered-on CEC that fails with a time-of-day topology error with SRC B111BA24.
  • A problem was fixed to provide a service alert for failed VPD on the anchor card.  Previously, only an informational (INF) SRC B155A435 was generated for this failure.  Now the SRC has been made a predictive error (PE) and the failed anchor card VPD is guarded and ready for service.
  • A problem was fixed for a clearing of all guard records associated with one error log entry.  If a FRU is replaced for any of the related guard record, all the related guard records are cleared.  Previously, only the guard record for the replaced FRU was cleared and the association was lost.
  • A problem was fixed to reduce switching noise on the memory address bus for DIMMs.  Noise on the bus could cause a failure for a marginal DIMM, so this fix has the effect of potentially improving the reliability of the memory.
  • A fix was made to prevent processor speculative memory loads from the service processor mailbox Direct Memory Access (DMA) area in the CEC memory.  The speculative loads caused memory cache faults and system checkstops with SRC B181E540.

System firmware changes that affect certain systems

  • For a system with a degraded power supply,  a problem was fixed so that inaccurate output voltage levels would be handled by the Voltage Regulator Modules (VRMs) and not cause a system failure.
  • For a system with a missing or broken operations panel, a problem was fixed for excessive logging of SRC B181A734 for the error condition.
  • On systems using Virtual IO Server (VIOS) with the partitions,  a problem was fixed for a mainstore dump (MSD) failure with SRC B2005123 when it attempted to write to a loadsource DASD connected via VIOS.  VIOS was unable to handle the I/O write request exceeding 256K.
  • On systems using OPAL, the time-outs for errors on the PCIe Host Bridge (PHB) were increased to allow time for PCIe link error recoveries to complete where possible to reduce partition and system errors caused by link errors.
  • A problem was fixed for a PowerVM hypervisor hang after a processor core and system checkstop.  The failed processor core was not put into a guarded state and the hypervisor hung when it tried to use the failed core.
  • On systems using Field Core Override (FCO) feature code #2319 to reduce the number of available cores, a problem was fixed where failed cores were not being replaced by unconfigured cores, causing the system to fail to IPL with a no cores available condition.  The fix now allows unconfigured cores to be substituted for licensed cores that have failed.
  • On systems using OPAL, a problem was fixed for an unnecessary guarding of a processor core on a L2 or L3 cache error.  This error was caused by an errant attempt to repair the cache using an operation that is not supported on OPAL.  Guarding of the processor core on OPAL now only occurs after a daily threshold of cache errors is exceeded instead of guarding on the second cache error for the core.
  • On systems using PowerVM, a problem was fixed for the handling of the error of multiple cache hits in the instruction effective-to-real address translation cache (IERAT).  A multi-hit IERAT error was causing system termination with SRC B700F105.  The multi-hit IERAT is now recognized by the hypervisor and reported to the OS where it is handled.
  • On systems using PowerVM, a problem was fixed to prevent a hypervisor task failure with a B7000602 SRC logged, if multiple resource dumps running concurrently run out of dump buffer space. The failed hypervisor task could prevent basic logical partition operations from working, potentially leading to an Incomplete state on the Management Console.
  • On systems using PowerVM, a problem was fixed for partitions going back to Epoch Time (1970) after a real-time clock (RTC) battery replacement.  If the RTC battery is replaced and the correct time is set using the Advanced System Management Interface, the partitions end up with the wrong time based in 1970.
  • On systems using PowerVM, a problem was fixed to allow partitions to recover PCIe links from multiple link errors occurring at the same time.  The only recovery without the fix would be to reipl the CEC.
  • On systems using PowerVM, a problem was fixed for partitions with Virtual Trusted Platform Module (VTPM) resources so they could restart partitions after a CEC power off and power on sequence without hanging at progress code C2006009.
  • On systems using PowerVM, a problem was fixed to fully deconfigure cores that have cache repair failures so they cannot be referenced by an On-Chip Controller (OCC) reset..  This will prevent an OCC reset failure because of the failed cores, logged with SRC B1112AB4 and BC82203B, that forces the OCC into safe mode (minimum processor clock frequency) for all of its remaining cores.  A CEC re-IPL is needed to get an OCC out of safe mode.
  • On systems using Virtual IO Server (VIOS) to share physical I/O resources among client logical partitions using virtual Small Computer Serial Interface (vSCSI) adapters, a problem was fixed that prevented the VIOS from accessing storage hosted by a physical adapter that had storage mapped to a vSCSI adapter.  The VIOS showed errors on disks under that physical adapter and was unresponsive.  To recover from this problem, the VIOS must be rebooted.
  • On systems using the Virtual I/O Server (VIOS) to share physical I/O resources among client logical partitions, a problem was fixed for memory relocation errors during page migrations for the virtual control blocks.  These errors caused a CEC termination with SRC B700F103.  The memory relocation could be part of the processing for the Dynamic Platform Optimizer (DPO), Active Memory Sharing (AMS) between partitions, mirrored memory defragmentation, or a concurrent FRU repair.
  • On systems using PowerVM, a problem was fixed for the PCIe Host Bridge (PHB) error recovery process which failed, causing the PCIe slots to fail.  The recovery process has been enhanced to allow for delays caused by active power bus operations during the recovery and to handle recovery from simultaneous PCIe switch and PHB errors .  A CEC re-IPL is needed to get the failed PCIe slots working again.
  • On systems using PowerVM, a problem was fixed that could result in unpredictable behavior if a memory UE is encountered while relocating the contents of a logical memory block during one of these operations:
    - Reducing the size of an Active Memory Sharing (AMS) pool.
    - On systems using mirrored memory, using the memory mirroring optimization tool.
    - Performing a Dynamic Platform Optimizer (DPO) operation.
  • On systems using Virtual Shared Processor Pools (VSPP), a problem was fixed for an inaccurate pool idle count over a small sampling period.
  • On systems using PowerVM and Virtual Trusted Platform Module (VTPM) partitions,  a problem was fixed for a Management Console error that occurred while restoring a backup profile that caused the system to go to the Management Console "Incomplete state".  The failed system had a suspended VTPM partition and a B7000602 SRC logged.
  • On systems using PowerVM, a problem was fixed for a partition deletion error on the Management Console with error code 0x4000E002 and message "...insufficient memory for PHYP".  The partition delete operation has been adjusted to accommodate the temporary increase in memory usage caused by memory fragmentation, allowing the delete operation to be successful.
  • On systems using PowerVM, a problem was fixed for Live Partition Mobility (LPM) migrations of Linux partitions running in P8 compatibility mode.  After an active migration, the resumed partition may experience performance degradation.
  • On systems using PowerVM, a problem was fixed for a false error message with error code 0x8006 when creating a virtual ethernet adapter with the Integrated Virtualization Manager (IVM).  The error message can be ignored as the virtual ethernet slot is fully functional.
  • On systems using PowerVM with a PCIe 3D graphics adapter (F/C #EC41 or #EC42) in a partition, a problem was fixed for a partition hang or BA21xxxx error conditions during partition initialization.
  • On systems using PowerVM, a problem was fixed for the Live Partition Mobility (LPM) migration of virtual devices to a Power8 systems to update each virtual device location code correctly to reflect the location code in the target systems instead of the location code in the source system.  This problem prevented the management console from being able to look up AIX Object Data Manager (ODM) names for the virtual devices so that operations such as remove on the device could not be performed.
  • On systems using PowerVM with a Linux partition, a problem was fixed for the Linux "lsslot" command so that it is able to find the F/C EC41 and EC42 PCIe 3D graphics adapter installed in the CEC, instead of showing the slot as "empty".  The Linux graphics adapter worked correctly even though it showed as "empty".
  • On systems using PowerVM, support was added for USB 2.0 HUBs so that a keyboard plugged into the USB 2.0 HUB will work correctly at the SMS menus.  Previously, a keyboard plugged into a USB 2.0 HUB was not a recognized device.
  • On systems using PowerVM,  a problem was fixed for a hypervisor deadlock that results in the system being in a "Incomplete state" as seen on the management console.  This deadlock is the result of two hypervisor tasks using the same locking mechanism for handling requests between the partitions and the management console.  Except for the loss of the management console control of the system, the system is operating normally when the "Incomplete state" occurs.
  • On systems using OPAL firmware,  a problem was fixed for Coherent Accelerator Processor Interface (CAPI)  devices not being available to the partitions after a re-IPL of a CEC with power on.
  • On systems using OPAL firmware, a problem was fixed to support a kdump of a baremetal Little Endian (LE) kernel using XPS mounts to prevent a hang in Big Endian (BE) Petitboot.  For this problem, there was an endian swtich on the re-mount of the XPS and Petitboot was unable to read the XPS logs to do recovery.  Petitboot now mounts the XPS file system read-only with no recovery: "-o ro,norecovery" to prevent the problem.
  • On systems using OPAL firmware, a problem was fixed in Petitboot for the default selection of the OS to use the first grub entry if no matching OS labels are found in the grub configuration file.  Previously, if a grub label did not match, the user had to manually select the OS and boot it.
  • On systems using OPAL firmware,  a security problem was fixed to prevent an out-of-bounds read in the glibc's iconv() function when converting certain encoded data to UTF-8.  This could cause a crash of OPAL.  The Common Vulnerabilities and Exposures issue number is CVE-2014-6040.
  • On systems using OPAL firmware,  a security problem was fixed for Name Service Switch (NSS) to prevent a denial of service attack from a application performing key based look-ups on a database in an infinite loop.   The Common Vulnerabilities and Exposures issue number is CVE-2014-8121.
  • On systems using OPAL firmware,  a security problem was fixed for the snap utility of powerpc-utils to prevent plain text passwords from being extracted from archives containing configuration snapshots of services.  The Common Vulnerabilities and Exposures issue number is CVE-2014-4040.
  • On systems using OPAL firmware,  a problem was fixed for the OPAL lsdevinfo command as it did not correctly process the path to the device, which made the path unreadable in the output.  With the fix, the path is displayed correctly.
  • On systems using OPAL firmware, a problem was fixed for Resource Monitoring and Control (RMC) failing and going inactive after several OPAL Linux partition migrations.  The validation operations failed when the Machine, Type, Model, and Serial number (MTMS) were set incorrectly.
  • On systems using OPAL firmware, a problem was fixed for the OPAL drmgr utility so it correctly gathers Logical Memory Block (LMB) information while performing Memory Dynamic Logical Partitioning (DLPAR) on the little-endian variation of the Power processor.
SV810_108_081 / FW810.21

01/09/15
Impact: Security         Severity:  SPE

System firmware changes that affect all systems

  • A problem was fixed to prevent the Advanced System Management Interface (ASMI) "System Service Aids/Factory Configuration" panel option from restoring to factory configuration for FSP or ALL if one boot side of the service processor is marked invalid.  The following informational message is issued:  "The request cannot be performed because a firmware boot side is marked invalid.  This state may have been caused by a previous firmware update failure."
  • A problem was fixed for firmware updates from USB to allow the code update progress to be seen with the addition of progress code C100B100.  This progress code means that the firmware update is busy unpacking the firmware image file and that the USB key should not be removed until the operation is completed.
  • A security problem was fixed in OpenSSL for padding-oracle attacks known as Padding Oracle On Downgraded Legacy Encryption (POODLE).  This attack allows a man-in-the-middle attacker to obtain a plain text version of the encrypted session data. The Common Vulnerabilities and Exposures issue number is CVE-2014-3566.  The service processor POODLE fix is implemented by disabling SSL protocol SSLv3 and requiring TLSv1.2 protocol on all secured connections.  The Hardware Management Console (HMC) also requires a POODLE fix for APAR MB03867(FIX FOR CVE-2014-3566 FOR HMC V8 R8.1.0 SP1 with PTF MH01481).  This HMC minimum requirement is enforced by the firmware update process for this defect.
  • A security problem was fixed in OpenSSL for memory leaks that allowed remote attackers to cause a denial of service (out of memory on the service processor). The Common Vulnerabilities and Exposures issue numbers are CVE-2014-3513 and CVE-2014-3567.
  • A problem was fixed for two light-emitting diodes (LEDs) turning on incorrectly on the operator panel after a system power off.  These LEDs are the blue LED (Identify) and the amber LED (enclosure fault indicator LED with the exclamation point symbol ("!").

System firmware changes that affect certain systems

  • On systems with partitions using shared processors, a problem was fixed that could result in latency or timeout issues with IO devices.
SV810_101_081 / FW810.20

10/24/14
Impact: Availability    Severity: HIPER

New features and functions

  • Support for the IBM Power System S824L (8247-42L).
  • Support for NEBS-3 48VDC 750 W power supply with CCIN 51D8 and F/C #EB3H on the S822 (8284-22A) and the S822L (8247-22L).
  • Support for 128Gb CDIMM DDR3 DRAM with F/C #EM8E on the IBM Power System S824 (8286-42A).  These need to be ordered in pairs and each DIMM within a DIMM pair must be of the same capacity.
  • Support for the Nvidia Compute Intensive Accelerator (PCIe attached GPU) with F/C #EC47.  This feature is only supported on the IBM Power System S824L(8247-42L).  It is a PCIe 3 X16/Long/Full High/Double wide adapter with the PCIe connection in the left slot.
  • Support was added to enable fast sleep on OPAL systems, allowing for significant power savings.
  • Support for an Intelligent Platform Management Interface (IPMI) enhancement to provide a host Linux boot device path on OPAL systems.
  • Enhancement to the service processor dump for easier problem debugging by collecting full kcore dumps as a gzipped file instead of truncating the large kcore files.
  • Enhancement made to the Advanced System Management Interface (ASMI) "System Service Aids/Factory Configuration" menu to clear all firmware NVRAM for PowerVM and OPAL, regardless of the current firmware selection.  Previously, only the NVRAM for the current firmware type was cleared.
  • Support for additional PCIe adapters, which had previously been supported on Power7+ and earlier servers, to help with server migration:
        Ethernet 1 Gb LAN: 2-port UTP/TX (#5767, #5281), 2-port SX (#5768, #5274), and 4-port UTP/TX (#5717, #5271)
        Ethernet and FCoE: 2-port 10 Gb (#5708, #5270)
        SAS:  3-port 6 Gb/1.8 GB cache (#5913, #ESA3)

System firmware changes that affect all systems

  • A problem was fixed in the error handling of memory channel failures with SRC B181E540 to prevent false processor errors with SRC B113E504 during the next IPL after the memory fault.
  • A problem was fixed for L4 cache errors being assigned an incorrect subsystem of "Memory Controller" in the SRC B121E504 error log instead of "Memory Fru".    L4 cache resides on the DIMM and is not a memory controller.
  • A problem was fixed in the Advanced System Management Interface (ASMI)  "Performance Setup/Logical Memory Block Size" menu that prevented the user from selecting valid Logical Memory Block (LMB) sizes because they were greyed out.
  • A problem was fixed to capture missing trace data for the hardware compression accelerator (NX) checkstop failures to allow for easier debug of the failures.
  • A problem was fixed to add call outs for the operations panel FRU for SRCs B1504804 and B1504805 for operation panel failures.  The FRU call out had been missing in the error log.
  • A problem was fixed that caused the system to hang in the IPL state during a system dump with SRC B182901E shown in the error log.  The hang occurred when system dump detected a prior system dump already in place.  The second system dump would normally be bypassed to allow the IPL to complete.
  • A problem was fixed for the service processor error log handling that caused SRC B150BAC5 errors when converting a error log entry from an object into a flattened array of bytes.
  • A problem was fixed for truncated fan part numbers in the FRU call outs of SRC 110076111 so that 4U systems (8286-41A, 8286-42A, 8247-42L) have FRU 00FV629 for the 80 mm fan and the 2U systems (8284-22A, 8247-21L, 8247-22L)  have FRU 00FV726 for the 60 mm fan.  FRU 00FV62 and FRU 00FV72 were being incorrectly reported, showing the right-most character of the part number truncated.
  • A problem was fixed in the fault isolation of FRUs for errors in the Time Of Day (TOD) oscillator topologies and the processors to reduce the number of incorrect call outs.  When a problem is detected in a connection between the processor and TOD oscillator,  the oscillator is now called out with high priority and processor with low priority but neither is guarded to prevent unnecessary loss of system resources.
  • A problem was fixed with the DIMM pairing rules to ensure that only the one DIMM that is the paired mate of a failing or missing DIMM is guarded.  An error in the pairing rules was causing additional DIMMs to be called out and guarded in the case of a single DIMM failure.
  • A problem was fixed so that when a L2/L3 cache repair cannot be performed because there is no repair available, the error log written is a Predictive Error instead of a hidden Recoverable Error.  This improves the customer awareness that the processor cache is becoming degraded.

System firmware changes that affect certain systems

  • HIPER/Pervasive:  On systems using PowerVM firmware, a performance problem was fixed that may affect shared processor partitions where there is a mixture of dedicated and shared processor partitions with virtual IO connections, such as virtual ethernet or Virtual IO Server (VIOS) hosting, between them.  In high availability cluster environments this problem may result in a split brain scenario.
  • On systems using OPAL firmware, a performance problem was fixed where the On-Chip Controller (OCC) failed to establish a session to OPAL, resulting in all the system processors being set to minimum (safe mode) frequencies.
  • On systems using PowerVM firmware, a problem was fixed for systems in networks using the Juniper 1GBe and 10GBe switches (F/Cs #1108, #1145, and #1151) to prevent network ping errors and boot from network (bootp) failures.  The Address Resolution Protocol (ARP) table information on the Juniper aggregated switches is not being shared between the switches and that causes problems for address resolution in certain network configurations.  Therefore, the CEC network stack code has been enhanced to add three gratuitous ARPs (ARP replies sent without a request received) before each ping and bootp request to ensure that all the network switches have the latest network information for the system.
  • On systems using OPAL firmware,  a problem was fixed for the 10/1Gb Ethernet adapter (F/C #EL3Z) where it failed by rebooting into the wrong endian mode.
  • On systems using PowerVM firmware, a problem was fixed for a false error message displayed on the management console during firmware code updates that include Concurrent Core Initialization (CCI) for the processors.  All processors core are correctly initialized but the management console displays this message:   "An open serviceable event related to system firmware was found.  The firmware update process will not be interrupted.  Please address any open serviceable events on the system(s) ...  HSCF0223".
  • On systems using PowerVM firmware,  a problem was fixed so that a system dump with Advanced System Management Interface (ASMI)  server firmware content of "maximum " or "HCA IO" will not cause the system to fail with a SRC B700F103.  There is no Infiniband (IB) Host Channel Adapter (HCA) on a IBM Power8 system so this caused an unexpected problem in the hypervisor dump data collection for IB adapters.
  • On systems using PowerVM firmware,  a problem was fixed for network boot/install using a null pointer when network adapter buffers are depleted and failing the boot with a SRC BA210003 - "Partition firmware detected a data storage error".
  • On the IBM Power System S824 (8286-42A)  with IBM i partitions, a problem was fixed to block a non-applicable  IBM i console warning message "CPF9E17 - Usage limit exceeded - operator action required".  IBM i software license key 5722-SS1 feature 5052, the user entitlement key for the number of users who are authorized to use the operating system, is not required for the 8286-42A system.  This system has the Software Tier P20 licensing, which does not have user based licensing and includes the 5250 features.
  • On systems using OPAL firmware,  a problem was fixed when switching into the PowerVM mode to prevent the management console from going into recovery mode.
  • On systems using PowerVM firmware, a problem was fixed for a hypervisor time-keeping services topology failover that caused errors to be wrongly attributed to the new time-of-day topology, resulting in processor FRUs being guarded falsely.
  • On systems with a PCIe dual-x4 SAS adapter (F/C #5901, #5278, or #EL10), a problem was fixed for the system fans running too fast and loud.  This PCIe adapter was incorrectly assigned a hot PCIe rating and this caused the system fans to go to high speed for the required extra cooling.
    This fix is not applicable to the IBM Power System S824L (8247-42L).
  • On systems using OPAL firmware,  a problem was fixed for CAPP (Coherent Attached Processor Proxy) system checkstops that should have been recoverable errors.
  • On systems using OPAL firmware,  a problem was fixed for the CEC memory controllers to increase the operation time-out value to be able to handle long-running Coherent Accelerator Processor Interface (CAPI) and Peripheral Component Interconnect Express (PCIe) operations.
  • On systems using OPAL firmware, a problem was fixed in the Advanced System Management Interface (ASMI) "Real Time progress indicator" to not delete the first character of the second line of the display.
  • On systems using PowerVM firmware, a problem was fixed to allow booting off an iSCSI device.  For the failure, the partition firmware error logs had SRC BA012010 "Opening the TCP node failed." and SRC BA010013 "The information in the error log entry for this SRC provides network trace data."  The open firmware standard output trace showed SRC BA012014  "The TCP re-transmission count of 8 was exceeded. This indicates a large number of lost packets between this client and the boot or installation server" followed by SRC BA012010.
  • On systems using PowerVM firmware, a problem was fixed for partition firmware stack corruption that would cause spurious output to the console for failed ping or network boot operations.  When a stack imbalance is encountered, text is displayed on the console indicating a stack depth error along with a number of values and the text string "CUTILS" similar, in format, to the following:
                6 1 2 2 0 da15b007 22901dc
                CUTILS: bad exit depth? SCHEDULER call-c-wrapper exit: depth=7 , _indepth=4 , _#inparms=0
  • On systems using PowerVM firmware, a problem was fixed so that the thermal and power management tunable parameters for the On-Chip Controller (OCC) in the Advanced System Management Interface (ASMI) "System Configuration/Power Management/Tuning Parameters" are not set back to the defaults when the CEC is powered off.
  • On systems using PowerVM firmware, a problem was fixed in checkstop error recovery to force a re-IPL instead of a system termination for checkstops that occur during memory-preserving IPLs.  This allows the system to recover from the IPL error without any operator intervention needed.
SV810_087_081 / FW810.11

09/26/14
Impact: Data            Severity:  HIPER

System firmware changes that affect certain systems

  • HIPER/Pervasive:  A problem was fixed in PowerVM where the effect of the problem is non-deterministic but may include undetected corruption of data.  This problem can occur if VIOS (Virtual I/O Server) version 2.2.3.x or later is installed and either one of following statements is true:

    (A) A storage adapter (including Fibre Channel) is assigned to a VIOS and shared between multiple partitions (one of which must be an IBM i partition, others can be AIX, Linux or IBM i partitions), and at least one of the other partitions is performing LPM (Live Partition Mobility) or an immediate or abnormal shutdown operation.

    -or-

    (B) A Shared Ethernet Adapter (SEA) with fail over enabled is configured on the VIOS.
SV810_081_081 / FW810.10

09/08/14
Impact: Availability    Severity: SPE

New features and functions

  • Extended the availability of the IBM Power System S812L (8247-21L) that was enabled in the 810.00 release.
  • Expansion of maximum number of SAS drives on Power System S814 (8286-41A) from 8 (SSD, disk, or combination thereof) to 10 drives.
  • Support for SAS EXP24S expansion drawer (#5887, #EL1S) attached using a PCIe slot.
  • Support for large M64 based BARs for systems in the OPAL environment.
  • Fan speed settings were enhanced for the case of systems with fan failure to set the speed based on system thermal conditions instead of forcing all remaining fans to a overdrive speed setting.
  • Support for a PCIe Gen3 FPGA x 16 slot adapter that acts as a co-processor for the POWER8 processor chip for gzip compressions and decompressions.  Feature codes #EJ12 and #EJ13 are electronically identical with the same CCIN of 59AB.  #EJ12 has full high tail stock and is supported by 8286-41A and 8286-42A.  #EJ13 has a low profile tail stock and is supported by 8284-22A.  OS levels supported are AIX 6.1 and AIX 7.1 or later.  IBM i and Linux are not supported.
  • Support for use of system and partition templates on the management console.
  • Support for Coherent Accelerator Processor Interface (CAPI) for the PCIe Gen 3 FPGA on OPAL.  Operating system supported is Linux.
  • Support was added to allow concurrent initialization of the processor cores.  This expands the range of concurrent firmware updates to accommodate core initialization changes and also allows for dynamic repairs of processor and cache memory.
  • Support was added for cache memory L2/L3 column repair to allow concurrent repair of memory and propagation of memory errors for better fault isolation of memory components.
  • The system operator panel was enhanced to show the firmware mode of the system during the IPL of either PowerVM or OPAL for panel function 1.
  • The service processor Processor Runtime Diagnostics (PRD) was enhanced to collect debug data for failures in host boot initialization for the Self-Boot Engine (SBE).
  • Support was added to the Advanced System Management Interface (ASMI) USB menu to allow a system dump to be collected to USB with the power on to the system.  This allows the dump to be collected with the system memory state intact.
  • Support for enhanced 10 Gb ethernet adapters that were previously announced for Power8 for AIX NIM (Network Install Management) or Linux Network Install capability.  The enhanced adapters are the following:
        PCIe2 4-port(10Gb+1GbE) SR+RJ45 Adapter (#EN0S, #EN0T)
        PCIe2 4-port(10Gb+1GbE) SFP+Copper+RJ45 Adapter (#EN0U, #EN0V)
        The level of adapter microcode required is level 20100130 or later.

        PCIe2 LP 2-port 10/1GbE BaseT RJ45 Adapter (#EN0W, #EN0X, #EL3Z)
        The level of adapter microcode required is level 30080130 or later.
  • Support for a new 4-port Ethernet Adapter with two 10 Gb and two 1Gb ports (#EN0M, #EN0N with CCIN 2CC0). The adapter offers NIC and FCoE over its 10 Gb ports and NIC over the 1 Gb ports and is SR-IOV capable.  The 10 Gb ports are LR (long range) fiber optic, supporting distances up to 10 km.  Except for the transceivers and cabling of the 10 Gb ports,  this adapter is functionally identical to the 4-port adapter (#EN0H, #EN0J, #EL38) SR optical and (#EN0K, #EN0L, #EL3C) activer copper twinax.
  • Support for a new PCIe 2-port Async adapter (#EN27, #EN28) that serves the same function as the  predecessor PCIe 2-port Async adapter (#5289, #5290) on the Power7+ and earlier servers.    This adapter provides connection for 2 asynchronous EIA-232 devices. Ports are programmable to support EIA-232 protocols, at a line speed of 128K bps. Two RJ45 connections are located on the rear of the adapter. To attach to devices using a 9-pin (DB9) connection, use an RJ45-to-DB9 converter. For convenience, one converter is included with this feature. One converter for each connector needing a DB9 connector is needed.
  • Support for additional PCIe adapters, which had previously been supported on Power7+ and earlier servers, to help with server migration:
        Ethernet 10 Gb LAN: 1-port optical SR (#5769, #5275)
        Ethernet and FCoE: 4-port 10 Gb/1 Gb Copper (#EN0K, #EN0L, #EL3C)
        Ethernet RoCE: 2-port 10 Gb copper (#EC27, #EC28, #EL27)
        Fibre Channel: 2-port 4 Gb (#5774, #5276, #EL09)
        SAS: 2-port 3 Gb 380 MB cache (#5805)
  • Support was added for a new Advanced System Management Interface (ASMI) menu to allow the user to choose between an IPMI or a serial console when in OPAL mode.

System firmware changes that affect all systems

  • A problem was fixed in the service processor that caused the SRC B1504804 to be logged as many as 30 times over five minutes for a operations panel voltage regulator error.  The error logging has been reduced to one SRC for this error.
  • A problem was fixed to allow the system to  prevent an intermittent system hang until IPL time-out after a processor core checkstop.  This secondary failure after a core checkstop had a low probability of occurring.
  • A problem was fixed to maintain time-of-day (TOD) clock redundancy for the hypervisor time-keeping services in the case of a TOD error and fail-over to the backup clock topology.  There was a failure in the TOD fail-over process to correctly assign the new backup TOD topology, causing loss of redundancy for the next TOD error.
  • A problem was fixed for the service processor reset/reload process to eliminate an extra dump and SRC B1818601 caused by an internal core dump during the reset/reload.
  • A problem was fixed for a processor error with an incorrect call out of a memory card with SRC B124E504 to eliminate the memory card FRU call out.  The processor error call out of SRC B170E540 was correct.
  • A problem was fixed in the Advanced System Menu Interface (ASMI) menus to restore factory settings so that the default for the Hypervisor mode (PowerVM or OPAL) was restored to the factory setting using "System Service Aids/Factory Configuration/Service Processor Reset/All Reset".
  • A problem was fixed in how the processor clock speed was reported to the hypervisor, causing the partitions to show a clock speed that was about 200 MHZ faster than the actual processor clock speed.
  • A problem was fixed for DRAM repair for the case where two DRAM modules are having failures at the same rank such that spares are used to repair each DRAM error.  Without the fix, the second DRAM is not repaired and could eventually be called out and guarded with a UE SRC.
  • A problem was fixed for system hardware dump collection to collect all the hardware registers by stopping all functional clocks before starting the collection.
  • A problem was fixed for repairing spare memory DRAM so that repair solutions for failed spares persists across IPLs of the system by getting the repair solutions written to the Vital Product Data (VPD) of the DRAM.
  • A problem was fixed in the Advanced System Menu Interface (ASMI) menus to change the name of the "Hypervisor Configuration" menu to "Firmware Configuration" to more accurately describe the menu function of being able to change firmware between the PowerVM and OPAL modes.
  • A problem was fixed in the Advanced System Menu Interface (ASMI) menus to move the IPMI password reset operation from the "Firmware Configuration" menu to the "Login Profile/Change password" menu.  This change was made to put all the password change operation together under one menu.
  • A problem was fixed in the Advanced System Menu Interface (ASMI) menu for "Resource Dump" to give the message "This feature is not supported for OPAL environments" when the system is in OPAL mode.  Previously,  ASMI incorrectly stated that the "Resource Dump" function was not supported on the machine type.
  • A problem was fixed in the service processor to add missing call outs for the memory buffer and memory controller FRUs when there is a time-out error on the power bus with PE SRC logged of B170E540.
  • A problem was fixed in memory diagnostics and fault isolation that deconfigured more memory than necessary for memory errors.
  • A problem was fixed that caused the Utility COD display of historical usage data to be truncated on the management console.
  • A problem was fixed to eliminate service processor dumps after AC power cycles of the CEC.
  • A problem was fixed to add a missing hardware call out for service processor FSI bus errors logged with SRC BC8A0A11.  This causes the failing hardware to be deconfigured and guarded for the next IPL of the system.
  • A problem was fixed so that if an IPL failure occurs that causes the system to power off,  error SRCs will be logged instead of the system hanging for ten minutes and not logging any SRCs.
  • A problem was fixed in the system dump data collection for missing memory data to collect memory data after hardware de-configuration checkstop errors.
  • A problem was fixed for in-band code update to prevent loss of a processor support interface (PSI) link that is in a backup role.
  • A problem was fixed in system dump collection for a system hang after a checkstop.  The system failed to go to terminate state and reboot.
  • A problem was fixed in system dump collection to return full dump data when a secondary error occurs during dump data collection for the checkstop primary error.
  • A problem was fixed in the Advanced System Menu Interface (ASMI) menu "System Configuration/Hardware Deconfiguration/Memory Deconfiguration" to be able to manually configure and deconfigure DIMMs.
  • A problem was fixed for system terminations that could occur as a result of PCIe adapters using a Level Signaled Interrupt (LSI) before the hypervisor interrupt handler was ready.  This could occur when in PCIe adapter recovery for an error with src logs of  B7006970 and B700B971.   The PCIe adapters are now held in reset until initialization sequences are completed to ensure all interrupt handlers are ready for PCIe adapter interrupts.
  • A problem was fixed for a management console firmware update "Remove and Activate" operation that fails to activate the OCC (On-Chip Controller for thermal and power management) new code level with SRCs logged of B18B2616 and B1812601.  An IPL is needed to activate the OCC code level to complete the firmware update.
  • A problem was fixed for IPL failures caused by Host Boot PNOR memory corruption.  If a IPL Terminate Immediate (TI) from Host Boot has a SRC without a specific reason code, a corruption check on the Host Boot memory partitions is run and the Host Boot partitions corrected to recover them.
  • A problem was fixed for the power usage regulation of memory to keep memory power usage below its specified limits.  Lack of enough memory throttling was allowing the memory to consume power pass its set limits, leaving the system exposed to power faults or unexpected power throttling in other areas of the system.
  • A problem was fixed to guard cores on hang errors.  A processor core was not being guarded on hang errors where a core timed-out waiting for an instruction to complete.
  • A problem was fixed to allow memory diagnostics during a re-IPL of the CEC, insuring that problem memory will be guarded or recovered and preventing possible error log flooding with memory errors.
  • A problem was fixed for system dump process memory corruption that could cause the wrong dump type to be created for a system failure, resulting in a system dump with the wrong content.
  • A problem was fixed for a service processor reset/reload causing a FSP dump with a Firmware Database (fwdb) core dump captured within it.
  • A problem was fixed for a processor core forward progress parity error so that the core could be guarded without causing a system checkstop.
  • A problem was fixed in the run time diagnostics of DIMMs to read the raw card type correctly, preventing failures in the memory repair.
  • A problem was fixed to prevent an intermittent hostboot IPL deadlock/hang in the deferred work queue with progress code CC009543 and termination with SRC B1813450.
  • A problem was fixed in memory diagnostics to be able to handle multiple DIMM failures without a time-out failure, reducing the the amount of memory needed to guarded for the errors.
  • A problem was fixed in DIMM initialization to prevent intermittent B181BA08 DIMM failures in host boot during IPL.
  • A problem was fixed to call home guarded FRUs on each IPL.  Only the initial failure of the hardware was being reported to the error log.
  • A problem was fixed for the incorrect fan FRU call outs of SRC 110076111 so that 4U systems (8286-41A, 8286-42A) have FRU 00FV629 for the 80 mm fan and the 2U systems (8284-22A, 8247-21L, 8247-22L)  have FRU 00FV726 for the 60 mm fan.
  • A problem was fixed for a memory write error becoming a system checkstop instead of being handled by the memory error handling and recovery processes.
  • A problem was fixed for the error processing of processor core checkstops at runtime to not ignore the guard on the failed core on the next IPL of the system, thus preventing additional failures with the next IPL during host boot.
  • A problem was fixed for error recovery for a failed processor that has all cores guarded such that host boot is able to re-IPL using the working processor.   In certain situations, the re-IPL on the good processor was failing with SRC B113E504 with PRD signature PB_CENT_CRESP_ADDR_ERROR.
  • A problem was fixed for run-time guarding of a processor core that had resulted in a system checkstop when the core guard attempt failed.  The processor with the non-guarded broken core caused the On-Chip Controller (OCC) to have a power measurement time-out to the processor with SRC B1102A00 that resulted in the system termination.
  • A problem was fixed to prevent incorrect logging of SRC 11007221 whenever the operator panel is missing (or broken).  This SRC indicates ambient temperature of the system is too high and a performance throttle may occur to lower the temperature, causing performance loss.  A missing operator panel should not cause lower performance of the system.
  • A problem was fixed for undefined hardware states in the system that caused a early IPL failure with SRCB1101314 when configuring the Self Boot Engine (SBE) for hostboot.
  • A problem was fixed for the Operator panel where the Enclosure Fault LED was swapped with the Attention/Check Log LED.
  • A problem was fixed for memory diagnostics to guard all unusable memory due to a channel failure.  This prevents the hypervisor from trying to start partitions with memory associated with the bad channel and having the partition crash.
  • A problem was fixed to insure all memory is scrubbed for correctable errors to prevent run-time memory failures and possible checkstops.   If memory scrubbing actions found the preceding memory rank had persistent ECC errors, the next rank of memory was sometimes skipped.
  • A problem was fixed in the Hostboot Self Boot Engine (SBE) to re-IPL without guarding the processor on a SBE step that has infrequent failures that are recoverable with a retry.

System firmware changes that affect certain systems

  • A problem was fixed for processor local bus errors during an IPL to call out the master and slave bus components with a BC14090F SRC to identify all the possible failing components.  For the problem, only the bus slave components were being called out on bus error leaving open the possibility that the faulty component might not be guarded or repaired.
  • On systems that have a boot disk located on a SAN,  a problem was fixed  where the SAN  boot disk would not be found on the default boot list  and then the boot disk would have to be selected from SMS menus.  This problem would normally  be seen for new partitions that had tape drives configured before the SAN boot disk.
  • On systems in IPv6 networks,  A problem was fixed for DHCP where a duplicate address detection (DAD) message to the DHCP-client on the service processor could fail, resulting in duplicate IP addresses being configured on the network.
  • On systems that have Active Memory Sharing (AMS) partitions, a problem was fixed for Dynamic Logical Partitioning (DLPAR) for a memory remove, leaving a logical memory block (LMB) in an unusable state until partition reboot.
  • On systems in IPv6 networks, a  problem was fixed for a network boot/install failing with SRC B2004158 and IP address resolution failing using neighbor solicitation to the partition firmware client.
  • On systems in Dynamic Power Saver (DPS) mode, a  problem was fixed so SRC B1812A61 is not logged when power throttling is needed for a workload over the power capacity.  In DPS mode,  a system power usage adjustment is not an error condition.
  • On systems in OPAL mode,  a problem was fixed for OPAL network boots to add retries to DHCP to prevent network boot time-out errors caused by network lags and slow downs.
  • On systems in OPAL mode, a problem was fixed in the fault isolation procedures to not call out hardware FRUS for software failures to reduce loss of hardware on errors.
  • On systems in PowerVM mode,  a problem was fixed in Live Partition Mobility (LPM) for systems at or near the new 32K maximum for virtual devices that insufficient space existed to store device attributes of the migrated system,  causing RMC failures and incorrect MTMS values for the migrated partition.
  • On systems in PowerVM mode,  a problem was fixed for I/O adapters so that BA400002 errors were changed to informational for memory boundary adjustments made to the size of DMA map-in requests.  These size adjustments were marked as UE previously for a condition that is normal.
  • On Power8 2U systems, a problem was fixed for the C5 PCIe slot failing.  This PCIe configuration was not supported on the 8284-22A, 8247-21L, and 8247-22L systems.
  • On Power8 2U systems, a problem was fixed  in the fan speed management to lower the maximum RPMs of the fans and reduce the noise level of the system.  This problem affects the 8284-22A, 8247-21L, and 8247-22L systems.
  • On systems in PowerVM mode using dedicated processors, a problem with concurrent firmware update was fixed to prevent a quiesce of the hypervisor process that can result in a system hang.
  • On systems in PowerVM mode, a problem was fixed for unresponsive PCIe adapters after a partition power off or a partition reboot.
  • On systems with 64Gb DIMM memory (F/C #EM8D), a problem was fixed to allow 64Gb DIMM memory error-correcting code (ECC) repairs instead of logging a predictive error with no repair to the memory.
SV810_061_054 / FW810.02

07/29/14
Impact: Data            Severity:  HIPER

System firmware changes that affect all systems

  • HIPER/Pervasive: A problem was fixed in PowerVM where the usage of P8 transactional memory and vector facilities could result in undetected corruption of data if the system is running in Power8 native mode. OS levels that support Power8 native mode are RHEL 7 and AIX 7.1 TL3 SP3 and later.

System firmware changes that affect certain systems

  • HIPER/Pervasive: A problem was fixed with Live Partition Mobility (LPM) on PowerVM when migrating a partition between two Power8 systems that are running in Power8 native mode. This problem could result in unpredictable behavior when the partition resumes execution on the target system, including potential undetected corruption of data, a system crash, or a partition crash. OS levels that support Power8 native mode are RHEL 7 and AIX 7.1 TL3 SP3 and later.
  • A problem was fixed for an IBM i D-mode IPL failure with SRC B2003110 when the alternative load source could not be found.  If a system encounters this issue prior to installing the fix, the Service Pack can be applied via the Management console or using a USB flash drive with the system powered off.
SV810_058_054 / FW810.01

06/23/14
Impact: Security         Severity:  HIPER

System firmware changes that affect all systems

  • HIPER/Pervasive:  A security problem was fixed in the OpenSSL (Secure Socket Layer) protocol that allowed clients and servers, via a specially crafted handshake packet, to use weak keying material for communication.  A man-in-the-middle attacker could use this flaw to decrypt and modify traffic between the management console and the service processor.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0224.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL for a buffer overflow in the Datagram Transport Layer Security (DTLS) when handling invalid DTLS packet fragments.  This could be used to execute arbitrary code on the service processor.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0195.
  • HIPER/Pervasive:  Multiple security problems were fixed in the way that OpenSSL handled read and write buffers when the SSL_MODE_RELEASE_BUFFERS mode was enabled to prevent denial of service.  These could cause the service processor to reset or unexpectedly drop connections to the management console when processing certain SSL commands.  The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2010-5298 and CVE-2014-0198.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL to prevent a denial of service when handling certain Datagram Transport Layer Security (DTLS) ServerHello requests. A specially crafted DTLS handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-0221.
  • HIPER/Pervasive:  A security problem was fixed in OpenSSL to prevent a denial of service by using an exploit of a null pointer de-reference during anonymous Elliptic Curve Diffie Hellman (ECDH) key exchange.  A specially crafted handshake packet could cause the service processor to reset.  The Common Vulnerabilities and Exposures issue number for this problem is CVE-2014-3470.
  • A problem was fixed for hardware dumps on the service processor so that valid dump data could be collected from multiple processor checkstops.  Previously, the hardware data from multiple processor checkstops would only be correct for the first processor.
  • A problem was fixed for platform dumps so that certain operations would work after the platform dump completed.  Operations such as firmware updates or reset/reloads of the service processor after a platform dump would cause the service processor to become inaccessible.
SV810_054_054 / FW810.00

06/10/14
Impact:  New      Severity:  New

New Features and Functions

  • GA Level

    NOTE:
  • POWER8 firmware addresses the security problem in the OpenSSL Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) to not allow Heartbeat Extension packets to trigger a buffer over-read to steal private keys for the encrypted sessions on the service processor.  The Common Vulnerabilities and Exposures issue number is CVE-2014-0160 and it is also known as the heartbleed vulnerability. 
  • POWER8 (and later) servers include an “update access key” that is checked when system firmware updates are applied to the system.  The initial update access keys include an expiration date which is tied to the product warranty. System firmware updates will not be processed if the calendar date has passed the update access key’s expiration date, until the key is replaced.  As these update access keys expire, they need to be replaced using either the Hardware Management Console (HMC) or the Advanced Management Interface (ASMI) on the service processor.  Update access keys can be obtained via the key management website: http://www.ibm.com/servers/eserver/ess/index.wss .

4.0 How to Determine The Currently Installed Firmware Level

For HMC managed systems:  From the HMC, select Updates in the navigation (left-hand) pane, then view the current levels of the desired server(s).

For standalone system running IBM i without an HMC: From a command line, issue DSPFMWSTS.

For standalone system running IBM AIX without an HMC: From a command line, issue lsmcode.

Alternately, use the Advanced System Management Interface (ASMI) Welcome pane. The current server firmware appears in the top right corner. Example: SV810_yyy.


5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: SVxxx_yyy_zzz

Where xxx = release level

HMC Managed Systems:

Instructions for installing firmware updates and upgrades on systems managed by an HMC can be found at:
http://www-01.ibm.com/support/knowledgecenter/8286-42A/p8ha1/updupdates.htm

Systems not Managed by an HMC:

Power Systems:

Instructions for installing firmware on systems that are not managed by an HMC can be found at:
http://www-01.ibm.com/support/knowledgecenter/8286-42A/p8ha5/fix_serv_firm_kick.htm

IBM i Systems:

Refer to "IBM i Support: Recommended Fixes":
http://www-912.ibm.com/s_dir/slkbase.nsf/recommendedfixes

When ordering firmware for IBM i Operating System managed systems from Fix Central, choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

7.0 Firmware History

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html