Power6 High-End System Firmware

Applies to: 9119-FHA

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power 595 (9119-FHA) Servers only.

The firmware level in this package is:


1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code level for this firmware is:  HMC V7 R3.4.0 with PTFs MH01186, MH01207, MH01210 and MH01211 (or higher).

Although the Minimum HMC Code level for this firmware is V7 R3.4.0, there are fixes/function that are only available when using a system managed by a V7 R3.5.0 HMC.
Therefore, HMC level V7 R3.5.0 with PTFs MH01212 and MH01217 (or higher) is recommended for this firmware level.

For information concerning HMC releases and the latest PTFs,  go to the following URL to access the HMC code packages:
http://www14.software.ibm.com/webapp/set2/sas/f/hmcl/home.html

NOTE: You must be logged in as hscroot in order for the firmware installation to complete correctly.

2.0 Cautions and Important Information

2.1 Cautions

Issue following the Concurrent Firmware Update on 9119-FHA Systems Running IBM i.

On systems that:

- Have RIO-G (HSL2) - attached I/O subsystems with I/O devices assigned to IBM i partitions, and

- Have concurrently installed system firmware EH340_122 without a subsequent reboot of the system.

Problems with I/O devices that may prevent partition IPL and partition installation may occur after system firmware EH340_122 is installed concurrently.  These problems can only be resolved by a full system reboot. 

IBM recommends the disruptive installation of EH340_122, or upgrading your firmware to EH350_071 (or higher), to prevent this exposure. This problem only exists in the EH340 release level when concurrently installing EH340_112 or EH340_122.  Concurrent service packs within the EH350 release level are not exposed to this problem.

CEC Concurrent Maintenance

CEC Concurrent Maintenance (CCM) provides the ability to perform maintenance on a system with the operating system running without requiring a reboot of the frame (re-IPL).

Several CEC Concurrent Maintenance issues have been resolved with this firmware level. It is important that HMC 7.3.5.0 with PTFs MH01212 and MH01217 (or higher), and this Service Pack are installed prior to attempting to perform a CCM function. It is also recommended that CCM is performed in a maintenance window where the system is quiesced (i.e. all applications are shutdown and the system is idling at the operating system level).

A pre-planning guide and readiness checklist are available on Info Center.

Prior to performing a CCM, contact you next level of support to ensure there are no restrictions with the repair you are about to perform.

POWER VM Active Memory Sharing

Attention:  If the firmware level currently installed on the system is lower than EH340_061,  after this level of firmware is installed, the platform must be powered off, then powered on to activate the POWER VM Active Memory Sharing function.
 

Attention: If EH340_122 has been installed, and the new POWER VM Active Memory Sharing function has been activated, and you want to back-level the system firmware, the active memory sharing pool must be deactivated and deleted prior to back-leveling the system firmware. IBM does not recommend back-leveling the system firmware.

2.2 Important Information

IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.


3.0 Firmware Information and Description

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as deferred. These deferred fixes can be installed concurrently, but will not be activated until the next IPL. Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For deferred fixes within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01EHXXX_YYY_ZZZ

NOTE: Values of service pack and last disruptive service pack level (YYY and ZZZ) are only unique within a release level (XXX). For example, 01EH330_067_045 and 01EH340_067_053 are different service packs.

An installation is disruptive if:

Example: Currently installed release is EH330, new release is EH340 Example: EH330_120_120 is disruptive, no matter what level of EH330 is currently
installed on the system Example: Currently installed service pack is EH330_120_120 and
new service pack is EH330_152_130

An installation is concurrent if:

Example: Currently installed service pack is EH330_126_120,
new service pack is EH330_143_120.

Firmware Information and Update Description

For information about previous firmware release levels, see Section 7.0 Firmware History.

 
Filename Size Checksum
01EH340_122_039.rpm 38486339 55247

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01EH340_122_039.rpm
 
EH340
EH340_122_039

05/19/10

Impact: Availability         Severity: ATT

System firmware changes that affect all systems

  • DEFERRED: This fix corrects the handling of a specific processor instruction sequence that has the potential to result in undetected data errors.  This specific instruction sequence has only been observed in a small number of highly tuned floating point-intensive applications.  However, it is strongly recommended that this fix be applied to all POWER6 systems.  This fix has the potential to decrease system performance on applications that make extensive use of floating point divide, square root, or estimate instructions.
  • A problem was fixed that prevented an SRC from being recorded in the service processor dump produced by a host-initiated reset.
  • A problem was fixed that caused a reset/reload of a node controller.
  • A problem was fixed that caused the system to become unresponsive and appear to hang  when page migration occurred on a PCIe slot.
  • The firmware was enhanced to improve the callouts for certain types of processor failures that log SRC B1xxE504.
  • The firmware was enhanced to improve the callouts when NVRAM corruption is detected in the bulk power controller's (BPC's) service processor.
System firmware changes that affect certain systems
  • A problem was fixed that caused a virtual SCSI or virtual fibre channel adapter to be seen by the operating system as not bootable when it was added to a partition using a dynamic LPAR (DLPAR) operation.
  • In partitions running AIX or Linux, a problem was fixed that caused the addition of an I/O slot to a partition using a dynamic LPAR (DLPAR) add operation to fail.
  • On systems running redundant VIOS partitions, a problem was fixed that prevented Ethernet traffic from being properly bridged between the two partitions.  This problem also prevented shared Ethernet adapter failover from working correctly.
  • A problem was fixed that caused the system to crash with SRC B7000103 when a concurrent maintenance operation was performed on an I/O slot directly from a partition (using AIX SMIT or IBM i HST).
  • A problem was fixed that caused a system or partition running Linux to crash when the "serv_config -l" command was run.
  • On systems running active memory sharing (AMS), the firmware was enhanced so that error messages indicating "out of compliance" issues with the memory (HMC SRC HSCL031F) will not be generated if the user allocates more memory than is installed in the system.  (Allocating more memory than is installed in the system is supported in active memory sharing.)
  • On systems using InfiniBand switches for processor clustering, a problem was fixed that caused InfiniBand ports to intermittently drop out.
  • A problem was fixed that caused the hypervisor to loop unnecessarily and consume too many processor cycles.  This impacted the performance of the system.
Concurrent maintenance (CM) firmware fixes
  • A problem was fixed that caused the concurrent addition of a node to fail with SRC B181A422.
  • A problem was fixed that caused unpredictable system behavior if a capacity on demand (CoD) or a virtualization engine technology (VET) activation code was entered and accepted after a node 0 evacuation was done.  The unpredictable machine behavior might also have occurred, if a node 0 evacuation failed, a system dump was taken, and a memory-preserving IPL was then initiated.
  • A problem was fixed that caused a concurrent maintenance operation after a node evacuation to fail.  When this problem occurred, the system erroneously states that a platform memory dump is pending.
  • A problem was fixed that prevented a concurrent maintenance operation from completing successfully.
  • On systems with F/C 5803 or F/C 5873 I/O drawers attached and a boot device in the drawer, a problem was fixed that prevented a partition from booting after the concurrent repair of the GX adapter that connects the 5802 or 5877 drawer to the system, or to the node that contains the GX adapter.
EH340_112_039

12/16/09

Impact: Serviceability           Severity:   HIPER

System firmware changes that affect all systems

  • HIPER:  A problem was fixed that might cause the system to crash if the server is running AIX  and has a F/C 5802 or 5877 drawer (in a 19" rack), or F/C  5803 or 5873 drawer (in a 24"rack), attached.
  • On systems with a lot of memory, the firmware was enhanced to reduce the time partition migrations take from hours to minutes.
  • A problem was fixed that might cause the system to crash with SRC B181E504, then SRC B1813909, being logged.
  • The firmware was enhanced such that SRCs B181F126, B181F127, and B181F129 are correctly handled, and no longer cause unnecessary calls home to be made.
  • The firmware was enhanced such that SRC B1817201, when generated by a bulk power controller (BPC), is correctly handled.
  • A problem was fixed that caused the system to hang with SRCs B182953C, B182954C, and B17BE434 being logged.
  • A problem was fixed that caused SRC 10009135, followed by 10009139, to be erroneously logged.  These SRCs indicate a system power control network (SPCN) loop is being broken, then re-established.
  • The firmware was enhanced to allow a temporary threshold reduction for processor unit book interconnect predictive errors.
System firmware changes that affect certain systems
  •  On a single system running Oracle in multiple partitions, with multiple IBM LHCAs connected in the same subnet, a problem was fixed that caused the remaining partitions to lose their reliable datagram socket (RDS) heartbeat connections after the reboot of a single partition.  There is a greater probability of encountering this problem if the partition being rebooted has a large partition memory assigned to it.
Concurrent maintenance (CM) firmware fixes
  •  On systems with four nodes, a problem was fixed that caused the system controller to perform a reset/reload, which caused a concurrent maintenance operation to fail, on the fourth node (P4).
  •  A problem was fixed that caused the concurrent replacement of an InfiniBand GX adapter or I/O planar to fail if a partition owned an embedded device on the planar.
  • The firmware was enhanced such that if an Ethernet cable is misplugged on a node controller during a concurrent node add operation, the node add operation will be completed successfully. 
EH340_101_039

09/23/09

Impact: Serviceability           Severity:   Attention 

System firmware changes that affect all systems

  • DEFERRED:  The firmware was enhanced to eliminate correctable errors (CEs) being erroneously logged against the memory bus with SRC B124E504.  This change affects only 9117-MMA systems equipped with 4.2GHz quad core processor cards (FC 7540) and all 8234-EMA systems.  This change is not critical.
  • The firmware was enhanced such that SRC B181F126 is correctly managed, and no longer calls home  unnecessarily for this problem.
EH340_095_039

08/20/09

Impact: Function           Severity:   HIPER

System firmware changes that affect all systems

  • DEFERRED:  This fix corrects the handling of a specific processor instruction sequence that was generated on a particular heavily-tuned High Performance Computing (HPC) application. This specific instruction sequence has the potential to produce an incorrect result. This instruction sequence has only been observed in a single HPC application.  However, it is strongly recommended that you apply this fix. 
  • The firmware was enhanced such that a generic B1817201 SRC will no longer be logged when a cache error occurs on a node controller (NC).  Unique SRCs will now be logged for cache failures, and upper and lower thresholds have been added to the NC cache error logging scheme. 
System firmware changes that affect certain systems
  • HIPER for systems with F/C 5803 or 5873 drawers attached:  A problem was fixed that prevented node concurrent maintenance operations on systems with F/C 5803 or 5873 drawers attached to them.
  • On systems with F/C 5802 or 5877 drawers attached, a problem was fixed that prevented an I/O slot's power LED from accurately reflecting the state of the I/O slot in a 5802 or 5877 drawer, under certain circumstances.
  • A problem was fixed that under certain rare circumstances caused a partition to crash when a 24" InfiniBand I/O drawer (feature code 5797 or 5798) drawer was concurrently added.  When this problem occurred, rebooting the system was required to recover.
  • On systems running system firmware EH340_075 and Active Memory Sharing, a problem was fixed that might have caused a partition to lose I/O entitlement after the partition was moved from one system to another using PowerVM Mobility.
  • On systems running system firmware EH340_075 and Active Memory Sharing, a problem was fixed that might have caused a partition to fail to boot with SRC B700F103 if the partition had more than 24 virtual processors assigned to it.
  • On systems running system firmware release EH340, a problem was fixed that might have caused the I/O performance to be degraded if a node evacuation operation was performed (as part of a concurrent maintenance operation to fix a failing I/O adapter or drawer) after the repair was complete.
  • On systems with external I/O towers attached, the firmware was enhanced so that the system will not crash when SRC B7006981 is logged for certain types of I/O hardware failures. 
Concurrent maintenance (CM) firmware fixes
  • A problem was fixed that might have caused the performance of an I/O loop (attached to a 12X I/O adapter) to be degraded if a B7006982, B7006984, B7006985, B70069F2, B70069F3, or B70069F4 SRC is logged after a concurrent maintenance operation on that loop.
  • A problem was fixed that caused concurrent maintenance operations on memory DIMMs to fail if the replacement DIMMs were functionally equivalent to the original DIMMs, but did not have the same CCIN (customer card identification number).
  • A problem was fixed that caused SRC B1xxB889 SRCs to be erroneously logged during a node evacuation operation.  (Node evacuation is one step in a concurrent maintenance operation on a node.)
  • A problem was fixed that caused the system to crash during a hot node or GX adapter repair with certain hardware configurations.
  • A problem was fixed that caused replacement of a system controller with power off, and the system at standby, to fail.
  • A problem was fixed that caused the system to crash during a hot node repair or upgrade.
EH340_075_039

05/26/09

Impact: Function       Severity: HIPER

New features and functions: 

- DEFERRED: Support for F/C 5803 (24" I/O drawer) and F/C 5873 (diskless 24" I/O drawer).

Attention: After this level of firmware is installed, the platform must be powered off, then powered on, before the 5803 or 5873 I/O drawer is added to the system.

- DEFERRED: Support for POWER VM Active Memory Sharing.

Attention: After this level of firmware is installed, the platform must be powered off, then powered on to activate the POWER VM Active Memory Sharing function.

Attention: If EH340_075 has been installed, and the new POWER VM Active Memory Sharing function has been activated, and you want to back-level the system firmware, the active memory sharing pool must be deactivated and deleted prior to back-leveling the system firmware. IBM does not recommend back-leveling the system firmware.

System firmware changes that affect all systems:

  • HIPER: A problem was fixed that caused a system to fail to reboot after a B1xxE504 SRC was logged, due to a processor interconnection bus failure. The same SRC, B1xxE504, was logged when the reboot failed.
  • A problem was fixed that caused non-terminating SRCs (such as B1818A1E) that indicate registry read errors to be logged during a disruptive installation of system firmware.
  • A problem was fixed that prevented the system from powering on after the "reset service processor settings" or "reset all settings" option was selected in the advanced system management interface (ASMI) menus.
  • A problem was fixed that caused the detailed data at the end of an "early power off warning type 5" AIX error log entry to be filled with invalid data instead of zeros.
  • A problem was fixed that caused the secondary system controller to reset/reload with SRC B1xxB741 being logged, if the system controller lost the communication path to one of the node controllers.
  • A problem was fixed that prevented all of the necessary files from being synchronized between the primary and the secondary service processors. One possible symptom of this problem was the time-of-day clocks being out of synch after a service processor failover.
  • A problem was fixed that caused SRC B1818601 to be logged, and a service processor dump to be generated, at runtime.
  • A problem was fixed that caused the number of empty GX adapter slots displayed by the advanced system management interface (ASMI) to be incorrect.
  • A problem was fixed that prevented a newly installed 12X I/O adapter from being recognized if the system controller was at standby, and the newly installed adapter was a 12X RIO adapter and the previous adapter was a 12X InfiniBand adapter, or vice-versa.
  • The firmware was enhanced so that SRC B1xxE458 (with word 6=0000E42B) will be logged as informational instead of generating a call home.
  • The firmware was enhanced such that error logs with relevant information will be created when a system crashes under certain circumstances, rather than a generic SRC (B1813410), with very little debug information, being logged.
  • A problem was fixed that caused the system to hang when terminating if the system had been in power save mode.
  • The firmware was enhanced so that if the secondary system controller remains hung after the primary system controller successfully boots, a predictive error will be logged, and a call home will be made.
  • A problem was fixed that caused SRC B181D312, and a call home to be made, when a bulk power controller (BPC) and a hardware management console (HMC) are temporarily disconnected.
  • The firmware was enhanced such that if an attempt is made to enable redundancy when the system is booting, the error log entry that is made will be informational instead of predictive.
  • The firmware was enhanced so that a call home will be made if the hypervisor issues a "terminate immediate" interrupt. 
  • A problem was fixed that caused SRC 11001D12 to be erroneously logged when the system was booting.
  • A problem was fixed that caused incorrect field replaceable unit (FRU) part numbers to be returned for the BPA scroll assembly, UEPO panel and the CEC MDA scroll assembly.
  • The firmware was enhanced so that the service processor only logs SRC B1A38B24 when a valid network set up error is found. The callouts for this SRC were also improved.
  • The firmware was enhanced so that SRCs B181720D, B1818A13, and B1818A0F, and occasionally a service processor dump, will not be generated when the service processor's two Ethernet interfaces are on the same subnet. (This is an invalid configuration.)
  • A problem was fixed that caused a system with I/O drawers attached to crash, and a SYSDUMP to be taken, with SRCs B7000103 and SRC B181D138 being logged.  Another symptom of this failure is informational SRC B7006970 entries constantly posting in the iqyylog.log.
System firmware changes that affect certain systems:
  • In systems using InfiniBand switches for processor clustering, a problem was fixed that caused packets to be dropped under certain circumstances.
  • On systems running firmware release EH340, a problem was fixed that caused data in the platform dump to be invalid.
  • On systems with five or more nodes, a problem was fixed that prevented the identify LED function from turning on the correct node's LED.
  • On systems with a large number of I/O drawers, a communication problem was fixed that caused unnecessary system controller failovers, unnecessary reset/reloads, and unnecessary dumps, and SRC B181F105 to be logged.
  • On systems with a large number of I/O drawers, the firmware was enhanced to reduce the boot time.
Concurrent maintenance (CM) firmware fixes: 
  • DEFERRED: A problem was fixed that caused SRC B150A422 to be erroneously logged, and the advanced system management interface (ASMI) to erroneously show deconfigured processor cores, if system firmware was installed while a node was deactivated due a concurrent maintenance operation.
  • DEFERRED: A problem was fixed that caused SRC B181B171 to be logged, and the system to crash, during a concurrent node repair or concurrent GX adapter repair.
  • A problem was fixed that prevented a concurrent add or repair of a GX adapter from being re-attempted if a reset/reload of the primary system controller occurred during the GX add part of the initial procedure.
  • A problem was fixed that might cause a concurrent node repair, a concurrent I/O expansion unit repair, a concurrent PCI slot repair, or a DLPAR removal or moving of I/O slots to fail if the I/O hardware involved is in a failed state.
  • A problem was fixed that caused a hot node repair operation to fail if 16GB huge pages were configured on the system.
  • On systems using on/off (temporary) memory capacity on demand (COD), the firmware was enhanced to improve memory COD's interaction with other tools (such as Inventory Scout in AIX), and to make the billing process easier.
  • A problem was fixed that caused a concurrent node add or repair operation to fail if the operation immediately followed an upgrade of system firmware from EH330_xxx to EH340_039, then a concurrent installation of EH340_061.
EH340_061_039

04/20/09

Impact: Function   Severity: Special Attention

System firmware changes that affect all systems:

  • DEFERRED: A problem was fixed that caused the advanced system management interface (ASMI) menus to become unresponsive, and the system to appear to hang, when a GX adapter slot reservation was attempted when the system was at service processor standby.
  • A problem was fixed that caused the service processor diagnostics to report a "TOD (time-of-day) overflow" error, instead of an uncorrectable memory error, when failures occurred on memory DIMMs.
  • A problem was fixed that prevented the service processor from automatically booting from the permanent (or P) side if the temporary (or T) side of the firmware flash was corrupted. When the problem occurred, the service processor stopped instead of booting from the P side.
  • A problem was fixed that might have caused the system to crash when a processor was dynamically removed when the system was running. If the system is running the EH340 release of system firmware, this problem can also occur during a concurrent maintenance operation.
  • The firmware was enhanced such that data corruption in the Anchor (VPD) will be corrected by the firmware, rather than having to have the Anchor card replaced.
  • A problem was fixed that caused non-terminating SRCs (such as B1818A1E) that indicate registry read errors to be logged during a disruptive installation of system firmware.
  • A problem was fixed that prevented the system from powering on after the "reset to factory settings" option was selected in the advanced system management interface (ASMI) menus.
  • The firmware was enhanced to improve the service processor's capability to recover from bad bits in the flash memory. A predictive error, or an unrecoverable error, will be logged against the card that contains the system firmware if the number of correctable or uncorrectable errors exceeds the threshold.
  • A problem was fixed that caused a partition being migrated to crash on the target system.
  • On systems running the EH340 release of system firmware, a problem was fixed that caused an abort code to be logged in the virtual input/output system (VIOS) error log on the source system after a successful partition migration.
  • A problem was fixed that caused a partition being migrated to become unresponsive on the target system when firmware-assisted dump was enabled.
  • The firmware was enhanced so that SRC BA210012 will not generate a call home when logged.
  • The callouts for SRC B181E6ED, which is logged when a system is booted with service processor redundancy disabled, were improved to indicate that redundancy was disabled rather than calling out a firmware failure.
  • A problem was fixed that caused hardware to be deconfigured when the system encountered network errors, even though the SRCs were being logged as informational.
  • A problem was fixed that prevented all of the necessary files from being synchronized between the primary and secondary service processors. One possible symptom of this problem was the time-of-day clocks being out of synch after a service processor failover.
System firmware changes that affect certain systems:
  • On systems with firmware release EH340 installed, a problem was fixed that caused a system firmware installation to fail with SRC E302F9D3 being erroneously logged.
  • On systems with 16GB DIMMs and firmware release EH340 installed, a problem was fixed that caused prevented the concurrent replacement of a distributed converter assembly (DCA) in a processor node.
  • On systems with external I/O drawers, a problem was fixed that could cause the system to hang on checkpoint C700406E during a "warm" reboot (a reboot in which the processor drawer is power-cycled but the I/O drawers are not).
  • On systems running system firmware release EH340 and IBM i partitions, a problem was fixed that caused message CPF9E7F, CPF9E2D or CPF9E5E (which indicates a licensing key problem) to be received by the IBM i partitions when the number of physical processors was greater than the number of IBM i licenses.
  • On systems with virtual fiber channel disks, a problem was fixed that prevented the system management services (SMS) from displaying the virtual fiber channel disks if the virtual fiber channel server reported that any of them were reserved. 
Concurrent maintenance (CM) firmware fixes 
  • DEFERRED: On systems running system firmware release EH340, a problem was fixed that caused the system to checkstop during the "hot add" of a GX I/O adapter card.
  • A problem was fixed that caused a concurrent maintenance operation to be halted with SRC B181A433 being logged.
  • A problem was fixed that caused concurrent maintenance operations, if attempted immediately after a disruptive firmware installation, to be disabled.
  • A problem was fixed that caused SRC B150D15E to be erroneously logged during a concurrent node addition or concurrent memory upgrade.
  • On systems with five or more processor nodes, a problem was fixed that identifies the wrong node LED.
  • A problem was fixed that caused a concurrent processor add operation, after a disruptive installation of system firmware, to fail with SRC B181A422 being logged.
  • A problem was fixed that caused concurrent maintenance operations, if attempted immediately after a concurrent firmware installation, to be disabled.
  • A problem was fixed that caused a concurrent node add to fail after a disruptive firmware installation with SRC B181A422 being logged.
  • A problem was fixed that prevented a concurrent add or repair of a GX adapter from being re-attempted if a reset/reload of the primary system controller occurred during the GX add part of the initial procedure. 
EH340_039_039

11/21/08

Impact: Function     Severity: Attention

New Features and Functions:

  • Support for concurrent processor node addition, as well as hot and cold node repair. 
  • Support for up to 30 feature code 5791, 5797, 5798, 5807, 5808, and 5809 I/O drawers in two powered I/O racks, with the limitation that no more than 12 of those 30 drawers can be feature codes 5791, 5797, 5798, 5807, 5808, and 5809.
  • Support for migrating memory DIMMs from POWER5 model 59x systems to model FHA systems.
  • Support for concurrently connecting an I/O rack to a model FHA system.
  • Support for the 8GB fiber channel adapter, F/C 5735.
  • Support for a virtual tape device.
  • Support for USB flash memory storage devices.
  • Support in the system controller firmware for IPv6.
  • Support in the hypervisor for three types of hardware performance monitors.
  • Support for installing AIX and Linux using the integrated virtualization manager (IVM).
  • On systems running AIX, support was added for an enhanced power and thermal management capability. When static power save mode is selected, AIX will "fold" processors to free processors which can then be put in the "nap" state.
System firmware changes that affect all systems:
  • A problem was fixed that prevented the default partition environment in the advanced system management interface (ASMI) power on/off menu from being set to "i5/OS" when it was blank.
  • The firmware was enhanced so that SRC B1xx3409, which indicates an invalid state change (such as pushing the power on button twice quickly) will be logged as informational instead of predictive, and will not call home.
  • A problem was fixed that caused a service processor dump to be taken and SRC B181EF88 to be logged, even though the operation of the system was not affected.
  • On systems that are managed by a hardware management console (HMC), a problem was fixed that, under certain rare circumstances, caused SRC B181E411 to be logged, a call home to be made, and a service processor dump to be taken.
  • The firmware was enhanced so that SRC B1812224, which indicates that the user attempted to enable redundancy when the managed system was booting, will be logged as informational instead of predictive.
  • A problem was fixed that prevented error log entries on the secondary service processor (or system controller) from generating a serviceable event on the hardware management console (HMC).
  • A problem was fixed that, under certain rare circumstances, caused SRC B1754202 to be erroneously logged (as a predictive error with a call home) after a disruptive firmware installation.
  • A problem was fixed that caused SRC B1818A0F to be erroneously logged during a firmware installation when service processor (or system controller) failover is disabled.
  • A problem was fixed that prevented the machine type and model data from being added to a node controller's error log entries.
  • On systems with external I/O frames, a problem was fixed that might have prevented the firmware from "unthrottling" processors after entering power save mode.
  • A problem was fixed that caused the system to crash and a SYSDUMP to be taken, with SRCs B170E540, B181D138, or B700F105, with a bad PCI-E adapter installed and in use, or while running a heavy network load.
System firmware changes that affect certain systems
  • On systems with the integrated x-series adapter (IXA), a problem was fixed that prevented the creation of a system plan on the HMC.
  • On systems with multiple host channel adapter (HCA) cards, a problem was fixed that logical ports on the HCA cards to be intermittently inactive.
  • In networks using a time server, a problem was fixed that caused the date on a client system to be reset to 1969 if the client system lost power.



4.0 How to Determine Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: EH340_122.

5.0 Downloading the Firmware Package

Follow the instructions on the web page. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a CD-ROM or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: EHXXX_YYY_ZZZ

Where XXX = release level

Instructions for installing firmware updates and upgrades can be found at http://publib.boulder.ibm.com/infocenter/systems/scope/hw/topic/ipha1/updupdates.htm


7.0 Firmware History

 
EH330
EH330_104_034

04/26/10

Impact: Availability          Severity:  ATT

System firmware changes that affect all systems

  • DEFERRED:   This fix corrects the handling of a specific processor instruction sequence that has the potential to result in undetected data errors.  This specific instruction sequence has only been observed in a small number of highly tuned floating point-intensive applications.  However, it is strongly recommended that this fix be applied to all POWER6 systems.  This fix has the potential to decrease system performance on applications that make extensive use of floating point divide, square root, or estimate instructions..
  • A problem was fixed that caused SRC B181440C to be erroneously logged, and a call home to be erroneously made, during the installation of system firwmare.
  • A problem was fixed that caused SRC B1818A0A to be erroneously logged during a concurrent firmware update.
  • The firmware was enhanced such that SRCs B181F126, B181F127, and B181F129 are correctly logged, and no longer cause unnecessary calls home to be made.
  • The firmware was enhanced so that SRC B181720D, and occasionally  a service processor dump, will not be generated when the service processor's two Ethernet interfaces are on the same subnet.  (This is an invalid configuration.)
  • In partitions running AIX or Linux, a problem was fixed that, under certain rare circumstances, caused the addition of an I/O slot to a partition using a dynamic LPAR (DLPAR) add operation to fail.
  • A problem was fixed that caused the system to hang with SRCs B182953C, B182954C and B17BE434 being logged.
  • A problem was fixed that caused SRC B1818902 to be erroneously logged during a firmware installation.
  • A problem was fixed that caused a reset/reload of a node controller.

System firmware changes that affect certain systems
  • On partitions running AIX or Linux, a problem was fixed that caused a dynamic LPAR (DLPAR) operation to add an I/O slot to fail.
  • On systems running redundant VIOS partitions, a problem was fixed that prevented Ethernet traffic from being properly bridged between the two partitions.  This problem also prevented shared Ethernet adapter failover from working correctly.
  • On systems using InfiniBand switches for processor clustering, a problem was fixed that caused InfiniBand ports to intermittently drop out.
EH330_095_034

08/31/09

Impact: Usability          Severity:  HIPER

System firmware changes that affect all systems

  • DEFERRED:  This fix corrects the handling of a specific processor instruction sequence that was generated on a particular heavily-tuned High Performance Computing (HPC) application. This specific instruction sequence has the potential to produce an incorrect result. This instruction sequence has only been observed in a single HPC application.  However, it is strongly recommended that you apply this fix. 
  • HIPER:  A problem was fixed that caused the migration of a partition using shared processors to fail with a reason code of 4180043, or caused the source system to hang or crash.
  • A problem was fixed that caused SRC 1000911B to be erroneously logged during a reset/reload of the service processor.
System firmware changes that affect certain systems
  • On systems with 7311-D11, 7314-G30, 5790, or 5796 19" drawers attached, a problem was fixed that caused SRC 10009138 to be erroneously logged.


Concurrent maintenance (CM) firmware fixes

  • A problem was fixed that caused SRC B7005603 to be erroneously logged when a F/C 5802 or 5877 19" drawer was concurrently added to the system.
EH330_092_034

05/18/09

Impact: Usability      Severity: Special Attention

System firmware changes that affect all systems:

  • DEFERRED: A problem was fixed that caused the advanced system management interface (ASMI) menus to become unresponsive, and the system to appear to hang, when a GX adapter slot reservation was attempted when the system was at service processor standby.
  • The firmware was enhanced to improve the service processor's capability to recover from bad bits in the flash memory. A predictive error, or an unrecoverable error, will be logged against the card that contains the system firmware if the number of correctable or uncorrectable errors exceeds the threshold.
  • A problem was fixed that prevented the service processor from automatically booting from the permanent (or P) side if the temporary (or T) side of the firmware flash was corrupted. When the problem occurred, the service processor stopped instead of booting from the P side.
  • The firmware was enhanced so that SRC B1xxE458 (with word 6=0000E42B) will be logged as informational instead of generating a call home.
  • A problem was fixed that caused non-terminating SRCs (such as B1818A1E) that indicate registry read errors to be logged during a disruptive installation of system firmware.
  • The firmware was enhanced to improve the field replaceable unit (FRU) callouts when a clock failure occurs.
  • A problem was fixed that caused a partition being migrated to become unresponsive on the target system when firmware-assisted dump was enabled.
  • The callouts for SRC B181E6ED, which is logged when a system is booted with service processor redundancy disabled, were improved to indicate that redundancy was disabled rather than calling out a firmware failure.
  • A problem was fixed that caused hardware to be deconfigured when the system encountered network errors, even though the SRCs were being logged as informational.
  • A problem was fixed that caused the detailed data at the end of an "early power off warning type 5" AIX error log entry to be filled with invalid data instead of zeros.
  • A problem was fixed that caused a partition being migrated to crash on the target system.
  • A problem was fixed that might cause a system to crash with SRC B170E504 when a processor was dynamically deconfigured.
  • The firmware was enhanced such that when data is written to the VPD (Anchor) card, the results are verified, resulting in fewer VPD cards being replaced.
  • A problem was fixed that prevented all of the necessary files from being synchronized between the primary and the secondary system controllers. One possible symptom of this problem was the time-of-day clocks being out of synch after a system controller failover.
  • A problem was fixed that caused SRC B1818601 to be logged, and a service processor dump to be generated, at runtime.
System firmware changes that affect certain systems:
  • In systems using InfiniBand switches for processor clustering, a problem was fixed that caused packets to be dropped under certain circumstances.
  • On systems with five or more nodes, a problem was fixed that prevented the identify LED function from turning on the correct node's LED.
EH330_076_034

12/05/08

Impact: Serviceability            Severity: HIPER

System firmware changes that affect all systems:

  • DEFERRED and HIPER: The system initialization settings were changed to reduce the likelihood of a system crash under extremely rare circumstances.
  • HIPER: A problem was fixed that caused a system to fail to reboot after a B1xxE504 SRC was logged, due to a processor interconnection bus failure. The same SRC, B1xxE504, was logged when the reboot failed.
  • A problem was fixed that caused SRC 11001D1x to be erroneously logged during system boot.
  • A problem was fixed that might, if a platform dump occurred, have caused a reset/reload of the service processor, and the platform dump to be corrupted.
  • A problem was fixed that caused incorrect field replaceable unit (FRU) part numbers to be returned for the BPF scroll assembly, UEPO panel and the CEC MDA scroll assembly.
  • A problem was fixed that prevented the system from rebooting if an error occurred during a memory-preserving IPL.
  • The firmware was enhanced so that if a system with redundant system controllers is booted with redundancy disabled, a call home error will be logged.
  • The firmware was enhanced so that a call home will be made if the hypervisor issues a "terminate immediate" interrupt.
  • A problem was fixed that prevented service processor and hypervisor error log entries from being reported to the operating system after a successful partition migration. This problem only affected the partition that was migrated.
  • On systems running AIX or Linux, a problem was fixed that, under certain rare circumstances, might cause the operating system to crash.
  • A problem was fixed that, in certain configurations, caused the removal of a host Ethernet adapter (HEA) port to fail when using a dynamic LPAR (DLPAR) operation.
  • A problem was fixed that, under certain rare circumstances, caused the hypervisor to crash when it was booting with SRC B6000103 being logged.
  • A problem was fixed that, under certain circumstances, prevented the operating system from recovering a PCI-E adapter on which a temporary enhanced error handling (EEH) error occurred.
  • A problem was fixed that, under certain rarely occurring circumstances, caused the system to crash if an L2 or L3 cache failure is not discovered and repaired when it initially occurs.
  • A problem was fixed that caused the service processor diagnostics to call out a processor as the failing item, instead of the memory DIMMs, when a large number of memory error correction coding (ECC) errors occurred.
  • A problem was fixed that prevented the system from powering on after the "reset to factory settings" option was selected in the advanced system management interface (ASMI) menus.
  • A problem was fixed that caused the wrong field replaceable unit (FRU) to be called out when SRC B152F109, which indicates a problem with the NVRAM in a bulk power controller (BPC), was logged.
  • (picked up under feature 683162): A problem was fixed that prevented service processor and hypervisor error log entries from being reported to the operating system after a successful partition migration. This problem only affected the partition that was migrated.
  • A problem was fixed that might cause a default catch to occur when booting from an iSCSI device.
System firmware changes that affect certain systems:
  • On systems with a host Ethernet adapter (HEA) or host channel adapter (HCA) assigned to a Linux partition, a problem was fixed that prevented the partition from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the partition. When this problem occurred, SRC B700F105 was logged.
  • In systems with clustered processors, various problems were fixed in the InfiniBand interconnection networks.
  • A problem was fixed that, under certain circumstances, caused an AIX or Linux partition to fail to boot with SRC D200E0AF being logged.
  • On systems with external I/O frames, a problem was fixed that might have prevented the firmware from "unthrottling" processors after entering power save mode.
EH330_046_034

08/28/08

Impact: Function       Severity: HIPER

System firmware changes that affect all systems:

  • DEFERRED and HIPER: A problem was fixed that, under certain rarely occurring circumstances, an application could cause a processor to go into an error state, and the system to crash.
  • HIPER: A problem was fixed that caused the system to terminate abnormally with SRC B131E504.
  • HIPER: A problem was fixed that might cause a partition to crash during a partition migration before the migration was complete.
  • DEFERRED: Enhancements were made to the system firmware to reduce the system boot time on power up.
  • DEFERRED: A problem was fixed such that under certain rare circumstances, if a system controller failover occurred, the new secondary system controller was not able to communicate with the system.
  • DEFERRED: A problem was fixed that caused SRC B1608CB0 to be logged if a separate I/O frame is attached to the CEC frame.
  • A problem was fixed that caused multiple instances of SRC B1818A03 and B1818A0A to be logged erroneously, and multiple calls home to be made, during a frame connection reset.
  • A problem was fixed that caused SRC B1819506 to be erroneously generated, and a call home to be made, when service processor (or system controller) error log entries were generated faster than they could be processed.
  • A problem was fixed that caused the hardware management console (HMC) to show an "Incomplete" state after it attempted to read a file with an incorrect size from the service processor (or system controller). This problem also occurred if the "factory configuration" option was used on the advanced system management interface (ASMI) menus.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of the time-of-day clock circuitry.
  • A problem was fixed that prevented a dump file larger than 4 GB from being successfully off-loaded to the hardware management console (HMC).
  • On systems with redundant bulk power controllers, a problem was fixed that caused the hardware management console (HMC) to get stuck at "Pending Authentication" for one of the bulk power controllers (BPCs).
  • On systems with I/O drawers attached, a problem was fixed that might have caused some I/O slots in the drawers not to be configured when the system was booted.
  • In systems with clustered processors, various problems were fixed in the InfiniBand interconnection networks.
  • A problem was fixed that caused the location codes of the external InfiniBand ports on a 5791 I/O drawer with the InfiniBand interface to be reported incorrectly on the HMC.
  • A problem was fixed that caused SRC B7006971 to be generated because the firmware was incorrectly performing operations on PCI-Express I/O adapters during dynamic LPAR (DLPAR) operations on memory.
  • A problem was fixed the might have caused an out-of-memory condition in the hypervisor, with SRC B7000200 being logged.
  • A problem was fixed in the thermal management firmware that caused SRCs B1812635 and B1812636 to be logged, and the system or node to run in low power mode when it should have been in nominal, or nominal when it should have been in low power mode.
  • A problem was fixed that caused SRC B1818A10 to be erroneously generated after a successful installation of system firmware.
  • A problem was fixed that caused the AIX commands "lsmcode" and "diag" to fail after a partition migration.
  • A problem was fixed that caused the message "BA330000malloc error!" to be displayed on the operating system console after a partition migration, even though SRC BA330000 had not been logged. When this problem occurred, the partition migration appeared to be successful. However, a process within the partition was either hung or had failed, and in most cased the partition had to be rebooted to fully recover.
  • A problem was fixed that caused the status of the connection between the hardware management console (HMC) and the service processor to be set to an invalid state. This might cause problems when the HMC and service processor tried to communicate.
  • A problem was fixed that caused partitions that were being rebooted to hang at D200E0AF after a concurrent firmware update under certain circumstances.
  • A problem was fixed that prevented the replacement of a system controller from completing successfully if the system controller had been guarded out prior to it replacement.
  • A problem was fixed that caused the system controller to go through an unnecessary reset/reload cycle when a checkstop occurred or the system was powered off.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of the node controller.
  • A problem was fixed that caused predictive SRC B181EF88 to be logged when, under certain circumstances, a system controller failover occurred at runtime.
  • A problem was fixed such that if redundancy was disabled, and the emergency power off (EPO) switch was then used to power off the system, redundancy was erroneously enabled when the system came back up.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of a node controller.
  • A problem was fixed such that caused the service processor (or system controller) to lose its communication link with the hypervisor, and SRC A181D000 to be logged, under certain rare circumstances.
  • On systems using virtual shared processor pools (VSPP), a problem was fixed that caused the number of processors assigned to the partitions to be reduced after a memory-preserving IPL. 
EH330_034_034

06/10/08

Impact: Function    Severity: HIPER

This level is a disruptive update from the prior level, EH330_018. The system should be powered off before installing this level of system firmware. If this level is installed when the system is running, the CECs will be rebooted, causing all partitions to be terminated, and a reboot will be required

System firmware changes that affect all systems:

  • HIPER: A problem was fixed that caused a concurrent firmware installation to hang with SRC BA00E840 being logged. This problem may also cause a partition migration to hang, under certain circumstances, with the same SRC, BA00E840, being logged. This SRC will be logged when this level of firmware is installed and will generate a call home; it should be ignored. It will not be logged during subsequent installations.
  • HIPER: The processor initialization settings were changed to reduce the likelihood of a processor going into an error state and causing a checkstop or system crash.
  • HIPER: A problem was fixed that, under certain circumstances, caused a system termination during a service processor failover.
  • HIPER: A problem was fixed that caused large numbers of enhanced error handling (EEH) errors to be logged against the 4-port gigabit Ethernet adapter, F/C 5740, under certain circumstances.
  • HIPER: On systems with a redundant system controllers installed and enabled, a problem was fixed that might cause a communications hang between the two system controllers. When this occurred, it triggered a reset/reload of the primary system controller, and the resulting fail-over to the secondary system controller failed in such a way that the system crashed. 
  • Several problems were fixed that might cause one or both of the clock cards to be deconfigured, and erroneously called out as bad, when the system boots up from the power-off state. 
  • A problem was fixed that caused the /tmp directory on the system controllers and the service processor in the bulk power controller (BPC) to fill up, which results in an out-of-memory condition. When this problem occurred, the system controllers or service processor in the BPC usually performed a reset/reload. This is one possible cause of SRC B1817201 being logged. 
  • A problem was fixed that prevented the "i5/OS enable/disable" setting (in the ASMI power on/off menu) from taking effect when the system is booted. This solution requires the system to be booted up to hypervisor standby twice after the setting is changed to "enabled". This will be fixed in a future service pack to remove the requirement for the second boot to hypervisor standby. 
  • A problem was fixed that caused the firmware to receive a false error indication when reading the registers on the LED controller. SRC B1811340 was logged when this happened.
  • A problem was fixed that prevented an error fail-over to the secondary system controller from completing successfully.
  • A problem was fixed that might have caused a system firmware installation to fail with SRC B18138B7 being logged.
  • A problem was fixed that caused an error log to be generated that called out system controller A (Un-P1-C2), instead of the correct callout, which was system controller B (Un-P2-C5).
  • A problem was fixed that caused the P1 LED on the front light strip to be on when it should have been off.
  • A problem was fixed that caused the wrong memory DIMM location to be called out when certain types of failures occurred.
  • A problem was fixed that might have caused cache chip failures when the system is operating in Power Save mode. Error log entries that might indicate that this problem is occurring include correctable errors and uncorrectable errors in L2, i-cache and d-cache memory, parity errors, and SRC B181E504. 
  • The firmware was enhanced so that the IDs "celogin1" and "celogin2" allow an authorized service provider to log into the bulk power controller (BPC).
  • A problem was fixed that caused a partition using a host channel adapter (HCA) or host Ethernet adapter (HEA) to appear to hang (with progress code D200C1FF being displayed) before successfully shutting down. The amount of time the partition appeared to hang depended on the amount of memory assigned to the partition and the usage of HCA or HEA.
  • A problem was fixed that prevented the HMC from connecting to the managed system if the HMC's DHCP server IP range is changed when the managed system is running.
  • The error logging and FRU callout firmware was enhanced so that if a failure occurs on one or both clock cards, only one will get deconfigured, and the system will continue to try to boot instead of terminating.
  • The firmware was enhanced to improve the system memory error recovery.
  • The firmware was enhanced so that the contents of the /tmp directory are included when a service processor dump is taken.
  • A problem was fixed in the hypervisor that might cause a partition migration to fail.
  • The firmware was enhanced so that:
    • A failure when writing VPD to a P6 processor will cause the node to be deconfigured rather than terminating the system.
    • The failure of a VPD write operation will not corrupt the VPD table, which may lead to unnecessary system down-time and unnecessary FRU replacement.
System firmware changes that affect certain systems:
  • On systems using QLogic InfiniBand switches, a problem was fixed that caused the PortInfo:linkWidthActive and PortInfo:linkSpeedActive to be inaccurately stored and displayed on the display of subnet parameters.
EH330_018_018

05/13/08

Impact: New   Severity: New
  • GA Level

8.0 Change History

Date
Description
Aug 03, 2010 Added information to the 'Cautions and Important Information' section concerning an issue following a concurrent firmware update on systems running IBM i
June 10, 2010 Added a defect description for firmware level EH340_122
Added the "Minimum HMC Code Level" section