EH340 |
EH340_122_039
05/19/10
|
Impact: Availability
Severity: ATT
System firmware changes that affect all systems
- DEFERRED: This fix
corrects the handling of a specific
processor instruction sequence that has the potential to result in
undetected data errors. This specific instruction sequence has
only been observed in a small number of highly tuned floating
point-intensive applications. However, it is strongly recommended
that this fix be applied to all POWER6 systems. This fix has the
potential to decrease system performance on applications that make
extensive use of floating point divide, square root, or estimate
instructions.
- A problem was fixed that
prevented an SRC from being recorded in the service processor dump
produced by a host-initiated reset.
- A problem was fixed that
caused a reset/reload of a node controller.
- A problem was fixed that
caused the system to become unresponsive and appear to hang when
page migration occurred on a PCIe slot.
- The firmware was enhanced to
improve the callouts for certain types of processor failures that log
SRC B1xxE504.
- The firmware was
enhanced to improve the callouts when NVRAM corruption is detected in
the bulk power controller's (BPC's) service processor.
System firmware changes that affect certain systems
- A problem was fixed that caused a virtual
SCSI or virtual fibre channel adapter to be seen by the operating
system as not bootable when it was added to a partition using a dynamic
LPAR (DLPAR) operation.
- In partitions running AIX or
Linux, a problem was fixed that caused the addition of an I/O slot to a
partition using a dynamic LPAR (DLPAR) add operation to fail.
- On systems running redundant
VIOS partitions, a problem was fixed that prevented Ethernet traffic
from being properly bridged between the two partitions. This
problem also prevented shared Ethernet adapter failover from working
correctly.
- A problem was fixed that
caused the system to crash with SRC B7000103 when a concurrent
maintenance operation was performed on an I/O slot directly from a
partition (using AIX SMIT or IBM i HST).
- A problem was fixed that
caused a system or partition running Linux to crash when the
"serv_config -l" command was run.
- On systems running active memory
sharing (AMS), the firmware was enhanced so that error messages
indicating "out of compliance" issues with the memory (HMC SRC
HSCL031F) will not be generated if the user allocates more memory than
is installed in the system. (Allocating more memory than is
installed in the system is supported in active memory sharing.)
- On systems using
InfiniBand switches for processor clustering, a problem was fixed that
caused InfiniBand ports to intermittently drop out.
- A problem was fixed that caused the hypervisor to loop
unnecessarily and consume too many processor cycles. This
impacted the performance of the system.
Concurrent maintenance (CM) firmware fixes
- A problem was fixed
that caused the concurrent addition of a node to fail with SRC B181A422.
- A problem was fixed that caused
unpredictable system behavior if a capacity on demand (CoD) or a
virtualization engine technology (VET) activation code was entered and
accepted after a node 0 evacuation was done. The unpredictable
machine behavior might also have occurred, if a node 0 evacuation
failed, a system dump was taken, and a memory-preserving IPL was then
initiated.
- A problem was fixed that
caused a concurrent maintenance operation after a node evacuation to
fail. When this problem occurred, the system erroneously states
that a platform memory dump is pending.
- A problem was fixed that
prevented a concurrent maintenance operation from completing
successfully.
- On systems with F/C 5803 or
F/C 5873 I/O drawers attached and a boot device in the drawer, a
problem was fixed that prevented a partition from booting after the
concurrent repair of the GX adapter that connects the 5802 or 5877
drawer to the system, or to the node that contains the GX adapter.
|
EH340_112_039
12/16/09
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems
- HIPER: A problem was fixed that might cause
the
system to
crash if the server is running AIX and has a F/C 5802 or 5877
drawer
(in a 19" rack), or F/C 5803 or 5873 drawer (in a 24"rack),
attached.
- On systems with a lot of memory, the firmware was enhanced
to reduce
the
time partition migrations take from hours to minutes.
- A problem was fixed that might cause the system to crash
with SRC
B181E504,
then SRC B1813909, being logged.
- The firmware was enhanced such that SRCs B181F126,
B181F127, and
B181F129
are correctly handled, and no longer cause unnecessary calls home to be
made.
- The firmware was enhanced such that SRC B1817201, when
generated by a
bulk
power controller (BPC), is correctly handled.
- A problem was fixed that caused the system to hang with
SRCs B182953C,
B182954C, and B17BE434 being logged.
- A problem was fixed that caused SRC 10009135, followed by
10009139, to
be erroneously logged. These SRCs indicate a system power control
network (SPCN) loop is being broken, then re-established.
- The firmware was enhanced to allow a temporary threshold
reduction for
processor unit book interconnect predictive errors.
System firmware changes that affect certain systems
- On a single system running Oracle in multiple
partitions, with
multiple
IBM LHCAs connected in the same subnet, a problem was fixed that caused
the remaining partitions to lose their reliable datagram socket (RDS)
heartbeat
connections after the reboot of a single partition. There is a
greater
probability of encountering this problem if the partition being
rebooted
has a large partition memory assigned to it.
Concurrent maintenance (CM) firmware fixes
- On systems with four nodes, a problem was fixed that
caused the
system
controller to perform a reset/reload, which caused a concurrent
maintenance
operation to fail, on the fourth node (P4).
- A problem was fixed that caused the concurrent
replacement of an
InfiniBand GX adapter or I/O planar to fail if a partition owned an
embedded
device on the planar.
- The firmware was enhanced such that if an Ethernet cable is
misplugged
on a node controller during a concurrent node add operation, the node
add
operation will be completed successfully.
|
EH340_101_039
09/23/09
|
Impact:
Serviceability
Severity: Attention
System firmware changes that affect all systems
- DEFERRED: The firmware was enhanced to
eliminate
correctable
errors (CEs) being erroneously logged against the memory bus with SRC
B124E504.
This change affects only 9117-MMA systems equipped with 4.2GHz quad
core
processor cards (FC 7540) and all 8234-EMA systems. This change
is
not critical.
- The firmware was enhanced such that SRC B181F126 is
correctly managed,
and no longer calls home unnecessarily for this problem.
|
EH340_095_039
08/20/09
|
Impact:
Function
Severity: HIPER
System firmware changes that affect all systems
- DEFERRED: This fix corrects the handling of
a
specific processor
instruction sequence that was generated on a particular heavily-tuned
High
Performance Computing (HPC) application. This specific instruction
sequence
has the potential to produce an incorrect result. This instruction
sequence
has only been observed in a single HPC application. However, it
is
strongly recommended that you apply this fix.
- The firmware was enhanced such that a generic B1817201 SRC
will no
longer
be logged when a cache error occurs on a node controller (NC).
Unique
SRCs will now be logged for cache failures, and upper and lower
thresholds
have been added to the NC cache error logging scheme.
System firmware changes that affect certain systems
- HIPER for systems with F/C 5803 or 5873 drawers
attached:
A problem was fixed that prevented node concurrent maintenance
operations
on systems with F/C 5803 or 5873 drawers attached to them.
- On systems with F/C 5802 or 5877 drawers attached, a
problem was fixed
that prevented an I/O slot's power LED from accurately reflecting the
state
of the I/O slot in a 5802 or 5877 drawer, under certain circumstances.
- A problem was fixed that under certain rare circumstances
caused a
partition
to crash when a 24" InfiniBand I/O drawer (feature code 5797 or 5798)
drawer
was concurrently added. When this problem occurred, rebooting the
system was required to recover.
- On systems running system firmware EH340_075 and Active
Memory Sharing,
a problem was fixed that might have caused a partition to lose I/O
entitlement
after the partition was moved from one system to another using PowerVM
Mobility.
- On systems running system firmware EH340_075 and Active
Memory Sharing,
a problem was fixed that might have caused a partition to fail to boot
with SRC B700F103 if the partition had more than 24 virtual processors
assigned to it.
- On systems running system firmware release EH340, a problem
was fixed
that
might have caused the I/O performance to be degraded if a node
evacuation
operation was performed (as part of a concurrent maintenance operation
to fix a failing I/O adapter or drawer) after the repair was complete.
- On systems with external I/O towers attached, the firmware
was enhanced
so that the system will not crash when SRC B7006981 is logged for
certain
types of I/O hardware failures.
Concurrent maintenance (CM) firmware fixes
- A problem was fixed that might have caused the performance
of an I/O
loop
(attached to a 12X I/O adapter) to be degraded if a B7006982, B7006984,
B7006985, B70069F2, B70069F3, or B70069F4 SRC is logged after a
concurrent
maintenance operation on that loop.
- A problem was fixed that caused concurrent maintenance
operations on
memory
DIMMs to fail if the replacement DIMMs were functionally equivalent to
the original DIMMs, but did not have the same CCIN (customer card
identification
number).
- A problem was fixed that caused SRC B1xxB889 SRCs to be
erroneously
logged
during a node evacuation operation. (Node evacuation is one step
in a concurrent maintenance operation on a node.)
- A problem was fixed that caused the system to crash during
a hot node
or
GX adapter repair with certain hardware configurations.
- A problem was fixed that caused replacement of a system
controller with
power off, and the system at standby, to fail.
- A problem was fixed that caused the system to crash during
a hot node
repair
or upgrade.
|
EH340_075_039
05/26/09
|
Impact: Function
Severity: HIPER
New features and functions:
- DEFERRED: Support for F/C 5803 (24" I/O drawer) and
F/C 5873
(diskless 24" I/O drawer).
Attention: After this level of firmware is installed,
the platform
must be powered off, then powered on, before the 5803 or 5873 I/O
drawer
is added to the system.
- DEFERRED: Support for POWER VM Active Memory
Sharing.
Attention: After this level of firmware is installed,
the platform
must be powered off, then powered on to activate the POWER VM Active
Memory
Sharing function.
Attention: If EH340_075 has been installed, and the
new POWER
VM Active Memory Sharing function has been activated, and you want to
back-level
the system firmware, the active memory sharing pool must be deactivated
and deleted prior to back-leveling the system firmware. IBM does not
recommend
back-leveling the system firmware.
System firmware changes that affect all systems:
- HIPER: A problem was fixed that caused a system
to
fail to reboot
after a B1xxE504 SRC was logged, due to a processor interconnection bus
failure. The same SRC, B1xxE504, was logged when the reboot failed.
- A problem was fixed that caused non-terminating SRCs (such
as B1818A1E)
that indicate registry read errors to be logged during a disruptive
installation
of system firmware.
- A problem was fixed that prevented the system from powering
on after
the
"reset service processor settings" or "reset all settings" option was
selected
in the advanced system management interface (ASMI) menus.
- A problem was fixed that caused the detailed data at the
end of an
"early
power off warning type 5" AIX error log entry to be filled with invalid
data instead of zeros.
- A problem was fixed that caused the secondary system
controller to
reset/reload
with SRC B1xxB741 being logged, if the system controller lost the
communication
path to one of the node controllers.
- A problem was fixed that prevented all of the necessary
files from
being
synchronized between the primary and the secondary service processors.
One possible symptom of this problem was the time-of-day clocks being
out
of synch after a service processor failover.
- A problem was fixed that caused SRC B1818601 to be logged,
and a
service
processor dump to be generated, at runtime.
- A problem was fixed that caused the number of empty GX
adapter slots
displayed
by the advanced system management interface (ASMI) to be incorrect.
- A problem was fixed that prevented a newly installed 12X
I/O adapter
from
being recognized if the system controller was at standby, and the newly
installed adapter was a 12X RIO adapter and the previous adapter was a
12X InfiniBand adapter, or vice-versa.
- The firmware was enhanced so that SRC B1xxE458 (with word
6=0000E42B)
will
be logged as informational instead of generating a call home.
- The firmware was enhanced such that error logs with
relevant
information
will be created when a system crashes under certain circumstances,
rather
than a generic SRC (B1813410), with very little debug information,
being
logged.
- A problem was fixed that caused the system to hang when
terminating if
the system had been in power save mode.
- The firmware was enhanced so that if the secondary system
controller
remains
hung after the primary system controller successfully boots, a
predictive
error will be logged, and a call home will be made.
- A problem was fixed that caused SRC B181D312, and a call
home to be
made,
when a bulk power controller (BPC) and a hardware management console
(HMC)
are temporarily disconnected.
- The firmware was enhanced such that if an attempt is made
to enable
redundancy
when the system is booting, the error log entry that is made will be
informational
instead of predictive.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
- A problem was fixed that caused SRC 11001D12 to be
erroneously logged
when
the system was booting.
- A problem was fixed that caused incorrect field replaceable
unit (FRU)
part numbers to be returned for the BPA scroll assembly, UEPO panel and
the CEC MDA scroll assembly.
- The firmware was enhanced so that the service processor
only logs SRC
B1A38B24
when a valid network set up error is found. The callouts for this SRC
were
also improved.
- The firmware was enhanced so that SRCs B181720D, B1818A13,
and
B1818A0F,
and occasionally a service processor dump, will not be generated when
the
service processor's two Ethernet interfaces are on the same subnet.
(This
is an invalid configuration.)
- A problem was fixed that caused a system with I/O drawers
attached to
crash,
and a SYSDUMP to be taken, with SRCs B7000103 and SRC B181D138 being
logged.
Another symptom of this failure is informational SRC B7006970 entries
constantly
posting in the iqyylog.log.
System firmware changes that affect certain systems:
- In systems using InfiniBand switches for processor
clustering, a
problem
was fixed that caused packets to be dropped under certain circumstances.
- On systems running firmware release EH340, a problem was
fixed that
caused
data in the platform dump to be invalid.
- On systems with five or more nodes, a problem was fixed
that prevented
the identify LED function from turning on the correct node's LED.
- On systems with a large number of I/O drawers, a
communication problem
was fixed that caused unnecessary system controller failovers,
unnecessary
reset/reloads, and unnecessary dumps, and SRC B181F105 to be logged.
- On systems with a large number of I/O drawers, the firmware
was
enhanced
to reduce the boot time.
Concurrent maintenance (CM) firmware fixes:
- DEFERRED: A problem was fixed that caused SRC
B150A422 to be erroneously
logged, and the advanced system management interface (ASMI) to
erroneously
show deconfigured processor cores, if system firmware was installed
while
a node was deactivated due a concurrent maintenance operation.
- DEFERRED: A problem was fixed that caused SRC
B181B171 to be logged,
and the system to crash, during a concurrent node repair or concurrent
GX adapter repair.
- A problem was fixed that prevented a concurrent add or
repair of a GX
adapter
from being re-attempted if a reset/reload of the primary system
controller
occurred during the GX add part of the initial procedure.
- A problem was fixed that might cause a concurrent node
repair, a
concurrent
I/O expansion unit repair, a concurrent PCI slot repair, or a DLPAR
removal
or moving of I/O slots to fail if the I/O hardware involved is in a
failed
state.
- A problem was fixed that caused a hot node repair operation
to fail if
16GB huge pages were configured on the system.
- On systems using on/off (temporary) memory capacity on
demand (COD),
the
firmware was enhanced to improve memory COD's interaction with other
tools
(such as Inventory Scout in AIX), and to make the billing process
easier.
- A problem was fixed that caused a concurrent node add or
repair
operation
to fail if the operation immediately followed an upgrade of system
firmware
from EH330_xxx to EH340_039, then a concurrent installation of
EH340_061.
|
EH340_061_039
04/20/09
|
Impact: Function Severity: Special Attention
System firmware changes that affect all systems:
- DEFERRED: A problem was fixed that caused the
advanced
system management
interface (ASMI) menus to become unresponsive, and the system to appear
to hang, when a GX adapter slot reservation was attempted when the
system
was at service processor standby.
- A problem was fixed that caused the service processor
diagnostics to
report
a "TOD (time-of-day) overflow" error, instead of an uncorrectable
memory
error, when failures occurred on memory DIMMs.
- A problem was fixed that prevented the service processor
from
automatically
booting from the permanent (or P) side if the temporary (or T) side of
the firmware flash was corrupted. When the problem occurred, the
service
processor stopped instead of booting from the P side.
- A problem was fixed that might have caused the system to
crash when a
processor
was dynamically removed when the system was running. If the system is
running
the EH340 release of system firmware, this problem can also occur
during
a concurrent maintenance operation.
- The firmware was enhanced such that data corruption in the
Anchor (VPD)
will be corrected by the firmware, rather than having to have the
Anchor
card replaced.
- A problem was fixed that caused non-terminating SRCs (such
as B1818A1E)
that indicate registry read errors to be logged during a disruptive
installation
of system firmware.
- A problem was fixed that prevented the system from powering
on after
the
"reset to factory settings" option was selected in the advanced system
management interface (ASMI) menus.
- The firmware was enhanced to improve the service
processor's capability
to recover from bad bits in the flash memory. A predictive error, or an
unrecoverable error, will be logged against the card that contains the
system firmware if the number of correctable or uncorrectable errors
exceeds
the threshold.
- A problem was fixed that caused a partition being migrated
to crash on
the target system.
- On systems running the EH340 release of system firmware, a
problem was
fixed that caused an abort code to be logged in the virtual
input/output
system (VIOS) error log on the source system after a successful
partition
migration.
- A problem was fixed that caused a partition being migrated
to become
unresponsive
on the target system when firmware-assisted dump was enabled.
- The firmware was enhanced so that SRC BA210012 will not
generate a call
home when logged.
- The callouts for SRC B181E6ED, which is logged when a
system is booted
with service processor redundancy disabled, were improved to indicate
that
redundancy was disabled rather than calling out a firmware failure.
- A problem was fixed that caused hardware to be deconfigured
when the
system
encountered network errors, even though the SRCs were being logged as
informational.
- A problem was fixed that prevented all of the necessary
files from
being
synchronized between the primary and secondary service processors. One
possible symptom of this problem was the time-of-day clocks being out
of
synch after a service processor failover.
System firmware changes that affect certain systems:
- On systems with firmware release EH340 installed, a problem
was fixed
that
caused a system firmware installation to fail with SRC E302F9D3 being
erroneously
logged.
- On systems with 16GB DIMMs and firmware release EH340
installed, a
problem
was fixed that caused prevented the concurrent replacement of a
distributed
converter assembly (DCA) in a processor node.
- On systems with external I/O drawers, a problem was fixed
that could
cause
the system to hang on checkpoint C700406E during a "warm" reboot (a
reboot
in which the processor drawer is power-cycled but the I/O drawers are
not).
- On systems running system firmware release EH340 and IBM i
partitions,
a problem was fixed that caused message CPF9E7F, CPF9E2D or CPF9E5E
(which
indicates a licensing key problem) to be received by the IBM i
partitions
when the number of physical processors was greater than the number of
IBM
i licenses.
- On systems with virtual fiber channel disks, a problem was
fixed that
prevented
the system management services (SMS) from displaying the virtual fiber
channel disks if the virtual fiber channel server reported that any of
them were reserved.
Concurrent maintenance (CM) firmware fixes
- DEFERRED: On systems running system firmware
release
EH340, a problem
was fixed that caused the system to checkstop during the "hot add" of a
GX I/O adapter card.
- A problem was fixed that caused a concurrent maintenance
operation to
be
halted with SRC B181A433 being logged.
- A problem was fixed that caused concurrent maintenance
operations, if
attempted
immediately after a disruptive firmware installation, to be disabled.
- A problem was fixed that caused SRC B150D15E to be
erroneously logged
during
a concurrent node addition or concurrent memory upgrade.
- On systems with five or more processor nodes, a problem was
fixed that
identifies the wrong node LED.
- A problem was fixed that caused a concurrent processor add
operation,
after
a disruptive installation of system firmware, to fail with SRC B181A422
being logged.
- A problem was fixed that caused concurrent maintenance
operations, if
attempted
immediately after a concurrent firmware installation, to be disabled.
- A problem was fixed that caused a concurrent node add to
fail after a
disruptive
firmware installation with SRC B181A422 being logged.
- A problem was fixed that prevented a concurrent add or
repair of a GX
adapter
from being re-attempted if a reset/reload of the primary system
controller
occurred during the GX add part of the initial procedure.
|
EH340_039_039
11/21/08
|
Impact: Function Severity: Attention
New Features and Functions:
- Support for concurrent processor node addition, as well as
hot and cold
node repair.
- Support for up to 30 feature code 5791, 5797, 5798, 5807,
5808, and
5809
I/O drawers in two powered I/O racks, with the limitation that no more
than 12 of those 30 drawers can be feature codes 5791, 5797, 5798,
5807,
5808, and 5809.
- Support for migrating memory DIMMs from POWER5 model 59x
systems to
model
FHA systems.
- Support for concurrently connecting an I/O rack to a model
FHA system.
- Support for the 8GB fiber channel adapter, F/C 5735.
- Support for a virtual tape device.
- Support for USB flash memory storage devices.
- Support in the system controller firmware for IPv6.
- Support in the hypervisor for three types of hardware
performance
monitors.
- Support for installing AIX and Linux using the integrated
virtualization
manager (IVM).
- On systems running AIX, support was added for an enhanced
power and
thermal
management capability. When static power save mode is selected, AIX
will
"fold" processors to free processors which can then be put in the "nap"
state.
System firmware changes that affect all systems:
- A problem was fixed that prevented the default partition
environment in
the advanced system management interface (ASMI) power on/off menu from
being set to "i5/OS" when it was blank.
- The firmware was enhanced so that SRC B1xx3409, which
indicates an
invalid
state change (such as pushing the power on button twice quickly) will
be
logged as informational instead of predictive, and will not call home.
- A problem was fixed that caused a service processor dump to
be taken
and
SRC B181EF88 to be logged, even though the operation of the system was
not affected.
- On systems that are managed by a hardware management
console (HMC), a
problem
was fixed that, under certain rare circumstances, caused SRC B181E411
to
be logged, a call home to be made, and a service processor dump to be
taken.
- The firmware was enhanced so that SRC B1812224, which
indicates that
the
user attempted to enable redundancy when the managed system was
booting,
will be logged as informational instead of predictive.
- A problem was fixed that prevented error log entries on the
secondary
service
processor (or system controller) from generating a serviceable event on
the hardware management console (HMC).
- A problem was fixed that, under certain rare circumstances,
caused SRC
B1754202 to be erroneously logged (as a predictive error with a call
home)
after a disruptive firmware installation.
- A problem was fixed that caused SRC B1818A0F to be
erroneously logged
during
a firmware installation when service processor (or system controller)
failover
is disabled.
- A problem was fixed that prevented the machine type and
model data from
being added to a node controller's error log entries.
- On systems with external I/O frames, a problem was fixed
that might
have
prevented the firmware from "unthrottling" processors after entering
power
save mode.
- A problem was fixed that caused the system to crash and a
SYSDUMP to be
taken, with SRCs B170E540, B181D138, or B700F105, with a bad PCI-E
adapter
installed and in use, or while running a heavy network load.
System firmware changes that affect certain systems
- On systems with the integrated x-series adapter (IXA), a
problem was
fixed
that prevented the creation of a system plan on the HMC.
- On systems with multiple host channel adapter (HCA) cards,
a problem
was
fixed that logical ports on the HCA cards to be intermittently inactive.
- In networks using a time server, a problem was fixed that
caused the
date
on a client system to be reset to 1969 if the client system lost power.
|
|
|