EM320 |
EM320_101_045
10/22/09
|
Impact:
Function
Severity: HIPER
System firmware changes that affect all systems
- HIPER: A problem was fixed that caused the
migration of a
partition using shared processors to fail with a reason code of
4180043,
or caused the source system to hang or crash.
- DEFERRED: This fix corrects the handling of
a
specific processor
instruction sequence that was generated on a particular heavily-tuned
High
Performance Computing (HPC) application. This specific instruction
sequence
has the potential to produce an incorrect result. This instruction
sequence
has only been observed in a single HPC application. However, it
is
strongly recommended that you apply this fix.
- The firmware was enhanced such that SRCs B181F126,
B181F127, and
B181F129
are correctly managed, and no longer cause unnecessary calls home.
- A problem was fixed that caused SRC B7005603 to be
erroneously logged
when
a F/C 5802 or 5877 19" drawer was concurrently added to the system.
- A problem was fixed that caused SRC B1817201 to be
erroneously logged
during
the installation of system firmware.
System firmware changes that affect certain systems
- On systems using on/off (temporary) memory capacity on
demand (COD),
the
firmware was enhanced to improve the billing process for this
feature.
|
EM320_093_045
05/04/09
|
Impact: Function Severity:
Special Attention
System firmware changes that affect all systems:
- DEFERRED: The firmware was enhanced so that the
system recovers
gracefully from an I/O load time-out, rather than issuing a machine
check,
which crashes the system.
- A problem was fixed that caused the service processor
diagnostics to
report
a "TOD (time-of-day) overflow" error, instead of an uncorrectable
memory
error, when failures occurred on memory DIMMs.
- A problem was fixed that, in certain configurations, caused
the removal
of a host Ethernet adapter (HEA) port to fail when using a dynamic LPAR
(DLPAR) operation.
- A problem was fixed that, under certain circumstances,
prevented the
operating
system from recovering a PCI-E adapter on which a temporary enhanced
error
handling (EEH) error occurred.
- A problem was fixed that caused the hardware management
console (HMC)
to
show the managed system's status as incomplete after adding a drawer
using
the concurrent maintenance operation.
- The firmware was enhanced to improve the service
processor's capability
to recover from bad bits in the flash memory. A predictive error, or an
unrecoverable error, will be logged against the card that contains the
system firmware if the number of correctable or uncorrectable errors
exceeds
the threshold.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
- A problem was fixed that prevented service processor and
hypervisor
error
log entries from being reported to the operating system after a
successful
partition migration. This problem only affected the partition that was
migrated.
- The firmware was enhanced so that if a system with
redundant service
processors
is booted with redundancy disabled, a call home error will be logged.
- A problem was fixed that prevented the system from powering
on after
the
"reset to factory settings" option was selected in the advanced system
management interface (ASMI) menus.
- A problem was fixed that caused the migration of an AIX or
Linux
partition
to fail when firmware-assisted dump was enabled. When this problem
occurs,
the partition becomes unresponsive on the target system, and the target
system may have to be rebooted to recover.
- A problem was fixed that prevented the service processor
from
automatically
booting from the permanent (or P side) if the temporary (or T side) of
the firmware flash was corrupted. When the problem occurred, the
service
processor stopped instead of booting from the P side.
- A problem was fixed that caused SRC B1818601 to be logged,
and a
service
processor dump to be generated, at runtime.
- The firmware was enhanced to include processor card #1 in
the list of
field
replaceable units (FRUs) that are called out if an I2C bus error occurs
when accessing the processor backplane's vital product data (VPD).
- A problem was fixed that prevented all of the necessary
files from
being
synchronized between the primary and the secondary service processors.
One possible symptom of this problem was the time-of-day clocks being
out
of synch after a service processor failover.
System firmware changes that affect certain systems:
- On systems with a host Ethernet adapter (HEA) or host
channel adapter
(HCA)
assigned to a Linux partition, a problem was fixed that prevented the
partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the
partition.
When this problem occurred, SRC B700F105 was logged.
- On systems with multiple host channel adapter (HCA) cards,
a problem
was
fixed that logical ports on the HCA cards to be intermittently inactive.
- On systems with the integrated xSeries adapter (IXA), a
problem was
fixed
that prevented the creation of a system plan on the HMC.
- On systems with redundant service processors, a problem was
fixed that
caused registry read errors or registry value errors to be generated
during
the installation of system firmware.
- On systems running AIX partitions, a problem was fixed that
caused AIX
to erroneously log a hardware error in which the LABEL field is
"INTRPPC_ERR",
and the INTERRUPT LEVEL is "0009 0001", after a concurrent firmware
update
or partition migration. This error did not affect the operation of the
system or partition.
|
EM320_083_045
09/24/08
|
Impact: Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: A problem was fixed that,
under
certain rarely
occurring circumstances, an application could cause a processor to go
into
an error state, and the system to crash.
- DEFERRED and HIPER: The system initialization
settings
were changed
to reduce the likelihood of a system crash under certain circumstances.
- HIPER: A problem was fixed that caused the system
to
terminate abnormally
with SRC B131E504.
- HIPER: A problem was fixed that caused a system to
fail to reboot
after a B1xxE504 SRC was logged due to a processor interconnection bus
failure. The same SRC, B1xx E504, was logged when the reboot failed.
- HIPER: A problem was fixed that might cause a
partition to crash
during a partition migration before the migration was complete.
- DEFERRED: A problem was fixed such that under
certain
rare circumstances,
if a service processor failover occurred, the new secondary service
processor
was not able to communicate with the system.
- A problem was fixed that caused SRC B1818A10 to be
erroneously
generated
after the successful installation of system firmware.
- Enhancements were made to the firmware to improve the FRU
callouts for
certain types of failures of the time-of-day clock circuitry.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash if an L2 or L3 cache failure occurred.
- The firmware was enhanced so that the contents of /tmp are
included
when
a service processor dump is taken.
- A problem was fixed that caused a predictive SRC, B181EF88,
to be
erroneously
logged after a successful installation of system firmware, and a
subsequent
slow-mode IPL, of the system.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash with SRC B7005191 being logged.
- A problem was fixed that prevented the system from
rebooting if an
error
occurred during a memory-preserving IPL.
- A problem was fixed that prevented the diagnostic commands
in AIX (diag
and lsmcode, for example) from working after a partition migration.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused a partition shutdown or partition reboot to hang with SRC
D200B077.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the hypervisor to loose its communication link to the service
processor
and log SRC A181D000.
- A problem was fixed that, under certain rarely occurring
circumstances,
might have caused dynamic LPAR (DLPAR) operations on memory to fail.
- A problem was fixed that prevented I/O hardware operations
from
completing
before dynamic LPAR (DLPAR) operations were performed on memory. This
caused
PCI bus errors, and multiple instances of SRC B7006971 to be logged.
- A problem was fixed in the hypervisor that, under certain
rarely
occurring
circumstances, caused a system-level activation to fail.
- A problem was fixed that caused SRC B7006971 to be
generated because
the
firmware was incorrectly performing operations on PCI-Express I/O
adapters
during dynamic LPAR (DLPAR) operations on memory.
- A problem was fixed that might have caused a processor
checkstop after
a node repair or node add operation.
- A problem was fixed that caused the message "BA330000malloc
error!" to
be displayed on the operating system console after a partition
migration,
even though SRC BA330000 had not been logged. When this problem
occurred,
the partition migration appeared to be successful. However, a process
within
the partition was either hung or had failed, and in most cased the
partition
had to be rebooted to fully recover.
- The firmware was enhanced to improve the description and
service
actions
that are logged with SRC BA210012.
- A problem was fixed that, under certain rare circumstances,
prevented a
partition migration from completing successfully if processors were
removed
from the partition being migrated prior to the migration using dynamic
LPAR (DLPAR) operations.
- A problem was fixed that, under certain rare circumstances,
caused a
system
crash during partition migration operations.
- A problem was fixed that, under certain rare circumstances,
caused the
hypervisor to crash when it was booting.
System firmware changes that affect certain systems:
- On systems that are managed by a hardware management
console (HMC), a
problem
was fixed that caused the HMC to show an "Incomplete" state after it
attempted
to read a file with an incorrect size from the service processor (or
system
controller). This problem also occurred if the "factory configuration"
option was used on the advanced system management interface (ASMI)
menus.
- On systems with I/O drawers attached, a problem was fixed
that might
have
caused some I/O slots in the drawers not to be configured when the
system
was booted.
- On i5 partitions using IOP-based I/O adapters which are
configured to
use
i5 clustering (SAN), a problem was fixed that caused the failover of an
I/O drawer or tower, to a system which previously owned the drawer or
tower,
to fail.
- On systems with a large number of fibre channel disks, a
problem was
fixed
that caused SRC BA210003 to logged (which called out the fibre channel
adapter) when the system management services (SMS) boot firmware was
searching
for a boot disk.
- In systems with clustered processors, various problems were
fixed in
the
InfiniBand interconnection networks.
|
EM320_076_045
06/09/08
|
Impact: Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The processor initialization
settings were changed
to reduce the likelihood of a processor going into an error state and
causing
a checkstop or system crash.
- HIPER: A problem was fixed in the hypervisor that
might cause a
partition migration to fail.
- HIPER: A problem was fixed that caused large
numbers
of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit
Ethernet
adapter, F/C 5740, under certain circumstances.
- HIPER: A problem was fixed that caused the
firmware
to erroneously
log VPD errors against the processors. This prevented drawers from
powering
on.
- HIPER: On system with a redundant service
processor
installed and
enabled, a problem was fixed that caused a communications hang between
the two service processors. When this occurred, it triggered a
reset/reload
of the primary service processor, and the resulting fail-over to the
secondary
service processor failed in such a way that the system crashed and
logged
SRC B1813410. Service processor dumps were also taken.
- HIPER: On systems with redundant service
processors
installed and
enabled, the firmware was enhanced so that if a failure occurs during a
service processor failover, the firmware will attempt to reset/reload
one
of the service processors. This may allow the system to recover and
stay
up instead of crashing.
- HIPER: On systems with redundant service
processors
installed and
enabled, a problem was fixed that caused the system to crash if a
service
processor failover occurred when the VPD files were being synchronized.
- The firmware was enhanced to improve the system memory
error recovery.
- A problem was fixed that caused the /tmp directory on the
service
processor
to fill up, which results in an out-of-memory condition. When this
problem
occurred, the service processor usually performed a reset/reload. This
is one possible cause of SRC B1817201 being logged.
- A problem was fixed that caused panel function 02 to fail
when trying
to
set the "next IPL speed" or "next IPL side".
- The firmware was enhanced so that serial port S1 is not
automatically
designated
the local console, even if the console is not selected within 60
seconds
of the system is first booted. This enhancement allows the console to
be
selected again, if no selection was made on the previous boot, instead
of defaulting to the S1 port.
|
EM320_061_031
Mfg Only
05/09/08
|
Impact:
Serviceability Severity:
HIPER
- HIPER: A problem was fixed that caused a
concurrent
firmware installation
to hang with SRC BA00E840 being logged. This problem may also cause a
partition
migration to hang, under certain circumstances, with the same SRC,
BA00E840,
being logged. This SRC will be logged when this level of firmware is
installed
and will generate a call home; it should be ignored. It will not be
logged
during subsequent installations.
|
EM320_059_031
Mfg Only
05/06/08
|
Impact: Function
Severity:
Special Attention
New features and functions:
- Support for the concurrent addition of a node (drawer) was
added.
- Support for the "cold" repair of a node (repair with power
off while
other
nodes are running) was added.
- Support for IPv6 was added. For more information, see
Section 2.1
Cautions,
paragraph Concurrent Maintenance.
- Support for logical volumes bigger than 2 TB was added.
- Virtual switch support for virtual Ethernet devices was
added. This
requires
HMC V7 R3.3.0.0 with efix MH01102 to be running on the HMC.
Fixes that affect all systems:
- HIPER: A problem was fixed that caused
capacity-on-demand (COD)
data to be retrieved in an unreadable format from the Anchor (VPD) card.
- HIPER: A problem was fixed that caused enhanced
error
handling (EEH)
to fail on certain I/O adapters.
- HIPER: A problem was fixed that might cause the
system to terminate
while IPLing partitions soon after a system boot. This problem might
also
have been seen if the partitions were set to "autostart". This failure
is typically seen on systems with a large amount of memory; SRC
B181D138
is usually logged when this error occurs.
- DEFERRED: A problem was fixed that caused the
system
to appear to
hang with C10090B8 in the control (operator) panel during a slow mode
boot.
- A problem was fixed that prevented the processor clock from
being
deconfigured
with the fabric bus after a hardware error.
- A problem was fixed that caused the L2 deconfiguration
option to be
displayed
advanced system management interface (ASMI) menus on systems on which
it
is not supported.
- A problem was fixed that caused the GX adapter slot
reservation option
to be displayed on the advanced system management interface (ASMI)
menus
on systems on which it is not supported.
- Fixes problem where wrong slot location was provided in
message when no
slot reservations were available for adding next Feature Code 1800 or
1802
adapter.
- A problem was fixed that caused the location code reported
with
enhanced
error handling (EEH) errors on certain imbedded slots have a -Cx suffix
instead of the correct -T# suffix for the underlying adapter. This also
impacted the HMC's System Planning tool.
- A problem was fixed that caused the Linux boot loader to
lose its
command
line parameters (and fail to boot a Linux partition) during a
reconfiguration
reboot.
- A problem was fixed that caused the "iSCSI" and "network1"
aliases to
be
created incorrectly in the SMS menus; this might have prevented the
system
or partition from booting from that device.
- A problem was fixed that caused this informational message
to be
erroneously
sent to the operating system console:
subq[5][0] destination address is 0!!!
Check whether the subq is needed. If it is, allocate MEM.
- A problem was fixed that caused the AIX command lsvpd to
hang if it was
executed during a partition migration.
- A problem was fixed that caused the system or partition to
hang at the
"Welcome to AIX" banner, following an iSCSI boot, during the
installation
of AIX.
- A problem was fixed that caused an iSCSI login to fail
under certain
circumstances.
When this failure occurred, the message sent to the console looked
something
like this:
iscsiFailed to LOGIN to target, rc = 1
failed to login.
could not open target 0x9034751 :system04 for r/w, aborting...
tcpOPEN: iscsi open failed
!BA012010 !
- A problem was fixed that caused the location codes of
devices attached
to the integrated USB ports to have a duplicate port suffix. For
example,
when this problem occurred, the location code of the device was shown
as:
/usb-scsi@1 U789D.001.DQDGARW-P1-T2-T2-L1
instead of the correct location code, which is
/usb-scsi@1 U787D.001.DQDGARW-P1-T2-L1
- Two translation issues were fixed. The first one caused the
string "No
alias" to always be displayed on the iSCSI menus in SMS in English even
though it should have been translated into the other languages that the
SMS menus support. The second one caused the NIC (network interface
card)
parameters such as the client IP address in the SMS ping menu to be
displayed
with message strings in English; these should have been translated as
well.
- A problem was fixed that caused the SMS menus to drop into
the open
firmware
prompt with the message "DEFAULT CATCH!" when the ping test failed.
- A problem was fixed that prevented the operating system
from setting
the
boot device list in NVRAM.
- A problem was fixed that caused approximately 20-25
occurrences of
informational
SRC B7005300 to be logged during every IPL, which was filling up the
error
logs.
- A problem was fixed that prevented the "100 Mbps/full
duplex" setting
for
the HEA 1 Gbps ports from being implemented from the HMC. When this
occurred,
there was no error message on the HMC, but the setting never took
effect.
- A problem was fixed that caused the MAC addresses displayed
on the HMC,
in the HEA logical port information for the second port group, to show
invalid addresses.
- A problem was fixed that caused a service processor dump to
be
generated
with SRC B181EF88 when the advanced system management interface (ASMI)
client was closed abruptly, or a network failure disconnected the
client
and the ASMI.
- Enhancements were made to improve the field replaceable
unit (FRU)
isolation
for phase-locked loop (PLL) clock failures on multi-CEC drawer system.
SRCs B114F6D2, B114F6C1, B113F6C1, B157F12E, B18187EF, and B158E500
were
typically seen with this type of failure.
- Enhancements were made to the error analysis firmware to
provide better
FRU callouts for certain types of processor fabric bus failures. SRCs
B114E504,
B114B2DF, and B181B10B were typically seen with this type of failure.
- Enhancements were made to the firmware to improve the
reliability of
memory
DIMMs.
- A change was made to the firmware such that predictive SRCs
B18138B0,
B1813862,
or B1813882 are now logged as informational.
System firmware changes that affect certain model MMA systems:
- On system using the EnergyScale(TM)
technology,
enhancements were made to include status, log, and error information
about
the Power Save mode in the service processor error logs.
- On systems with redundant service processors enabled, a
problem was
fixed
that caused the "restore factory configuration" function on the
Advanced
System Management Interface (ASMI) to fail.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the InfiniBand I/O device to drop packets, which resulted in an
unrecoverable
error.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the drawer to fail when performing concurrent maintenance on the
associated
InfiniBand loop.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the partition to become unresponsive when an InfiniBand cable in a
redundantly-cabled
loop was disconnected.
Note: The last two defects in this section corrected the issues
detailed
in the section titled Signal Cable in an InfiniBand loop, and
InfiniBand
I/0 drawer power on/off in earlier levels of the firmware description
file. |
EM320_046_031
06/09/08
|
Impact: Serviceability
Severity:
HIPER
Fixes that affect all model MMA systems:
- HIPER: A problem was fixed that caused a
concurrent
firmware installation
to hang with SRC BA00E840 being logged. This problem may also cause a
partition
migration to hang, under certain circumstances, with the same SRC,
BA00E840,
being logged. This SRC will be logged when this level of firmware is
installed
and will generate a call home; it should be ignored. It will not be
logged
during subsequent installations.
- HIPER: On systems with redundant service
processors
installed and
enabled, a problem was fixed that caused the system to crash if a
service
processor failover occurred when the VPD files were being synchronized.
- HIPER: On systems with redundant service
processors
installed and
enabled, the firmware was enhanced so that if a failure occurs during a
service processor failover, the firmware will attempt to reset/reload
one
of the service processors. This may allow the system to recover and
stay
up instead of crashing.
- HIPER: A problem was fixed that caused the
firmware to
erroneously
log VPD errors against the processors. This prevented drawers from
powering
on.
|
EM320_040_031
03/03/08
|
Impact: Serviceability
Severity: Special
Attention
Fixes that affect all model MMA systems:
- DEFERRED: A problem was fixed that caused a
system
crash (with SRC
B131E504) by changing the initialization settings of the I/O control
hardware.
- A problem was fixed that could cause the hypervisor to hang
after a
reset/reload
of the service processor.
- A problem was fixed that, under certain circumstances,
caused the
InfiniBand
adapter to stop responding to InfiniBand requests.
- A problem was fixed that caused SRC B1813014 to be logged
after a
successful
system firmware installation. This SRC will be logged when this level
of
firmware is installed and will generate a call home; it should be
ignored.
It will not be logged during subsequent installations.
- The FRU list was changed so that clock card failures in a
multi-drawer
system will be easier to debug and require fewer parts to fix.
- A problem was fixed that caused the service processor to
get stuck in a
reset/reload loop, which prevented the system from booting to standby.
System firmware changes that affect certain model MMA systems:
- On systems with redundant service processors enabled, a
problem was
fixed
that could cause a significant increase in system boot time.
- On systems with two service processors installed and with
redundancy
disabled,
a problem was fixed that caused the secondary service processor to go
into
the dump state, and remain in the dump state, after a platform dump.
- On systems with redundant service processors, SRCs B1813833
and
B1813834,
which were being logged intermittently after a side-switch IPL, were
changed
to informational.
- On systems with a 1519-100 tower attached, a problem was
fixed that
caused
the location code of a connector on the integrated virtual IOP to be
displayed
as Un-SE1-SE1-T1 instead of Un-SE1-T1.
- On systems with 7134-G30 I/O drawers attached in certain
cabling
configurations,
a problem was fixed that prevented the I/O port labels from being
displayed
for the port location codes on the hardware topology screens.
|
EM320_031_031
12/03/07
|
Impact: Function Severity:
Attention
New Features and Functions:
- Support for redundant service processors with failover on
model MMA
systems.
- Support for the concurrent addition of a RIO/HSL adapter on
model MMA
systems.
- Support for the concurrent replacement of a RIO/HSL adapter
on model
MMA
systems.
- Support for the "hyperboot" boot speed option in the power
on/off menu
on the Advanced System Management interface (ASMI).
- Support for the creation of multiple virtual shared
processor pools
(VSPPs)
within the one physical pool. (In order for AIX performance tools to
report
the correct information on systems configured with multiple shared
processor
pools, a minimum of AIX 5.3 TL07 or AIX 6.1 must be running.)
- Support for the capability to move a running AIX or Linux
partition
from
one system to another compatible system with a minimum of
disruption.
- Support for the collection of extended I/O device
information
(independent
of the presence of an operating system) when a system is first
connected
to an HMC and is still in the manufacturing default state.
- Improved VPD collection time on model MMA systems.
- Support for the migration of DDR2 memory DIMMs during the
MES upgrade
from
a 9117-570 server to a 9117-MMA server when processor card F/C 5621 is
ordered when the initial system upgrade MES order is placed.
- Support for EnergyScaletm and Active Energy
Managertm.
For more information on the energy management features now
available,
please see the EnergyScaletm
white
paper .
|
EM310 |
EM310_074_048
11/10/2008
|
Impact: Serviceability Severity:
HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The system initialization
settings were changed
to reduce the likelihood of a system crash under certain circumstances.
- HIPER: A problem was fixed that caused a system to
fail to reboot
after a B1xxE504 SRC was logged, due to a processor interconnection bus
failure. The same SRC, B1xxE504, was logged when the reboot failed.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash if an L2 or L3 cache failure is not
discovered
and repaired when it initially occurs.
- The firmware was enhanced so that the contents of /tmp are
included
when
a service processor dump is taken.
- A problem was fixed that, in certain configurations, caused
the removal
of a host Ethernet adapter (HEA) port using a dynamic LPAR (DLPAR)
operation
to fail.
- A problem was fixed that, under certain rare circumstances,
caused the
hypervisor to crash when it was booting with and SRC B6000103 being
logged.
- A problem was fixed that, under certain circumstances,
prevented the
operating
system from recovering a PIE adapter on which a temporary enhanced
error
handling (EEH) error occurred.
- A problem was fixed that prevented service processor and
hypervisor
error
log entries from being reported to the operating system after a
successful
partition migration. This problem only affected the partition that was
migrated.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
System firmware changes that affect certain systems:
- In systems with clustered processors, various problems were
fixed in
the
InfiniBand interconnection networks.
- On systems with a host Ethernet adapter (HEA) or host
channel adapter
(HCA)
assigned to a Linux partition, a problem was fixed that prevented the
partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the
partition.
When this problem occurred, SRC B700F105 was logged.
|
EM310_071_048
07/30/2008
|
Impact: Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The processor initialization
settings were changed
to reduce the likelihood of a processor going into an error state and
causing
a checkstop or system crash.
- HIPER: A problem was fixed that caused large
numbers
of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit
Ethernet
adapter, F/C 5740, under certain circumstances.
- DEFERRED: A problem was fixed that caused
informational SRCs B181B964
and B150D134 to be logged multiple times, and fill the service
processor
error log, during normal operation of the system.
- DEFERRED: The firmware was enhanced so that if an
L3
cache controller
gets deconfigured at runtime, the associated processor cores will also
be deconfigured. This prevents the system from going into an error
state
and causing a checkstop or system crash.
- A problem was fixed that caused the /tmp directory on the
service
processor
to fill up, which results in an out-of-memory condition. When this
problem
occurred, the service processor usually performed a reset/reload. This
is one possible cause of SRC B1817201 being logged.
- Enhancements were made to improve the field replaceable
unit (FRU)
isolation
for phase-locked loop (PLL) clock failures on multi-CEC drawer system.
SRCs B114F6D2, B114F6C1, B113F6C1, B157F12E, B18187EF, and B158E500
were
typically seen with this type of failure.
- A problem was fixed that caused SRC B1813014 to be
erroneously
generated
when a new level of system firmware was installed on the managed system.
- A problem was fixed that caused SRC B7006971 to be
erroneously
generated
during dynamic LPAR (DLPAR) operations on memory.
- A problem was fixed that caused an "HTML viewer error",
followed by the
message "Cannot complete service action for reference code 'xxxxyyyy' "
to occur in Service Focal Point on the HMC when trying to perform the
service
actions for certain SRCs.
- A problem was fixed in partition firmware that could cause
a partition
running AIX to crash under certain circumstances.
System firmware changes that affect certain systems:
- On a partition running Linux, a problem was fixed that
might cause the
hypervisor to erroneously deconfigure a processor core.
- On partitions with a large number of hard disks attached to
fibre
channel
adapters, a problem was fixed that might cause SRC BA210003 to be
erroneously
generated when the partition is booting. The partition might or might
not
boot when this error occurs.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the port labels to be missing on the hardware topology screens with
certain
cable configurations.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the partition to become unresponsive when an InfiniBand cable in a
redundantly-cabled
loop was disconnected.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
might
have caused some I/O slots in the drawers not to be configured when the
system was booted.
Note: The last two defects in this section corrected the issues
detailed
in the section titled Signal Cable in an InfiniBand loop, and
InfiniBand
I/0 drawer power on/off in earlier levels of the firmware description
file. |
EM310_069_048
02/11/2008
|
Impact: Availability Severity:
HIPER
Fixes that affect all model MMA systems:
- HIPER: A problem was fixed that caused some
functions
that perform
hardware operations during runtime to generate temporary extended error
handling (EEH) errors.
- DEFERRED: A problem was fixed that caused a system
crash (with SRC
B131E504) by changing the initialization settings of the I/O control
hardware.
Note: This fix is not in the EM320_031_031 level listed above; it is
included
in the EM320_040_031 level.
- A problem was fixed that prevented a system from recovering
after SRC
B1xxB9xx
was logged.
- A problem was fixed that caused a firmware installation to
fail with
SRC
B1813028.
- A problem was fixed that caused SRC B1818A10 to be
erroneously logged
during
a disruptive firmware installation.
- A problem was fixed that, under certain circumstances,
caused the
buttons
on the control (operator) panel to be inoperative.
- A problem was fixed that prevented the system planning tool
from
deploying
a sysplan with certain HEA MCS values.
- A problem was fixed that caused SRC B1813108 to be
erroneously logged
during
system boot.
- A problem was fixed that, under certain circumstances,
caused the
InfiniBand
adapter to stop responding to InfiniBand requests.
- A problem was fixed that caused the error
"MSGVIOSE0300E002-0154 There
is insufficient memory available for firmware" to be logged on the HMC.
System firmware changes that affect certain model MMA systems
- On model MMA systems with multiple drawers, a problem was
fixed that
prevented
the pin-hole reset switch on the control (operator) panel from
resetting
the system.
- On model MMA system with an uninterruptible power supply
(UPS)
attached,
a problem was fixed the prevented the UPS from notifying the operating
system that a utility failure or low battery condition had
occurred.
- On systems with at least 3 or more licensed processors and
2 or more
unlicensed
processors, a problem was fixed that caused the system boot to be
slower
than normal, or to hang with SRC C700406E.
- On model MMA system with 7314-G30 I/O expansion drawers
attached,
problems
were fixed that caused the wrong FRUs to be called out with SRC
B70069ED,
and caused the hypervisor to loop if certain invalid cabling
configurations
are encountered.
- On model MMA systems with a large number of I/O towers
attached, a
problem
was fixed that caused the HMC to go to the incomplete state when an
additional
tower was added to a loop.
|
EM310_063_048
11/19/07
|
Impact:
Availability Severity:
HIPER
- HIPER: A problem was fixed that caused a time-out
in a
hardware
device driver. This time-out must include both SRCs B181B920
and
B181D147. Other SRCs may be present including, but not limited to,
B1xxB9xx,
B1xxE504, and B150D141. Occasionally the system crashes. If B181B920
and
B181D147 SRCs are logged, check for any resources that were
deconfigured
at the time of these errors and reconfigure them using the ASMI menus.
No hardware should be replaced. To recover from this error condition,
the
service processor must be reset by removing, then reapplying, the
managed
system's power.
- DEFERRED: On multi-drawer model MMA systems, a
problem
found in
testing was fixed which when the L3 cache was disabled, under very
unique
(and rare) circumstances may result in data being overwritten in the
cache
and the system to crash. Although the exposure to this issue is very
low,
and there have been no reported problems from the field, the system
impact
if this occurred would be high. Product Engineering recommends that you
schedule time to install this deferred fix at you earliest convenience.
|
EM310_057_048
9/14/07
|
Impact:
Availability Severity:
HIPER
Additional features and functions:
- Added support for 9406-MMA.
System firmware changes that affect all 9117-MMA systems:
- HIPER: A problem was fixed that caused the system
to
crash with
SRC B170E450.
- HIPER: A problem was fixed that, in rare
circumstances, could cause
the system to hang due to the improper handling of certain exceptions.
- HIPER: A problem was fixed that prevented the
operating system from
being notified of certain EPOW conditions that could lead to the system
or partition being shut down, with the possible loss of data. These
EPOW
conditions included the ambient temperature being too high, the loss of
utility power (with or without UPS backup), and a user-initiated power
off using the white power button or the HMC.
- A problem was fixed that could cause a firmware
installation from the
HMC
to fail with SRC E302F85C on the HMC, and SRC B1813088, B1818A0F, or
B1813011
logged in the service processor error log.
- A change was made so that if a failure occurs during a
memory-preserving
reboot, the system continues to reboot rather than remaining in the
termination
(powered off) state.
- A problem was fixed that caused EEH (enhanced error
handling) errors to
be erroneously logged against certain I/O adapters.
- A problem was fixed that prevented "linked" resources that
had been
guarded
out from being reconfigured during the next reboot after a service
action
on one of the guarded parts.
- A problem was fixed that, after the backplane was replaced
in a
7314-G30
I/O drawer, prevented the partition that owned the drawer from seeing
those
resources.
- A problem was fixed that caused the serial connection to a
partition to
be lost. When this occurred, SRCs B181D307, B200E0AA, and/or B200813A
were
generated by the service processor and the hypervisor.
- A problem was fixed in partition firmware that, in some
circumstances,
prevented a CD-ROM or tape device from being in the default service
mode
boot list, even if one was present in the system.
- A problem was fixed that caused the HMC to go to the
incomplete state,
and SRC B182953C to be logged in the service processor error log every
five minutes or so, when the managed system was booted.
- A problem was fixed that caused the system to
intermittently fail to
configure
devices attached to the integrated USB port when booting.
- A problem was fixed that might have caused erroneous
callouts if a
problem
was found with certain levels of memory controller chips.
- A problem was fixed that caused the system to call home and
reboot
instead
of allowing the failing part (a memory controller or DIMM) to be
deconfigured
by PRD (processor runtime diagnostics).
Additional information concerning this service pack:
In addition to the fixes described above, this service pack
also contains
a fix for a low probability problem and content intended for
newly-manufactured
systems, or enhancements to system internal interfaces, which is not
required
for systems already in production use. This content will not be
activated
on systems that install this service pack concurrently. Even though
this
content is not required for systems which are already installed and in
use, a disruptive installation of this service pack or a re-IPL after
installing
it will cause this content to become active. It is not necessary to
plan
a window for re-IPL the system the activate this content.
|
EM310_048_048
6/22/07
|
Impact:
New Severity:
New
|