IBM Power Systems Scale-out LC Server Firmware
Applies to: S922LC (9006-22C)
This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.
This package provides firmware for the Power Systems Scale-out LC S922LC (9006-22C).
The firmware level in this package is:
•V0.55-20190325/BMC 2.06
These are the following images:
BMC Firmware: SMT_P9_206.bin
PNOR Firmware: P9DSU20190325_IBM_sign.pnor
pUpdate Utility for powerpc: pUpdate_ppc
Details on the package binaries are included in section 3.1
Verify your ipmitool level on your linux workstation using the following commands:
bash-4.1$ ipmitool -V
ipmitool version 1.8.15
If you need to update or add impitool to your Linux workstation , you can compile ipmitool (current level 1.8.15) for Linux as follows from the Sourceforge:
1.1.1 Download ipmitool tar from http://sourceforge.net/projects/ipmitool/ to your linux system
1.1.2 Extract tarball on linux system
1.1.3 cd to top-level directory
1.1.4 ./configure
1.1.5 make
1.1.6 ipmitool will be under src/ipmitool
You may also get the ipmitool package directly from your workstation linux packages such as Ubuntu 14.04.3:
sudo apt-get install ipmitool
The BMC Web GUI is a web-based application that works within a browser. Supported browser levels are shown below with Chrome being the preferred browser:
•Google Chrome Version 46.0.2490.71 m
•Mozilla Firefox version 41.0.3
For specific fix level information on key components of IBM Power Systems servers and Linux operating systems, please refer to the documentation in the IBM Knowledge Center.
Here are the links for the LC servers:
9006-22C: http://www.ibm.com/support/knowledgecenter/POWER9/p9hdx/9006_22c_landing.htm
Downgrading firmware from any given release level to an earlier release level is not recommended.
If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.
Concurrent Firmware Updates not available for LC servers.
Concurrent system firmware update is not supported on these LC servers.
Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.
For the LC server systems, the installation of system firmware is always disruptive.
The xxx.pnor file updates the primary side of the PNOR. The yyy.bin updates the primary side of the BMC only. The golden sides are unchanged.
Filename | Size | Checksum |
P9DSU20190325_IBM_sign.pnor | 67108992 | 4d76695e50d107668dce8f076e551de8 |
SMT_P9_206.bin | 33554432 | a720a44ac0e96ca2902fa65ee50bcb91 |
pUpdate_ppc | 133824 | 00afbdb0690fa576019331bfab93e743 |
Note: The Checksum can be found by running the Linux/Unix/AIX md5sum command against the file (all 32 characters of the checksum are listed), ie: md5sum xxx.pnor.
After a successful update to this firmware level, the PNOR components and BMC should be at the following levels. The ipmitool "fru" command can be used to display FRU ID 47 and the ipmitool "mc info" command can be used to display the BMC level.
Note: FRU information for the PNOR level does not show the updated levels via the fru command until the system has been booted once at the updated level.
PNOR firmware levels from FRU ID 47 inventory list for driver:
Product Name : OpenPOWER Firmware
Product Version : open-power-SUPERMICRO-P9DSU-0.55-20190325
Product Extra : op-build-8709070
Product Extra : skiboot-v6.0.19
Product Extra : hostboot-14fd85c-p1737ccb
Product Extra : occ-394de99
Product Extra : linux-4.17.12-openpower1-pad938fe
Product Extra : petitboot-v1.7.5-p675a019
Product Extra : machine-xml-bbf0d7a
Product Extra : sbe-4aa6703
BMC Level:
Display BMC firmware level using the "ipmitool mc info | grep Firmware" command:
Firmware Revision : 2.06
| |
V0.55-20190325/BMC 2.06
04/12/2019 | Impact: Availability Severity: HIPER
New features and functions
Redfish support was extended to version 1.6.0 and the FanMode API was added.
Support for the UART3 and UART4 was disabled in the Linux kernel on the BMC.
The BMC GUI was enhanced to show both the PNOR version and the build date.
Support was added for a new BMC gui page to control the power capping of the system.
Support was added for 24x7 On-Chip Controller (OCC) counter data collection. It allows a customer to monitor utilization and throughput of memory, buses and other system components. The data it collects is stored in system memory and the firmware provides a call interface for applications to read out this data.
Added BMC support for new PNOR version partition that has a 4k signed header.
Added BMC support to be able to detect Self Boot Engine (SBE) SEEPROM corruption.
Support was added to provide the processor VPD data for the serial number and part number on the host OS. The information can be found in the /proc/device-tree/vpd/root-node-vpd directory path. For example, the following directory path contains the serial-number file for a processor: " /proc/device-tree/vpd/root-node-vpd@a000/enclosure@1e00/backplane@800/processor@1000/serial-number".
Support was added to enable Call Home ESELs to allow system data such as On-chip Controller (OCC) telemetry to be collected remotely.
Support has been removed from XIVE interrupt controller for the store EOI operation. Hardware has limitations which would require a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This would be performance prohibitive and the PCI Host Bridges (PHBs) do not support the synchronization.
Support for voltage-droop monitors (VDM) to provide for improved system reliability during periods of unstable voltage from the power supply. The P9 processor uses an adaptive clock strategy to reduce the system power usage during power supply droop events by embedding analog VDMs that direct a digital phase-locked loop (DPLL) to immediately reduce clock frequency in response to the droop event
Support for Workload Optimized Frequency (WOF). This feature provides the maximum processor frequency in order to increase system performance based on workload characteristics.
Support was added to make the Self Boot Engine (SBE) fault indicator bits recoverable. This means if a SBE seeprom error occurs, recovery action will be taken to prevent an IPL failure or system outage.
Added support in Redfish for configuring RADIUS (Remote Authentication Dial In User Service), a network protocol for remote user authentication and accounting. It is implemented under redfish/v1/Managers/1/RADIUS. Method supported: Get/Patch. [PATCH]: "RadiusEnabled", "RadiusServerIP", "RadiusPortNumber", "RadiusSecret".
Added support in Redfish for configuring Syslog.
Added an IPMI raw command 0x3a 0x30 to be able to set the Meltdown/Spectre risk level to 0, 1, or 2. The default is risk level 0 to provide full mitigation but slowest performance. Here are the risk levels: Risk Level 0 = "Speculative execution controls to mitigate user-to-kernel and user-to-user side-channel attacks" Risk Level 1 = " Speculative execution controls to mitigate user-to-kernel side-channel attacks" Risk Level 2 = "Speculative execution fully enabled" More information on these levels can be found at https://www.ibm.com/support/knowledgecenter/en/POWER9/p9hby/p9hby_speculative_execution_control.htm?pos=2. After the risk level setting is changed, the host needs to be powered off and back on again to be running at the new risk level.
Added support in the SNMP client to allow connections to V2 and V3 servers to be running at the same
Added support for Active Directory to allow the BMC to make connections to LDAP\AD servers.
System firmware changes that affect all systems
HIPER/Pervasive: A problem was fixed where, under certain conditions, a Power Management Reset (PM Reset) event may result in undetected data corruption. PM Resets occur under various scenarios such as power management mode changes between Dynamic Performance and Maximum Performance, power management controller recovery procedures, or system boot.
HIPER/Pervasive: A problem was fixed for false time-outs in the POWER9 DD2.0 processor that caused system checkstops on a regular basis with INTCQFIR [49:51] reported in the eSEL. The system reboots with the failing processor guarded (de-configured), resulting in the loss of processor and all the memory local to the processor. The trigger for the failure is highly active workloads designed to measure the performance of the system. With the fix, Hostboot disables localized clock gating in the Interrupt Controller of the POWER9 processor and OPAL also disables that portion of the interrupt controller, thus preventing the false time-outs from occurring.
A problem was fixed for a rare Nest Memory Management Unit (NMMU) hang calling out processor hardware incorrectly, masking the real cause of the problem which was an NPU failure. The incorrect error messages take this form on the system: 3 | FQPSPPU0093G | 2018-10-01 01:25:40 | Yes | Warning | CPU 1 has exceeded a correctable error threshold 4 | FQPSPPU0093G | 2018-10-01 03:20:55 | Yes | Warning | CPU 0 has exceeded a correctable error threshold 5 | FQPSPAA0008M | 2018-10-01 04:35:40 | Yes | Critical | Hostboot procedure callout
A security problem was fixed to prevent a buffer overflow when loading the boot image that could cause firmware corruption. The firmware mitigation adds additional checking of the initial boot firmware image's load size and terminates the boot if the size is too big. The Common Vulnerabilities and Exposures issue number is CVE-2018-1992.
A security problem was fixed to prevent host programs from being able to corrupt the BMC using the internal software bridges between the host and BMC. The Common Vulnerabilities and Exposures issue number is CVE-2019-6260.
A security problem was fixed to detect and prevent Self Boot Engine (SBE) SEEPROM corruption. The Common Vulnerabilities and Exposures issue number is CVE-2018-8931.
A problem was fixed for an intermittent opal-prd crash that can happen on the host OS. This is the fault signature: " opal-prd[2864]: unhandled signal 11 at 0000000000029320 nip 00000 00102012830 lr 0000000102016890 code 1"
A problem was fixed for diagnostic code trying to read sensor values for PCI Host Bridge (PHB) entries that are unused, which causes debug output to have incorrect values for the unused entries. With the fix, only the used entries are processed by the diagnostic code.
A problem was fixed for certain system boot failures not propagating to the BMC before the boot firmware shuts down. Some details of the error log may still appear in the console output trace, but the details will not be available with the BMC queries. This problem is timing dependent and intermittently possible depending on the timing of the shutdown path. However, immediate shutdowns exacerbate the problem and increase the chance it can occur.
A problem was fixed for a IPv4 address change not persisting after a BMC reboot. This error can occur if the last octet of the IP address is reduced in characters by the IP address change. For the case where this was observed, the IP address was changed fro 50.6.36.100 to 50.6.36.1. But after the BMC reboot, the IP address again had two trailing zeros on IP as the IP address had reverted to 50.6.36.100.
A problem was fixed for an intermittent IPL failure with BC131705 and BC8A1703 logged with a processor core called out. This is a rare error and does not have a real hardware fault, so the processor core can be unguarded and used again on the next IPL.
A problem was fixed on the BMC for an incorrect "LanDrvinit fails to initial" message. This is not a true error and the message can be ignored.
A problem was fixed for a MSI-X checkstop in CAPI mode. This occurred intermitently when a DMA from the CAPI device targeted an address lower than 4GB and was confused for a 32-bit MSI operation. This is now avoided by disabling the 32-bit MSI when in CAPI mode.
A problem was fixed for L3 cache calling out a LRU Parity error too quickly for hardware that is still good. Without the fix, ignore the L3FIR[28] LRU Parity errors unless they are very persistent with 30 or more occurrences per day.
A problem was fixed for a processor core hang and checkstop during normal operations. This failure occurs only rarely on a race condition in the processor state machine.
A problem was fixed for a failure in DDR4 RCD (Register Clock Driver) memory initialization that causes half of the DIMM memory to be unusable after an IPL. This is an intermittent problem where the memory can sometimes be recovered by doing another IPL. The error is not a hardware problem with the DIMM but it is an error in the initialization sequence needed get the DIMM ready for normal operations
A problem was fixed for a Serial Over Lan (SOL) console slow down during the IPL.
A problem was fixed for a BMC gui freeze condition when an error event occurs on the backplane.
A problem was fixed for a BMC segmentation fault when running mboxd_v2 with the --help option.
A problem was fixed for the SBE timer being stuck and unavailable to the host applications. This forces OPAL to use legacy timer loops for timers at the cost of additional processor bandwidth. Here are the messages that are logged for the problem that occurs on every boot: [ 194.494559313,3] SBE: Timer stuck, falling back to OPAL pollers. [ 194.494624185,3] SBE: You will likely have slower I2C and may have experienced increased jitter.
A problem was fixed for PCIe4 CX5 adapter performance with an increase of performance of 40% for DMA read requests. The adapter affected is the Mellanox CX5 PCIe4 100Gb IB CAPI with feature codes #EC62 with CCIN 2CF1 and #EC64 with CCIN 2CF2. Without the fix, each read request requires a retry to work.
A problem was fixed for Petitboot exiting to the shell with xCAT genesis in the menu when trying to do a network boot. Petitboot was timing out when trying to access the ftpserver but it was not doing the network re-queries necessary for a proper retry. If this error happens on a system, it can be made to boot with the following two steps: 1) Type the word "exit" and press enter key. This brings it back to petitboot menu. 2) Press the enter key again to start the boot of the xCAT image.
A problem was fixed for random "????" characters displayed on the SOL Console during the skiboot boot.
A problem was fixed for Redfish-Service-Validator detected errors in the following two Redfish APIs: "/redfish/v1/Managers/BMC/LogServices/Log/Entries/[ID]" and "/redfish/v1/Chassis/Planar/Assembly".
A problem has been fixed for systems unexpectedly running with all processors at lower frequencies than would be expected for Workload Optimized Frequency (WOF) ultra-turbo mode. There was no eSEL or callout for the processor causing the error that disabled the WOF mode. With the fix, there is an eSEL and callout for the WOF fault that identifies the errant processor that needs to be replaced.
A problem has been fixed for a PCIe adaper running in CAPP mode having a missing MMIO Base Address Register (BAR) entry that causes a failure of the adapter and a fence off of two of the four ports of the adapter.
A problem has been fixed for a slow start up of a process that can occur when the system had been previously in an idle state.
A problem has been fixed for a TOD error that can cause a soft lockup of the kernel. A 'soft lockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds, without giving other tasks a chance to run. The current stack trace is displayed upon detection and, by default, the system will stay locked up.
A problem was fixed for a processor core that cannot be awakened or a timeout in the On Chip Controller when switching Workload Optimized Frequency (WOF) modes from disabled to enabled. These errors can cause a reduction in performance by running with fewer cores or by running at the safe mode frequencies.
A problem was fixed for a false call out of a processor on a INTCQFIR[27]. This FIR bit should not call out the processor as the processor has not failed. The error is recoverable and should only serve as an early warning indication.
A problem was fixed for Workload Optimized Frequency (WOF) where parts may have been manufactured with bad IQ data that requires filtering to prevent WOF from being disabled.
A problem was fixed for the opal-prd service consuming 100% of CPU during and after boot to the host. This is an infrequent intermittent problem that can be circumvented by a reboot of the system.
A problem was fixed for the wrong DIMM being called out on over-temperature failures with B1xx2A30 errors logged This should be a rare failure as it requires a DIMM to exceed its maximum specified operating temperature.
A problem has been fixed to add part and serial numbers to the processors when accessed through the device tree.
A problem has been fixed to make the OS aware of the DARN random number generator at 0x00200000 PPC_FEATURE2_DARN) and the SCV syscall at 0x00100000 (PPC_FEATURE2_SCV). Without this fix, these service constants are not defined in the OS userspace.
A problem was fixed for Coherent Accelerator Processor Proxy (CAPP) mode for the PCI Host Bridge (PHB) to improve DMA write performance by enabling channel tag streaming for the PHB. With this enabled, the DMA write does not have to wait for a response before sending a new write command on the bus.
A problem was fixed for the Open-Power Flash tool "pflash" failing with a blocklevel_smart_erase error during a pflash. This problem is infrequent and is triggered if pflash detects a smart erase fits entirely within one erase block.
A problem was fixed in the Petitboot user interface to handle cursor mode arrow keys for the VT100 'application' cursor to prevent mis-interpreting an arrow key as an escape key in some situations. For more information on the VT100 cursor keys, see http://www.tldp.org/HOWTO/Keyboard-and-Console-HOWTO-21.html.
A problem was fixed in the Petitboot user interface to cancel the autoboot if the user has exited the Petitboot user interface. This prevents the user dropping to the shell and then having the machine boot on them instead of waiting until the user is ready for the boot.
A problem was fixed in the Petitboot parsing of manually-specified configuration files that caused the parser to create file paths relative to the downloaded file's path, not the original remote path.
A problem was fixed for a failure to IPL with SRC BC8A0506 logged for a Phase Lock Loop error (PLL) in the PCIe Host Bridge (PHB). This problem is very infrequent. The fix does the correct call out of the failed FRU, allowing the IPL to continue.
A problem was fixed for a system IPL hang that shows in the log as the host going to a quiesce state with the OS inactive. This is a rare problem that may be recovered by a power off and re-IPL of the system. This problem is triggered by a higher than normal level of interrupts from the Power Supply Unit (PSU).
A problem was fixed for the VPD serial number not being updated on the replacement of a planar. The VPD update failed with the following message: "ERROR: (ECMD): ecmd - 'putvpdkeyword' returned with error code 0x20300001 (ERROR OPENING DECODE FILE). ERROR: A problem occurred updating the serial number(OSYS:SS). Please see previous output for reason ".
A problem was fixed for the CAS latency calculation for memory to improve its accuracy to reduce the potential for DIMM failures due to memory timing errors. Column Access Strobe (CAS) latency is the delay time between the moment a memory controller tells the memory module to access a particular memory column on a RAM module, and the moment the data from the given array location is available on the module's output pins.
A problem was fixed for clearing DIMM guard records when there was a repair marked in the VPD and that prevented the DIMM from being unguarded. With the fix, the VPD mark will be cleared if the guard record is cleared for the FRU, allowing it to be enabled on the next IPL.
A problem was fixed for the Self Boot Engine (SBE) error identification on failure. The SRR0/SRR1/LR/Local FI2C register are now extracted to allow the following SBE errors to now be identified: 100 - Program interrupt , promoted 101 - Instruction storage interrupt, promoted 110 - Alignment interrupt, promoted 111 - Data storage interrupt, promoted
A problem was fixed for read margins to improve the margins on DIMMs, reducing the number of DIMM failure occurrences.
A problem was fixed for the Hostboot reset to enable error recovery of Hostboot through the reset path. Without the fix, the Self Boot Engine (SBE) fails to reboot on the Hostboot reset, preventing error recovery for the Hostboot failures.
A problem was fixed for a memory training error that could caused DIMMs to be marked as bad or memory ports to be deconfigured. This problem is rare and triggered by an incorrect internal voltage level.
A problem was fixed for a Phase Lock Loop (PLL) error causing a checkstop but not calling out and guarding the failed hardware. There is a chance the failure can recur on the next IPL of the system.
A problem was fixed to reduce memory latency in memory blocks where bad memory bits have been marked.
A problem was fixed for time and date fields being zero in Hostboot error log entries for the time/date of the error occurrence.
A problem was fixed for an extraneous "MCBISTFIR[3]: broadcast out of sync" error during memory diagnostics if a Register Clock Driver (RCD) parity error occurs. The "broadcast out of sync" error should be ignored when isolating the RCD fault. This problem is triggered if the RCD parity error occurs while the DDR4 memory is in broadcast mode.
A problem was fixed for BC8A2AC5 and BC8A2AC4 errors that prevented the reading of On-Chip Controller (OCC) thermal readings from the Analog Power Subsystem Sweep (APSS) bus. This is a very rare problem.
A problem was fixed for a processor fault that caused the master processor core to guard and prevented an IPL of the system with SRC BC13E540. With the fix, the system will IPL up on the available processor cores. This error only occurs if the master core is faulted. Faults on the other cores are handled correctly and do not stop an IPL.
A problem was fixed for a flood of OPAL error messages that can occur for a processor fault. The message "CPU ATTEMPT TO RE-ENTER FIRMWARE" appears as a large group of messages and precede the relevant error messages for the processor fault. A reboot of the system is needed to recover from this error.
A problem was fixed for an incorrect Redfish access privilege to make it consistent with the BMC gui webpage.
A problem was fixed for Redfish namespace PhysicalContext.v1_3_0 not being found in schema PhysicalContext_v1.xml.
A problem was fixed for Redfish crashing if an attempt is made to convert a string to an integer when the string does represent a number.
A problem was fixed for the host console losing data.
A problem was fixed for slow SOL console response for long-running commands.
A problem was fixed for Redfish failing to read a SEL when the SEL log is being flooded with new entries.
A problem was fixed for BMC remote console not conforming to security standards by not being digitally signed. The function has been updated to be cryptographically signed,
A problem was fixed for a skiboot hang that could occur rarely for a i2C request if the i2c bus is in error or locked by the On-Chip Controller (OCC).
A problem was fixed for "Unexpected TCE size" error messages when Linux tried the default P9 PHB4 pages size and used the unsupported 2M and 1G page sizes. The TCE page size property is now set correctly with 4K/64K/16M and 256M supported.
A problem was fixed for PCIe ECC protection in the response data path for Power 9 processor parts. With the fix, PCIe ECC errors detected from the adjacent AIB (Adapter Interface Board) receive data path escalate to a checkstop so that the defective parts can be replaced.
A problem was fixed for an intermittent rare processor core lock failure that is not a real hardware problem. The erroneous failure looks like this in the logs: LOCK ERROR: Releasing lock we don't hold depth @0x30493d20 (state: 0x0000000000000001) [13836.000173140,0] Aborting! CPU 0000 Backtrace: S: 0000000031c03930 R: 000000003001d840 ._abort+0x60 S: 0000000031c039c0 R: 000000003001a0c4 .lock_error+0x64 S: 0000000031c03a50 R: 0000000030019c70 .unlock+0x54 S: 0000000031c03af0 R: 000000003001a040 .drop_my_locks+0xf4
A problem was fixed for the power-capping range allowed for the user. Changes were made to allow the user to access the entire powercap range, with two minimums exported into the OS: soft power cap minimum "powercap-min" and the hard power cap minimum limit "powercap-hard-min".
A problem was fixed for an OS reboot after a shutdown that intermittently fails after the shutdown. This can happen if the BMC is not ready to receive commands. With the fix, the messages to the BMC are validated and retried as needed. To recover from this error, the system can be rebooted from the BMC interface.
A problem was fixed for a kernel hard lock up that could occur if IPMI synchronous messages were sent from the OS to BMC while the BMC was rebooting. For these type of messages, a processor thread remains waiting in OPAL until a response is returned from the BMC.
Support was added to recognize a port parameter in the URL path for the Preboot eXecution Environment (PXE) in the ethernet adapters. Without the fix, there could be PXE discovery failures if a port was specified in the URL for the PXE.
A problem was fixed to correct the names of the WIO HBA slots . WIO-R Slot2 has been changed back to WIO-R Slot and WIO Slot 3 has been changed back to WIO Slot1.
|
V0.24.3-20180926/BMC 1.23
10/08/2018 | Impact: Availability Severity: SPE
System firmware changes that affect all systems
A problem was fixed for not checkstopping on ECC faults in the response data on the PCI Host Bridge (PHB) ASIC interconnect bus (AIB) for DD2.0 P9 processors. For these parts, PCIe ECC protection is not available in the response data path. With the fix, the AIB ECC errors detected are escalated to a checkstop so the bad parts can be replaced.
A problem was fixed for a IPL loop/hang with a fatal MCE exception log caused by a probe of a failed PCI Host Bridge (PHB) that had been guarded. This is an infrequent error because it requires a PHB to have previously failed. The exception log has the following format: Fatal MCE at 000000003006ecd4 .probe_phb4+0x570 CFAR : 00000000300b98a0 ... Aborting! CPU 0018 Backtrace: S: 0000000031cc37e0 R: 000000003001a51c ._abort+0x4c S: 0000000031cc3860 R: 0000000030028170 .exception_entry+0x180 S: 0000000031cc3a40 R: 0000000000001f10 * S: 0000000031cc3c20 R: 000000003006ecb0 .probe_phb4+0x54c S: 0000000031cc3e30 R: 0000000030014ca4 .main_cpu_entry+0x5b0 S: 0000000031cc3f00 R: 0000000030002700 boot_entry+0x1b8
A problem was fixed for PCI Host Bridge (PHB) configuration Write Request time-outs being incorrectly set to informational instead of fatal. With this fix, a possible system checkstop is prevented and failed hardware is properly reported.
A problem was fixed for a PCI Host Bridge (PHB) configuration write error that caused the incorrect PCIe device to be frozen. The fault will be attributed to the last device to have a memory-mapped I/O operation (MMIO). With this fix, the freeze action for PHB configuration write errors is disabled in order to not impact functional hardware.
|
V0.24-20180130/BMC 1.23
07/17/2018 | Impact: Availability Severity: SPE
New features and functions
Support was added for Redfish UpdateService, EventService, Log Service, Computer System Collection and Computer System.
Support was added to display the BMC IP address on the OpenPOWER splash screen.
System firmware changes that affect all systems
A problem was fixed for hangs during OS boots with "HW Power Status Error" in the SEL. Without the fix, I2C bus polling was causing abnormal behavior in the Voltage Regulator Module (VRM).
A problem was fixed for a failure to isolate to an errant FRU for a system checkstop. This is an intermittent error related to the OCC not waiting long enough to collect the failure information for a checkstop that occurs on a busy system. When this error happens, it prevents checkstop diagnosis procedures from identifying the cause of the checkstop fault. For this error, no active error bits are found and the checkstop analysis failure error log is mapped to a SEL which directs the customer to contact support as shown below: 1 | 05/17/2018 | 20:49:27 | System Firmware Progress Boot Progress | Motherboard initialization () | Asserted 2 | 05/17/2018 | 20:50:13 | System Firmware Progress Boot Progress | System boot initiated () | Asserted 3 | 05/18/2018 | 09:27:07 | OEM record c0 | 040020 | ceff6fffffff ==> Checkstop Signal , check other serviceable SELs and resolve them 4 | 05/18/2018 | 09:27:44 | System Firmware Progress Boot Progress | Motherboard initialization () | Asserted 5 | 05/18/2018 | 09:27:56 | OEM record df | 040020 | 12046faa0000 6 | 05/18/2018 | 09:28:05 | OEM record de | 000000 | 100000000005 ==> Procedure callout, decodes to contact next level of suppoort of assistance 7 | 05/18/2018 | 09:28:31 | System Firmware Progress Boot Progress | System boot initiated () | Asserted The following steps can be used to identify the signature for this problem. Look for the SEL indicating checkstop signal as shown below: 3 | 05/18/2018 | 09:27:07 | OEM record c0 | 040020 | ceff6fffffff ==> Checkstop Signal , check other serviceable SELs and resolve them If found, look for a "OEM de" SEL that is logged a few minutes after the checsktop signal SEL, decoding into "contact next level of support for assistance": 5 | 05/18/2018 | 09:27:56 | OEM record df | 040020 | 12046faa0000 6 | 05/18/2018 | 09:28:05 | OEM record de | 000000 | 100000000005 ==> Procedure callout, decodes to contact next level of suppoort of assistance If found, collect plc.pl output and inspect the decoded eSEL (PEL) associated with the checkstop analysis. If the signature description in the PRD log reads "No active error bits found", the checkstop analysis failure is confirmed. | Reference Code : BC70E550 | | Hex Words 2 - 5 : 000000E0 00000B00 00000000 00200000 | | Hex Words 6 - 9 : 000B0004 00000103 BC4ADD02 00000000 | | | | ModuleId : 0x0B | | Reason Code : 0xE550 | | Code Location : 0x0103 | | | | PRD SRC Type : PRD Detected Hardware Indication | | PRD SRC Class : Software likely caused a hardware error condition,| | : smaller possibility of a hardware cause. | | | | PRD Signature : 0x000B0004 0xBC4ADD02 | | Signature Description : mcs(n0p1c0) No active error bits found |
A problem was fixed for the VBAT sensor always reading out of range.
A problem was fixed for a BMC shutdown on a "mc reset cold" command.
A problem was fixed for the IPMI SOL console losing characters from the output stream intermittently.
A problem was fixed for losing sensor polling after using IPMI commands to disable sensor polling and then enable it. Every sensor reading stops working after doing the Disable/Enable sequence.
A problem was fixed for truncated characters in the SEL description of Session Audit.
A problem was fixed for a crash of the system when doing a PNOR reprovision followed by an "opal-gard list --all" command.
A problem was fixed for a long flash time for the PNOR using the pUpdate tool. With the fix, the flash time was reduced to 18 minutes from more than 30 minutes.
A problem was fixed for error message "An error occurred while reading the request" being displayed during a PNOR update using the Web GUI. Without the fix, temporary directory /tmp/rsync_file/ was filled to capacity. With the fix, the BMC temporary work directories are cleaned up before the PNOR update. |
V0.24-20180130/BMC 1.14
| Impact: Security Severity: SPE
This Service Pack includes updates in response to Recent Security Vulnerabilities, New Features & Functions and System Firmware Updates.
Details of each are below:
Response for Recent Security Vulnerabilities
In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754. Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.
New features and functions (not related to above CVE’s)
Support for DD2.1 versions of the Power 9 processors
Support was added for compile time certificates to STUNNEL allow full 3650 days validity. This change allows JAR to have the newest STUNNEL certificates for each build.
System firmware changes that affect all systems
A problem was fixed for the ipmitool dcmi power limit that was ignoring action states. With the fix, the action is set as shown for this example: "ipmitool dcmi power set_limit action power_off" Current Limit State: Power Limit Active Exception actions: Hard Power Off & Log Event to SEL Power Limit: 1500 Watts Correction time: 0 milliseconds Sampling period: 0 seconds
A problem was fixed for the ipmitool chassis poh (Power On Hours) counter getting reset to zero. The counter should have been stored in non-volatile storage to increment the hour count whenever the system is powered up and it should not reset to zero when the system is powered off or the BMC is reset.
A problem was fixed for an intermittent hang in iKVM when used in full-screen mode.
A problem was fixed for improving the security of BMC SSH session by adding support for CBC Ciphers and enabling Weak MAC algorithms.
A problem was fixed for a checkstop and re-ipl occurring during a normal reboot of the system.
A problem was fixed for an intermittent hang on a re-IPL of a system. This can occur in ISTEP 10.10 (proc_pcie_scominit) for around 600 seconds, which then recovers to complete the IPL. This problem may also have a symptom of an ipmitool get power limit information fail with reason code 0x250a if attempted during the IPL stall state. It is also possible the IPL could stall at ISTEP 21.1 during host_runtime_setup.
A problem was fixed for an unexpected reboot of the system and NTP is enabled from the BMC web interface.
A problem was fixed for intermittent hangs for PCIe adapters going through an error reset. The wrong PE number was being configured for the device but the PE mapping has been corrected. Larger PE numbers were truncated causing additional PE errors and freeze errors.
A problem was fixed for intermittent host checkstops caused by NCU and PCI timeout mismatches. PCI timeouts that are longer than NCU timeouts may cause checkstops on the host.
A problem was fixed for incorrect guards on the MCA when there are faulty DIMMs.
A problem was fixed for not setting the guard/deconfiguration bits correctly on bad devices, allowing these to be accessed during the reconfiguration loop of the IPL. This can lead to errors that fail the IPL.
A problem was fixed for an IPL failing in step 6.11 after encountering a checkstop.
A problem was fixed to disable out-of-store behavior in PCIe devices and improve MMIO packet rate performance
|
V0.19-20171030D | Impact: New Severity: New GA Level |
OS levels supported by the LC servers:
- Red Hat Enterprise Linux 7.4 for IBM Power LE (POWER9)
-Red Hat Enterprise Linux 7.4 for IBM Power LE (POWER9) /RHEV 4.1
–Ubuntu 16.04.3
–Ubuntu 16.04.3 / KVM
–Ubuntu 17.04
–Ubuntu 17.04 / KVM
IBM Power LC servers supports Linux which provides a UNIX like implementation across many computer architectures. Linux supports almost all of the Power System I/O and the configurator verifies support on order. For more information about the software that is available on IBM Power Systems, see the Linux on IBM Power Systems website:
http://www.ibm.com/systems/power/software/linux/index.html
The Linux operating system is an open source, cross-platform OS. It is supported on every Power Systems server IBM sells. Linux on Power Systems is the only Linux infrastructure that offers both scale-out and scale-up choices. One supported version of Linux on the IBM Power LC is Ubuntu Server 17.04 for IBM POWER9. For more information about Ubuntu Server for Ubuntu for POWER9 see the following website.
https://wiki.ubuntu.com/ppc64el
Another supported version of Linux on the Power LC servers is Red Hat Enterprise Linux 7.4 LE. For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog at https://hardware.redhat.com.
For information about the PowerLinux Community, see the following website:
https://www.ibm.com/developerworks/group/tpl
For information about the features and external devices that are supported by Linux, see this website:
http://www.ibm.com/systems/power/software/linux/index.html
Use one of the following commands at the Linux command prompt to determine the current Linux level:
•cat /proc/version
•uname -a
The output string from the command will provide the Linux version level.
The opal-prd package on the Linux system collects the OPAL Processor Recovery Diagnostics messages to log file /var/log/syslog. It is recommended that this package be installed if it is not already present as it will help with maintaining the system processors by alerting the users to processor maintenance when needed.
On Ubuntu Linux, perform command, dpkg -l "opal-prd". The output shows whether the package is installed on your system by marking it with ii (installed) and un (not installed).
This package provides a daemon to load and run the OpenPower firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware.
If the package is not installed on your system, the following command can be run on Ubuntu to install it:
sudo apt-get install opal-prd
On Red Hat Linux, perform command "rpm -qa | grep -i opal-prd ". The command output indicates the package is installed on your system if the rpm for opal-prd is found and displayed. This package provides a daemon to load and run the OpenPower firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware. If the package is not installed on your system, the following command can be run on Red Hat to install it:
sudo yum update opal-prd
Use the ipmtool "fru" command or the BMC Web GUI FRU option to look at product details of FRU 47.
ipmitool -I lanplus -H <bmc host IP address> -P admin -U ADMIN fru print 47
Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.
The updating and upgrading of system firmware depends on several factors, such as the current firmware that is installed, and what operating systems is running on the system.
These scenarios and the associated installation instructions are comprehensively outlined in the firmware section of Fix Central, found at the following website:
http://www.ibm.com/support/fixcentral/
Any hardware failures should be resolved before proceeding with the firmware updates to help insure the system will not be running degraded after the updates.
The pUpdate utility is provided with the firmware update files from IBM Fix Central. It can be used to perform in-band (from the host OS), in-band update recovery, and out-of-band updates by selecting either the "-i usb" , "-i bt" or "-i lan" parameters, respectively on the command invocation. The code update needs to be done in two steps: 1) Update the BMC firmware and 2) Update the CEC PNOR for the hostboot and the OPAL components. It is recommended that the BMC be updated first unless otherwise specified in the firmware install instructions.
Before using the pUpdate command on the host, make sure that the ipmi driver is loaded in the kernel and the ipmi service is started.
Note: For updates that use the "usb" or "bt" pUpdate option, you must use the root user ID and password to log in to the host operating system. After you log in to the host operating system, ensure that the IPMI service is activated.
# chkconfig ipmi on
# service ipmi start
For more information about activating the IPMI service, see the OpenIPMI Driver: https://www.ibm.com/support/knowledgecenter/POWER8/p8eih/p8eih_ipmi_open_driver.htm
For in-band update, use the following "-i usb" invocation of pUpdate:
BMC update: "pUpdate -f bmc.bin -i usb", where bmc.bin is the name and location of the BMC image file.
PNOR update: "pUpdate -pnor pnor.bin -i usb", where pnor.bin is the name and location of the PNOR image file.
If the in-band update fails on the BMC, use the recovery option with the Block Transfer (bt) invocation of pUpdate:
BMC update: "pUpdate -f bmc.bin -i bt -r y" where bmc.bin is the name and location of the BMC image file.
PNOR update:" pUpdate -pnor pnor.bin -i bt " where pnor.bin is the name and location of the PNOR image file.
For more information on BMC recovery steps, refer to the following link in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8eis/p8eis_console_problem.htm
If the host is not booted, a network connection can be made to the BMC and an out-of-band update done with the following LAN invocation from a Linux companion system:
BMC update: " pUpdate -f bmc.bin -i lan -h xx.xx.xx.xx -u ADMIN -p ADMIN -r y" where bmc.bin is the name and location of the BMC image file, xx.xx.xx.xx is the IP address of the BMC.
PNOR update: "pUpdate -pnor pnor.bin -i lan -h xx.xx.xx.xx -u ADMIN -p ADMIN " where pnor.bin is the name and location of the PNOR image file and xx.xx.xx.xx is the IP address of the BMC.
For more details on how to use the pUpdate utility, refer to the following link:
https://www.ibm.com/support/knowledgecenter/POWER9/p9eit/p9eit_update_firmware_pupdate.htm
You can use diagnostic utilities to diagnose adapter problems.
For more details on how to use the diagnostic utilities, refer to the following link:
https://www.ibm.com/support/knowledgecenter/POWER9/p9eit/p9eit_diags_kickoff.htm
Open Power requires Source Forge ipmitool level v1.8.15 to execute correctly on the P9 V0.19-20171030D and later firmware.
Another method to update the system firmware is by using the baseboard management controller (BMC).
The system firmware is a combination of the BMC firmware and the PNOR firmware. To update the system firmware, update both the BMC firmware and the PNOR firmware by using the BMC.
Note: System firmware update from the BMC Web GUI is only supported on Google Chrome and Mozilla Firefox browsers.
Complete the following steps to update the BMC firmware:
1. Log in to the BMC by entering the user name and password. Then, press Enter.
2. From the Maintenance list on the BMC dashboard, select BMC Update.
3. In the BMC Update window, select Enter Update Mode. Click OK.
4. In the BMC Upload window, choose the .bin file from your local system folder and click Upload Firmware. Wait for the file to be uploaded. Then, click OK.
5. The existing and new versions of the BMC firmware are displayed. Ensure that the Preserve Configuration check box is selected and the Preserve SDR check box is not selected. Click Start Upgrade.
Note: You cannot perform other activities by using the BMC interface until the firmware update is complete.
6. The upgrade progress of the firmware update is displayed. After the BMC update is complete, the system is restarted.
7. After the restart of the system is complete, verify the firmware revision level in the System menu of the BMC dashboard.
Complete the following steps to update the PNOR firmware:
1. Log in to the BMC by entering the user name and password. Then, press Enter.
2. From the Maintenance list on the dashboard, select PNOR Update.
3. In the PNOR Upload window, choose the .pnor file from your local system folder and click Upload PNOR. Wait for the file to be uploaded. Then, click OK.
4. The existing and new dates of the PNOR firmware are displayed. Click Start Upgrade.
Note: You cannot perform other activities by using the BMC interface until the PNOR update is complete.
5. The progress of the PNOR update is displayed. After the PNOR update is completed, the system must be restarted to finish installation of the new PNOR firmware.
For more information on updating the firmware using the BMC, refer to the following link:
https://www.ibm.com/support/knowledgecenter/POWER9/p9eit/p9eit_update_firmware_bmc.htm
System I/O devices have firmware that can be updated.
Please see the appropriate IBM Knowledge Center for these servers for applicable I/O firmware update information:
9006-22C:
http://www.ibm.com/support/knowledgecenter/POWER9/p9hdx/9006_22c_landing.htm
The service processor, or baseboard management controller (BMC), provides a hypervisor and operating system-independent layer that uses the robust error detection and self-healing functions that are built into the POWER processor and memory buffer modules. Open power application layer (OPAL) is the system firmware in the stack of POWER processor-based Linux-only servers.
The service processor, or baseboard management controller (BMC), is the primary control for autonomous sensor monitoring and event logging features on the LC server.
The BMC supports the Intelligent Platform Management Interface (IPMI) for system monitoring and management. The BMC monitors the operation of the firmware during the boot process and also monitors the OPAL hypervisor for termination. The firmware code update is supported through the BMC and Intelligent Platform Monitoring Interface (IPMI) and the BMC Web GUI The GUI console is accessed using a web browser with a "http:" connection to port. See section 1.2 for the supported browsers that can be used with BMC Web GUI.
The Open Power Abstraction Layer (OPAL) provides hardware abstraction and run time services to the running host Operating System.
For these LC servers, only the OPAL bare-metal installs can be used. A KVM can be used on top of the installed OS to run Linux virtual guest OS machines.
Find out more about OPAL skiboot here:
https://github.com/open-power/skiboot
The Intelligent Platform Management Interface (IPMI) is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. It is the default console to use when you configure PowerKVM. These LC servers provide one 10M/100M baseT IPMI port.
The ipmitool is a utility for managing and configuring devices that support IPMI. It provides a simple command-line interface to the service processor. You can install the ipmitool from the Linux distribution packages in your workstation, sourceforge.net, or another server (preferably on the same network as the installed server). For example, in Ubuntu, use this command:
$ sudo apt-get install ipmitool
For installing ipmitool from sourceforge, please see section 1.1 "Minimum ipmitool Code Level".
For more information about ipmitool, there are several good references for ipmitool commands:
1.The man page
2.The built-in command line help provides a list of IPMItool commands:
# ipmitool help
3.You can also get help for many specific IPMItool commands by adding the word help after the command:
# ipmitool channel help
4.For a list of common ipmitool commands and help on each, you may use the following link:
www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabpcommonipmi.htm
To connect to your host system with IPMI, you need to know the IP address of the server and have
a valid password. To power on the server with the ipmitool, follow these steps:
1. Open a terminal program.
2. Power on your server with the ipmitool:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password power on
3. Activate your IPMI console:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password sol activate
Petitboot is a kexec based bootloader used by IBM POWER9 systems for doing the bare-metal installs on these LC servers.
After the POWER9 system powers on, the petitboot bootloader scans local boot devices and network interfaces to find boot options that are available to the system. Petitboot returns a list of boot options that are available to the system. If you are using a static IP or if you did not provide boot arguments in your network boot server, you must provide the details to petitboot. You can configure petitboot to find your boot with the following instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootadvanced.htm
You can edit petitboot configuration options, change the amount of time before Petitboot automatically boots, etc. with these instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootconfig.htm
After you select to boot the ISO media for the Linux distribution of your choice, the installer wizard for that Linux distribution walks you through the steps to set up disk options, your root password, time zones, and so on.
You can read more about the petitboot bootloader program here:
https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html
Kernel-based Virtual Machine (KVM) is a cross-platform, open source hypervisor that provides enterprise-class performance, scalability and security to run Linux and other workloads on a range of processor architectures. For these LC servers, Ubuntu KVM or Red Hat Enterprise Virtualization (RHEV) may be installed on top of a bare-metal install of Ubuntu or RHEL, respectively.
Red Hat Enterprise Virtualization (RHEV) for IBM Power is an enterprise virtualization product produced by Red Hat, based on the KVM hypervisor. For more information, go to this link on the Red Hat portal:
Ubuntu KVM is configured by installing the missing virtualization packages (qemu-user qemu-utils cloud-image-utils qemu-system-ppc qemu-slof libvirt-bin numactl); adding users in a KVM group; disabling the SMT mode of the cpu using the ppc64_cpu tool; and enabling the KVM module in the kernel. For more information on how to complete these steps, refer to this link in the Ubuntu wiki: https://wiki.ubuntu.com/ppc64el/CommonQuestions#How_to_use_Ubuntu_as_a_hypervisor.3F
IBM PowerKVM is not supported on these LC servers.
Note: These Power LC servers and their KVM options do not support AIX or IBM i guest VMs and cannot be managed by a HMC
For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog:
This guide helps you install Ubuntu on a Linux on a Power Systems server.
Overview
Use the information found in http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabwkickoff.htm to install Linux, in this case Ubuntu, on a non-virtualized (bare metal) IBM Power LC server. Note that the choice of PowerKVM is offered in the link but that is not a supported OS for these LC servers.
Date | Description |
04/12/2019 | OP910.31 (V0.55-20190325/BMC 2.06) release |
10/08/2018 | V0.24.3-20180926/BMC 1.23 release |
08/28/2018 | Re-publish to change pUpdate tool level (pnor/bmc binaries remain unchanged) |
08/03/2018 | V0.24.2-20180801/BMC 1.23 release |
07/17/2018 | V0.24-20180130/BMC 1.23 release |
02/16/2018 | V0.24-20180130/BMC 1.14 release |
12/01/2017 | V0.19-20171030D release for Power 9 S922LC (9006-22C) and Elastic Storage Server (5104-22C) |
|
|