IBM POWER9 Systems LC Server Firmware
Applies to: AC922 (8335-GTG)
This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.
This package provides firmware for the Power System AC922 (8335-GTG) server only.
The firmware level in this package is:
• OP910.10 / PNOR OP9_v1.19_1.154 / BMC ibm-v2.0-0-r44
This section specifies the "Minimum ipmitool Code Level" required by the System Firmware for managing the system. Open Power requires ipmitool level v1.8.15 or later to execute correctly on the OP910 firmware. It must be capable of establishing a IPMI v2 session with the ipmi support on the BMC.
Verify your ipmitool level on your linux workstation using the following command:
bash-4.1$ ipmitool -V
ipmitool version 1.8.15
If you are need to update or add impitool to your Linux workstation , you can compile ipmitools (current level 1.8.15) for Linux as follows from the Sourceforge:
1.1.1 Download impitool tar from http://sourceforge.net/projects/ipmitool/ to your linux system
1.1.2 Extract tarball on linux system
1.1.3 cd to top-level directory
1.1.4 ./configure
1.1.5 make
1.1.6 ipmitool will be under src/ipmitool
You may also get the ipmitool package directly from your workstation linux packages.
For specific fix level information on key components of IBM Power Systems LC and Linux operating systems, please refer to the documentation in the IBM Knowledge Center for the AC922 (8335-GTG):
https://www.ibm.com/support/knowledgecenter/POWER9/p9hdx/8335_gtg_landing.htm
If using xCAT on the host OS to do firmware updates, the minimum xCAT level that should be used is 2.13.4 because it has stability improvements for the firmware update process. See the xCAT 2.13.4 release notes below for more information.
https://github.com/xcat2/xcat-core/wiki/XCAT_2.13.4_Release_Notes
Downgrading firmware from any given release level to an earlier release level is not recommended.
If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.
Concurrent Firmware Updates not available for LC servers.
Concurrent system firmware update is not supported on LC servers.
Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.
For the LC server systems, the installation of system firmware is always disruptive.
The BMC and PNOR image tar files are used to update the primary side of the PNOR and the primary side of the BMC only, leaving the golden sides unchanged.
obmc-phosphor-image-witherspoon.ubi.mtd.tar
witherspoon.pnor.squashfs.tar
Filename | Size | Checksum |
obmc-phosphor-image-witherspoon.ubi.mtd.tar | 18196480 | a40751b9bcc1ac6fa52c2b219ae23704 |
witherspoon.pnor.squashfs.tar | 22609920 | c80b87242c744d64cfdb9b8e4ae64d74 |
Note: The Checksum can be found by running the Linux/Unix/AIX md5sum command against the Hardware Platform Management (hpm) file (all 32 characters of the checksum are listed), ie: md5sum <filename>
After a successful update to this firmware level, the PNOR components and BMC should be at the following levels.
To display the PNOR level, use the following BMC command: "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION"
And the BMC command line command "cat" can be used to display the BMC level: "cat /etc/os-release".
Note: FRU information for the PNOR level does not show the updated levels via the fru command until the system has been booted once at the updated level.
PNOR firmware level: driver content
display pnor FW level using this cmd: "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION"
IBM-witherspoon-ibm-OP9_v1.19_1.154
op-build-v1.21.2-243-ga721185-dirty
buildroot-2017.11-5-g65679be
skiboot-v5.10.3-op910-1
hostboot-7329fe0
linux-4.14.24-openpower1-p70ef7b9
petitboot-v1.6.6-p0cd10f8
machine-xml-ecb6952
occ-8cb5727
hostboot-binaries-9bd4056
capp-ucode-p9-dd2-v3
sbe-706e220
openBMC level:
display BMC FW level via ssh session on the BMC , using this cmd root@witherspoon:~# cat /etc/os-release
id: openbmc-phosphor
name: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)
version: ibm-v2.0
version_id: ibm-v2.0-0-r44-0-g843c2e1
pretty_name: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.0
build_id: ibm-v2.0-0-r44
OP910.10 | |
|
|
OP9_v1.19.1.154 / OP910.20
04/18/18 | Impact: Availability Severity: SPE
This Service Pack includes updates in response to Recent Security Vulnerabilities, New Features & Functions and System Firmware Updates. Details of each are below:
Response for Recent Security Vulnerabilities
In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue number CVE-2017-5754 with firmware initializations augmenting an earlier fix provided in FW level OP910.10. Operating System updates are required in conjunction with the new FW level for addressing CVE-2017-5754.
New features and functions (not related to above CVE)
Support for voltage-droop monitors (VDM) to provide for improved system reliability during periods of unstable voltage from the power supply. The P9 processor uses an adaptive clock strategy to reduce the system power usage during power supply droop events by embedding analog VDMs that direct a digital phase-locked loop (DPLL) to immediately reduce clock frequency in response to the droop event
Support for Workload Optimized Frequency (WOF). This feature provides the maximum processor frequency in order to increase system performance based on workload characteristics.
Support was added for using "ipmitool mc info" from the host OS to get the BMC firmware level.
Support was added to increase the number of NPU2 register contents dumped for NVLINK Hypervisor Maintenance Interrupts (HMIs) and to add logging for the HMI actions.
Support was added to make the Self Boot Engine (SBE) fault indicator bits recoverable. This means if a SBE seeprom error occurs, recovery action will be taken to prevent an IPL failure or system outage.
System firmware changes that affect all systems
A problem was fixed for an On-Chip Controller (OCC) not going active caused by a race condition in the initialization of the OCCs. This problem is intermittent and can be resolved by a re-IPL of the system.
A problem was fixed for the BMC journal file getting overwritten with network change notifications when there is a IPv6 router in the local subnet. A problem was fixed for the BMC version fields not being set as shown by "ipmitool mc info" and the Petitboot System Information UI. The BMC can be accessed by SSH (secured shell) login and the following command run to show the BMC firmware level: "cat /etc/os-release". Look for the "VERSION=" string that has the BMC version identifier appended to it.
A problem was fixed for the display of the power supply output outage that in one instance was showing as 390V instead of 12V. The voltage is at the right level but recent revisions of the power supply firmware had a change in how the output voltage was calculated, causing the displayed values to read too high.
A problem was fixed for VLAN ID showing as "Disabled" with the "ipmitool lan print 1" after the VLAN was set by inband by the OS. The VLAN is set correctly and functional, but the display of the VLAN information, while initially correct, went to "Disabled" during the first minute after the operation.
A problem was fixed for no amber fault LEDs being lit (or SELs reported) for front or rear fan rotors that have a RPM of zero due to blockage or other hardware error.
A problem was fixed for the host failing during a reset of the BMC when a host to BMC message had a time out. This problem is rare as the host normally stays up and running when the BMC is reset.
A problem was fixed for multi-rotor failures in a fan not causing a system shutdown, making it possible for the system to fail from an overheat condition that could be destructive to other system FRUs. This problem is rare as it requires that more than one rotor fail in a system fan at the same time.
A problem was fixed for a change or enablement of the NTP time server not forcing a network time synchronization, potentially leaving the BMC local time different from the network time. This problem can be circumvented by a reset of the BMC.
A problem was fixed for a BMC reset causing the On-Chip Controller (OCC) to fail and the system going into Safe mode. This is an infrequent problem that is triggered if a BMC reset and a OCC reset happen at the same time such that the BMC is unable to respond to OCC messages, forcing the OCC into a failed state.
A problem was fixed for error log "BC8A2502 - IPMI::RC_INVALID_SENDRECV" occurring during the system IPL. The On-Chip Controller (OCC) error is automatically recovered, so the error log does not impact the system.
A problem was fixed for error log " BC8A2507 - IPMI::RC_SENSOR_NOT_PRESENT" that can occur on a system power on if the BMC was reset at system runtime previously. When the BC8A2507 error occurs, the host uses the default value for the sensor data. The problem will persist for the IPL until the BMC is reset.
A problem was fixed for a power supply FQPSPPW0034M error persisting with enclosure fault LEDs lit even after the power supply problem has been corrected. The fault can be triggered by a momentary loss of AC or by unplugging and plugging AC into the power supply.
A problem was fixed for Coherent Accelerator Processor Proxy (CAPP) mode for the PCI Host Bridge (PHB) to improve DMA write performance by enabling channel tag streaming for the PHB. With this enabled, the DMA write does not have to wait for a response before sending a new write command on the bus.
A problem was fixed for the Open-Power Flash tool "pflash" failing with a blocklevel_smart_erase error during a pflash. This problem is infrequent and is triggered if pflash detects a smart erase fits entirely within one erase block.
A problem was fixed in the Petitboot user interface to handle cursor mode arrow keys for the VT100 'application' cursor to prevent mis-interpreting an arrow key as an escape key in some situations. For more information on the VT100 cursor keys, see http://www.tldp.org/HOWTO/Keyboard-and-Console-HOWTO-21.html.
A problem was fixed in the Petitboot user interface to cancel the autoboot if the user has exited the Petitboot user interface. This prevents the user dropping to the shell and then having the machine boot on them instead of waiting until the user is ready for the boot.
A problem was fixed in the Petitboot parsing of manually-specified configuration files that caused the parser to create file paths relative to the downloaded file's path, not the original remote path.
A problem was fixed for a failure to IPL with SRC BC8A0506 logged for a Phase Lock Loop error (PLL) in the PCIe Host Bridge (PHB). This problem is very infrequent. The fix does the correct call out of the failed FRU, allowing the IPL to continue.
A problem was fixed for a system IPL hang that shows in the log as the host going to a quiesce state with the OS inactive. This is a rare problem that may be recovered by a power off and re-IPL of the system. This problem is triggered by a higher than normal level of interrupts from the Power Supply Unit (PSU).
A problem was fixed for the VPD serial number not being updated on the replacement of a planar. The VPD update failed with the following message: "ERROR: (ECMD): ecmd - 'putvpdkeyword' returned with error code 0x20300001 (ERROR OPENING DECODE FILE). ERROR: A problem occurred updating the serial number(OSYS:SS). Please see previous output for reason ".
A problem was fixed for the CAS latency calculation for memory to improve its accuracy to reduce the potential for DIMM failures due to memory timing errors. Column Access Strobe (CAS) latency is the delay time between the moment a memory controller tells the memory module to access a particular memory column on a RAM module, and the moment the data from the given array location is available on the module's output pins.
A problem was fixed for clearing DIMM guard records when there was a repair marked in the VPD and that prevented the DIMM from being unguarded. With the fix, the VPD mark will be cleared if the guard record is cleared for the FRU, allowing it to be enabled on the next IPL.
A problem was fixed for the Self Boot Engine (SBE) error identification on failure. The SRR0/SRR1/LR/Local FI2C register are now extracted to allow the following SBE errors to now be identified: 100 - Program interrupt , promoted 101 - Instruction storage interrupt, promoted 110 - Alignment interrupt, promoted 111 - Data storage interrupt, promoted
A problem was fixed for read margins to improve the margins on DIMMs, reducing the number of DIMM failure occurrences.
A problem was fixed for the Hostboot reset to enable error recovery of Hostboot through the reset path. Without the fix, the Self Boot Engine (SBE) fails to reboot on the Hostboot reset, preventing error recovery for the Hostboot failures.
A problem was fixed for a memory training error that could caused DIMMS to be marked as bad or memory ports to be deconfigured. This problem is rare and triggered by an incorrect internal voltage level.
A problem was fixed for a Phase Lock Loop (PLL) error causing a checkstop but not calling out and guarding the failed hardware. There is then a chance the failure will recur on the next IPL of the system.
A problem was fixed to reduce memory latency in memory blocks where bad memory bits have been marked.
A problem was fixed for time and date fields being zero in Hostboot error log entries for the time/date of the error occurrence.
A problem was fixed for an extraneous "MCBISTFIR[3]: broadcast out of sync" error during memory diagnostics if a Register Clock Driver (RCD) parity error occurs. The "broadcast out of sync" error should be ignored when isolating the RCD fault. This problem is triggered if the RCD parity error occurs while the DDR4 memory is in broadcast mode.
A problem was fixed for BC8A2AC5 and BC8A2AC4 errors that prevented the reading of On-Chip Controller (OCC) thermal readings from the Analog Power Subsystem Sweep (APSS) bus. This is a very rare problem.
A problem was fixed for a processor fault that caused the master processor core to guard and prevented an IPL of the system with SRC BC13E540. With the fix, the system will IPL up on the available processor cores. This error only occurs if the master core is faulted. Faults on the other cores are handled correctly and do not stop an IPL.
A problem was fixed for a flood of OPAL error messages that can occur for a processor fault. The message "CPU ATTEMPT TO RE-ENTER FIRMWARE" appears as a large group of messages and precede the relevant error messages for the processor fault. A reboot of the system is needed to recover from this error.
|
OP9_v1.19_1.111 / OP910.10 | Impact: Availability Severity: SPE
This Service Pack includes updates in response to Recent Security Vulnerabilities, New Features & Functions and System Firmware Updates. Details of each are below:
Response for Recent Security Vulnerabilities
In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754. Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.
New features and functions (not related to above CVE’s)
Support was added for increasing the number of BMC error logs from 100 to 200 and changing the error log to roll over old entries when full instead of stopping the logging of errors. Without this feature, the error log would get full at 100 entries and error logging would be stopped until some of the error logs were purged to make room for new entries.
Support was added to enable power supply redundancy.
Enable air-cooled fan control to optimize fan speeds for the temperature conditions and improve fan speed control to minimize fan speed oscillation.
Support was added for advanced power supply fault monitoring to improve fault isolation, error detection, and reliability.
Support was added for forcing a new dump type, Checkstop, if the host has a checkstop. Without this new dump, critical debug information is missing because the /var/lib/obmc_console.log does not
System firmware updates (not related to above CVE’s)
A problem was fix for intermittent processor core hangs that caused checkstops with code "NCU no response to snooped TLBIE".
A problem was fixed for fans being reported as "Nonfunctional". This error occurred during peak loads on the BMC that tripped a watchdog process, causing the fans to speed up to the maximum speed. An error in the fan recovery to normal speed resulted in the "Nonfunctional" status.
A problem was fixed for a processor replacement that caused extra cores to be reported as present that do not exist. This happens if the new processor has fewer cores than the processor that is being replaced. This problem can be recovered by doing a factory reset on the BMC.
A problem was fixed for GPU temperatures not being reported on systems that have maximum DIMM configurations for the memory. Without the fix, reducing the number of DIMMs plugged in would make available On-Chip Controller (OCC) slots for missing GPU temperatures to be reported.
A problem was fixed for the host time inadvertently changing when a BMC time change is requested in NTP mode with Split ownership. The problem can be recovered by IPLing to the host and the NTP server will correct the host time.
A problem was fixed for the host time skewing ahead in time after time ownership is split and the clock has been set from the BMC. The problem can be recovered by setting the correct time from the host.
A problem was fixed to reject the use of the path /org/openbmc on the REST API URIs. This affects the API /org/openbmc/sensors/host/PowerSupplyRedundancy which is no longer valid.
A problem was fixed for the BMC REST server going into a retry hang with the BMC becoming unresponsive when given a REST command with a bad data format. Without the fix, the REST server will repeatedly retry the bad command, causing a denial of service for all other users of the BMC.
A problem was fixed for an On-Chip Controller (OCC) read failure with ERRNO=11 during a IPL. This intermittent problem was caused by an overflow of the total system power value from the OCC. The system can be recovered by retrying the IPL.
A problem was fixed for the ECC error recovery. Error recovery was not working and the ECC errors would prevent the boot.
A problem was fixed for an intermittent power on failure with message "Error in mapper call to get service name". To recover from this problem. power cycle the BMC and try the boot again.
A problem was fixed for an On-Chip Controller (OCC) read failure with ERRNO=19 during a power off of the system. This intermittent problem is an extraneous errror log and can be ignored.as the power off is successful.
A problem was fixed for an intermittent error message when activating firmware during a firmware update. This extraneous error message occurred with moderate frequency. This is internal server 500 error message returned on the REST enumerate request. The error message can be ignored because there is not a problem with the firmware activate.
A problem was fixed for the power button LED not blinking when in the standby state (not powered on). Without the fix, the power button always has a solid green LED, regardless of power on or power off state.
A problem was fixed for intermittent host checkstops caused by NCU and PCI time-out mismatches. PCI timeouts that are longer than NCU timeouts may cause checkstops on the host.
|
OP9_v1.19_1.94 / OP910.00 | Impact: New Severity: New New features and functions for MTM 8335-GTG: GA Level |
OS levels supported by the LC 8335 servers:
- Red Hat Enterprise Linux 7.4 for IBM Power LE (POWER9)
IBM Power LC 8335 servers supports Linux which provides a UNIX like implementation across many computer architectures. Linux supports almost all of the Power System I/O and the configurator verifies support on order. For more information about the software that is available on IBM Power Systems, see the Linux on IBM Power Systems website:
http://www.ibm.com/systems/power/software/linux/index.html
The Linux operating system is an open source, cross-platform OS. It is supported on every Power Systems server IBM sells. Linux on Power Systems is the only Linux infrastructure that offers both scale-out and scale-up choices.
A supported version of Linux on the Power LC 8335 servers is Red Hat Enterprise Linux 7.4 LE (POWER9). For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog at
https://access.redhat.com/products/red-hat-enterprise-linux/#addl-arch.
For more information about Linux on Power, see the Linux on Power developer center at https://developer.ibm.com/linuxonpower/
For information about the features and external devices that are supported by Linux, see this website:
http://www.ibm.com/systems/power/software/linux/index.html
Use one of the following commands at the Linux command prompt to determine the current Linux level:
•cat /proc/version
•uname -a
The output string from the command will provide the Linux version level.
The opal-prd package on the Linux system collects the OPAL Processor Recovery Diagnostics messages to log file /var/log/syslog. It is recommended that this package be installed if it is not already present as it will help with maintaining the system processors by alerting the users to processor maintenance when needed.
On Red Hat Linux, perform command "rpm -qa | grep -i opal-prd ". The command output indicates the package is installed on your system if the rpm for opal-prd is found and displayed. This package provides a daemon to load and run the OpenPower firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware. If the package is not installed on your system, the following command can be run on Red Hat to install it:
sudo yum update opal-prd
To display the PNOR level, use the following BMC command: "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION"
And the BMC command line command "cat" can be used to display the BMC level: "cat /etc/os-release".
Note: the "cat" commands are run after ssh to the BMC as root and the default password is 0penBmc (where 0 is the zero character).
Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.
The updating and upgrading of system firmware depends on several factors, such as the current firmware that is installed, and what operating systems is running on the system.
These scenarios and the associated installation instructions are comprehensively outlined in the firmware section of Fix Central, found at the following website:
http://www.ibm.com/support/fixcentral/
Any hardware failures should be resolved before proceeding with the firmware updates to help insure the system will not be running degraded after the updates.
The process of updating firmware on the OpenBMC managed servers is documented below.
The sequence of events that must happen is the following:
•Power off the Host
•Update and Activate BMC
•Update and Activate PNOR
•Reboot the BMC (applies new BMC image)
•Power on the Host (applies new PNOR image)
The OpenBMC firmware updates (BMC and PNOR) for the LC 8335 servers can be managed via the command line with the openbmctool.
The openbmctool is obtained using the IBM Support Portal.
1.Go to the IBM Support Portal.
2.In the search field, enter your machine type and model. Then click the correct product support entry for your system.
3.From the Downloads list, click the openbmctool for your machine type and model.
4.Follow the instructions to install and run the openbmctool. You will need to provide the file locations of the BMC firmware image tar and PNOR firmware image tar that must be downloaded from Fix Central for the update level needed.
Information on the openbmctool and the firmware update process can be found in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER9/p9ei8/p9ei8_update_firmware_openbmctool.htm .
The service processor, or baseboard management controller (BMC), provides a hypervisor and operating system-independent layer that uses the robust error detection and self-healing functions that are built into the POWER processor and memory buffer modules. Open power application layer (OPAL) is the system firmware in the stack of POWER processor-based Linux-only servers.
The service processor, or baseboard management controller (BMC), is the primary control for autonomous sensor monitoring and event logging features on the LC server.
The BMC supports the Intelligent Platform Management Interface (IPMI) for system monitoring and management. The BMC monitors the operation of the firmware during the boot process and also monitors the OPAL hypervisor for termination.
The Open Power Abstraction Layer (OPAL) provides hardware abstraction and run time services to the running host Operating System.
For the 8335 servers, only the OPAL bare-metal installs can be used.
Find out more about OPAL skiboot here:
https://github.com/open-power/skiboot
The Intelligent Platform Management Interface (IPMI) is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. The LC 8335 servers provide one 10M/100M baseT IPMI port.
The ipmitool is a utility for managing and configuring devices that support IPMI. It provides a simple command-line interface to the service processor. You can install the ipmitool from the Linux distribution packages in your workstation, sourceforge.net, or another server (preferably on the same network as the installed server).
For installing ipmitool from sourceforge, please see section 1.1 "Minimum ipmitool Code Level".
For more information about ipmitool, there are several good references for ipmitool commands:
The man page
The built-in command line help provides a list of IPMItool commands:
# ipmitool help
You can also get help for many specific IPMItool commands by adding the word help after the command:
# ipmitool channel help
For a list of common ipmitool commands and help on each, you may use the following link:
www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabpcommonipmi.htm
To connect to your host system with IPMI, you need to know the IP address of the server and have
a valid password. To power on the server with the ipmitool, follow these steps:
1. Open a terminal program.
2. Power on your server with the ipmitool:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password power on
3. Activate your IPMI console:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password sol activate
Petitboot is a kexec based bootloader used by IBM POWER9 systems for doing the bare-metal installs on the 8335 servers.
After the POWER9 system powers on, the petitboot bootloader scans local boot devices and network interfaces to find boot options that are available to the system. Petitboot returns a list of boot options that are available to the system. If you are using a static IP or if you did not provide boot arguments in your network boot server, you must provide the details to petitboot. You can configure petitboot to find your boot with the following instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootadvanced.htm
You can edit petitboot configuration options, change the amount of time before Petitboot automatically boots, etc. with these instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootconfig.htm
After you select to boot the ISO media for the Linux distribution of your choice, the installer wizard for that Linux distribution walks you through the steps to set up disk options, your root password, time zones, and so on.
You can read more about the petitboot bootloader program here:
https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html
This guide helps you install Linux on Power Systems server.
Overview
Use the information found in http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabwkickoff.htm to install Linux on a non-virtualized (bare metal) IBM Power LC server.
Date | Description |
04/18/2018 | Updated for OP910.20 |
03/22/2018 | Corrections for OP910.10 |
01/18/2018 | Updated for AC922 only for OP910.10 |
12/22/2017 | New for LC server OP910.00 release |