Sx Issue- Debug/Triage BKM



Sx Issue Triage and Debug StepsVersion 0.212/05/2014Table of Contents TOC \o "1-3" \h \z \u 1.Sx Issue- Debug/Triage BKM PAGEREF _Toc387670168 \h 41.1Scope PAGEREF _Toc387670169 \h 41.2Target audience PAGEREF _Toc387670170 \h 41.2.1Details PAGEREF _Toc387670171 \h 41.3Requirements PAGEREF _Toc387670172 \h 41.3.1 Hardware requirements PAGEREF _Toc387670173 \h 41.3.2 Software requirements PAGEREF _Toc387670174 \h 41.4Description PAGEREF _Toc387670175 \h 41.5System reset / shutdown unexpectedly during cycling PAGEREF _Toc387670176 \h 101.6Requirements PAGEREF _Toc387670177 \h 101.6.1 Hardware requirement PAGEREF _Toc387670178 \h 101.6.2 Software requirements PAGEREF _Toc387670179 \h 101.7Description PAGEREF _Toc387670180 \h 111.7.1 Examples on collecting PMC log is shown below PAGEREF _Toc387670181 \h 121.8Collect PMC log using RW tool – as explained below PAGEREF _Toc387670182 \h 121.8.1 Requirements PAGEREF _Toc387670183 \h 121.8.2 Description PAGEREF _Toc387670184 \h 121.9System hangs in OS Phase PAGEREF _Toc387670185 \h 151.10Requirements PAGEREF _Toc387670186 \h 151.10.1Hardware requirement PAGEREF _Toc387670187 \h 151.10.2?Software requirements PAGEREF _Toc387670188 \h 151.11Description PAGEREF _Toc387670189 \h 151.12BCDEDIT PAGEREF _Toc387670190 \h 161.13How to analyze crash dump PAGEREF _Toc387670191 \h 171.14OS Crash (BSOD) during cycling PAGEREF _Toc387670192 \h 181.15System hangs in BIOS Phase (POST code hangs) during cycling PAGEREF _Toc387670193 \h 181.16Requirements PAGEREF _Toc387670194 \h 181.16.1 Hardware requirement PAGEREF _Toc387670195 \h 181.16.2 Software requirements PAGEREF _Toc387670196 \h 191.17Description PAGEREF _Toc387670197 \h 191.18The hardware connections snap PAGEREF _Toc387670198 \h 201.19Teraterm Configuration PAGEREF _Toc387670199 \h 211.20The cause of Sx failure viz., ME PAGEREF _Toc387670200 \h 211.21Requirements PAGEREF _Toc387670201 \h 211.21.1 Hardware requirement PAGEREF _Toc387670202 \h 211.21.2 Software requirements PAGEREF _Toc387670203 \h 211.22Description PAGEREF _Toc387670204 \h 221.23Point of Contact PAGEREF _Toc387670205 \h 23Revision SheetRelease No.DateRevision DescriptionRev. 0.212/05/2014Revised DocumentSx Issue- Debug/Triage BKMScope The purpose of this document is to explain basic triage and debug that needs to be performed during Sx failure before filing sighting. This is first level triage and debug hence this may not have comprehensive list of issue scenarios. Target audience Sx Validation and debug teamDetailsFirst level debug Steps for different Sx issues that we come across are explained below in details.System hangs with CATERR during cyclingCollect MCDump using ITP – command to collect MCdumpitp.unlock()import syssys.path.append(r"\\hsw-tb\hsw\itp_scripts")import BdwMCDump [Note: this is for BDW CPU]Collect AFD dump as explained below:Requirements 1.3.1 Hardware requirementsITP box5V/2.5A adaptor 1.3.2 Software requirementsPlatform debug tool kitDFx Abstraction layer Python console DescriptionStep1: Connect ITP on the board Step2: Launch configuration console and select the appropriate target?Step3: Start the Intel DAL Python ConsolelefttopStep4: Make sure ITP connection is established with the target by typing 'itp.devicelist'Step5: Unlock the itp with 'itp.unlock' command. Step6: Open Platform debug kit and navigate to State?Freeze and dump?under ViewStep7: Select dump type as Hang Base and click on run.Note: This will trigger to collect AFD and the progress could be seen on Message log window.Step8: The dump file will be saved in the output folder mentioned in the output tab as shown below, also once after AFD get generated 'Run' tab will change to 'Stop'.System reset / shutdown unexpectedly during cyclingCollect PMC log file using Stardebug– Refer this link for BKM Requirements 1.6.1 Hardware requirementUTAG??? 1.6.2 Software requirementsdfStardebug applicationDescriptionBelow are the steps to establish connection between Stardebug and PCH.Step1: Connect Stardebug on the rear XDP port.Step2: Download the latest version of stardebug from the below link: . Step3: Locate Stardebug.exe from the extracted folder??Step4: On launching stardebug.exe the above highlighted debug blocks should be displayed. This indicates the connection to between stardebug and PCH is established. 1.7.1 Examples on collecting PMC log is shown belowOnce PCH get connected to Stardebug (Above step 4), log collection can be proceeded. Before initiating any log collection make sure the script file to create the log is placed in the same folder as stardebug.exe Locate 'dft' tab by typing ' sw dft' command Now initiate the command to start log file collection, enter run LptPmDump1.5M.lua as shown in the above screenshot. A text file Pmdump.text would have created in the same folder this is our PMC log file.Collect PMC log using RW tool – as explained below1.8.1 Requirements1.6.1.1 Software requirementsRWEverything?1.8.2 Description Step1: Launch RWEveryting and click on Memory icon as shown in the below figure.??Step2: Write 0x03030002 to 0xfed1f320Step3: Read DWORD from 0xFED1F338Step4: Decode the PMC value from the below table:Reg 0x303 (bit 0 to bit 7):BIt 7: LTRESET# With Policy 1 (LTRST_POL1): This bit is set to '1' by hardware when a global reset is triggered by an LTRESET# assertion with LT_E2STS.LT_RESET_POLICY = 1. BIT 6: ME-Initiated Global Reset (ME_GBL): This bit is set to '1' by hardware when a global reset is triggered by an ME FW write of 1's to both GENCTL-"ME Partition Reset” and “GENCTL”. ME-initiated Host Reset with Power Cycle" in the same write cycle (this is ME FW's method of requesting a global reset).BIT 5: CPU Thermal Trip (CPU_TRIP): This bit is set to '1' by hardware when a global reset is triggered by a CPU thermal trip event (i.e. an assertion of the THRMTRIP# pin).BIT 4: ME-Initiated Power Button Override (ME_PBO): This bit is set to '1' by hardware when a global reset is triggered by an ME FW write of '1' to GENCTL."ME-Initiated Power Button Override".BIT 3: ICH Catastrophic Temperature Event (ICH_CAT_TMP): This bit is set to '1 by hardware when a global reset is “Triggered by a catastrophic temperature event from the ICH internal thermal sensor”.BIT 2: PMC SUS RAM Uncorrectable Error (PMC_UNC_ERR): This bit is set to '1' by hardware when a global reset is triggered due to an uncorrectable parity error on a data read from one of the PMC SUS well register files.BIT 1: Power Button Override (PB_OVR): This bit is set to '1' by hardware when a global reset is triggered by a power button override (i.e. an assertion of the PWRBTN# pin for 5 seconds).BIT 0: SUS Well Power Failure Status (SUSFLR_STS): This bit is set to '1' by hardware when a global reset is triggered by loss of SUS well power. This includes DeepSx entry and G3.?Reg 0x304 (bit 8 to bit15):BIT 5: AS Well Power Failure (ASW_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of ASW power (i.e. a de-assertion of APWROK at an unexpected time).BIT 4: SYS_PWROK Failure (SYSPWR_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of SYS_PWROK. FW arms this global reset source via GBLRST_CTL.EN_SYSPWR_FLR.BIT 3: PCH_PWROK Failure (PCHPWR_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of PCH_PWROK. FW arms this global reset source via GBLRST_CTL.EN_PCHPWR_FLR.BIT 2: PMC Firmware Global Reset (PMC_FW): This bit is set to '1' by hardware when a global reset is triggered by a request from PMC firmware (i.e. a write of '1' to the GBLRST_CTL.TRIG_GBL bit).BIT 1: ME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware watchdog timer.BIT 0: PMC Firmware Watchdog Timer (PMC_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the PMC firmware watchdog timer.?Reg 0x305 (bit 16 to bit 23):BIT 4: Over-Clocking WDT Expiration In ICC Survivability Mode (OC_WDT_EXP_ICCSURV): This bit is set to '1 by hardware when a global reset is triggered by the expiration of the over-clocking watchdog timer while running in a mode that has ICC survivability impact (OC_WDT_ICCSURV=1).BIT3: Over-Clocking WDT Expiration In Non-ICC Survivability Mode (OC_WDT_EXP_NO_ICCSURV): This bit is set to '1 by hardware when a global reset is triggered by the expiration of the over-clocking watchdog timer while running in a mode that does not have ICC survivability impact (OC_WDT_ICCSURV=0).BIT2: ADR GPIO Reset (ADR_GPIO_RST): This bit is set to '1' by hardware when a global reset is triggered by the assertion of the GPIO assigned to ADR.BIT1: ME HW Uncorrectable Error (ME_UNCOR_ERR): This bit is set to '1' by hardware when a global reset is triggered by ME hardware due to the detection of an uncorrectable ECC or parity error on a data read from one of its SRAM s.BIT0: CPU Thermal Runaway Watchdog Timer (CPU_THRM_WDT): This bit is set to '1' by hardware when a global reset is triggered by the expiration of the CPU Thermal Runaway Watchdog Timer.Collect window Eventvwr log file BKM: Run EventVwr Windows system System hangs in OS Phase(System control transferred from BIOS to OS) during cycling - Example, Blank screen, window display freeze.Collect Windbg and analyze current status of system – Refer below for BKM: Requirements1.10.1Hardware requirementAjay's USB debug cable1.10.2?Software requirementsWindbg setup (x64 Preferable) DescriptionStep1: Install the USB to USB convertor driver on both the host and target machine. (Driver copied here:?\\akasha1\PSPV-Tools\windbg-driver). WinBlue OS has?inbox driver for the cable and it will install the driver automatically.Step2: Using USBVIEW tool find out the USB port1 on the target machine and connect the debug cable to port1 (Usually debug port is port1).Step3: Change the BIOS settings as mentioned below by pressing F2 while booting,Step4: Go to Intel Advanced Menu -> PCH-IO Configuration ->?USB Configuration; and set XHCI Mode?– Manual.Step5: Route USB 2.0 pins to which HC??? -???Route Per-Pin and set all the pin to XHCI except pin#1 and pin#11. Pin#1& pin#11 should be routed with EHCI itself.Step6: After seeing the BIOS, boot into the OS. BCDEDIT On an elevated command prompt run the below commands,bcdedit /debug onbcdedit /dbgsettings usb targetname: (type any name) bcdedit /set {dbgsettings} busparams 0.29.0 ?(bus, device and function of the usb root controller)Restart the target system. Open the windbg on the host machine and enter the target name under USB tab (File -> Kernal Debugging -> USB)Now the target will start pumping the debug messages to the kernel debug window ?lefttopHow to analyze crash dumpStep1: Navigate to?file > open crash dump then select the crash dump to be analyzed.Step2: In the command bar type 'analyze-v’, this is the command to analyze the crash dump. OS Crash (BSOD) during cyclingCollect dump file and analyze If no dump created then connect windbg and take dump – as explained above. System hangs in BIOS Phase (POST code hangs) during cyclingCollect BIOS Serial log - Refer below for the BKMRequirements1.16.1 Hardware requirementRS232 Null-Modem cableRVP with debug BIOS flashed.1.16.2 Software requirements Any UART terminal utility like Putty or Teraterm ?DescriptionStep1: Flash debugs BIOS which is downloadable from client download?[ex: HSW_LP_LPT_V106.3_Debug.rom] Step2: Enter into Bios using F2 -> Intel Advance Menu-> Debug Configuration-> Serial Debug messages-> Set the value as per your requirement.Step3: Connect Null-Modem cable to host and RVP?? ?(May need a USB to Serial Adapter to connect to host)Step4: Install Terminal program (Putty, Teraterm, and Termite) on host with following settings:Port: (Look in Device Manager/Ports)Baud rate: 115200Data: 8 bitParity: noneStop: 1 bitControl: noneStep5: Open putty-> Set Serial-> select the com as shown in the client device managerStep6: Boot the system (system will start pumping debug messages to Putty)Step7: Stop log file after system boots to OS.?The BIOS serial log will look like,?BIOS_Serial dump.txt?The hardware connections snap?Teraterm Configuration The cause of Sx failure viz., ME Please collect ME debug log – refer to the below BKMRequirements1.21.1 Hardware requirementDediprog hardware1.21.2 Software requirementsDediprog flash utilityFITC tool DescriptionStep1: Take the SPI BIOS fileStep2: Install FITC toolStep3: Browse for BIOS full image(16MB) file and modify with below settings, build new image and flash it on your target system.Step4: Make sure it's not a LAN-less imageStep5: In FITC, under ME Region -> Configuration -> ME Debug Event Service, set as shown below:Step6: Please make sure in? Event Filters, group 87 has the value 0x1 (you can leave other groups as-is)Step7:?To record it, connect another computer to the same LAN as the DUT (note: the DUT must be?connected using the built-in LAN, not any external PCIe card). On that other computer, run PDA (Platform Debug Analyzer) or WireShark, to record all the packets sent. You should see quite a lot (hundreds or more) packets sent on UDP port 64507.Step8: Reproduce your issue on the targetStep9: Go to the location? \\akasha1\temp\nramalin\Tools and install PDA.Step10: Connect LAN cable to target to host machine, ensure ping is successfulStep11: Launch PDA app, start capture the logCheck point before proceed on sighting:Make sure to latest BKC stack from here: Make sure system has all mandatory rework as applicable – Use this link to know applicable rework-{2DC03B06-DDA9-4EAA-ABA6-CA8A28FBF446}Make sure to use recommend bios settings – Refer this link to get recommended BISO settings if there is similar issue reported in Sx_WG – Refer this link to get known issue list - {6D9F3299-A5E6-4D20-B7B3-1FD542C2D19F}&InitialTabId=Ribbon.List&VisibilityContext=WSSTabPersistence Point of ContactPlease mail natarajan.ramalingam@ for feedback/query. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download