SQL Server According to Bob – Two guys who work on SQL …



Using SQL Server’s SNITrace to Troubleshoot Networking IssuesIn the process of tracking down a few TCP 10054 issues (highlighted here: ) I also used the SNITrace (SNI Trace) capabilities.SNITrace, formally BIDTrace, is an deeper level of tracing for the Microsoft SQL Server TDS consumers and providers. Terms and Brief HistoryBefore I dive into the details here is a brief history and related terminology.BIDBuilt-In Diagnostics (BID) is the former name replaced with SNI. The original tracing was added as BIDTrace and since then many of the functions and macros were renamed to SNITrace* as improvements were made and features added.SNIServer Network Interface (SNI) is centralized code, shared among the SQL Server and SQL Server client providers, both Windows and Linux that can send, parse and act upon TDS communications.ODSOpen Data Services (ODS) was a separate, public API set allowing a TDS gateway to be created. The API was deprecated several releases ago and the behaviors consumed internally by SQL Server. ODS provides an evening model for processing TDS requests as parsed by the SNI layer.ODBC TraceTo capture an ODBC trace use the driver manager, trace capabilities () Launch odbcad32.exeSelect the tracing tab and then start tracingExecute the reproductionStop tracing and view the log fileExecution: sqlcmd -E -S.\sql17ODBC trace records the ENTER and EXIT activities for the ODBC APIs as well as the various parameters, values and messages. 740c-592cENTER SQLDriverConnectW HDBC 0x0000019DE23D8CC0HWND 0x0000000000000000WCHAR * 0x00007FFF581D21C0 [ -3] "******\ 0"SWORD -3 WCHAR * 0x00007FFF581D21C0 SWORD -3 SWORD * 0x0000000000000000UWORD 0 <SQL_DRIVER_NOPROMPT> 740c-592cEXIT SQLDriverConnectW with return code 1 (SQL_SUCCESS_WITH_INFO)HDBC 0x0000019DE23D8CC0HWND 0x0000000000000000WCHAR * 0x00007FFF581D21C0 [ -3] "******\ 0"SWORD -3 WCHAR * 0x00007FFF581D21C0 <Invalid buffer length!> [-3]SWORD -3 SWORD * 0x0000000000000000UWORD 0 <SQL_DRIVER_NOPROMPT>DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed database context to 'master'. (5701) DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed language setting to us_english. (5703)Going Beyond ODBC TraceFor my specific issue I was attempting to debug why the SQLDriverConnectW errored with TCP 10054 because it was failing the pre-login handshake. For that I first needed to understand and capture the flow of the handshake activities and this is where SNITrace is helpful.10054 The Direction MattersUnderstanding where the RST (TCP closure) occurs if helpful in understanding the problem and how to troubleshoot it.Each stage in the login process uses asynchronous read and write timeouts. For example, when the SQL Server sends the initial Pre-Login Response the SQL Server posts a read request using a 5 second timeout. If the first Ssl/Tls context packet takes longer than 5 seconds to arrive, from the client, SQL Server closes the TCP connection (RST occurs from the SQL Server) and the next write attempt from the SQL Client fails with a 10054 error, the connection has been reset. The Ssl/Tls reads use similar timeout mechanisms and the SQL client provider honors the ‘login timeout.’ For example, if the login packet is sent by the client but the SQL Server does not respond before the established login timeout (30 second default) the client closes the TCP connection (RST from the SQL Client provider) and the server records the 10054, connection reset error and terminates the login attempt.Before diving into SNITrace let’s breakdown the network trace of a successful and failed login.Good Login Sequence (SYN, ACK, … FIN)10054 sequence from SQL Server | Start login at 48.8 and server terminates at 52.9 (SYN, ACK … RST)SQL Server exchanges multiple packets to complete a login request. The entire Tabular Data Stream protocol (TDS) is documented here: A login request performs the following steps.Open the TCP connection (usually to port 1433 on the SQL Server)Send pre-login request from clientReceive pre-login response from serverExchange Ssl/Tls information (InitializeSecurityContext/AcceptSecurityContext) by using TDS to exchange the security context information (1 to 4 packets)Send login information (encrypted)Login AnnotatedThe following table contains a high-level annotation of the SQL login and a select @@VERSION from sqlcmd.No.SourceDestinationProtocolLengthInfoClient sends TCP open request to SQL Server (SYN) computer9489x.x.x.1x.x.x.100TCP6656369 > 1433 [SYN] Seq=0 Win=65280 Len=0 MSS=1360 WS=256 SACK_PERM=1TCP acknowledges request (If port was incorrect this is where server sends RST 10054)9490x.x.x.100x.x.x.1TCP661433 > 56369 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1396 WS=256 SACK_PERM=19491x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=1 Ack=1 Win=262400 Len=0SQL Client sends TDS pre-login9492x.x.x.1x.x.x.100TDS142TDS7 pre-login messageSQL Server responds to pre-login request9493x.x.x.100x.x.x.1TDS102Response9494x.x.x.1x.x.x.100TDS250TDS7 pre-login message9495x.x.x.100x.x.x.1TCP14141433 > 56369 [ACK] Seq=49 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]9496x.x.x.100x.x.x.1TCP14141433 > 56369 [ACK] Seq=1409 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]9497x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=285 Ack=2769 Win=262400 Len=09498x.x.x.100x.x.x.1TCP14141433 > 56369 [ACK] Seq=2769 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]Ssl/Tls negotiation packet exchange (SQL Client InitializeSecurityContext -> SQL Server AcceptSecurityContext)9499x.x.x.100x.x.x.1TDS70TDS7 pre-login message (Not last buffer)9500x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=285 Ack=4145 Win=262400 Len=0Note: The SQL Server can send a RST for 5 second timeouts in this exchange window.9501x.x.x.100x.x.x.1TDS428TDS7 pre-login message9502x.x.x.1x.x.x.100TDS188TDS7 pre-login message9503x.x.x.100x.x.x.1TDS113TDS7 pre-login messageEnd of Ssl/Tls negotiation process9504x.x.x.1x.x.x.100TDS349TLS exchange9505x.x.x.100x.x.x.1TCP541433 > 56369 [ACK] Seq=4578 Ack=714 Win=261632 Len=0SQL Client sends Login information (SendLogin)9506x.x.x.100x.x.x.1TDS800TLS exchange9507x.x.x.1x.x.x.100TDS163TLS exchangeNote: The client can send a RST to honor the login timeout during this window9509x.x.x.100x.x.x.1TDS125TLS exchange9510x.x.x.1x.x.x.100TDS147TLS exchange9511x.x.x.100x.x.x.1TDS120TLS exchangeSQL Server finishes acknowledgement of successful login after sending back config info (language, …)9512x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=916 Ack=5461 Win=261120 Len=0Idle keep alive, client not doing anything (sqlcmd at prompt)9627x.x.x.1x.x.x.100TCP55[TCP Keep-Alive] 56369 > 1433 [ACK] Seq=915 Ack=5461 Win=261120 Len=19628x.x.x.100x.x.x.1TCP66[TCP Keep-Alive ACK] 1433 > 56369 [ACK] Seq=5461 Ack=916 Win=261632 Len=0 SLE=915 SRE=9169629x.x.x.100x.x.x.1TCP55[TCP Keep-Alive] 1433 > 56369 [ACK] Seq=5460 Ack=916 Win=261632 Len=19630x.x.x.1x.x.x.100TCP66[TCP Keep-Alive ACK] 56369 > 1433 [ACK] Seq=916 Ack=5461 Win=261120 Len=0 SLE=5460 SRE=5461SQL Client sends query (select @@VERSION)9741x.x.x.1x.x.x.100TDS149TLS exchangeSQL Server responds with version information9743x.x.x.100x.x.x.1TDS337TLS exchange9744x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=1011 Ack=5744 Win=262400 Len=0SQL Client sends disconnect request and closes TCP connection9772x.x.x.1x.x.x.100TCP5456369 > 1433 [FIN, ACK] Seq=1011 Ack=5744 Win=262400 Len=09773x.x.x.100x.x.x.1TCP541433 > 56369 [ACK] Seq=5744 Ack=1012 Win=261376 Len=09774x.x.x.100x.x.x.1TCP541433 > 56369 [FIN, ACK] Seq=5744 Ack=1012 Win=261376 Len=0Server acknowledges TCP closure9775x.x.x.1x.x.x.100TCP5456369 > 1433 [ACK] Seq=1012 Ack=5745 Win=262400 Len=0SNI TracingUnderstating where the TCP Close (RST) originates provides you with diagnostic guidance. A general rule for login is if the SQL Server sends the RST it is probably a client issue and if the SQL client provider sends the RST is it probably a SQL Server issue. Scenario (SQL Server Side): Client sent RST due to login timeoutStart by tracing the SQL error_reported and/or connectivity_ring_buffer events on the SQL Server. The events and 17803 error highlight the various login steps occurring and provides timing information to help you understand the failure condition. My previous post highlights the login timer and 17803 event in detail: Assume the information reveals the login attempt is taking longer than expected and we want to know more about internal activities. We can accomplish that using the SQL Server side SNI Tracing, exposed with a dynamic XEvent session. Note: The BID/SNI Trace for the SQL Server (SQL 2008’ish days) used to require ETW setup and a restart of SQL Server. Newer versions of SQL Server do not require ETW or restart of SQL Server, instead the SNI trace events are produced to the XEvent session dynamically. You can start and stop the session, apply filters and apply predicates dynamically.The name of the XEvent is not obvious (I am working with the XEvent team to correct this.) The event is named ‘trace’ and here is a script to create an SQL SNI tracing session.CREATE EVENT SESSION [SNIRdorr] ON SERVER ADD EVENT sqlsni.trace( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id))WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)I started the XEvent trace on my SQL Server and established a connection using sqlcmd. Looking at task manager the process id for sqlcmd was (PID = 5188.) I then stopped the XEvent capture and applied the XEvent filter to the client_pid column in the SQL Server Management Studio. The SNI Trace to XEvent logs internal SNI function calls and can be combined with the client SNI Trace and network trace making the scenario easier to understand. ENTER: Highlighted in the screen capture below is the SNIPacketAllocateEx2, a memory allocation. In fact, the SQL Server allocates 2 input packets and 1 output packet for each connection, keeping free packets in an object pool, partitioned by packet size. INFO: Looking closer the highlighted event is the start of the allocation attempt and the next event indicates the packet was allocated from the SNI object pool. You can use a DMV query to see the SNI pools for SNI Packets (select * from sys.dm_os_memory_pools where type = 'OBJECTSTORE_SNI_PACKET'). Each pool accommodating a different packet size range. You can also use sqlcmd’s -a parameter and the free_entries_count column of the DMV changes for different packet size and pool usage scenarios.LEAVE: The 3rd SNIPacketAllocateEx2 is the return or exit from SNIPacketAllocateEx2For the sake of this example assume that the elapsed duration from the ENTER to the LEAVE event took more than 5 seconds. From that you might conclude it is taking a long time to allocate a packet and troubleshoot the memory related issue. There are other SNI events besides the ‘trace’ event. The following query outlines the events contained in the sqlsni package which can provide you with additional debugging capabilities.select o.name, o.description from sys.dm_xe_objects oinner join sys.dm_xe_packages p on p.guid = o.package_guidand o.object_type = 'event'and p.name = 'sqlsni'Using the following XEvent session, reveals the packet allocations occur as part of SNISetInfo activity.CREATE EVENT SESSION [SNIRdorr] ON SERVER ADD EVENT sqlsni.trace( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id)),ADD EVENT sqlsni.enter( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id)),ADD EVENT sqlsni.leave( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id))WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)Scenario (SQL client side): Server sent RST due to pre-login read timeoutThe SQL client is not an XEvent session, instead SNITrace is implemented using ETW tracing for the client providers (SQL ODBC, SQL OLEDB and SQLClient.) For the SQL Server client providers such as sqlncli11.dll, msodbcsql17.dll, msoledbsql.dll, etc. ETW registration and enablement is required.ETW tracing uses a major guid and event guids to capture and store the data. The major guid represents the component (such as a major release of a SQL Server) and each event has a guid with an associated format string defined. ETW uses the guids in much the same way as the Windows event log uses the message files (.mc) format strings (resource dlls.) For optimized tracing formatting should always be done offline and not while producing the event. The offline formatting approach saves storage space, reduces CPU and the impact on performance. When an event is produced the ETW consumer checks to see if that even has been enabled for tracing, if so serializes the binary data values. The ETW reporting features are used to process the capture and format into ‘human’ readable format.During the build process the SQL client providers produce the major guid along with the format string and event guids. These guids are major version aligned. Unless you have a private symbol file (.pdb) that contains the ETW information or a ETW output file (Trace Message Format File .tmf) you do not know what register with ETW or use to enable the tracing. The good news is the information is discoverable.There are some older articles that discuss SNI client tracing and provide some of the older scripts and .MOF registration files. (I have filed an issue to have the doc team update these.) You will still find these helpful.(v=sql.100)?redirectedfrom=MSDN (v=msdn.10)?redirectedfrom=MSDNWhat Providers Are InstalledThe SNI client tracing is controlled by logman.exe command. Start by determining what is registered on your system.logman query providerslogman query providers | findstr /I sqlOn my system I have the Microsoft ODBC Driver for SQL Server registered (msodbcsql13.dll.)Finding the Trace GUID to EnableWhen the ODBC driver loads the SQL client provider the provider registers with ETW (EtwRegister.) At this time the provider determines if/what events should be produced and therefore you usually have to restart the application to enable the trace activities.The MSDADIAG.ETW trace is how you discover the guid of the SQL client providers.{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752} 0xFFFFFFFF 0 MSDADIAG.ETWMake sure MSDADIAG.ETW is a registered provider: logman query providers | findstr /I MSDADIAG.ETW If not present use the registration information from the older package links above to execute the mofcomp for the MSDADIAG.ETW trace provider.Once MSDADIAG.ETW is registered, enable MSDADIAG.ETW tracing. (Hint: The older links contain the _registerSchema.cmd, _startTrace.cmd, _stopTrace.cmd and _report.cmd to assist you.)logman start MyTrace -pf traces.txt -ct perf -o Out.etl -etsTraces.txt should contain the following content{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752} 0xFFFFFFFF 0 MSDADIAG.ETWThe 0xFFFFFFFF enables all event groups tracing for the MSDADIAG.ETW tracing provider. Run your application and then stop the tracing.logman stop MyTrace -etsNow you can convert the out.etl to a CSV file format.traceRPT /y Out.etl /of CSVThe output shows some of the common files and associated guids.For this example, I am looking for msodbcsql13.dll and there you have it. The MSDADIAG.ETW shows the initial registration attempt from the Microsoft SQL Server ODBC provider with the major GUID you need to enable tracing of the provider.Registering the SQL client SNI Trace ProviderNow you create a .MOF file for the msodbcsql provider registration. You can take one of the older .MOF files from the links above, modify the description and guid to look like the following, and then use mofcomp to complete registration.Register: mofcomp msodbcsql.mof#pragma autorecover#pragma classflags("createonly")#pragma namespace ("\\\\.\\Root\\WMI")#pragma deleteclass("Bid2Etw_MSODBCSQL_1", NOFAIL)///////////////////////////////////////////////////////////////////////////////// Microsoft ODBC Driver for SQL Server[ dynamic: ToInstance, Description("Microsoft ODBC Driver for SQL Server"), Guid("{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C}"), locale("MS\\0x409")]class Bid2Etw_MSODBCSQL_1 : EventTrace{};[ dynamic: ToInstance, Description("Microsoft ODBC Driver for SQL Server"), Guid("{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C}"), locale("MS\\0x409")]class Bid2Etw_MSODBCSQL_1_Trace : Bid2Etw_MSODBCSQL_1{};[ dynamic: ToInstance, Description("Microsoft ODBC Driver for SQL Server formatted output (A)"), EventType(17), EventTypeName("TextA"), locale("MS\\0x409")]class Bid2Etw_MSODBCSQL_1_Trace_TextA : Bid2Etw_MSODBCSQL_1_Trace{ [ WmiDataId(1), Description("Module ID"), read ] uint32 ModID; [ WmiDataId(2), Description("Text StringA"), extension("RString"), read ] object msgStr;};[ dynamic: ToInstance, Description("Microsoft ODBC Driver for SQL Server formatted output (W)"), EventType(18), EventTypeName("TextW"), locale("MS\\0x409")]class Bid2Etw_MSODBCSQL_1_Trace_TextW : Bid2Etw_MSODBCSQL_1_Trace{ [ WmiDataId(1), Description("Module ID"), read ] uint32 ModID; [ WmiDataId(2), Description("Text StringW"), extension("RWString"), read ] object msgStr;};You can verify the registration outcome using a logman query.logman query providers | findstr /I sqlEnabling SNI Tracing (Start, Stop and Report)Add the msodbcsql provider to the traces.txt file allowing you to start, stop and report with the msodbcsql events.Traces.txt should contain the following content.{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752} 0xFFFFFFFF 0 MSDADIAG.ETW{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C} 0xFFFFFFFF 0 MSODBCSQL13Here is an example capture of sqlcmd (client side) showing SNIPacketAllocateEx activities.SNI Trace Group MasksYou can control the event groups traced by modifiing the mask in the traces.txt file to one or more of the following values.Default0x1Common Trace0x2Scope0x4Performance0x8Increment, Decrement, Set and Reset0x10Memory0x20Error and HResults0x40Advanced0x80Reservered 10x100Reserveed 20x200Reservered 30x400Bid0xFFFUser0xFFFFF000All0xFFFFFFFFSNI Trace on LinuxSNI Trace is also available for native Linux applications. Instead of using ETW the output is written to a text file, registered in ODBCINST.INI. Here is an example:["ODBC?Driver?17?for?SQL?Server"]BIDTrace=yesBIDTraceFile=<<File path>>BIDTraceFileSize=100240Note: A value of zero(0) for the file size indicates no size limit. Recommendation: Set the file size and allow rollover to provide you with smaller files to troubleshoot.Note: File size is in bytes and limited to 4GB per file.RecapThe SQL Server error messages, ODBC Trace, XEvents, network tracing and SNI Trace provide lots of details about your application and the SQL Server TDS network activities. While it takes a bit more work to register for SNI tracing, the first time on the client, the SNI Trace is powerful and easy to start, stop and report once registered.The tracing facilities can help you resolve networking issues impacting your SQL Server.Bob Dorr – Principal Software Engineer SQL Server ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download