WordPress.com



Accidental DBA – Kimberly Tripp and Paul Randall

The class is over top things seen on newsgroups, etc. as well as top problems that they commonly see

Ref #1.

Sql scripts for this class can be found by going to the sqlscripts…past events section of . or you can download it directly from here:

rule number 1 never change server name: there is a master jobserver that cannot be modified. failover clusters will cause problems, etc.

get familiar with tools ( sqlagent, ssms, sqlcmd, profiler, dta )

Logging and recovery is very important

structures (optimize data partition vs. log partition)

choose appropriate recovery model

create a recovery strategy

All varchars all the time crates a 'too wide' table'

select * cannot be optimized very well.

Ref #2.

Logfile: You do not need one file per object (Paul wrote a benchmarking file (can be found blog blogs/paul to see every file that is being used). Better performance is 1 logfile (ldf). (better for management and performance). Isolate the log from the data because it does different access patterns to accesses a log file differently than it does the data files. MDF relatively small. Because of the historical need of data, sometimes she recommends multiple secondary files for the historical data and use a different backup strategy for them. Log should be 10-25% of data size (10: read only, 25: read/write oltp…you can use this as a rough starting point, but a larger the dbs the more this is off. Index rebuilds in the full recovery model is a size of data rebuild. You need 25 gig of space for a clustered index to rebuild on 25 gig of data).

Capacity planning. tough job. Look at data. 2-3x your data size should be allocated. 10G of raw data : 30-40 gig for all preallocated data.

in 2008 backuplog with truncate and backuplog with no log doesn't work. A workaround: change to simple recoverymodel, backup will clear and then change to normal recovery model.

In 2008 rebuild table for compression is available.

Check out links on whitepapers:

Physical database storage design speaks on consolidation*

Sql server i/o basics

Batch compilation, recompilcation, plan cacheing**

Optimize data files: auto shrink is evil. Example on cd, but the simple way to illustrate the evilness of authoshrink is to do the following

10 mb filler table

Production table : no fragmentation

Drop filler table

Shrink table and you will get: 100% fragmentation

Autoshrink should be turned off.

Autoclose should be turned off

Autogrow should be on but don’t let it get out of control. In 2005 it is better, it doesn’t use the same type of locks and doesn’t cause all the blocking as it did in 2000. It also uses ‘instant initialization’ which is a great thing because it grow the file by 5 gig in a second or two. This is also called ‘fast file initialization’)

Be sure to choose a good raid array and a good blocksize (see *). There is a security vulnerability around the fact that windows just removes the pointers to deleted files and if you run a dbcc page command with raw pages it will expose those deleted files, but only in hex.

Ref #2 22:39

Optimizing log files

Only one log file

Isolate log file

Defrag drive on log

Pre-allocateing the log. Two factors. The size of the data and the size needed for maintenance

Autogrowth can create problems (internal log fragmentation). She wrote a blog about this: 8 steps to recommend on pre-allocation of transaction logs

Ref #2.26:40

If you need a transaction log > 8 gig allocate it in 8 gig chunks (vlfs). Why? 8 is divided into 16 virtual log files (half a gig size). So every 500 meg of activity will be able to be cleared.

Ref #2.30:00

Effective raid configuration. Make sure to choose the right block size. 64K definitely. 2.33:00

Optimize internal file fragmentation: see blog on 8 steps blogs/kimberly

Temp db is a special case 2005. Page 9

It has a lot of new objects being created all the time:

• Worktables

• Sort tables

• Anytime you see hash in showplan

• User defined objects

o Table variables

o # or ##

• Brandon Leuthke went to see sunil agarwal who had great discussion on this. Look him up. My discussion with Brandon:

o Sunil showed how easy it is to fill up tempdb (monitoried with dmv views)

o He will post dmvs (top 10) dmvs to use to query tempdb (posted online somewhere)

o Temp db: one file per core, so that there is a file to handle each thread. A single file with 4 procs with dual core you have 8 threads that are cueing up.

In 2000 it used to do this a page at a time. So they added a trace flag in 2000 (kb 23855) which would change it from page-based allocation to extent based allocations (instead of 8k, 64k) so more space for initial object.

In 2005 deleting temp tables instead of de-allocating the space it creates 1 page and 1 cache and the next time it creates the table it reuses these files. You don’t need the trace flag any longer.

Ref#2.37:00

Dm_db_file_space_usage tells how space is being used in tempdb: temp objects, system objects, mixed extents (single page based allocations) , online index operations, triggers, version store, mars, snapsot isolations. It will also tell you how much is going to the version store (new in 2005). In 2005 triggers are now in temp db (uses version store) table variables are just like temp tables. They are stored in tempdb. For all these reasons isolate the temp db, pre-allocate tempdb. How to move databases: kb 224071 White paper: working with tempdb. . If on a multi proc machine, more users with more objects …so recommended multiple files. Kb states one file per cpu but this is overkill.

ANATOMY OF DATA MODIFICATION page 10

Update is highly selective

No manipulation is ever written to disk directly. Everything is brought into cache and all manipulation is brought into cache where it is later written to disk.

Update is a single statement batch

Sql uses locks like breadcrumbs

Ref#2.44:51

LOCKS are necessary

• Most locks are non blocking(shared locks)

• Locks are like breadcrumbs

• Live locks would be a problem only if it occurs for a long period of time

• Every user gets a database lock. Live locks are only a problem if it occurs for a long period of time….unless it’s for an unreasonable amount of time. …which leads to a queue. Kb 271509. To see sp_blocker_pss80 “who is at the head of the chain” in this kb there is a bunch of code that can be used. It includes a dbcc input buffer: which shows the last command of that user.

Custom reports in 05 see lab on dvd ssms reports

Ref#3.00:00

Performance dashboards

See custom reports for 2005 Se lab on dvd ssms reports

Any time you pull down reports look in the rdl look for command text if you are concerned about the security as it is running under your profile.

When you look at the report you can see the xml for it. if you export that file you can save the xml as a .sqlplan you can see the showplan gui. So when you run the report to see the different performing queries you can see the query plan of the most expensive queries.

Ref#3.11:00

More on locks and transactions (pages 11-14)

• Everyone that uses a database has a database level lock

• Everyone that uses a table has a table lock

• Everyone that uses a page has a page level lock

• Everyone that uses a row has a row level lock

• An update lock is row level and it is called an ‘intent exclusive locks’. It has the intent to lock resources. Data can still be read. Better concurrency.

• Then the lock changes to an exclusive lock to make the modification

• Transaction is complete (trancount is 0). There is no such thing as nested transactions. It is an atomic unit. When the transaction is complete sql server will

o Write logs to transaction logs on disk

o Release locks

o Notify user

• Data in cache is now up to date. When does data get written from cache to disk? Checkpoint! It batches the items to disk. If sql server crashes, you have to do restart-recovery…depending on how the server is shut down.

Ref#3.19:15

Purpose of checkpoint

• Reduce roll forward recovery time

• Batch i/o to disk

• Help bring what is closer to current

• Not to guarantee consistency. (only way to guarantee consistency is with the logs)

You can’t clear the log while full database backup is running. You can do log backups in 05, but you can’t clear the log because during the data recovery it needs that log. It replays the transactions to get it back to a consistent state

Purpose of log is to help performance and bring database to a consistent state

Ref#3.25:00

RECOVERY MODELS

Simple recovery model

• You have to be ok with a certain amount of data loss

• Clears inactive portion of log

• Worst option : no logs

Bulk-logged

• Batch/bulk operations

o Creating dropping rebuilding indexes

o Bulk load operations

o Select into

o LOB

Full:

• Everything logged

• Operations like creating or rebuilding indexes takes as large a log as the size of the the operations

View performance, blogsize, backup capabilities on her blog

Ref#3.32:00

Clearing the log should be called ‘removing the inactive portion of the log’: transactions complete and data is checkpointed see pg. 17

Ref#3.36:00

There is no timeout for a hold lock.

Requesting lock has a timeout, but not for someone that has the lock.

Sp_who2 will show you who is blocking and when was their last batch.

Select from master.sysperfinfo you can track log file used files. Shrink file can be used.

Is it ok to get rid of all the logs that the full backups cover? Yes you can, but it’s wise to keep it just in case. Zipped 4 gig story about someone that zipped their backup files. Zip used to have a limit of 4 gigs. Before they realized it 4 weeks had gone by…..so they had several corrupt zipped backup files. Because they kept all the logs they were able to go to a backup 4 weeks back, and go through every transaction in the logs. Not a single record was lost.

Ref#3.49:00

Backup clears only the inactive logs.

See blogpost of kimberly’s on a graphical view of the logging activity.

See blogpost of Paul’s to show size difference in differentials

Backup tall log and then restore the differential then restore the logs

Key points of a healthy transaction log

• Do not use autogrow

• Size the log very loosely on database size and then tune this log size in testing. For large databvases, allocate the log in no larger than 8GB chunks so that VLFs are optimally sized (Kimberly’s blog post on 8 steps to better transaction log throughput)

• For existing databases, check current number and optimize vlfs

• Minimize long running transactions

• Perform regular transaction log backups to help keep the transaction log small angd give you more frequent recovery points. How about 60 per hour.

Common problems and solutions

• Poor performance when higher activity/change rates

o Likely to be caused by data and log on same drive

▪ Find a logical drive that is not competing with the others for frequent write and move the transaction log(kb:224071)

• Transaction log that seems to grow and grow and grow

o Likely caused by lack of transaction log maintenance

▪ Solution use the simple recovery model or

▪ Solution: increase frequency of transaction log backups or

▪ Solution: review the oldest active (and open) transaction in the log(dbcc opentran) and determin if application changes can be made

• Poor performance seen around transaction log backups

o Likely to be caused by LVF fragmentation

▪ Solution: shrink/manually grow the log

• See blog post trip: 8 steps to better transaction log throughput

Ref #4.00:00 Ref#5.00:00

Data consists of records of various types

Database structure

• Database consists of

o Filegroups consist of

▪ Files consist of

• Extents consist of

o Pages consist of

▪ Records store the data

Page types

• Data

• Index

• Text - xml column varcharmax. Loose structure 78k

• Sort – if a sort that can’t run in memory

• Boot – metadata of database

• File header – must all be read in sq on startup

• Allocation maps - - pfs, gam, sgam, iam

• Diff map – keeps track of every extent

• Ml map – also for backups bulk load recovery backups

Ue paul’s script on the cd(code allocation_meta) to find page locations and then use a dbcc command to query pages. With paul’s script you wil fine file1:page153. With dbcc trace0n(3606) don’t output errors. Show me dbccpage(‘table’,1,153,3)

These commands are useful to find corruption. see page 30

Ref#5.21:01

How is data stored: The record structure

1. Tag byte: The type of record and it’s first

2. Null bitmap Boolean to determine if it’s null or not. For optimization. Checks this before checking the record

3. Fixed length columns (integers, bigints)

4. Varchar is last…a series of pointers that go to the end. It’s a reference structure.

o Doesn’t matter what order the columns are in sql will reorder as it likes

o Careful: row datatypes save page space for each

o Semi – deterministic algorithms determine which row to push off page. To access that data it will need to find a random pointer to read that data. If the row is greater than 8k it will push that value off the page and store it somewhere else. This is a great way by which fragmentation occurs. See page 28

5. This shows that column order is not relevant during table creation.

This is why you need to be very careful what datatypes to use.

I don’t know what extents are.

• They are a group of 8 contiguous pages

• Start on 8 page boundary , starting with page zero in a file

• Tracked in allocation bitmaps

• An optimization in the storage engine.

• They are wholly dedicated to an object/index or can be shared for the initial few pages of any object/index

How is data organized

• A heap

o The simplest storage arrangement

o Comprised entirely of data pages

o Unordered, pages unlinked

o Records are located using physical RID (file, page in file)

• Clustered index

o Alternative table structure to heap

o Data is stored in defined order

o Fast lookup through b-tree

o Records located through logical RID (cluster keys)

o Non-leaf index records contain a child page id in every record

o Can be unique or non-unique.

▪ Non-unique clustered indexes have an addition al hidden 4 byte uniquifier column when two rows have the same key values to provide uniqueness

• Non-clustered index

o A way to provide different ordering

o Devine on heaps or clustered indexes

o Leaf records contain rid of matching record in base table

o Non-unique use the base table RID as the uniquifier

o Goes through b-tree same as clustered index but the leaf nodes are not data pages. They are index leaf pages. I am unsure as to the differences in this

• LOB data

o Stored off road

o You follow a link on a data page to another text page

o The overhead comes because it has to do another i/o

o Text, ntext, image

• Other points

o Legacy systems: turn on ‘text in row’ but to turn it off is a big deal so beware

o Sql doesn’t tell you when a row gets pushed off but it can kill performance

o Only way to tell is to look at dbccpage command

o For legacy lob datatypes use ‘text in row’ table option

▪ Sp_tableoption ‘prodtable’,’text in row’,’on’

• Defaults the in-row limit to 236 bytes

▪ Sp_tableoption ‘prodtable’,’text in row’,’1000’

o Turning text in row off is an immediate size-of-data operation. Goes through all records and ……..(does something, can’t remember)

Ref#5.45:00

How is data accessed?

• Singletonlookup (index seek/clustered index seek)

• Range scan (index scan/clustered index scan)

o Returns row keeps scan open.

o Looks at next logical row on page

o Until it doesn’t match search predicate

• Allocation order scan (table scan/unordered clustered index scan – under right conditions)

o This occurs if there is a table lock or exclusive lock

Ref #6.00:00

How does sql server determine what is useful? THE 5 PHASES OF EXECUTION

1. Batch is submitted.

a. Problem is errors received

i. You can write a sql command to catch for errors here. Sqlcmd (if there is an error exit). They added for osql. –e

2. Once a batch is complete it goes through 4 phases. The statement level

a. The result is a normalized query tree which is now passed to

b. The optimizer

i. First checks to see if there is a plan

ii. Can we find a trivial plan?

iii. How do we find a GOOD PLAN FAST.

1. Story about developer that worked on optimizer who said ‘it’s not my job to find a good plan. It’s my job to find a good plan fast.

2. The point is that if it tried to find the best plan it could take days. It has to access the usefulness of a reasonable list of

a. It looks at your query and combines these with statistics.

b. It does a cost/base analysis based upon the statistics available

3. It is trying to find a ‘reasonable subset of data’ which is sometimes why it can’t find it fast.

iv. ‘Stored procedures are the best because they are precompiled’

1. Not necessarily. It’s a good thing they are. Not every plan should be saved and reused.

2. Lots of ways that planned caching occurs (

3. How to invalidate the plan that is in cache.

a. Command

b. When statistics on the tables get invalidated, then the sp gets invalidated.

c. Schema changes

d. Alter procedure

e. You can specify certain statements to not have a plan cached.

i. This is kimberly’s favorite way control

f. Auto-parameterized queries

g. Execute with recompile

4. Sp_executesql is similar to stored procedure. It’s called forced statement cacheing and each execution will use the same plan whether it’s valid or not. The plan is probably not good if the statement returns a wildly varying result set (2 rows vs 1000 rows)

a. Does varying columns create an optimization nightmare?

5. Dynamic string execution can be good in controlling the plan in cache.

a. How?

b. Sql injection is always a problem with dynamic sql.

c. Sql2005 injection is worse because of execute as

Ref#6.15:45

6. When a stored procedure is created page 40(85)

a. A plan is not saved to disk. Only metada is saved at procedure creation. Resolution is delayed

i. Use sys.procedures, sp_procs, functions and views to see metadata

b. Resolution is resolved at first runtime execution (has to if structured is dropped)

i. Optimization

ii. Compliation

iii. Compiled plan placed in plan cache

c. dbcc freeproccache will clear all plans in cache and dropcleanbuffer will drop all the data in cache

d. a plan is not generated for subsequent executions (even when indexes are added!)

e. review scripts on cd: dmvplancache

i. shows that the plans in cache are the same. It just shows what the plan should do….so you get the same execute result set

ii. you can directly query the dmv’s and get the showplan xml, save it as queryplan. You can see what’s in cache. I can see if it’s parameterized.

f. you can do an execute with recompile

Ref #6.27:00

g. her example used select from where like ‘%@param%’

i. the first time it is optimized a certain way based on the parameter.

ii. If it can go strait to the record then it will use a seek and get right to it

iii. But if the second time you supply a % then it will use that same plan and it’s effectiveness will be lessened. (a better lookup might be a range lookup instead)

h. One recommendation is modularization. Take one line of code that is offending and take it out: set it to not compile. Optimizer will optimize any code in sp even if not executed (check out moduleprocedures.sql

i. Take offending line and use w/ option (recompile)

ii. However don’t do this all the time because Reoptimzation is recompilation which is serilaizatin which makes people wait. So don’t.

iii. Troubleshooting sp recompilation issues kb 243586

iv. Example on cd

i. She also did a groktalk on this (a 15 minute talk on )

j. The more selective the data the moreuseful the index

k. Statistics are updated when 20% of data changes

i. You can manually run autoupdate stats

ii. Leave auto create on

iii. Dbccshow_statistics

iv. Stats_date shows when stats were last touched

v. You can turn off a specific stat

1. Statistics_norecompute on the index definition

l. Like, less than, greater than can produce faulty results (ref#6.29:00) in the query plan

i. One easy way to test it. Execute with compile on each query.

m. Index strategies

i. Determin primary usage patterns of table (oltp (fewer indexes) olap (more and wider indexes)

ii. Add indexes to foreign key columns

iii. Capture a workload and analyze with databae tuning advisor (2005)

iv. Add additional indexes to help improve sargs, joins, aggregations

v. Btw, dates unique only every 2.3 milliseconds : )

Ref#7.0:00

Fragmentation and corruption

Caused by page splits and wasted space. The more wasted space you have the more spread out your data is and the more i/o’s

Schemas that use page splits

• Guids with high-order key

• Updates to rows w/ variable len columns

• Random insertion points

• Mismanaged fill factor

Logical fragmentation affects read ahead which affects read scans

Detecting fragmentation

• Know which indexes to look at

• Use dm_db_index_phyusical_stats (dbcc showcontig in 2000)

o See page 53 for output example

How to fix fragmentation

• Maintenance plan can be used

• Three realistinc choices

o Do not drop/recreate indexes

o Alter index rebuild

▪ Will always rebuild and defrag

▪ No knowledge of schema/constraints necessary

▪ Always rebuilds indexes

• If you set your stats with sample scan

▪ Cons: atomic (all at once)

o Create index with drop_existing

o Alter index reorganize

o Most people rebuild or reorganize

Corruption (i/o) errors

Three types

• 823: hard i/o buffer pool: get this data and os says no

• 824 soft i/0 sql says something wrong with data (torn page)

• 825 read retry: after 3 retrys and it succeeds (page stale error). If this fails it returns 823 or 824. Example on page 57 (slide 120)

You can download from his blog a corrupt db

Catch it ASAP:

• CREATE AN OPERATOR.

• CREATE A NEW ALERT SELECT SEVERITY 24. NOTIFY OPERATOR.

o SELECT 19-24 AND SPECIAL CASE FOR 825 ERROR

PAGE CHECKSUMS

• Turn this on! Provides the ‘smoking gun’ that the error is not due to sql server

o On by default

o Cpu overhead of 2%

• Per page checksum

o Written as the last thing sql does on a a physical write

o Checked as the first thing sql does on a physical read

• Not available for tempdb

• Blogpost

• Checkdb should be run regularly (at least run with physical_only)

WHAT TO DO WITH CORRUPTION

• Don’t restart

• Don’t touch logs

• Don’t detach suspect dbs

o If it fails it won’t reattach (hack: create dummy dbs with same name. shut down svr replace file and start up svr)

• Don’t depart from recovery plan

• Do use backups and have backups

o There is a 3rd party log analysis tool you can use to look into logs

DATABASE REPAIR IS NAMED REPAIR_ALLOW_DATA_LOSS FOR A REASON

• Purpose is not to prevent data loss. It is to restore a stable system

• Does not fix business constraints, data relationships

• Backup dbs first

• See paul’s blog on corruption last resorts

Where to look for causes

• Sql server error log

• Windows error log

• Sqliosim WILL STRESS I/O SYBSYSTEM MORE THAN SQL WILL EVER TEST IT

o RUN ON NEW HARDWARE

o OUTPUT IS hard to read but documentation is onle

• Turn on checksums and additional auditing

CAUSES AND SOLUTIONS

• Logfile full unexpectedly

o Transaction log file grows until no more disk space and database grinds to a halt

• Cause: database is in full recovery mode, a dbs backup has been taken but no log backups

• Resolution: backup log and shrink the file

• If no available space, forced to use backuplog with truncate_only (or with no_log)

DATA FILE FULL UNEXPECTEDLY

• Data file grows until no more space. Database grinds to a halt

• Cause: no one monitoring data file growth and or insufficient space planning

• Resolution: add more data files and set alert to know when data file is full

MAINTENANCE PLAN DOESN’T FIX FRAGMENTATION

• Classic mistake of running data file shrink after index rebuilds to reclaim space used by rebuilds

• Data file strink refragments the indexes

• Resolution: don’t run shrink

ADDING HA BREAKS MAINTENANCE

• Regular reorganize maintenance slows to a crawl after adding database mirroring

• Casue: transactions to log overloads network

• Resolution: upgrade network or use less intensive way of removing fragmentation

NEVER ENDING FRAGMENTATION

• Reorganize doesn’t seem to defragment

• Cause: bad schema design causing incredible amount of fragmentation (ex: social networking site with clustered index on ‘per page comments’

• Resolution: change schema and/or add identity column

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download