Problem
Area |
Symptoms (Where
to find)
|
Solutions
(Things to Try) |
green=highly
recommended black =
recommended red=practiced, but not highly
recommended |
Block/Buffer/Chain
Contention Blocks are
loaded into memory (block buffers) and assigned to a particular hash
bucket (and underlying hash chain).
Latches are used to lock a given hash chain when the
instance is attempting to scan (aka walk the buffer chain) and
access a block buffer in memory.
Contention occurs when
multiples operations exercise the same buffer chain (and/or
underlying buffer) simultaneously. |
- "latch free", "cache buffer chains" and possibly
"buffer busy" waits will be high in terms of total_waits and/or
average_wait) (v$system_event)
- Use the query in Appendix
B, to help characterize the type of chain contention that is
occurring:
- If unique block#s appear, then likely the
contention is at the hash bucket's chain level
- If the same block# appears in query, then the
block is receiving high-traffic. Contention is most likely at
the block or row level.
If there are a high number
of "buffer busy waits" then block level contention is the most
probable.
Row-level contention would be accompanied by
high enqueue waits (v$system_event) |
- If the contention is for different blocks on the
same chain then you should consider setting the init.ora parameter
_db_block_hash_latches (8i and beyond). This parameter specifies
the system-wide number of hash chain latches divided among the
number of _db_block_hash_buckets. Increasing the number of hash
chain latches per bucket allows for short chain lengths, reducing
the amount of time required to walk the chain as well as the
possibilities of more than one hot block residing on the same
chain.
- If the contention is for the same block, consider
the following:
- Rebuild table with a larger
process FREELIST count (equal to or greater than the highest
number of simultaneous processes acting against the
table.)
- If the majority of waits
occur on undo header blocks (v$waitstat), consider setting the
init.ora parameter transactions_per_rollback_segment to a lower
value or moving to Oracle 9i's System Managed Undo feature,
which maintains a 1:1 relationship between bound transactions
and undo segments.
- In Oracle 9i, consider
moving the table to a tablespace that uses the Automatic Segment
Management feature.
- In scenarios where several large inserts are
operating against a common table, it is possible that the
High-Water mark (HW) enqueue is being used heavily. This usage
can be reduced or eliminated by:
- Using locally managed
tablespaces (which eliminate the need for obtaining a HW
enqueue)
- Setting the init.ora
parameter _bump_highwater_mark_count. This parameter specifies
the number of blocks to be assigned per freelist on an
advancing high-water marks. (Default is 0)
- If all else fails, redesign the application to
reduce the amount of same-block contention.
- If contention is at the row level, the
application is left with few choices:
- Re-design the application to
eliminate the bottleneck.
- "Tweak" the application to
restrict / throttle the number of processes responsible for the
contention.
|
High Waits for Free Block Buffers This occurs when the database attempts
to locate a clean block buffer but cannot because there are too many
outstanding dirty blocks waiting to be written.
This type of problem can be
typical in hybrid databases, consisting of heavy mixes of OLTP
and OLAP/Query operations and/or environments with over-utilized or
misallocated I/O layouts. |
"free buffer waits" will be
high in terms of total_waits and/or average_wait)
In a majority of cases, one or more other
write-related wait events ("db file parallel write", "db file single
write", "direct path write") will also be high.
(v$system_event)
In addition, you should also check the average_wait
for the "checkpoint completed" wait event (v$system_event). If it is
too high then you may want to also want to first investigate your
overall checkpoint performance (see "Poor Checkpoint Performance"
for more information).
|
- If you are
not I/O bound, increase the number of db_writer_processes
(see "Poor I/O Performance" for more information) to allow more blocks
to be flushed at one time. (In Oracle 8i you will likely be
required to increase your db_block_lru_latches as
well)
- In
hybrid OLTP/Query environments, split out pieces of functionality
between multiple database instances.
- Increase the number of block buffers(*) and or log
buffers in the SGA to reduce the amount of block flushing required
in order to find free space.
(*) Increasing the number of SGA
block buffers can decrease checkpoint performance in write
intensive environments. See "Poor
Checkpoint Performance" for more information.
|
ITL
Exhaustion Each transaction modifying a block requires a
ITL (Interested Transaction List) in order to modify the rows of the
block. The minimum and maximum number of ITLs available is
controlled by the INITTRANS and MAXTRANS specified during object
creation and are restricted by block
size. |
high enqueue waits
(v$system_event) |
- Re-create object with higher
MAXTRANS
- If your block size is below 8k,
increase block size. Do not exceed 8k for heavy transactional
systems.
- Modify the data distribution to reduce the number
of simultaneous transactions against a single
block.
- Modify the application to limit/reduce the number
of simultaneous transactions acting against a single block.
|
Poor Checkpoint Performance Depending
on the SGA size and number of datafiles in a database, a checkpoint
can be a highly resource intensive operation.
If tuned
properly, reducing checkpoint completion time decreases the amount
of time required for "checkpoint freezes" as well as the time
required for instance recovery. (A win-win situation, in most
books)
|
High average_wait on the
"checkpoint completed" wait event. Additionally, a high number of
total_waits and average_wait for "free buffer waits" may also be
exhibited. (v$system_event)
|
- If you are
not I/O bound, increase the number of db_writer_processes
(see "Poor I/O Performance" for more information) to allow more blocks
to be flushed at one time. (In Oracle 8i you will likely be
required to increase your db_block_lru_latches as
well)
- In Oracle 8i, checkpoint
completion time can be reduced by setting the fast_start_io_target
to 1/2 the total number of block buffers. This will cause DBWR to
write dirty buffers more aggressively between checkpoints, in
order to keep the number of dirty blocks below the threshold
specified. It is overridden by
log_checkpoint_interval.
Achieving the same functionality
is not that easy in Oracle 9i, fast_start_io_target has been
deprecated in favor of fast_start_mttr_target. This new parameter
specifies the target number of seconds that should be spent
performing instance recovery. It is also overridden by
log_checkpoint_interval.
A good starting point for
fast_start_mttr_target is to take the average number of seconds
for checkpoint completions, divide by 10 and use this value for
fast_start_mttr_target. Tweak this number lower to increase the
aggressiveness of DBWR in writing out dirty blocks (decreasing the
checkpoint completion time).
If a good balance cannot be
struck between DBWR aggressiveness and checkpoint completion times
without having a majorly adverse affect on I/O load and
performance, consider tuning your I/O subsystem (see "Poor
I/O Performance" for more information).
- Reduce the
number of block buffers in the SGA (reduces cache hit-ratio and
increases free buffer waits, etc.)
|
Poor I/O Performance When a database's I/O demands exceed the bandwidth and
read/write abilities of the underlying disk subsystem, the resulting
bottlenecks can affect the overall I/O performance of the
database.
|
High average_wait
primarily in one or more of the following wait events
(v$system_event)
"db file parallel write" "db file scattered
read" "db file sequential read" "db file single write" "log
buffer space" "log file parallel write" "log file single
write" "log file single read" "log file sequential
read" "log file sync" |
- Tune and/or
enhance the disk I/O and/or layout can alleviate these
problems.
- In hybrid OLTP/Query
environments, split out pieces of functionality between multiple
database instances.
- Increase the
number of block buffers and or log buffers in the SGA to reduce
the amount of block flushing required..
|
Poor Log Buffer
Performance
Oracle requires use two primary latches for log
buffer manipulation.
The "redo allocation" latch is first taken to
allocate space in the log buffers. Once the buffer is obtained, the
"redo copy latch" is acquired (and the "redo allocation" latch is
released) and the log buffer is populated.
The "redo copy" latch is used for the actual loading
and writing of log buffers by LGWR.
|
High sleeps on the
"redo allocation" or "redo copy" latches (v$latch) |
If there are high sleeps on
the "redo allocation latch":
- Make sure that init.ora
parameters that impact the amount of non-checkpoint writing that
LGWR performs (fast_start_io_tartget) is not set too
low.
- Modify the application to
reduce the number of small, frequent commits
If there are high sleeps on
the "redo copy latch":
- Verify there are not high
sleeps on the "redo allocation latch". If so, tune this latch
first.
- Increase the value of
_log_simultaneous_copies
|
Shared Pool / Library
Cache Contention Problems generally happen here when
there are a high number of unique objects in the library cache
(fully qualified SQL statements, etc.) causing the system to spend a
significant amount of time inspecting and managing memory and
ultimately performing constant, hard-parsing of non-shareable
SQL. |
- High sleeps on the "shared pool latch", "library
cache", or "row cache lock" (v$latch)
- Low ratio difference of hard parses to total
parses (v$sysstat)
- Performance improves after the shared pool is
flushed
- Large number of unique application-SQL statements
with a low-average number of parses and/or executions
(v$sqlarea)
|
- Re-design the application to make
better use of the shared pool
- Set the init.ora parameter
cursor_sharing to force literals in place of
hard-coded variables found in the SQL statements
- If re-use is poor, reduce the size of the shared
pool. Be careful, this will impact the hit ratio (gethits/gets) of
other shared pool areas in v$librarycache
- If re-use is not an issue but
the hit ratio (gethits/gets), in v$librarycache is low, increase
the size of the shared pool
- Set the init.ora parameter session_cached_cursors
to a higher value (for optimal performance, do not exceed 100).
This will result in more soft parses being fulfilled out of the
session's PGA and reduce contention on the "library cache" latch.
(see Appendix
A for more information on how this used during statement
parsing)
The effectiveness of the session_cached_cursors
value can be examined at a system and session level by looking at
the values for the statistics "session cursor cache count" and
"session cursor cache hits"
- Perform occasional flushes of
the shared pool, reducing the number of searches in the shared
pool. Warning>> This can be very impactive to the system,
forcing momentarily system pauses and frequent hard re-parsing of
even shareable SQL
|
Slow Archiver
Performance Operating a database in
archivelog mode, requires that redo logs be archived prior to their
reuse. If the ARCH process cannot back up the redo log before the
database circles around to use it again, the system will have to
wait until the backup completes.
|
Several occurrences of the
"log file switch (archiving needed)" wait event. It may or may not
have a high average_wait, if the problem regularly occurs.
(v$system_event)
|
- Increase the
size of the database redo logs, reducing the frequency of log
switches.
- Increase the
number of database redo logs, reducing the frequency of log
re-use.
- Tune and/or enhance the
disk I/O and/or layout can alleviate these performance problems.
(see "Poor
I/O Performance" for more information).
|