Feed aggregator

Protect your APEX app from URL Tampering – in just a few clicks

Jeff Kemp - Sun, 2022-08-07 19:40

Recently I’ve been reviewing and updating my knowledge of APEX security, especially protection from URL tampering. I’ve read the documentation, a number of blogs, and heard from people with experience in the field such as Lino. By default, when you create a new application in APEX you get the following security settings set automatically, which is a good start:

  • Application Session State Protection is Enabled.
  • Each page has Page Access Protection set to Arguments Must Have Checksum.
  • Each Application Item has Protection Level set to Restricted – May not be set from browser.
  • Each Primary Key Item* created by a wizard has Protection Level set to Checksum Required – Session Level.

(* that is, any item mapped from a table column that is, or forms part of, a Primary Key constraint).

These default settings are considered best practice. If you change these, it becomes your responsibility to ensure that your application is protected against security vulnerabilities from URL tampering.

For page items, however, the Protection Level defaults to Unrestricted. This is ok for Form items because the page fetch process will set their values on page load, rendering any attempt at URL tampering ineffective.

For non-form page items, unless the Page Access Protection is relaxed (Unrestricted), leaving items unrestricted is safe since URL tampering is blocked for the entire page anyway. At runtime, if a malicious visitor tries to modify the item value via the URL, they will get the error “No checksum was provided to show processing for a page that requires a checksum when one or more request, clear cache, or argument values are passed as parameters.

Error message "No checksum was provided to show processing for a page that requires a checksum when one or more request, clear cache, or argument values are passed as parameters."

However, what if a developer later needs to change the page to Unrestricted? They may unwittingly introduce a potential URL tampering issue because one or more items were not protected.

The majority of these items are editable input items, so the fact that someone may input a value via the URL is not a big deal. However, for Hidden and Display Only items, it is common for application logic to depend on their values; this logic may be adversely affected by malicious values supplied via the URL.

In some cases, this default is needed in order for the application to work. Some examples when an item must be left Unrestricted are:

  • An item is changed by a Dynamic Action (whether via a Set Item Value, via the Items to Return of a Execute Server-side Code action, or in some custom JavaScript), and cannot have Value Protected set because the page may be submitted.
  • We do actually intend the item to be set via the URL, e.g. when an external web page has a link that sets the item’s value.

In all these cases, the application must be designed to ensure it does not “trust” the value of these items; it should apply suitable checks to ensure the values are valid.

In most cases, it is best practice to set the item Protection Level to Checksum Required – Session Level (or Restricted – May not be set from browser where supported).

You can use a query like this to discover all items that may need to be reviewed:

select
    i.application_id,
    i.page_id,
    i.page_name,
    i.region,
    i.item_name,
    i.display_as
from apex_application_page_items i
where i.application_id = :app_id
and i.item_protection_level = 'Unrestricted'
and i.display_as_code in ('NATIVE_HIDDEN','NATIVE_DISPLAY_ONLY')
order by i.application_id, i.page_id, i.region, i.item_name;
Report showing APPLICATION_ID, PAGE_ID, PAGE_NAME, REGION, ITEM_NAME, DISPLAY_AS, listing two Display Only items (P1_DISPLAY_ONLY_ITEM, P1_DISPLAY_ONLY_ITEM_DA) and two Hiden items (P1_HIDDEN_ITEM, P1_HIDDEN_ITEM_DA) that have Item Protection Level set to "Unrestricted".

Other excellent options are using third-party security scanners such as APEXSec and APEX-SERT to alert you to potential issues in your application. We mandate the use of tools like these internally at Oracle for our own applications and they are a great help.

Using the Session State Protection Wizard

One easy step you can take right now is to use the Session State Protection wizard. It gives you a quick overview of what level of protection your application has against URL tampering, and gives an easy way of fixing the relevant attributes in bulk.

You can access the wizard via Shared Components > Session State Protection

Screenshot of part of App Builder Shared Components; under the Security heading, we want to click on "Session State Protection".

Alternatively, you can access the wizard via Edit Application Definition > Security > Session State Protection > Manage Session State Protection

Screenshot of part of the App Builder Edit Security Attributes page, under the Security tab. In the Session State Protection section, we want to click on the button "Manage Session State Protection".

The wizard starts by showing an overview of the current state of your application’s protection against URL tampering.

Screenshot of the App Builder Session State Protection overview page.
It indicates that my application has Session State Protection = Enabled.
It shows that one page is set to "Arguments Must Have Checksum", one page allows "No URL Access", and one page is "Unrestricted".
It indicates that all 9 Page Items in the application are set to "Unrestricted".
It indicates that of the application's Application Items, one is set to "Restricted - May not be set from browser", one is "Checksum Required - Session Level", and one is "Unrestricted".
Next to each category a ">" icon button is shown.
At the bottom of the page is the button "Set Protection".

You can see if your application has Session State Protection enabled (which it should, really), and if any pages, page items, and/or application items are unprotected. In my sample app here, it’s obvious that there are some potential security issues that need to be reviewed.

You can click the > buttons next to each category to list all the pages and items that need to be reviewed.

The main things to watch out for are Pages, Page Items, and Application Items that are set to Unrestricted. Other values are generally fine.

If you see any Items which are set to Checksum Required but not at the Session Level, you may find that a developer has simply set them incorrectly and you should consider changing them to Session Level. However, there are some scenarios where the other levels (Application Level, or User Level) are required.

Now, I might now go through the application page-by-page and set the protection level on each page and item as appropriate. This could be a laborious process for a large application.

A good alternative is to use this wizard to set the protection level in bulk. In this case, I’m going to click Set Protection.

Screenshot of page 1 of the Session State Protection Wizard.
It is asking to Select an Action - either "Disable", or "Configure". I have selected "Configure".
Buttons at the bottom of the page allow me to Cancel, or go to the Next page.I’ve selected the action Configure, then click Next. Screenshot of page 2 of the Session State Protection Wizard.
This page allows me to select the Page Access Protection (defaulted to "Arguments Must Have Checksum"), the Page Data Entry Item Protection, the Page Display-Only Item Protection, and the Application Item Protection. These last three are all defaulted to "Checksum Required - Session Level".
Buttons at the bottom of the page allow me to go to the Previous page, Cancel, or go to the Next page.The wizard now gives me the opportunity to modify the protection level on my pages and items in bulk. I’m going to accept the defaults (Arguments Must Have Checksum / Checksum Required – Session Level) because they are appropriate for most cases in my application. Screenshot of the last page of the Session State Protection Wizard.
This allows me to confirm the changes that will be made to pages and items in the application.
Buttons at the bottom of the page allow me to go to the Previous page, Cancel, or Finish.After reviewing the summaries of the changes that the wizard will make, I click Finish. Screenshot of the App Builder Session State Protection overview page, after running the wizard.
The page now indicates that all 3 pages in my application are set to "Arguments Must Have Checksum", that all 9 Page Items and the 3 Application Items are now set to "Checksum Required - Session Level".

Perfect!

Final Steps

Now, I need to check for hidden page items that are now restricted that might need to be returned to Unrestricted. Otherwise, users will see the error “Session state protection violation” when they submit the page, if a dynamic action has changed them.

 This may be caused by manual alteration of protected page item P1_DISPLAY_ONLY_ITEM_DA. If you are unsure what caused this error, please contact the application administrator for assistance."

The following query will alert me to any Hidden items that have Value Protected switched off (e.g. because they need to be submitted):

select
    i.application_id,
    i.page_id,
    i.page_name,
    i.region,
    i.item_name,
    i.display_as
from apex_application_page_items i
where i.application_id = :app_id
and i.item_protection_level != 'Unrestricted'
and i.display_as_code = 'NATIVE_HIDDEN'
and i.attribute_01 = 'N' -- Value Protected
order by i.application_id, i.page_id, i.region, i.item_name;
Report showing APPLICATION_ID, PAGE_ID, PAGE_NAME, REGION, ITEM_NAME, DISPLAY_AS listing one entry for "P1_HIDDEN_ITEM_DA", a hidden item that is set to "Unrestricted" and has attribute_01 (Value Protected) set to "N".

Now I can review this item to check if Value Protected really needed to be switched off. If the page is never submitted, or the item is never changed by any dynamic actions, this could be switched On. Otherwise, I need to set the item protection to Unrestricted in order for the page to work.

Having made changes to the application, I need to test to ensure I haven’t introduced any issues. My focus will be mainly on the following areas:

  1. Navigation – e.g. do the View or Edit buttons in all reports still work?
  2. Dynamic actions – e.g. do all the dynamic actions and custom javascript still work on all pages that set item values?

For #1, I’m looking for any links that include item values that were not correctly built. If the application generates any links using just string concatenation, it will fail if the target page expects a checksum. The application should build these links using declarative link attributes if possible, or by calling apex_page.get_url (or apex_util.prepare_url at least).

For #2, I would test to ensure that after triggering a dynamic action or javascript code that modifies an item’s value, that the form is still submitted (saved) without error.

Further Reading

The format of the ROWID

Hemant K Chitale - Sun, 2022-08-07 04:31

 A ROWID is a method of identifying the physical location of a row in an Oracle Database.  An Index on a Table captures the ROWIDs for the rows holding the index key values and these entries in the Index are how an Index lookup redirects a query to the row (i.e. physical location) in the table.

A ROWID (called an Extended ROWID) consists of 4 components :

    -    DataObject Number

    -    DataFile Number Relative to the Tablespace

    -    DataBlock Number (within the DataFile)

    -    RowNumber within the DataBlock


A Partitioned Table actually consists of multiple segments.  Each segment has a different DataObject Number.

Here is a quick demo of the difference between a normal (Non-Partitioned) Table and a Partitioned Table :


The Normal Table :



SQL> create table NONPARTITIONED (id_col number, data_col varchar2(1000)) pctfree 99 tablespace HEMANT_DATA;

Table created.

SQL>
SQL> insert into NONPARTITIONED
2 values (1, dbms_random.string('X',999));

1 row created.

SQL> insert into NONPARTITIONED
2 values (2,'This is the second row');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> select rowid, id_col, substr(data_col,1,24)
2 from NONPARTITIONED
3 order by id_col
4 /

ROWID ID_COL SUBSTR(DATA_COL,1,24)
------------------ ---------- ------------------------------------------------------------------------------------------------
AAAT70AAgAAAACTAAA 1 CBXBRIP5ZNQ9VPZNC4HHVJJH
AAAT70AAgAAAACXAAA 2 This is the second row

SQL>
SQL> l
1 select id_col,
2 dbms_rowid.rowid_object(rowid) ObjectNumber,
3 dbms_rowid.rowid_relative_fno(rowid) RelativeFileNumber,
4 dbms_rowid.rowid_block_number(rowid) BlockNumber
5 from NONPARTITIONED
6* order by id_col
SQL> /

ID_COL OBJECTNUMBER RELATIVEFILENUMBER BLOCKNUMBER
---------- ------------ ------------------ -----------
1 81652 32 147
2 81652 32 151

SQL>


Because I created the Table with PCTFREE 99 and inserted a long string in the first row, the second row was created in a different block.  Both Blocks are in the same Relative File Number (32) and belong to the same Object (ObjectNumber 81652).  Is this really the Object ID ?



The Partitioned Table :


SQL> l
1 create table PARTITIONED (id_col number, data_col varchar2(1000))
2 partition by range (id_col)
3 (
4 partition P_1 values less than (2) tablespace HEMANT_DATA,
5 partition P_2 values less than (3) tablespace HEMANT_DATA,
6 partition P_3 values less than (4) tablespace HEMANT_DATA,
7 partition P_MAX values less than (MAXVALUE) tablespace HEMANT_DATA
8 )
9* tablespace HEMANT_DATA
SQL> /

Table created.

SQL>
SQL> insert into PARTITIONED
2 values (1, dbms_random.string('X',999));

1 row created.

SQL> insert into PARTITIONED
2 values (2,'This is the second row');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> select rowid, id_col, substr(data_col,1,24)
2 from PARTITIONED
3 order by id_col
4 /

ROWID ID_COL SUBSTR(DATA_COL,1,24)
------------------ ---------- ------------------------------------------------------------------------------------------------
AAAT77AAfAAAAJ3AAA 1 RFU3DNMCD6GXL2ZNV9DDGBG2
AAAT78AAfAAAAZ3AAA 2 This is the second row

SQL>
SQL> l
1 select id_col,
2 dbms_rowid.rowid_object(rowid) ObjectNumber,
3 dbms_rowid.rowid_relative_fno(rowid) RelativeFileNumber,
4 dbms_rowid.rowid_block_number(rowid) BlockNumber
5 from PARTITIONED
6* order by id_col
SQL> /

ID_COL OBJECTNUMBER RELATIVEFILENUMBER BLOCKNUMBER
---------- ------------ ------------------ -----------
1 81659 31 631
2 81660 31 1655

SQL>


In this case, the two rows are in different Blocks not because of the PCTFREE (which has defaulted to 10) but because they are in different Segments -- as you can see from the ObjectNumbers being different for the two rows.
(You might have also noticed that these were created in a separate datafile, FILENUMBER 31 instead of 32 {as was for the first table}, but that is because Oracle tries to allocate new segments across different datafiles)

In the ROWID format the ObjectNumber is actually the *Data Object Number* that identifies the  Segment, not the Object Number of the Table.

Thus, to verify the Segments of the two tables, I can query and check :


SQL> l
1 select object_name, subobject_name, object_type, object_id, data_object_id
2 from user_objects
3 where object_name in ('NONPARTITIONED','PARTITIONED')
4 and object_type in ('TABLE','TABLE PARTITION')
5* order by 1, 2 nulls first, 4
SQL> /

OBJECT_NAME SUBOBJECT_NAME OBJECT_TYPE OBJECT_ID DATA_OBJECT_ID
---------------- ---------------- ----------------------- ---------- --------------
NONPARTITIONED TABLE 81652 81652
PARTITIONED TABLE 81658
PARTITIONED P_1 TABLE PARTITION 81659 81659
PARTITIONED P_2 TABLE PARTITION 81660 81660
PARTITIONED P_3 TABLE PARTITION 81661 81661
PARTITIONED P_MAX TABLE PARTITION 81662 81662

6 rows selected.

SQL>



Thus, for the NONPARTITIONED Table, the Object_ID and Data_Object_ID and that returned by DBMS_ROWID are all the same -- 81652.  
But the logical entry for the PARTITIONED Table has an Object_ID of 81658 but, without any segment and, therefore, without a Data_Object_ID.
The rows in this Partitioned Table are actually created in the two different Partition Segments with the corresponding Data_Object_ID  (81659 and 81660).



We know that when we rebuild a Table, the ROWID changes.  But this is actually because a new Segment is allocated.  

Thus, if I were to do a MOVE of the "Normal" Table :



SQL> alter table NONPARTITIONED move;

Table altered.

SQL>
SQL> l
1 select id_col,
2 dbms_rowid.rowid_object(rowid) ObjectNumber,
3 dbms_rowid.rowid_relative_fno(rowid) RelativeFileNumber,
4 dbms_rowid.rowid_block_number(rowid) BlockNumber
5 from NONPARTITIONED
6* order by id_col
SQL> /

ID_COL OBJECTNUMBER RELATIVEFILENUMBER BLOCKNUMBER
---------- ------------ ------------------ -----------
1 81663 32 155
2 81663 32 156

SQL>
SQL> l
1 select object_name, subobject_name, object_type, object_id, data_object_id
2 from user_objects
3 where object_name in ('NONPARTITIONED','PARTITIONED')
4 and object_type in ('TABLE','TABLE PARTITION')
5* order by 1, 2 nulls first, 4
SQL> /

OBJECT_NAME SUBOBJECT_NAME OBJECT_TYPE OBJECT_ID DATA_OBJECT_ID
---------------- ---------------- ----------------------- ---------- --------------
NONPARTITIONED TABLE 81652 81663
PARTITIONED TABLE 81658
PARTITIONED P_1 TABLE PARTITION 81659 81659
PARTITIONED P_2 TABLE PARTITION 81660 81660
PARTITIONED P_3 TABLE PARTITION 81661 81661
PARTITIONED P_MAX TABLE PARTITION 81662 81662

6 rows selected.

SQL>


Executing a MOVE of the Non-Partitioned Table resulted in a change of the *Data Object Number* (i.e DATA_OBJECT_ID) (from 81652 to 81663) without changing the OBJECT_ID.



For a couple of more interesting aspects of ROWIDs, see this YouTube video "Think you know how the ROWID works? Think again!" by Connor McDonald




Categories: DBA Blogs

The old old coolness - HTMX | Execute HTTP request directly from HTML tag

Andrejus Baranovski - Sat, 2022-08-06 13:47
Developing a modern Web app with Django and Tailwind? Thinking of using React, Vue, or another fancy JS library for UI implementation? I would recommend spending some time and trying HTMX. With HTMX you can execute partial requests and return HTML content to replace web page regions. HTMX properties are defined directly on HTML tags, this makes it easy to use and flexible.

 

パティシエになるために専門学校に進学するなら

The Feature - Fri, 2022-08-05 20:53

パティシエは人気のある職業です。そのため、パティシエを養成する専門学校も数多くあります。数ある学校の中から自分にあったものを見つけるにはどうしたらいいのでしょうか。まずは講師陣を調べてみる方法があります。

自分が目指したいパティシエ像がはっきりしている人におすすめの方法です。一口にお菓子といってもタルト、スポンジ系、ムース系とさまざまなので、その中で何を得意としてやっていきたいのかがわかれば、それを得意とする講師がいるかどうかがカギとなります。パンフレット類では有名な人が名義貸しで講演会程度でしか授業に関わっていなくても掲載されていることがあります。どの専門学校でもオープンキャンパスが行われているので、そういったもので確かめてみるといいでしょう。

次に実習先から選ぶ方法があります。学校内の授業だけでなく、インターンシップを取り入れているかどうか、調べてみましょう。これもオープンキャンパスで在校生から情報を得るとよりわかりやすいでしょう。具体的にパティシエ像をもっておらず、なんとなく意識している人については、専門学校で学べるプラスアルファの部分をよく見るとよいでしょう。

今やどの専門学校も競争なので「うちの学校の特色はこれだ」というものを打ち出しているところが多いです。職人である前に社会人として必要な知識を身につけるビジネスマナーの授業を取り入れているところもあります。また、お菓子に関連してカフェの勉強ができるところもあります。カリキュラムでこういう情報は手に入れられます。

しっかりと見極め、自分にあった学校を選んでください。

Categories: APPS Blogs

パティシエになるために専門学校に進学するなら

Marian Crkon - Fri, 2022-08-05 20:53
パティシエは人気のある職業です。そのため、パティシエを養成する専門学校も数多くあります。数ある学校の中から自分...

Indexes with or without TRIM

Tom Kyte - Fri, 2022-08-05 15:46
If I have an index, for example, by CD_CLIENT, CD_COMMOD, CD_MERCAD, and there is a query that uses them with TRIM. So: ....... <code>WHERE TRIM( CUSTOMER_CD ) = ... AND TRIM(CD_COMMOD) = ... AND TRIM(CD_MARKET) = ...</code> Should I make another index that has the TRIM ? Or is that enough? Thank you very much!!
Categories: DBA Blogs

Your suggestions on OLAP technology

Tom Kyte - Fri, 2022-08-05 15:46
We are evaluating OLAP technology for implementing some analytics for our product. We are reviewing technologies from Oracle, Microsoft and other vendors. Correct my if I am wrong, OLAP technology with Oracle looks like an early stage feature. I got this assumption when comparing the features, ease of use etc with other vendors. Even Microsoft also well in advance in that area. I couldn?t see much customer references for Oracle OLAP. Please comment on the following issue as you have more expertise with the different technologies and industry? What are your suggestions on choosing OLAP technology? What are your suggestions on Oracle OLAP? Should we consider it? If yes, why?
Categories: DBA Blogs

NVARCHAR2 > 2000 - MAX_STRING_SIZE

Tom Kyte - Fri, 2022-08-05 15:46
Hello, I inherited from a schema that is using nvarchar2. I am trying to recreate that schema but I am struggling to create table with nvarchar2 that have a size of 4000 (> 2000). I have changed the max_string_size to "extended" but it does not works. I works varchar2 with e.g. 32676 bytes. Here are the operation I made. <code> sqlplus / as sysdba ALTER SESSION SET CONTAINER=CDB$ROOT; ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE; shutdown immediate; startup upgrade; ALTER PLUGGABLE DATABASE ALL OPEN UPGRADE; EXIT; cd $ORACLE_HOME/rdbms/admin mkdir -p /home/oracle/tmp/log_utl32k $ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS --force_pdb_mode 'UPGRADE' -d $ORACLE_HOME/rdbms/admin -l '/home/oracle/tmp/log_utl32k' -b utl32k_cdb_pdbs_output utl32k.sql sqlplus / as sysdba shutdown immediate; startup; ALTER PLUGGABLE DATABASE ALL OPEN READ WRITE; EXIT; mkdir -p /home/oracle/tmp/utlrp $ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS --force_pdb_mode 'READ WRITE' -d $ORACLE_HOME/rdbms/admin -l '/home/oracle/tmp/utlrp' -b utlrp_cdb_pdbs_output utlrp.sql sqlplus / as sysdba create table testvc (name varchar2(32676)); >>Table created. create table testnvc (name nvarchar2(4000)); >>ORA-00910: specified length too long for its datatype </code> I have checked several resources but I cannot find a solution. Any suggestion are welcomed.
Categories: DBA Blogs

drop partition

Jonathan Lewis - Fri, 2022-08-05 14:24

This note is about some testing I did on the consequences of the (new in 12c) “deferred global index maintenance” feature that Oracle introduced as a strategy to reduce the impact of dropping partitions from a partitioned table.

Looking at my notes I see that created my first test in August 2013 on Oracle 12.1.0.1 – probably after reading Richard Foote’s series on the topic.

At the time I didn’t turn my notes into blog post but a recent request on the MOS Community Forum (needs a MOS account) prompted me to revisit and extend the tests using 19c.

  1. The Request
  2. The Background
  3. The Model
  4. Tests and Results
  5. Deep Dive
    1. dbms_space
    2. Tree dumps
    3. Dumping Redo
    4. Transactions and locking
  6. Summary
  7. Footnote

The Request

The database is 19.3, two-node RAC with a standby (type and function not specified). There is a table range-partitioned by month holding nearly three years of data. The table size is about 250GB and the OP wants to drop the partitions older than one year.

There was an issue doing this on some other environment that when running the daily maintenance windows “it consumes a lot of CPU but I could not find a link between both activities”.

Is there any thing that could be done to avoid any db impact especially as this is a “production 24” environment?

There was no comment about how many indexes there were (and that’s an important detail), nor how many were global, globally partitioned, or local (also important details), but there was a comment that the pmo_deferred_gidx_main_job was run immediately after the drop, generated a lot of redo, and was still running after 10 hours – so it’s reasonably safe to assume that global indexes had a big impact since that’s the “partition maintenance operations deferred global index maintenance” job.

From the comment about the system being “production 24” I assume that the target is to come up with a strategy that doesn’t deny access to the users for a few hours, has the least possible impact on what they normally do, and doesn’t require the standby database to be (partially) rebuilt / unavailable.

Since this is Oracle 19c and the OP wants to drop nearly two-thirds of the data (i.e. significantly more than he’s keeping) the “obvious” strategy to investigate is dropping the partitions (online, with update indexes) then rebuilding the global (or globally partitioned) indexes (online).

At a minimum it would be sensible to do some modelling to get some idea of why the other system spent so much time in pmo_deferred_gidx_main_job as this might allow you to work out either that it wouldn’t be a problem in this system, or that there was a variation on the method that would be better, or that you just don’t want to use the job because you’ve got a good idea of just how nasty it would be.

The Background

Deferred index maintenance means that global index maintenance does not take place when you drop a partition. Historically Oracle would, as part of the drop, delete every single index entry for the dropped partition from every single global index – doing a lot of work and taking a lot of time at the moment of the drop. Deferred maintenance means Oracle simply notes which object_ids no longer exist and when reading through a global index ignores index entries where the rowid includes the object_id of a dropped object.

Note: the rowid stored in a B-tree index for a global index is made up of 4 components that require 10 bytes of storage. In order these are: (object_id of table partition, file_id, block_id, row number within block). For a local index or index on a simple heap table the object_id of the table can be inferred from the identity of the index so it is not stored and the rowid needs only 6 bytes of storage.

So the benefit of deferred maintenance is that dropping a partition takes virtually no time at all, but (a) Oracle has to clean up the garbage at some point and (b) until the garbage has been cleaned up it has to be read before it can be ignored.

A thought about the second point – if Oracle can check for dropped object_ids very efficiently then it doesn’t necessarily matter that you haven’t cleaned out the garbage. You application won’t instantly run more slowly, it’s just that you won’t have achieved the (possible) benefit of a smaller index that might allow the application to run a little faster.

So here’s a hypothesis to explain why the OP’s previous experience of deferred maintenance was very slow : if you update global indexes in real time Oracle does that job as efficiently as possible because it can use key values and rowids from the table segment that it’s dropping to create a “delete array” for the index; if you defer the maintenance Oracle has to walk through the index one entry at a time and work out whether or not to delete it – and we all know that single-row processing is more expensive than array-processing. It’s worth noting that there are notes on MOS to support this hypothesis, e.g. Bug 27468233 : ALTER INDEX COALESCE CLEANUP IS GENERATING HUGE AMOUNT OF REDO reports an example of generating 23GB of redo while cleaning up an index of only 1.8GB. (Version 12.2.0.1)

So let’s build a model and do some simple tests.

The Model

I’m going to build a table with 6 (range) partitions and two global indexes. I’ll set up two very different patterns of data for the two indexes to see how much impact the data pattern might have.

I’ll drop the bottom three partitions, then clean up the mess in a variety of ways. There’s the call to the pmo_deferred_gidx_main_job, which normally runs at 2:00 am daily but can be initiated by a call to dbms_scheduler.run_job; then there’s the dbms_part.cleanup_gidx() procedure that has a couple of options; then there’s a simple call to “alter index coalesce cleanup [only][parallal N]” (which needs improved documentation) and finally, of course, “alter index rebuild [online]”.

For at least some runs of the tests it will be worth enabling SQL_trace to see what happens behind the scenes; and it’s always worth checking the session activity stats – and maybe some activity from some other dynamic performance views as well.

So here’s the code to create the data set:

rem
rem     Script:         12c_global_index_maintenance_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested:
rem             19.11.0.0
rem

create table t1 
partition by range(id) (
        partition p09000 values less than ( 9000),
        partition p18000 values less than (18000),
        partition p27000 values less than (27000),
        partition p36000 values less than (36000),
        partition p45000 values less than (45000),
        partition p54000 values less than (54000)
)
as
select
        rownum - 1                      id,
        trunc(dbms_random.value(0,600)) n1,
        rpad('x',100)                   padding
from
        all_objects ao
where
        rownum <= 54000
;

create index t1_n1 on t1(n1) pctfree 0;
create index t1_id on t1(id) pctfree 0;

select 
        index_name, num_rows, s.blocks, leaf_blocks, status, orphaned_entries 
from 
        user_indexes i, user_segments s 
where 
        i.index_name = s.segment_name 
and     i.table_name='T1' 
and     partitioned = 'NO'
;

alter table t1 drop partition p09000, p18000, p27000 update global indexes; 

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

select 
        index_name, num_rows, s.blocks, leaf_blocks, status, orphaned_entries 
from 
        user_indexes i, user_segments s 
where 
        i.index_name = s.segment_name 
and     i.table_name='T1' 
and     partitioned = 'NO'
;


For testing purposes I’ve set the index pctfree to 0; and I’ve reported some of the index stats before and after dropping the three partitions so that we can see what the optimizer thinks the indexes look like:

Index size information before drop
==================================
INDEX_NAME             NUM_ROWS     BLOCKS LEAF_BLOCKS STATUS   ORP
-------------------- ---------- ---------- ----------- -------- ---
T1_ID                     54000        256         134 VALID    NO
T1_N1                     54000        256         128 VALID    NO


Index size information after drop
=================================
INDEX_NAME             NUM_ROWS     BLOCKS LEAF_BLOCKS STATUS   ORP
-------------------- ---------- ---------- ----------- -------- ---
T1_ID                     27000        256          68 VALID    YES
T1_N1                     27000        256         128 VALID    YES

Both indexes are valid (which is good for the application) and their segment sizing has not changed. The number of rows has halved in both indexes but the number of (populated) leaf blocks has remained unchanged one index even though it has halved for the other.

If you dumped a few index leaf blocks the explanation for the changes (and the difference in the changes) would become clear. The number of (non-deleted) index entries in the two indexes has not changed, but Oracle is (almost literally) ignoring half of them – the ones that include the object_ids for the original first three table partitions.

The t1_id index is on the (sequential) id and the table is partitioned by id, and we have dropped the partitions that hold (nothing but) the ids less than 27,000 – in earlier versions of Oracle this would have immediately deleted all the index entries from the first half of the index, leaving all the leaf blocks in the 2nd half of the index full – although the index entries are still in those blocks Oracle is behaving as if they don’t exist, which means it treats the blocks as empty when calculating leaf_blocks statistic. The t1_n1 index is on integer values from 0 to 599, randomly distributed across the full range of ids, so by dropping the partitions for ids less than 27,000 we (ought to) have deleted the first half of the index entries for n1 = 0, the first half for n1 = 1 and so on – leaving every index leaf block approximately half empty and available for inclusion in the leaf_blocks count.

How, then, does Oracle manage to “ignore” the rows that would have been deleted in older versions. We can always enable SQL tracing when gathering stats, run tkprof against the trace file, and look for the SQL that Oracle used – and if that doesn’t reveal all, use the sql_id of the relevant statements to pull their plans from memory. Here’s the query (reformatted) and plan for one of the index stats gathering queries that I pulled from memory after finding it and its sql_id in the tkprof output:

SQL> select * from table(dbms_xplan.display_cursor('gtnd3aphdkp3k'));

SQL_ID  gtnd3aphdkp3k, child number 0
-------------------------------------
select /*+ 
                opt_param('_optimizer_use_auto_indexes' 'on')
                no_parallel_index(t, "T1_ID")  
                dbms_stats  cursor_sharing_exact  use_weak_name_resl 
                dynamic_sampling(0)  no_monitoring  xmlindex_sel_idx_tbl 
                opt_param('optimizer_inmemory_aware' 'false')
                no_substrb_pad  no_expand index(t, "T1_ID") 
        */ 
        count(*)                                          as nrw,
        count(distinct sys_op_lbid(130418, 'L', t.rowid)) as nlb,
        null                                              as ndk,
        sys_op_countchg(substrb(t.rowid, 1, 15), 1)       as clf 
from
        "TEST_USER"."T1" t where "ID" is not null

Plan hash value: 4265068335

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |   136 (100)|          |
|   1 |  SORT GROUP BY   |       |     1 |    17 |            |          |
|*  2 |   INDEX FULL SCAN| T1_ID | 27000 |   448K|   136   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((TBL$OR$IDX$PART$NUM(<?>,0,8,0,"T".ROWID)=1 AND "ID" IS NOT NULL))

If you’ve read Cost Based Oracle – Fundamentals you’ll recognise the SQL is typical of the pattern Oracle uses to gather stats on an index, with a couple of sys_op() function calls that dissect rowids to allow Oracle to calculate the number of leaf_blocks in, and clustering_factor of, the index. What’s new, though is the filter() predicate on the index scan that (presumably) is checking that the rowid belongs to a table partition that still exists. (In other circumstances the “<?>” would be the table-name, possibly it’s a printing error here. The value for 8 as the third parameter also appears in queries involving table-expansion with partial indexing.)

Unsurprisingly, if you execute a simple query driven through one of the indexes after dropping partitions you’ll see exactly the same filter() predicate generated for the execution plan for the range scan operation.

SQL_ID  822pfkz83jzhz, child number 0
-------------------------------------
select  /*+ index(t1(n1)) */  id from t1 where n1 = 300

Plan hash value: 2152633691

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |      1 |        |    44 (100)|     40 |00:00:00.01 |      41 |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| T1    |      1 |     45 |    44   (0)|     40 |00:00:00.01 |      41 |
|*  2 |   INDEX RANGE SCAN                         | T1_N1 |      1 |     45 |     1   (0)|     40 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=300)
       filter(TBL$OR$IDX$PART$NUM(<?>,0,8,0,"T1".ROWID)=1)

The tbl$or$idx$part$num() function is an important iten to consider at this point. How much impact will it have on your processing – it’s hard to give a generic answer since it may depend on exactly what your data looks like and whether or not the function can cache its result effectively. It’s also possible that the performance of the function is related to either the number of partitions dropped or the number of partitions still in existence – so that’s a detail that probably has to be tested at the correct scale before you go into production

More significantly, perhaps, is how long that impact is going to be relevant and what savings it has to be balanced against. My thought at this point is that if you drop a partition but don’t clean up the index you reduce your workload by not visiting (possibly cached) table blocks, but you pay for the benefit by calling the function for every index entry you visit (whether or not it would have required you to visit the table prior to dropping the partition). Maybe it’s okay to leave the index uncleaned for a few days or even a few weeks before you take any steps to clean up the mess; if that’s the case then maybe you can spread a relatively large number of clean-up jobs over a long enough period of time that their impact doesn’t become visible to the users.

Tests and Results Test number 1:

What does pmo_deferred_gidx_main_job do?

I had to connect as a dba and grant alter on this job to my normal test user to be able to do the following test. The default role for my test user had also been granted create job, so that might have been needed as well; and you’ll see that I’ve included a number of my typical “v$ snapshot” procedures to measure different aspects of the workload, and I’ve enabled the 10046 trace to see what Oracle does behind the scenes.

alter system flush buffer_cache;

execute snap_enqueues.start_snap
execute snap_events.start_snap
execute snap_my_stats.start_snap
execute snap_redo.start_snap
execute snap_rollstats.start_snap

alter session set events '10046 trace name context forever, level 8';

execute dbms_scheduler.run_job('SYS.PMO_DEFERRED_GIDX_MAINT_JOB',true)

alter session set events '10046 trace name context off';

execute snap_rollstats.end_snap
execute snap_redo.end_snap
execute snap_my_stats.end_snap
execute snap_events.end_snap
execute snap_enqueues.end_snap

The most significant discovery in this test was that the package (ultimately) executed two SQL statements:

ALTER INDEX "TEST_USER"."T1_ID" COALESCE CLEANUP PARALLEL 1
ALTER INDEX "TEST_USER"."T1_N1" COALESCE CLEANUP PARALLEL 1

So really all I needed from that point onwards was to worry about testing and/or optimising variations of the “alter index coalesce” command.

Test number 2:

What does the procedure dbms_part.cleanup_gidx() do?

This procedure takes a schema name and table name as its first inputs with defaults null and has two other parameters called parallel (default 0) and options (default ‘CLEANUP_ORPHANS’); the only other value you can supply for options is ‘COALESCE’. Again the 10046 trace of my wrapper was very useful, as this showed the following SQL when I specified just the schema and table name for the call:

        ALTER INDEX "TEST_USER"."T1_ID" COALESCE CLEANUP ONLY
        ALTER INDEX "TEST_USER"."T1_N1" COALESCE CLEANUP ONLY

The 19c manual does mention the “ONLY” keyword, but doesn’t explain its significance (but I will in a moment). If I re-ran the test with options set to ‘COALESCE’ the SQL statements changed to:

        ALTER INDEX "TEST_USER"."T1_ID" COALESCE CLEANUP
        ALTER INDEX "TEST_USER"."T1_N1" COALESCE CLEANUP

This did more work than the “cleanup only” run (figures in a moment). When I re-ran the tests setting the parallel option to 2 the same SQL statement appeared with “PARALLEL 2”.

So here are some of the most important numbers for the calls to dbms_part.cleanup_gidx(). They are the headline undo and redo figures from the session statistics:

Default: (cleanup_orphans)
==========================
Name                                       Value
----                                       -----
redo entries                              54,283
redo size                             12,412,092
undo change vector size                4,762,432

options=>"Coalesce"
====================
Name                                       Value
----                                       -----
redo entries                              57,427
redo size                             15,518,660
undo change vector size                6,466,264

As you consider those figures, let me remind you that the indexes started out at roughly 1MB each, and we dropped 27,000 rows in three partitions. A quick check of the “cleanup orphans” arithmetic and you can see (approx) 54,000 redo entries = (two indexes * 27,000 rows each), and 12.4M redo size / 54K redo entries = 230 bytes per index entry. I’ve highlighted that last result because that’s a number you can use as a baseline to estimate the redo that will be generated by cleaning up global indexes. How many rows will be dropped from the table, how many global indexes have you got on the table – multiply the two together and multiply by 230.

Of course, that’s just for “cleanup only” and it assumes that every row appears in every index (which, in a well engineered system, won’t be true). Where does the extra 3MB of redo come from? Let’s drop down one more level in the processing, and run explicit “alter index” statements through the test harness.

Test number 3:

What does alter index xxx coalesce cleanup [only] do?

Here are the redo and undo summaries fron two sets of tests – “coalesce cleanup only”, and “coalesce cleanup”

Index t1_n1                         Cleanup only            Cleanup
-----------                         -------------------------------
redo entries                              27,139             28,176
redo size                              6,209,140          8,930,900
undo change vector size                2,382,564          3,999,900

Index t1_id                         Cleanup only            Cleanup
-----------                         -------------------------------
redo entries                              27,144             28,966
redo size                              6,202,436          6,547,460
undo change vector size                2,379,868          2,461,908

For “coalesce cleanup only” the workload for the two indexes is (effectively) identical – it’s basically the undo and redo from marking 27,000 index entries as deleted and doing nothing else. The blocks have note been cleaned up in any way, that will be left to future sessions which need to insert and entry in a block and find that it is full, but has lots of space that can be reclaimed from deleted entries.

When we use the “coalesce cleanup” (i.e. without “only”) Oracle does some extra work, but the amount of work varies significantly depending on the nature of the index: t1_n1 generates an extra 2.7MB of redo, index t1_id generates only another 345KB. That may be a little surprising, but we’ve already had a clue that something like this might happen, and since every other strategy for “cleaning” the indexes comes down to running these variations of the coalesce command we should look a little further into what they do and how they work.

To get a complete picture we’ll have to do some work with the dbms_space package, the index treedump command, dumping redo, and we also ought to take a look at v$rollstat and v$enqueue_stat. and we’ll pursue that in the Deep Dive section.

Test number 4:

What does alter index rebuild online do?

There’s a very important point to check in this test – if your database is noarchivelog mode the rebuild will be nologging. and you’ll be fooled by the apparent efficiency of the mechanism right up to the point where you go to production and find that you’re generating a huge amount of redo. For the record, my indexes were roughly 64 blocks (512KB) each when rebuilt and produced the following redo figures (and virtually no undo):

Index t1_n1
-----------
redo entries                                 343
redo size                                594,984
redo size for direct writes              527,148
undo change vector size                   18,528
sorts (rows)                              27,058

Index t1_id
-----------
redo entries                                 345
redo size                                625,956
redo size for direct writes              560,092
undo change vector size                   18,096
sorts (rows)                              27,022

I’ve included the sorts statistic as a reminder that there are other potentially nasty overheads to consider. And when you do an online rebuild Oracle will have to lock the table briefly create a journal table (effectively a materialized view log) to capture the changes that go on while the rebuild is running then apply them when the rebuild is nearly complete, and the rebuild has to be based on a tablescan and sort.

Depending on what fraction of the partitions you are dropping, though, this does look like a very promising option – especially when you have to cater for the problem of shifting the redo to a remote site.

Deep Dive

We have seen how much redo was generated for both “coalesce cleanup” and “coalesce cleanup only” and have an idea that we know what’s happening, so we will be taking a look at some redo dumps to see if that confirms our suspicion. Before we get to that extreme, though, it’s worth taking a couple of simpler steps.

The dbms_space package

We can use the dbms_space.space_usage() procedure to see the state of blocks in the two index segments before and after the attempts to cleanup. It’s important to remember that for index segments the procedure uses the “FS2” state to report blocks that are “free”, i.e. formatted but contain no data, aren’t in the index structure and can be linked in to the index when an existing block needs to be split.

Here are three sets of results from calls to the procedure showing the state immediately before (and after) the partitions were dropped, then after a “coalesce cleanup only” test, and then after a “coalesce cleanup” test.

Index T1_N1                   Blocks       Bytes    |  After "Cleanup Only" |   After "Cleanup" 
----------------------------------------------------|-----------------------|-------------------
Unformatted                   :    0 /         0    |      0 /         0    |      0 /         0
Freespace 1 (  0 -  25% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 2 ( 25 -  50% free) :    1 /     8,192    |      1 /     8,192    |     67 /   548,864
Freespace 3 ( 50 -  75% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 4 ( 75 - 100% free) :    0 /         0    |      0 /         0    |      0 /         0
Full                          :  134 / 1,097,728    |    134 / 1,097,728    |     68 /   557,056


Index T1_ID                   Blocks       Bytes    |  After "Cleanup Only" |  After "Cleanup"
----------------------------------------------------|-----------------------|-------------------
Unformatted                   :    0 /         0    |      0 /         0    |      0 /         0
Freespace 1 (  0 -  25% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 2 ( 25 -  50% free) :    1 /     8,192    |      1 /     8,192    |     65 /   532,480
Freespace 3 ( 50 -  75% free) :    0 /         0    |      0 /         0    |      0 /         0
Freespace 4 ( 75 - 100% free) :    0 /         0    |      0 /         0    |      0 /         0
Full                          :  128 / 1,048,576    |    128 / 1,048,576    |     64 /   524,288


There are two key points in these figures:

  • cleanup only doesn’t change the state of any space usage information.
  • cleanup doesn’t do anything to release space back to the tablespace; it simply compacts the data into a smaller number of blocks and tags empty blocks as “free” (and takes them out of the index structure, but that’s not visible in summary from dbms_space.space_usage).
Index treedumps

If we want to find out how many of the FS2 blocks are in the index structure and how many have been taken out of the tree then we’ll have to do a tree dump – get the object_id of the index and issue:

alter session set events 'immediate trace name treedump level {object id}';

Here are the first few lines of the tree dump for the t1_n1 index in the same three states, created, after cleanup only, after cleanup:

Immediately after drop
branch: 0x9000684 150996612 (0: nrow: 128, level: 1)
   leaf: 0x9000685 150996613 (-1: row:449.449 avs:8)
   leaf: 0x9000686 150996614 (0: row:444.444 avs:4)
   leaf: 0x9000687 150996615 (1: row:444.444 avs:4)
   leaf: 0x9000688 150996616 (2: row:444.444 avs:4)
   leaf: 0x9000689 150996617 (3: row:444.444 avs:4)
   leaf: 0x900068a 150996618 (4: row:444.444 avs:4)
   leaf: 0x900068b 150996619 (5: row:444.444 avs:4)

After "Cleanup only"
branch: 0x9000684 150996612 (0: nrow: 128, level: 1)
   leaf: 0x9000685 150996613 (-1: row:449.214 avs:8)
   leaf: 0x9000686 150996614 (0: row:444.216 avs:4)
   leaf: 0x9000687 150996615 (1: row:444.216 avs:4)
   leaf: 0x9000688 150996616 (2: row:444.226 avs:4)
   leaf: 0x9000689 150996617 (3: row:444.216 avs:4)
   leaf: 0x900068a 150996618 (4: row:444.206 avs:4)
   leaf: 0x900068b 150996619 (5: row:444.231 avs:4)
   leaf: 0x900068c 150996620 (6: row:444.201 avs:4)
 
After "Cleanup"
branch: 0x9000684 150996612 (0: nrow: 64, level: 1)
   leaf: 0x9000685 150996613 (-1: row:446.446 avs:16)
   leaf: 0x9000688 150996616 (0: row:444.444 avs:4)
   leaf: 0x900068a 150996618 (1: row:444.444 avs:4)
   leaf: 0x900068c 150996620 (2: row:444.444 avs:4)
   leaf: 0x900068e 150996622 (3: row:444.444 avs:4)
   leaf: 0x9000690 150996624 (4: row:444.444 avs:4)
   leaf: 0x9000692 150996626 (5: row:444.444 avs:4)
   leaf: 0x9000694 150996628 (6: row:444.444 avs:4)
   leaf: 0x9000696 150996630 (7: row:444.444 avs:4)
   leaf: 0x9000698 150996632 (8: row:444.444 avs:4)
   leaf: 0x900069a 150996634 (9: row:421.421 avs:12)

This is the index where the values 0 to 599 have been spread randomly across the 54,000 different id values, so each n1 value appears in roughly 90 rows – of which we’ve deleted about 45 by dropping the bottom 3 partitions of 6. For leaf blocks “row:X,Y” means there are X rows in the block directory of which Y would remain after rows marked as deleted had been removed, “avs:N” shows N bytes of free space in the block. The t1_n1 index can fit 444 index entries per leaf block and we can see that immediately after the “drop partition” none of them has been marked as deleted. After a “cleanup only” however we can see that (as expected) roughly half the rows in every leaf block have been marked as deleted with half remaining. After a “cleanup” we can see that we’re back to 444 rows per leaf block with no deletions and virtually no freespace.,

Notice, however, the way that the leaf block addresses have changed during the cleanup. If we examine just the last 3 digits of the decimal version of the leaf block addresses we start with:

613, 614, 615, 616, 617, 618, 618, 619, 620

but we end with:

613, 616, 618, 620 ..

Effectively, Oracle has “copied back” index entries from block 614 to block 613, copied the remaining rows forward from 614 and all the rows from 615 to 616, copied the “overflow” rows from 616 and all the rows from 617 into 618 and so on, detaching empty blocks from the index structure as they appear. (It’s not likely that Oracle thinks in terms of “copying forward/back”, it’s more likely that Oracle simply reads through the index in order constructing new leaf blocks in private memory and has a simple algorithm for deciding which block to replace e.g. “the most recent leaf block that does not have any overflow to use up”)

If we now examine the tree dumps from index t1_id we can see why the volume of redo differs on the final “cleanup” phase of the code.

Immediately after drop
branch: 0x9000784 150996868 (0: nrow: 134, level: 1)
   leaf: 0x9000785 150996869 (-1: row:426.426 avs:7)
   leaf: 0x9000786 150996870 (0: row:421.421 avs:1)
   leaf: 0x9000787 150996871 (1: row:421.421 avs:1)
   leaf: 0x9000788 150996872 (2: row:421.421 avs:1)
   leaf: 0x9000789 150996873 (3: row:421.421 avs:2)
   leaf: 0x900078a 150996874 (4: row:421.421 avs:1)
   leaf: 0x900078b 150996875 (5: row:421.421 avs:1)
...
   leaf: 0x90007c6 150996934 (64: row:400.400 avs:0)
   leaf: 0x90007c7 150996935 (65: row:400.400 avs:0)
   leaf: 0x90007c8 150996936 (66: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (67: row:400.400 avs:0)
   leaf: 0x90007ca 150996938 (68: row:400.400 avs:0)
   leaf: 0x90007cb 150996939 (69: row:400.400 avs:0)

After "Cleanup only"
branch: 0x9000784 150996868 (0: nrow: 134, level: 1)
   leaf: 0x9000785 150996869 (-1: row:426.0 avs:7)
   leaf: 0x9000786 150996870 (0: row:421.0 avs:1)
   leaf: 0x9000787 150996871 (1: row:421.0 avs:1)
   leaf: 0x9000788 150996872 (2: row:421.0 avs:1)
   leaf: 0x9000789 150996873 (3: row:421.0 avs:2)
   leaf: 0x900078a 150996874 (4: row:421.0 avs:1)
   leaf: 0x900078b 150996875 (5: row:421.0 avs:1)
...
   leaf: 0x90007c6 150996934 (64: row:400.0 avs:0)
   leaf: 0x90007c7 150996935 (65: row:400.303 avs:0)
   leaf: 0x90007c8 150996936 (66: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (67: row:400.400 avs:0)

After "Cleanup"
branch: 0x9000784 150996868 (0: nrow: 68, level: 1)
   leaf: 0x90007c7 150996935 (-1: row:303.303 avs:1940)
   leaf: 0x90007c8 150996936 (0: row:400.400 avs:0)
   leaf: 0x90007c9 150996937 (1: row:400.400 avs:0)
   leaf: 0x90007ca 150996938 (2: row:400.400 avs:0)
   leaf: 0x90007cb 150996939 (3: row:400.400 avs:0)
   leaf: 0x90007cc 150996940 (4: row:400.400 avs:0)
   leaf: 0x90007cd 150996941 (5: row:400.400 avs:0)

I’ve shown two sections of the treedump for the first two extracts, the start of the index and the start of the “2nd half” of the index where the id values are in the partitions that we kept. You’ll notice that the number of index entries per leaf block drops from 426 to 400 as we move through the index, that’s just the effect of a sequential id generally getting bigger (42 takes 2 bytes, 42,000 takes 3 bytes, 42,042 takes 4 bytes).

After “cleanup only” all the leaf blocks in the first half the the index show “no rows remaining of 4xx”, while all the leaf blocks in the 2nd half the index show “400 rows remaining of 400” – with the exception of the leaf block numbered 65 at address 0x90007c7, which shows 303 rows remaing of 400, that must be the block that held the highest few rows from partition p27000.

After the final cleanup we can see that this “mid-point” leaf block has become the “low value” leaf block, and the rest of the index leaf blocks look as if they are completely unchanged. (I think we can assume that the “copy forward/back” code caters for a few boundary conditions that (e.g.)stop Oracle from doing something silly with leaf blocks that are already completely full.)

In this case, then, we can guess that Oracle has simply removed 66 leaf blocks (blocks -1 to 64) from the index structure and reconnected block number 65 as the starting block. In the previous case the “cleanup” redo disconnected a simlar number of leaf blocks, but also rewrote all the blocks that were kept.

Dumping Redo

The information we examined so far prompts us to ask two significant questions:

  • Moving from the “cleanup only” to the “cleanup” increased the redo by 345KB in the case of the sequential t1_id index, but by 2.7MB in the case of the “random arrival” t1_n1 index. In both cases the number of leaf blocks to be relinked is very similar, so the difference seems to be due to the work done in compacting roughly 128 leaf blocks down to 64 leaf blocks – how does that generate (2700KB – 345KB)/64 = 36KB per “new” block? We might also wonder why “just” relinking 66 blocks appears to take 345K/66 = 5K per block anyway – that seems a little surprising.
  • For the full cleanup operation does Oracle delete the entries from a few consecutive blocks, compact them and write them and then carry on with the next few blocks; or does it walk the entire index deleting entries and then walk the index again to tidy up and compact (logically) adjacent blocks. If the latter then for a very large index (rather than my tiny test) that means we could see two consecutive full scans physically reading the whole index, and possibly generating more redo thanks to delayed block cleanout.

If we dump the redo log we can answer these questions by extracting some fairly simple details – although the resulting trace file is going to be rather large. For example we could code:

column current_scn new_value start_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter index t1_n1 coalesce cleanup;

column current_scn new_value end_scn
select to_char(current_scn,'9999999999999999') current_scn from v$database;

alter session set tracefile_identifier='n1_index';
alter system dump redo scn min &start_scn scn max &end_scn /* layer 10 */;
alter session set tracefile_identiier='';


I’ve included a commented out “layer 10” in the dump command to show how you can be selective in what redo you dump. Layer 10 is the set of redo op codes for index-related change vectors. You will find other op codes (in particular 5.1) being dumped as well because Oracle dumps the whole of any redo record containing a change vector of the specified type.

When I dumped the redo for a single index cleanup the trace file was about 45MB – so not something you would want to read in detail – but you could start with a few simple searches, for example:

grep -n "OP:10"       or19_ora_13388_n1_index.trc >temp_n1_op10.txt
grep -n "REDO RECORD" or19_ora_13388_n1_index.trc >temp_n1_record.txt

grep "OP:"  or19_ora_13388_n1_index.trc | sed "s/^.*OP:/OP:/" | sed "s/ .*$//" | sort | uniq -c | sort -n

The first grep example simply extracts all the index-related op-codes (with line numbers from the trace file to make it easier to spot patterns. The second grep does the same for the start of each redo record because those lines also report the length of each record, which may make it possible to find out more about the surprising amount of redo generated by the compacting and relinking. The third example is just a bit of showing off, it extracts all the op-code lines, cuts the actual OP:nn.nn bit out, then sorts and counts the number of appearances of each op-code. Here’s the result of that last example from one of the tests:

      1 OP:13.24
      1 OP:17.28
      1 OP:24.1
      2 OP:11.5
      3 OP:10.11
      3 OP:11.3
     15 OP:10.14
     18 OP:14.4
     18 OP:22.5
     28 OP:10.7
     29 OP:5.11
     36 OP:22.2
     64 OP:13.22
     68 OP:10.34
     82 OP:10.39
    138 OP:10.8
    205 OP:5.4
    219 OP:10.6
    259 OP:4.1
    324 OP:10.5
    347 OP:5.6
    689 OP:5.2
  27327 OP:10.4
  27833 OP:5.1

It’s an obvious guess that the 27,000 OP:10.4 at the bottom are “delete index entry” and most of the OP:5.1 are their corresponding undo change vectors. The OP:10.4 count is a little high, but checking the session activity stats I found that they showed: rollback changes – undo records applied = 376, so some internal transaction rollback took place, and the session stats are a reasonable match for the “excess” OP:10.4, suggesting that at some point we saw a batch of deletes rollback and restart. (NOTE: to investigate further, this suggests that the entire operation is executing as a number of relatively small transactions and could safely be interrupted – a hypothesis supported by the 205 OP:5.4 (commit/rollback change vectors)). The “excess” OP:10.4 are also closely matched by the OP:10.5 / OP:5.6 figures (restore leaf during rollback / mark user undo record applied)

Before chasing any other details let’s answer the question about whether the compacting takes place during the delete phase or starts after the deletes are all done. If we jump to the end of the “OP:10” file we can check to see if there are OP:10.4 all the way through the file with small patches of other stuff scattered through, or it’s non-stop OP:10.4 with a load of “something else” appearing only in the last couple of thousand lines. The answer is that we get regular repeates of a pattern like the following:

OP:10.4 x ~400
OP:10.6 
OP:10.6 
OP:10.6 
OP:10.39 
OP:10.8 
OP:10.34 
OP:10.8 

For example (after lots of 10.4 on blocks 0x9000695 and 0x9000696. we see:

119184:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119210:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000695 OBJ:131045 SCN:0x0000000002165407 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119236:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000694 OBJ:131045 SCN:0x0000000002165405 SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000
119272:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.39 ENC:0 RBL:0 FLG:0x0000
119625:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000694 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
120499:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000695 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:10.34 ENC:0 RBL:0 FLG:0x0000
120862:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x0000000002165409 SEQ:2 OP:10.8 ENC:0 RBL:0 FLG:0x0000

This translates to:

10.6:  lock block 0x09000696
10.6:  lock block 0x09000695
10.6:  lock block 0x09000694
10.39: update branch block 0x09000684
10.8:  new block 0x09000694
10.34: empty block 0x09000695
10.8:  new block 0x09000696

The pattern then repeats starting with deletes from block 0x09000697 and 0x09000698, and then producing an interesting detail:

DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.6 
DBA:0x09000697 OBJ:131045 SCN:0x000000000216540d SEQ:1 OP:10.6 
DBA:0x09000696 OBJ:131045 SCN:0x000000000216540b SEQ:1 OP:10.6 
DBA:0x09000684 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.39 
DBA:0x09000696 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.8 
DBA:0x09000697 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.34 
DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:2 OP:10.8 

Check line 5 above – it’s another OP:10.8 creating another version of block 0x09000696. And that’s one of the reasons why the compaction creates more redo than expected. Oracle may recreate a block several times before the block gets to its final compacted state.

Looking at the detail of the trace file, in particular the pattern of deletes followed by “new block”, it looks as if Oracle deletes all the rows from just two adjacent blocks (perhaps to minimise block-level locking) and then does the best it can with the rows that are left and this may mean (as it does in our fragment) writing one full block and one partial block, though for a sparsely populated index it might mean writing just a single partial block.

To show the total effect on redo generation I’ve extracted the redo records for a complete cycle of the pattern (excluding several hundred deletes, which generate 228 bytes each). I’ve used egrep to pick out 3 patterns “OP:”, “REDO RECORD” and (using hindsight) “new block has”:

132650:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e90.0134 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
132652:CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540e SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

132656:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e90.018c LEN: 0x0164 VLD: 0x01 CON_UID: 3792595
132658:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x00000000021653f8 SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
132661:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132682:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132691:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e91.0100 LEN: 0x00e4 VLD: 0x01 CON_UID: 3792595
132693:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:2 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132708:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000697 OBJ:131045 SCN:0x000000000216540d SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132717:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e91.01e4 LEN: 0x00e4 VLD: 0x01 CON_UID: 3792595
132719:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:3 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132734:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x000000000216540b SEQ:1 OP:10.6 ENC:0 RBL:0 FLG:0x0000

132743:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e92.00d8 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
132745:CHANGE #1 CON_ID:3 TYP:2 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x000000000216540a SEQ:1 OP:4.1 ENC:0 RBL:0 FLG:0x0000

132749:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e92.0130 LEN: 0x0128 VLD: 0x01 CON_UID: 3792595
132751:CHANGE #1 CON_ID:3 TYP:0 CLS:18 AFN:17 DBA:0x04406fdf OBJ:4294967295 SCN:0x000000000216540f SEQ:4 OP:5.1 ENC:0 RBL:0 FLG:0x0000
132770:CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000684 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.39 ENC:0 RBL:0 FLG:0x0000

132779:REDO RECORD - Thread:1 RBA: 0x0002c8.00046e93.0068 LEN: 0x4038 VLD: 0x01 CON_UID: 3792595
132781:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
132784:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe0 OBJ:4294967295 SCN:0x000000000216540f SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
133123:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000696 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.8 ENC:0 RBL:0 FLG:0x0000
133139:new block has 444 rows

133653:REDO RECORD - Thread:1 RBA: 0x0002c8.00046eb5.0010 LEN: 0x20e8 VLD: 0x05 CON_UID: 3792595
133656:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x000000000216540f SEQ:2 OP:5.2 ENC:0 RBL:0 FLG:0x0000
133659:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe1 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
133998:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000697 OBJ:131045 SCN:0x000000000216540f SEQ:1 OP:10.34 ENC:0 RBL:0 FLG:0x0000

134007:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ec6.0010 LEN: 0x0064 VLD: 0x01 CON_UID: 3792595
134009:CHANGE #1 CON_ID:3 TYP:0 CLS:8 AFN:36 DBA:0x09000680 OBJ:131045 SCN:0x0000000002165409 SEQ:1 OP:13.22 ENC:0 RBL:0 FLG:0x0000

134017:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ec6.0074 LEN: 0x3a74 VLD: 0x01 CON_UID: 3792595
134019:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
134022:CHANGE #2 CON_ID:3 TYP:1 CLS:18 AFN:17 DBA:0x04406fe2 OBJ:4294967295 SCN:0x0000000002165410 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
134361:CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:36 DBA:0x09000698 OBJ:131045 SCN:0x000000000216540f SEQ:2 OP:10.8 ENC:0 RBL:0 FLG:0x0000
134377:new block has 362 rows

134799:REDO RECORD - Thread:1 RBA: 0x0002c8.00046ee4.00c8 LEN: 0x0058 VLD: 0x01 CON_UID: 3792595
134801:CHANGE #1 CON_ID:3 TYP:0 CLS:17 AFN:17 DBA:0x044001c0 OBJ:4294967295 SCN:0x0000000002165410 SEQ:2 OP:5.4 ENC:0 RBL:0 FLG:0x0000

This shows a complete transaction (the previous redo record is the commit (OP:5.4) for the a few hundred deletes.I’ve highlighted five lines in the output – the three redo record headers that report the most space, and two lines that show you how many rows are in the “newed” blocks gene.rated by the op codes OP:10.8

  • Line 24 – LEN = 0x4038 = 16,440 bytes. That’s basically the size of two blocks: the OP:5.1 is the old image (8KB) of block 0x9000696, the OP:10.8 is the new image (also 8KB) of the block
  • Line 30 – LEN = 0x20e8 = 8,424. That’s basically one block image. The OP:10.34 is “clear block 0x09000698” which requires very little redo, but the OP:5.1 is the old image (8KB) of the block.
  • Line 38 – LEN = 0x3a74 = 14,964 bytes. Again that’s basically the size of two blocks: the OP:5.1 is the old image (8KB) of block 0x9000698, the OP:10.8 is the new image of the block, but it has only 362 rows (of a final 444 rows) in it, so the dump is only about 6.5KB

All the other bits add up to about 1,200 bytes of redo, so in each cycle the compacting activity generates about 40KB of redo in total. Since we end up with 68 filled blocks the total “extra” redo we got as we switched from “coalesce compress only” to “coalesce compress” should be around 68 * 40KB = 2.65 MB, which is pretty close to what we atually saw for the t1_n1 index.

Transactions and Locking

I commented on the presence of the session statistics reporting “rollback changes – undo records applied”, and mentioned the presence of the OP:5.4 records in the redo. These are all pointers to the coalesce command being operated as a series of smaller transactions rather than one large transaction, and even before I had started to dump and examine the redo I had added v$rollstat and v$enqueuestat to my usual snapshot of dynamic performance views. Here are the results from a typical “coalesce cleanup” test for the t1_n1 index.

---------------------------------
Rollback stats
---------------------------------
USN   Ex Size K  HWM K  Opt K      Writes     Gets  Waits Shr Grow Shr K  Act K
----  -- ------  -----  -----      ------     ----  ----- --- ---- ----- ------
   0   0      0      0      0           0        2      0   0    0     0      0
   1   0      0      0      0      382184       85      0   0    0     0      0
   2   0      0      0      0      412930       91      0   0    0     0     61
   3   0      0      0      0      404260       90      0   0    0     0      0
   4   0      0      0      0      413412       91      0   0    0     0   -106
   5 -13  -1792      0      0      341294       99      0   2    0    24    -71
   6   0      0      0      0      355986       81      0   0    0     0      0
   7   0      0      0      0      444054       97      0   0    0     0      0
   8 -12  -1728      0      0      418474      114      1   2    0    24    -45
   9   0      0      0      0      422776       92      0   0    0     0     38
  10   0      0      0      0      426976       94      0   0    0     0      0


----------------------------------
System enqueues
----------------------------------
Type    Requests       Waits     Success      Failed    Wait m/s Reason
----    --------       -----     -------      ------    -------- ------
KI             3           0           3           0           0 contention
CR           250          24         250           0          12 block range reuse ckpt
IS            25           0          25           0           0 contention
TM             8           0           8           0           0 contention
TA             2           0           2           0           0 contention
TX           198           0         198           0           0 contention
US             8           0           8           0           0 contention
HW             8           0           8           0           0 contention
TT             4           0           4           0           0 contention
SJ             2           0           2           0           0 Slave Task Cancel
CU             2           0           2           0           0 contention
OD             1           0           1           0           0 Serializing DDLs

-------------------------------------
System REDO stats
-------------------------------------
Name                                       Value
----                                       -----
redo entries                              27,977
redo size                              8,853,604
undo change vector size                3,967,100

I have 10 undo segments online, and each has received about 400KB of undo – which comes to a total of 4MB – which is a very good match to the “undo change vector size” reported for the session. (I didn’t get any system generated rollbacks in the run.)

We can also see 198 TX enqueues – which is a very good match for the 196 OP:5.4 that I found in the sessions redo dump on this test.

As I commented in the section on redo generation, to handle the “coalesce cleanup” the session walks the index in leaf block order and (with variations dependent on situations like finding leaf blocks with nothing to delete, or finding leaf blocks that become empty after all the deletes have been done)

  • locks two leaf blocks
  • deletes the dropped rows
  • commits
  • packs the outstanding rows downwards
  • relinks leaf blocks and adjusts branch blocks as necessary
  • commits
    • “cleanup only” locks and deletes from a single row at a time, and doesn’t do the packing.
  • repeats until end of index

(Note – “cleanup only” seems to lock, delete and commit one block at a time)

This strategy has two side-effects. First, though I don’t think it’s documented, it looks as if you could simply kill the process if it started putting too much stress on your system. All that could happen is that the current (small) transaction would be cleaned up, but the rest of the work that had been done up to that moment would persist. I have actually tested this, managing to kill sessions while they were in the middle of a delete a batch (though I’ve never managed to catch an example where a session was in the middle of relinking). After restarting Oracle would simply pick up where it had left off after the previous (small) rollback.

The second side effect is another possible overhead. Partitioned tables tend to be big, and each index clean-up is likely to be a fairly big job generating a lot of redo and undo. How much impact could the undo have on your system? This depends in part on what your undo retention looks like, how long the clean-up takes, and the risk of other sessions running into ORA-01555 errors. In particular there’s an odd problem of undo segment extension causing updates to restart – if an update statement (or delete, or merge with update clause) results in an undo segment extending then the statement will rollback and restart using a different locking strategy.

I don’t know if something of this sort would happen with the deletes from an index coalesce, and the deletes are very small anyway so it might not matter anyway, but the coalesce executes a large number of transactions in a relatively short time period rotating around and gradually filling every available undo segment – which means other large updates are more likely to need to extend an undo segment.

It’s also worth noting that overlapping jobs will sometimes need to do a lot of work to check read (and write) consistency, and to find “upper bound commit” SCNs: if you have a process executing a large number of transactions in a short period of time it can be very expensive for another process to find an “upper bound commit” and you may see it doing a lot of reads of undo segments, reporting a number of “transaction tables consistent read rollbacks” and a large number of “transaction tables consistent reads – undo records applied” in the session stats. (Worst case scenario – the number of transactions could be similar to the number of (used) leaf blocks in the index).

One final point to consider – the report from the (system level) Enqueue stats showed a handful of TM locks, so I enabled lock tracing for the session to see if any of them came from my session and whether they were likely to be a concurrency threat.

alter session set events 'trace[ksq] disk=medium';

Most of the TM enqueues were from my session, but they appeared only after the coalesce was complete, and they were all related to sys-owned (dictionary) tables:

2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000140-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000061-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000049-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.902*:ksq.c@9175:ksqgtlctx(): *** TM-00000004-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.903*:ksq.c@9175:ksqgtlctx(): *** TM-0000004B-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***
2022-08-05 19:49:27.903*:ksq.c@9175:ksqgtlctx(): *** TM-00000013-00000000-0039DED3-00000000 mode=3 flags=0x401 why=173 timeout=21474836 ***

In case you’re wondering, the first TM lock (object 0x140) is the index_orphaned_entry$ table, and Oracle had to lock it at the end of the “coalesce clean up” to delete the rows that linked the three dropped table partitions with the one global index that I had cleaned.

Summary

Deferred global index maintence means you can drop partitions very quickly with virtually no interruption to service and then have Oracle clean up the related index entries at a later point in time. The drawback to deferring the cleanup is that Oracle will use a row-by-row mechanism to mark all the index entries for the dropped data as deleted – at a cost of a full index scan and about 230 bytes of redo per dropped row per index – compared to a bulk-processing mechanism that can be applied if the index entries are dropped as part of the partition drop processing.

There are basically two mechanisms you can use to clean up the entries that are waiting deletion.

  • rebuild the index (online, probably)
  • use one of the special coalesce calls on the index

You can best spread the workload and minimise interference with your normal work by micro-managing the clean up, explicitly issuing whichever “alter index coalesce” or “alter index rebuild” command you prefer for each index in turn that needs to be cleaned up.

None of the coalesce command variants returns space to the tablespace; nor do they even drop the highwater mark on the index segment. If you want to return space to the tablespace you will need to excute a “shrink space” on the index after the coalesce is complete. All the coalesce options generate a very large amount of redo (230 bytes per index entry for each delete, plus a volume that may be several times larger than the actual volume of data in the index to compact the remaining rows into the smallest number of blocks.)

The rebuild (online) will generate a much smaller volume of redo – but the penalties include the problem of journalling and applying the changes that took place as the rebuild was running; plus the cost of scanning the table and sorting the data to produce the index.

Footnote

My tests are basically the worst case scenario (every leaf block has some entries to be deleted) and very close to best case scenario (leaf blocks will either become completely empty on deletion, or will have no rows deleted).

Do You Worry Your Companies data is Being Stolen?

Pete Finnigan - Fri, 2022-08-05 13:26
The number of data breaches is seemingly growing daily and a lot of companies worry that they could be the next statistic of misery and embarrassment. Do you lose sleep worrying that your company could be breached and its data....[Read More]

Posted by Pete On 05/08/22 At 12:47 PM

Categories: Security Blogs

SQL Model

Tom Kyte - Thu, 2022-08-04 21:26
Tom, sorry about the silly question, but have been trying to do something with SQL Model without any success. I cannot understand how it works, could you do some simple sample? -- may be already exists. Thanks,
Categories: DBA Blogs

KVM with an OnPrem Exadata X9-2 M

Tom Kyte - Thu, 2022-08-04 21:26
I have a client that is trying to implement KVM on an Exadata X9-2 M (21.2.11.0.0), and he was told that in the Exadata he could have a single VM running in a single DB Node in the exadata, not a cluster VM, but just a single VM in a single DB Node. To my understanding, that is not possible in the Exadata as they had to be VM clusters, 2 DB nodes, instead of 1 DB node. I'm I incorrect in what I know?
Categories: DBA Blogs

Shrink space row movement

Tom Kyte - Thu, 2022-08-04 21:26
I use the process below to shrink datafiles and reclaim operating system space. It works fine. My question is there any downside to ALWAYS leaving row movement enabled instead of disabling it when the process is completed? Below is a sample of what I'm currently doing. <code>create table t as select rownum x, lpad('x', 500, 'x') xx from dual connect by level <= 10000; / -- ALTER TABLE t SHRINK SPACE CHECK; select bytes from user_segments where segment_name = 'T'; / delete t where x <= 9900; / select bytes from user_segments where segment_name = 'T'; / alter table t enable row movement; / alter table t shrink space CASCADE; / select bytes from user_segments where segment_name = 'T'; </code>
Categories: DBA Blogs

Outer Join with OR - Ansi vs Oracle

Tom Kyte - Thu, 2022-08-04 21:26
hello I'm curious how you would deal with the following. Table t1 has mixture of usernames, some with format of email address, some with format of first_name.last_name. Table t2 lookup table has username in email format. Requirement to return all 3 rows <code> create table t1 (taskid number, username varchar2(50)) insert into t1 values (1,'john.smith@12]3.com'); insert into t1 values (2,'dave.jones@123.com'); insert into t1 values (3,'dave.smith'); commit; create table t2 (username varchar2(50)); insert into t2 values ('john.smith@123.com'); insert into t2 values ('dave.smith@123.com'); commit; </code> Via ansi syntax, quite forward: <code> select * from t1 left outer join t2 on t1.username = t2.username or t1.username = substr(t2.username,1,instr(t2.username,'@')-1) order by 1 </code> Via oracle syntax - As can't use outer joins with OR, do you have any (easier) alternatives than this? <code> select t1.taskid, t1.username, t2.username from t1, t2 where t1.username = t2.username union all select t1.taskid, t1.username, t2.username from t1, t2 where t1.username = substr(t2.username,1,instr(t2.username,'@')-1) union all select t1.taskid, t1.username, to_char(null) from t1 where not exists (select 1 from t2 where t1.username = t2.username or t1.username = substr(t2.username,1,instr(t2.username,'@')-1)) order by 1 </code> thanks
Categories: DBA Blogs

東京の製菓専門学校は技術を身につけやすい

Marian Crkon - Tue, 2022-08-02 20:52
東京にある製菓の専門学校は、短期間に効率良く知識や技術が身につけやすいので希望通りの就職先を見つけるのにおすす...

東京の製菓専門学校は技術を身につけやすい

The Feature - Tue, 2022-08-02 20:52

東京にある製菓の専門学校は、短期間に効率良く知識や技術が身につけやすいので希望通りの就職先を見つけるのにおすすめです。熱心に指導してくれる先生が、それぞれの生徒の個性を伸ばしながら指導をしてくれるので、自分のペースで技術が身につきやすく、専門に特化した技術を教えてくれるので就職後にもすぐに即戦力として活躍する事ができます。東京の製菓専門学校は、数多くの学校があるので、多くの学校の中からどの学校が自分にとって合っているのかを事前によく比較してから選ぶ事で、理想通りの学校が見つけられます。体験入学を実施している東京の製菓の専門学校もあるので何校かに通って見て設備や校風等を自分の目でチェックすると良い学校選びができます。

東京にある製菓の専門学校といってもそれぞれ作る製菓の種類が違ったり、設備も大きく異なる場合もあるので将来どのジャンルの職場に自分が就職するのかを良く考えてから学校選びをする事が大切です。学校では実際にお菓子等を作る機会が多いので、就職に有利なだけでなく趣味としてもお菓子作りの腕前が上がるので、一生役立つ知識や技術の習得ができるのが専門学校の良いところです。勉強をして技術の習得をするだけでなく、文化祭や学園祭・ハロウィンパーティー等のイベントが豊富にある学校もあるので、仲間を作って楽しみながら学べる学校を選ぶと充実した学生生活が過ごす事ができ、適当に学校を選ぶのではなく事前に比較してから自分に合った場所を見つける事が重要です。大阪のメイク専門学校のことならこちら

Categories: APPS Blogs

Cluster Interconnect Split Brain Testing (CSS_CRITICAL yes)

Michael Dinh - Tue, 2022-08-02 20:25

Simply, setting CSS_CRITICAL for the node will provide more weight to not shutdown node during splitbrain.

cluster_interconnect_split_brain_CSS_CRITICAL_yes

analytical vs aggregate functions

Tom Kyte - Mon, 2022-08-01 20:06
Are analytical functions replacement for aggregate functions? Some of the developers here are hurrying in replacing the group by aggregate functions with analytical functions just because they learnt something new :-).. Are the analytical functions faster always(compared to aggregate)?
Categories: DBA Blogs

adding sysdate on the file name

Tom Kyte - Mon, 2022-08-01 20:06
Hi Tom When spooling the file on PL/SQL, i would like the file name to have sysdate attached to the filename and it should change dynamicaly.. e.g.PM-INT-20021129_FLASHNOTIFICATIONS.TXT Thanx Zoliswa
Categories: DBA Blogs

FORALL issues...

Tom Kyte - Mon, 2022-08-01 20:06
Hi Tom, I'm using the FORALL statement to update tables in my applications. When the update raises an error, it can be saved using the SAVE EXCEPTIONS clause and then continuing with the update, or the update can be aborted without using that clause. In both these situations however, I am unable to pin down which specific record caused the error, i.e. the id of the record. More specifically, is there a way to store user defined data upon encountering an error during a FORALL execution? Small example: declare type intArray is table of number index by binary_integer; type charArray is table of varchar2 (2000) index by binary_integer; type recType is record (idArray intArray, array2, charArray); l_tabrec recType; lt_id intArray; begin ... ... /* populating the l_tabrec record of arrays here. */ ... begin forall i in 1..l_tabrec.idArray.count update table2 set value = l_tabrec.array2(i) where id = l_tabrec.idArray(i); exception when others then /* here is where I'd like to log the id of the record which fails. */ raise; end; end; I know the SAVE EXCEPTIONS gives us information about the type of error and the iteration on which it encountered the error, but I'm not sure how I can use that information to find which record actually caused the error. Hope my question is clear. Thanks in advance. Kashif
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator