Oracle SQL Tuning: A practical oriented guide

As developers and consultants, it’s easy to think our job is done when our SQL returns the correct output. But here’s the reality: a query that takes five minutes to run might as well be a query that fails (mainly if your query is in ERP Bi Publisher). The result is the same: a frustrated user staring at a waiting screen, unable to do their job.

Delivering true value means going beyond just correctness. It means being optimal. It means respecting the user’s time and the system’s resources. This is the difference between simply writing code and engineering a solution. This mindset is critical whether you’re building a BI Publisher report in Oracle Fusion or working with a standard Oracle database.

It’s important to shift focus from just ‘getting the data’ to getting it efficiently and turning frustrating delays into performance value.

Understand the foundation: SQL & CBO

SQL is a declarative language. This means you tell the database what data you want, but not the specific steps on how to retrieve that information. The complex process of efficiently retrieving this data is driven by a core component of the database: the Cost-Based Optimizer (CBO).

Think of the CBO as the GPS of your database. When you submit a SQL query, you’re telling it the destination (the data you want). The CBO’s job is to analyze all the possible routes (access paths) and choose the one it calculates will be the fastest (the one with the lowest “cost”).

To make this decision, the CBO relies heavily on two things: statistics about your data (cardinality, data distribution, etc.) and indexes. If your statistics are stale or your indexes are missing, the Optimizer might choose a scenic, time-consuming country road instead of the direct highway.

CBO in Action

The true power of the CBO is its ability to adapt. It will automatically re-evaluate and can choose a completely different query path (execution plan) as the volume and distribution of your data change.

Imagine you have a query to find a specific user:

   SELECT * 
   FROM USERS 
   WHERE USERNAME = 'THIAGO';

Scenario 1: Small Table (100 rows)

The CBO looks at the statistics and sees the table is tiny. It calculates that the fastest way to get the data is to just read all 100 rows (Full Table Scan). Using an index here would be like using a detailed map to find a store in a tiny village—it’s overkill and slower than just looking around.

Scenario 2: Large Table (10 Million rows)

A year later, the same USERS table has grown to 10 million rows. The statistics now reflect this massive size. The CBO knows a Full Table Scan would be incredibly expensive. Now, using an Index Scan on the USERNAME column is by far the cheapest and fastest path.

Pratical Health Check for Statistics

Since the CBO depends on fresh statistics, how do you make sure it has the right information?

Step 1: Check the LAST_ANALYZED Date
This is the quickest way to see the last time statistics were gathered. A recent date is a good sign. A null or very old date is a red flag:

   SELECT TABLE_NAME,
      NUM_ROWS,
      LAST_ANALYZED
   FROM USER_TABLES
   ORDER BY
   LAST_ANALYZED DESC;

Step 2: Verify the Automated Maintenance Job
If you notice the LAST_ANALYZED date is stale, it’s worth checking if the automated jobs are running. You can check with your DBA for the run history of these key jobs (found in dba_scheduler_job_run_details):

   ORA$_ATSK_AUTOSTS: The primary job for gathering Optimizer Statistics
   ORA$_ATSK_AUTOSMT: The job for Space Management (Segment Advisor)
   ORA$_ATSK_AUTOSPMT: The job for SQL Plan Baselines maintenance

Step 3: Manual Intervention (With Caution)
If you find that statistics are stale and need to be updated for a specific table, you can do it manually. However, always align with your DBA before running this in a shared environment, as it can be a resource-intensive operation

   EXEC DBMS_STATS.GATHER_TABLE_STATS('YOUR_SCHEMA_NAME', 'YOUR_TABLE_NAME');

CBO Results in The Execution Plan

This is the most critical concept in all of SQL tuning. An Execution Plan is the step-by-step recipe the database follows to execute your query. Understanding this plan is the single most important skill for diagnosing and fixing slow-running queries. Think of it as the database’s roadmap; if you can read the map, you can find the performance bottlenecks.

When you generate a plan, pay close attention to these key details:

Access Predicates: This shows how the data is being retrieved. For example, is it using an index to pinpoint specific rows or reading the entire table?

Filter Predicates: This shows the conditions used to discard rows after they have been retrieved.

Note Section: A sometimes-overlooked goldmine of information. It can reveal if special behaviors like dynamic sampling were used or if a particular feature influenced the plan.

Access Method: is a critical part of the plan, as it dictates how the database gets data from your tables. An inefficient access method is a common cause of poor performance:

Full Table Scan: The database reads every single row in the table. While efficient for small tables, it can be disastrously slow for large ones. If you see this on a large, indexed table for a query that should return few rows, it’s a major red flag.

Index Scan: The database uses an index to quickly find the specific rows needed. This is generally much faster and what you want to see for selective queries. There are several types, like Index Range Scan (for range conditions like > or BETWEEN) and the more advanced Index Skip Scan.

Join Order: Start with the smallest table or the join that produces the smallest result set

The join order and organization, is often the single most important factor determining if your query returns in seconds or runs for hours. Let’s break down how table size and filter power interact to create a high-performance query:

Default Strategy: Filter First, Reduce Early

The optimizer’s primary goal is to keep the amount of data it has to work with the intermediate result set as small as possible at every step.

Imagine a query to find a specific customer’s orders for a certain product.

   Customers (5 million rows)
   Orders (100 million rows)
   Order_Items (500 million rows)

A naive approach might be to join the two largest tables first, creating a monstrous intermediate table, and only then filtering for the specific customer. This is like moving an entire library just to find one book.

The smart approach, and the optimizer’s preferred strategy, is to filter first. If your WHERE clause is WHERE customer_name = ‘John Doe’, the optimizer will likely start with the Customers table. This filter is highly selective. It reduces 5 million potential rows down to just one. This single row becomes the driving table for the subsequent joins, making the rest of the work incredibly fast and efficient.

When the Biggest Table is the Unique Place to Start due to Requirements: Use index

What if the most powerful filter in your query applies to the largest table?

This is a very common scenario. Consider a report that needs to find all sales data for a specific, unique product SKU.

WHERE product_sku = ‘ABC-123’

Target Table: Order_Items (500 million rows)

Secret Weapon: An index on the product_sku column.

In this case, starting with the 500-million-row Order_Items table is the absolutely correct and most efficient strategy.

The index allows the optimizer to bypass a full table scan. It doesn’t read all 500 million rows. Instead, it uses the index to instantly locate the handful of rows matching ‘ABC-123’. Thanks to this highly selective, indexed filter, the effective starting data set isn’t 500 million rows; it might only be 50.

To avoid scenario

Imagine the report filters on a broad product category instead of a unique SKU.

WHERE product_category = ‘Shoes’

Even with an index, this filter might return 40 million rows. This is a low selectivity filter and it doesn’t significantly reduce the data set.

Here, the optimizer will recognize that starting with Order_Items would be a mistake. It will evaluate the other filters in the query. If there’s another clause, like WHERE order_date > ’01-SEP-2025′, it will estimate how many rows that filter will return from the Orders table.

It will then compare the estimated row counts from each potential starting point and choose the one that yields the smallest initial result set.

As a consultant, try to suggest and refine scenarios that rules over “Filter First, Reduce Early” and “Use index” sessions.

Additional take aways

Avoid usage of Group By or DISTINCT (where possible): These operations can be very resource-intensive as they require sorting and unique value identification, especially on large datasets. Re-evaluate if you truly need them or if a different query design can achieve the same result with less overhead (e.g., using ROWNUM with an ordered query or designing your joins to avoid duplicates).

Avoid Functions on Indexed Columns: Applying any function (e.g., TO_CHAR, SUBSTR, NVL, UPPER, LOWER, TRUNC) to a column in your WHERE clause will typically prevent the database from using a standard index on that column. The database has to calculate the function’s result for every row before it can compare, negating the index.

Avoid usage of DUAL table (where possible): While DUAL is harmless for single calls, repeatedly querying DUAL within complex loops or for every row in a large result set can add unnecessary overhead. Consider alternative ways to generate sequence numbers or retrieve system values without repeated DUAL access.

Avoid nested data sets – Use CTE (Common Table Expressions): Deeply nested subqueries (especially non-correlated ones) can be hard for the optimizer to handle efficiently. CTEs (using the WITH clause) improve readability, often aid the optimizer in creating a better plan, and can make it easier to reuse intermediate results.

Prefer CASE instead of multiple UNIONs: Multiple UNION or UNION ALL operations can be less efficient than a single query using CASE expressions within SELECT or WHERE clauses, especially if they involve repeated table access. UNION typically involves sorting to remove duplicates, making UNION ALL or CASE often preferable if duplicates are acceptable or handled otherwise.

Prefer tables over value lists: Using joins to lookup values from a lookup table is generally more efficient and scalable than embedding long lists of values directly in an IN clause (e.g., IN (1, 2, 3, …, 1000)). Large IN lists can also hit SQL statement size limits.

Prefer NOT EXISTS/EXISTS instead of NOT IN/IN (especially for subqueries): For subqueries, EXISTS and NOT EXISTS often perform better because they can stop scanning as soon as a match (or non-match) is found. IN with a subquery might require evaluating the entire subquery first, and NOT IN has particular issues with NULL values.

Use column aliases to shorten XML/Json files (for reporting): Shorter aliases mean less data transferred and parsed, which can be significant for large XML/JSON payloads, especially in client-server communication.

Minimize ORDER BY and GROUP BY operations: Both operations can be expensive, especially on large datasets, as they often require sorting. If you can achieve the desired order or aggregation through an indexed access path or a smaller intermediate result, that’s preferable. Only sort when absolutely necessary.

Beware of OR in WHERE clauses: While not always bad, using OR can sometimes prevent index usage, especially if the columns on either side of the OR are from different tables or if one side is unindexed. Consider rewriting with UNION ALL or IN if appropriate.

Avoid SELECT * in production code: Always specify the columns you need. SELECT * retrieves unnecessary data, which consumes more I/O, CPU, and network bandwidth, especially for tables with many columns or large data types (LOBs).

Profile Your Queries Regularly: The most important takeaway is to not guess. Use tools like EXPLAIN PLAN and DBMS_XPLAN to analyze your query execution, understand the actual costs, and identify bottlenecks.

Thiago Cunha

recent posts