Overview
Within the past decade or so, Oracle Business Intelligence Enterprise Edition (OBIEE) has become a pseudo standard for reporting for almost all Oracle products, and in some case non-Oracle ERPs and custom OLTP systems as well. OBIEE reporting solution has various components such as data warehouse, RPD, Presentation Services, BI Publisher. (Note that, we will not be covering other OBIA related components such as DAC, ETL etc in this document), which have become even more complex to manage with the advent of OBIEE 11g. These components require proper hardware, configuration and design, to deliver optimum query/report response from the system. This document will attempt to comprehensively cover all the facets of the OBIEE 11g performance tuning.
Scope
In this document we will cover performance tuning across the highlighted components namely,
- Database (Oracle and Teradata)
- Oracle BI Server
- - Oracle BI Presentation Server
- Oracle BI Java Host
For any software to perform well, the following components should be tuned/optimized
Hardware: The system should be capable of handling the user and query load
Configuration: The configuration parameters should be setup appropriately, in order to exploit the functionality of the system per need basis.
Design: The solution should be design utilizing the best practices for data warehouse, semantic layer and reporting.
Database Tuning
3.1 HARDWARE
Below are the hardware requirements for a typical data warehouse. Note that these requirements are based on Oracle Business Analytical Warehouse (OBAW) which is delivered along with Oracle Business Intelligence Applications (OBIA).
Source Data Volume | SMALL: Up to 200Gb | MEDIUM: 200Gb to 1Tb | LARGE: 1Tb and higher |
# CPU cores | 8 | 16 | 32* |
Physical RAM | 16Gb | 32Gb | 64Gb* |
Storage Space | Up to 400Gb | 400Gb - 2Tb | 2Tb and higher |
Storage System | Local (PATA, SATA, iSCSI), preferred RAID configuration | Local (PATA, SATA, iSCSI). Recommended two or more I/O controllers | High performance SCSI or network attached storage. Hardware RAID controller with multiple I/O channels. |
3.2 CONFIGURATION
3.2.1 Tablespace settings
Listed below are recommended table space values for a typical data warehouse
Source Data Volume | SMALL: Up to 400Gb | MEDIUM: 400Gb to 2Tb | LARGE: 2Tb and higher |
Temporary Tablespace | 40 – 60Gb | 60 – 150Gb | 150 – 250Gb |
DATA Tablespace | 350Gb | 350Gb – 1.8Tb | > 1.8Tb |
INDEX Tablespace | 50Gb | 50 – 200Gb | > 200Gb |
3.2.2 Compression of tables:
The Oracle database allows you to store data in a compressed (zipped) format, with three possible benefits:
· Reduced overall storage requirements
· Performance improvement due to less disk reads (each record uses less space)
· Improved memory efficiency (data is stored in memory in compressed format
The potential drawback is with higher CPU activity since all data blocks have to be uncompressed at run-time in order to be processed
3.3 DESIGN
3.3.1 De-normalization (Best Practices in Star Schema design)
De-normalization of data or converting to star-schema is one of the basic tenets of data warehousing. Below are some of the generic rules that should be followed. For a comprehensive list refer to any standard book on data warehouse such as “The Data Warehouse Toolkit by Kimball”.
Avoid snow flaking
Build star schemas around business processes and not around reporting needs
Define conformed dimensions and facts
Use numeric surrogate keys for joins between dimensions and facts
Date dimension surrogate key should be meaningful sequential value, as it can be used for partitioning the fact table if needed
Entities with fixed, time-invariant, strongly correlated relationships should be modeled within the same dimension.
3.3.2 Aggregation
Aggregate tables or summary tables roll up fact data at a higher level of grain than what is present in the detail fact tables. This is accomplished by ETL or Materialized views. Aggregate tables significantly enhance the query performance by reducing the number of rows in the fact table. Aggregate tables are generally built based on the reporting/dash boarding needs of the business users. The following aspects should be kept in mind while designing aggregate tables:
Incremental aggregate can be complex but is preferred for ETL performance
Aggregates across various dimensional levels should be consolidated whenever possible, thus reducing the number of aggregate tables ETL/Presentation layer has to deal with.
Conformed/Common dimensions should be used when building aggregate table over a detail fact table. This comes handy when users intend to drill down from summary to detail
3.3.3 Indices and Statistics:
Oracle database uses Cost Based Optimizer (CBO) to determine best possible way to execute a query. For CBO to work, statistics have to be gathered on tables. This accomplished by the following command:
EXEC DBMS_STATS.GATHER_TABLE_STATS (ownname => 'USERNAME', tabname => 'CUSTOMER');
Note that for large tables it might be enough to collect stats on 30% sample, using the command:
EXEC DBMS_STATS.GATHER_TABLE_STATS (ownname => 'USERNAME', tabname => 'CUSTOMER', estimate_percent => 30);
The query performance improvement from gathering 30% stats and 100% stats should be recorded and compared. The one that improves the performance better should be selected.
Note: Even though the syntax is Oracle specific, gathering statistics enhances performances on other data bases such as Teradata as well.
Oracle has a feature that can optimize star schema based queries. It can be enabled with the following parameter:
STAR_TRANSFORMATION_ENABLED = TRUE
However for this feature to work correctly, all foreign key columns on the fact tables should be “bitmap indexed” and not just “b-tree indexed”.
For foreign keys with between 2500 and 10000 distinct values (or those likely to grow beyond 2500) you should trial a Bitmap Index if the table is large (say, >10M rows). Bitmap indexes with over 10000 distinct values are unlikely to add benefit - use a B-Tree index (ie. a "regular" index) instead; Oracle is able to convert the results of a b-tree range-scan into a bitmap in order to combine with other Bitmap indexes.
“Bitmap join indexes” are recommended by Oracle to substantially increase performance of reports. “Bitmap join index” is a type of “Bitmap index” that stores the actual results of a join between two or more tables. Example, below is a bitmap join index that stores the result of joins from the fact tables to 3 columns in the “Customer” dimension table:
CREATE BITMAP INDEX WC_CUST_SALES_F_BMJ
ON WC_CUST_SALES_F (CUST_NAME, CUST_LOC, CUST_PHONE)
FROM WC_CUST_SALES_F F, WC_CUST_D D
WHERE F.CUST_ID = D.ID;
Primary key indexes should be created for all dimensions to support star-schema joins, but they are optional for Fact tables unless they are required to support the ETL.
3.3.4 Partitioning
When dealing with large fact tables, partitioning can be applied to optimize query response times.
Ensure all foreign keys on the fact table being partitioned have “LOCAL” Bitmap Index created on them. Below is sample syntax:
CREATE BITMAP INDEX WC_CUST_SALES_F_1 ON WC_CUST_SALES_F (CUST_KEY) LOCAL;
Set parallel querying to exploit partitioning feature:
ALTER SYSTEM SET PARALLEL_DEGREE_POLICY=AUTO
The document below lays out the basic requirements for an OBIEE server. However based on the expected usage, the system requirements need to be determined. Also reaching out to your Oracle Sales representative is also a viable option.
4.2 CONFIGURATION
Lets look at the various configuration parameters that can be setup for OBIEE server to optimize its performance. Note that the associated services need to be restarted after changing the parameters in order for the changes to take affect.
4.2.1 NQSConfig
Each instance of OBIEE Server has its own NQSConfig.ini. For a clustered environment, each instance of the file needs to be updated. Below is a list of parameters that can be tweaked to enhance the performance of OBIEE server.
Cache Parameters:
o Enable this flag so that queries are cached in flat file system and results are retrieved quickly when similar user executes same query.
ENABLE = YES;
o Specify where the cached results should be stored. Ideally this should be a path to a storage system that delivers high I/O throughput. Multiple paths, each no more than 4G can be specified.
DATA_STORAGE_PATHS
o Set this flag, so that when an aggregate query is executed, OBIEE can aggregate data using a cached detail entry, rather than going back to the database. The result of this aggregation will be cached as well.
POPULATE_AGGREGATE_ROLLUP_HITS = YES
o Set this flag if you want OBIEE to dig deep for cache hit.
USE_ADVANCED_HIT_DETECTION = YES
Other Parameters
o Specify multiple directories for temp space by using this variable. All directories should be fully qualified, separated by commas and within quotes:
WORK_DIRECTORY_PATHS = “C:\TMP\”, “D:\TEMP”;
o The processor uses virtual memory when it runs out of the physical memory specified for the OBIEE process. Depending on number of concurrent users this can be a multiple of 64K.
VIRTUAL_TABLE_PAGE_SIZE
OBIEE caching writes results of a OBIEE query into a flat XML file. When the same dataset is requested OBIEE retrieves it from the XML file/Cache instead of going after the database again. This is good for performance of the OBIEE system, as long as the cache is small enough. A good rule of thumb is to set the Cache size to 4G.
For standard reports that are run quite often it is advisable to use BI Scheduler/iBots. iBots are automatic process (similar to cronjobs in UNIX), that can trigger based on a event/time and can be configure to do the following:
Clear OBIEE Cache after ETL jobs are completed. (This can be time based event, that gets executed at particular time, when we are confident that all ETL jobs are complete)
Run all the pre-canned reports/dashboards with default filters (if any) as Administrator
(This step will create cache for all the dashboards that are more often used by the user community)
Any regular user that executes reports after the previous step will hit the Cache.
Logging or Session Logging is meant for debugging purposes only and should be disabled in all Testing and especially in Production environment. Logging requires OBIEE server to write to the disk, information generated by the query, which includes Presentation, Logical, and Physical queries; this process is very resources intensive and will have adverse impact on the performance of the server.
· Use native drivers to connect to the database instead of standard ODBC. Eg, Use OCI for Oracle
· Set Max Connections using the following formula:
0.04 * Number_Of_Users * Max_Reports_On_A_Dashboard
·
· Use simple (non-complex) incremental numeric keys for joining dimensions and facts
· Avoid snow flaking while modeling
· Foreign key should not be null
· Complexity should be handled at the database level as much as possible. Any kind of manipulation of attributes/data elements at the OBIEE level negatively impacts the performance of queries.
· Use content filter wherever possible
· Avoid using functions on columns. This should be handled in ETL if possible.
· Avoid outer joins.
· Create implicit fact constant. As a result when just a dimension is pulled into a report, the results will return quicker. See article:
4.3.4 Initialization Blocks and others
· Consolidate Init Blocks: Initialization blocks can affect the performance of the system and the queries as well. Hence it is wise to consolidate them and make them efficient. Eg: Instead of having Init Blocks for computing Current Month, Current Year, Previous Month, Previous Year; Compute all these variables in one single Init Block.
· Monitor Init Blocks: Failed Init blocks hamper the performance of the OBIEE server as well. Hence monitor the logs for any failed Init blocks.
· Note:If your database connection requires fully qualified names, then make sure that Init Blocks use them as prefix to tables, even if the flag has been checked in the connection pool; else the init blocks will keep failing.
· Filters always enhance performance of queries, hence implement Row-Level security in OBIEE RPD, which will enforce WHERE clause on queries that are executed in Presentation Layer.
Presentation Layer
The presentation layer is a play area where developers, power users, end users build and share content. Proper configuration and design techniques should be employed so that users are able to retrieve results in reasonable amount of time. The SLAs for dashboards and reports may vary from Enterprise to Enterprise, Business Unit to Business Unit and Functional Area to Functional Area. However the following benchmarks are generally accepted.
Dashboard: < 10 seconds
Publisher/Standard Reports: < 30 seconds
Below are some configurations and design techniques that can be employed for better performance:
OBIEE presentation layer components such as “Chart” or “Publisher” reports allocate and use Java heap size to process and display the results returned from an OBIEE request. Look at the session logs generated by OBIEE to see if the bottleneck is the Database, OBIEE layer or Presentation layer. If you notice based on timestamps that the response of database engine and OBIEE engine are fraction of the total time it is taking to return the results on a chart or a publisher report, it implies that the heap size allocated is not enough. To tweak these values open the file (sample location, actual location will be based on your installation):
C:\app\Middleware\instances\instance1\config\OracleBIJavaHostComponent\coreapplication_obijh1\config.xml
Edit the following parameter for charts (may be multiple it by a factor of 10, the values are in KB):
- 8192
For Publisher edit the parameter below:
- 8192
…
Set the following parameter in instanceconfig.xml to limit the maximum number of rows returned by an OBIEE query. In 11g this presentation layer config file is located at:
C:\app\Middleware\instances\instance1\config\OracleBIPresentationServicesComponent\coreapplication_obips1
Below is the variable that controls the maximum retrievable result set. It can be set higher or lower to ensure users don’t run queries that exhaust OBIEE and Database resources:
65000
Note that OBIEE is analytical/dashboard tool and users should never be required to retrieve more that a few rows for their analysis.
Apart from the above parameter Presentation layer can apply controls at individual display methods such as Pivot Tables/Tables using the following parameters:
DefaultRowsDisplayed
DefaultRowsDisplayedInDelivery
DefaultRowsDisplayedInDownload
Below are some design best practices while designing presentation layer content:
Restrict ad-hoc access to limited set of power users who understand the metadata and data layer very well. This will prevent users from running wild queries, that might hog the system
Provide users with data dictionary and training on various elements of presentation layer, so that they understand their purpose and impact.
Avoid usage of Pivot tables on large data sets
Always use filter(s) during ad-hoc analysis
Download result set in Data (CSV) format instead of XLS
Employ caching on all dashboard reports
Enforce filters on all dashboards
Limit the number of reports in a dashboard to 4, split reports into multiple tabs if needed
Ensure all dashboards reports are built on aggregated data
5.2.3 Publisher
BI Publisher reports can have many possible sources of data including Answers queries. If source data is Answers then ensure that the report is cached
BI Publisher reports should have filters defaulted/enforced just like dashboard reports.
Allocate enough heap size for complex operations.
0 comments:
Post a Comment