Skip Headers

Oracle® Content Management SDK Installation and Configuration Guide
Release 10g (9.0.4.1) for Microsoft Windows NT/2000/2003/XP
Part No. B13614-01
  Go To Table Of Contents
Contents
Go To Index
Index

Previous Next  

A Planning Your Oracle CM SDK Deployment

This appendix presents planning information designed to help you make important decisions about how to configure and deploy Oracle CM SDK.

The following sections are included in this appendix:

See Chapter 1, "Concepts," in the Oracle Content Management SDK Administrator's Guide for detailed information on Oracle CM SDK architecture and integration with key Oracle technologies.

Oracle CM SDK Minimum Hardware Requirements

The requirements described in Table A-1, "Minimum Hardware Requirements for Single-Computer Deployment" and Table A-2, "Minimum Hardware Requirements for Multiple-Computer Deployment for Production Environments" are based on using the Oracle CM SDK Middle-Tier Install.

The information in Table A-1 assumes that you are installing Oracle CM SDK on its own middle-tier computer, and that Oracle Ultra Search and Oracle9iAS Unified Messaging (Email) are run on a separate computer if you are also deploying those components.

Table A-1 and Table A-2 do not include requirements for Oracle Internet Directory. Oracle recommends that you install, configure, and run Oracle Internet Directory on a separate computer.

Table A-1 Minimum Hardware Requirements for Single-Computer Deployment

Description Requirement
Number of computers 1
Oracle CM SDK users supported 2 concurrent connected usersFoot 
Number of CPUs 1 (add 1 CPU if Oracle Text is being used for indexing)
Minimum processor type Intel Pentium III, 500 MHz (or higher)
RAM 1 gigabyte
Hard disk drive space and swap space 8.5 gigabytes minimum total free hard disk drive space required, which includes 6 gigabytes of space required by the Oracle database and Oracle Collaboration Suite Middle-Tier Install, and 2 gigabytes of swap space

Footnote A concurrent connected user is a user performing operations during a particular hour.

Table A-2 Minimum Hardware Requirements for Multiple-Computer Deployment for Production Environments

Description Requirement
Number of computers 2
Oracle CM SDK users supported 50 concurrent connected users
Computer 1: Middle Tier
Number of CPUs 1
Minimum processor type Intel Pentium III, 800 MHz (or higher)
RAM 1.5 gigabytes
Hard disk drive space and swap space 4 gigabytes minimum total free hard disk drive space required, which includes 2 gigabytes of space required by Oracle Collaboration Suite Middle-Tier Install, and 2 gigabytes of swap space
Computer 2: Database
CPUs 2 (includes 1 CPU for Oracle Text indexing)
RAM, disk, and swap space See Oracle Database Server Installation Guide and Release Notes for requirements for the database computer.

The hardware requirements in Table A-1 can support approximately two Oracle CM SDK concurrent connected users accessing two protocols moderately.

The hardware requirements in Table A-2 support a workgroup of about 50 Oracle CM SDK concurrent connected users accessing all protocols moderately.

Deployment Configuration Options and Requirements

In a production environment, Oracle recommends that you deploy Oracle CM SDK and Oracle Application Server according to the following guidelines:

See the Oracle Application Server 10g Installation Guide for more information.

Using Oracle Application Server, Infrastructure and Oracle Internet Directory

Oracle CM SDK does not require Oracle Application Server, Infrastructure unless you want to use Oracle Internet Directory for credential management. If you do want to use Oracle Internet Directory with Oracle CM SDK, follow these guidelines:

  • Install and configure Oracle Internet Directory on a separate database instance on a separate computer.

    • The only Oracle Internet Directory releases certified for use with Oracle CM SDK 9.0.4 are the releases shipped with Oracle Application Server, Infrastructure (release 9.0.2 and release 9.0.4).

  • To configure the OidCredentialManager during Oracle CM SDK configuration, you must know the orcladmin password, the computer name, the port number (389 is the default LDAP port number), and the root Oracle context of the Oracle Internet Directory.

  • Decide in advance how you want to map the default Oracle CM SDK system, guest, and scott user accounts to Oracle Internet Directory. During the configuration process for the OidCredentialManager, you can either create new accounts for these users (if Oracle Internet Directory does not already contain accounts with these names), or you can map these accounts to an Oracle Internet Directory account of your choice. (See Chapter 3, "Installation and Configuration" for additional information.)

See the Oracle Application Server 10g Installation Guide and Oracle Internet Directory Administrator's Guide for additional recommendations and requirements.

Multiple-Computer Deployment

Oracle CM SDK is designed to run as middle-tier application server software supported by Oracle Application Server. For optimal performance and ease of administration, the tiers should be located on different physical computers: the database should run on one computer, and the Oracle Application Server and Oracle CM SDK software should run on another computer.

To use Oracle Internet Directory for managing Oracle CM SDK user credentials, you should also install and configure Oracle Application Server, Infrastructure on a third computer that meets the requirements for running Oracle Internet Directory and Oracle Application Server, Infrastructure. See the Oracle Application Server 10g Administrator's Guide for more information.

The following provides an overview of the deployment process:

Database Tier (Computer 1)

You must use Oracle9i 9.0.1.4, 9.2.0.4 (or higher), or Oracle10g database server for the database tier. Your database instance must meet the requirements listed in "Oracle Database Requirements and Recommendations".

If are planning to use Oracle Management Agent for Oracle Enterprise Manager 10g Grid Control and are not planning to use Oracle Internet Directory, install Oracle Enterprise Manager 10g Grid Control on this computer. Optionally, you can configure it to use the Oracle9i database.

To create a new database, see Appendix B, "Creating an Oracle Database" for setting information.

Oracle Application Server Infrastructure Tier (Computer 2)

  1. Install and configure Oracle Application Server, Infrastructure.

  2. Configure Oracle Internet Directory if you plan to use it for authentication of Oracle CM SDK users (and any other Oracle applications).

  3. If you are planning to use Oracle Enterprise Manager 10g Grid Control, install it on this computer into the Oracle home where Oracle Application Server, Infrastructure is configured.

    Optionally, you can configure Oracle Enterprise Manager 10g Grid Control to use the Oracle Application Server, Infrastructure database.

Application Server Tier (Computer 3):

  1. Install and configure Oracle Application Server (Type A, B, or C), release 9.0.4. Oracle recommends using the Oracle Application Server, A. J2EE and Web Cache installation option.


    Note:

    Oracle CM SDK is certified for use with Windows XP Professional only with the J2EE and Web Cache installation option of Oracle Application Server.

  2. Install and configure Oracle CM SDK into the Oracle home where Oracle Application Server (Type A, B, or C) is configured. During configuration, select Create a new Oracle CM SDK domain, using the database instance on the Database Tier.


    Note:

    If you select Oracle Application Server Infrastructure Use during Oracle Application Server (Type A, B, or C) configuration and identify an Oracle Internet Directory instance, the Oracle Internet Directory must be running when you use the ifsca tool.

  3. If you are upgrading, copy all custom classes from a previous Oracle 9iFS or Oracle CM SDK installation to the following directory:

    %ORACLE_HOME%\ifs\cmsdk\custom_classes
    
    
  4. If you are planning to use Oracle Management Agent, install it on this computer in a separate Oracle home.


    Note:

    If you have a custom class named LINK, you must rename the class before installing Oracle CM SDK.

Single-Computer Deployment

Oracle CM SDK can be installed on a single computer if the computer meets the recommended hardware and software requirements. If your computer does not meet the recommended requirements, performance in this configuration can be less than satisfactory.

The hardware requirements for single-computer deployment can support only two Oracle CM SDK users accessing two protocols concurrently. If you plan to use Oracle Internet Directory for credential management, the computer requires three Oracle home instances. Because of this, Oracle recommends that you use single-computer deployment for development or evaluation purposes only.

  1. Install and configure the Oracle 10g Database Server (release 9.0.1.4, 9.2.0.3, or later) or Oracle10g Database Server into one Oracle home.

  2. Install and configure Oracle Application Server, A. J2EE and Web Cache into a new, separate Oracle home, accepting all of the default values.

  3. Install Oracle CM SDK in the same Oracle home that contains Oracle Application Server.

  4. If you are upgrading, copy all custom classes from a previous Oracle 9iFS or Oracle CM SDK installation to the following directory:

    %ORACLE_HOME%\ifs\cmsdk\custom_classes
    

    Note:

    If you have a custom class named LINK, you must rename the class before installing Oracle CM SDK.

  5. Using Oracle CM SDK Configuration Assistant, configure Oracle CM SDK following the instructions in Chapter 3, "Installation and Configuration".

Choice of Protocols

The most important decision regarding performance and scalability is the choice of which protocols to use to access Oracle CM SDK.

When possible, Oracle recommends using Wide Area Network (WAN) protocols as the primary mechanism for accessing Oracle CM SDK, and using Local Area Network (LAN) protocols only as secondary protocols, or only for those users who are unable to use WAN protocols.

WAN protocols include:

LAN protocols include:

WAN protocols generally are much more efficient in terms of network round trips, and perform fewer server operations to accomplish end user requests. Both of these factors improve performance for the end user.

For example, Oracle recommends using Web Folders with Microsoft Office 2000/XP for viewing and editing documents on Windows computers, rather than using NTFS.


Note:

Web Folders is different than the WebDAV File System Redirector on Windows XP, which is not supported.

Web Folders are created by mapping a network drive using the syntax http://server/content, and show up under Network Places without a drive letter.

Web Folders are configurable on all Windows operating systems.

The Windows XP WebDAV File System Redirector is created by mapping a network drive using the syntax \\server\target, and shows up as a mounted drive (for example, E:).


The advantages of Web Folders over NTFS, AFP, or NFS are as follows (NTFS is used as the example):

The disadvantages of using Web Folders are:

Sizing Guidelines

This section describes hardware requirements for a sample deployment of Oracle CM SDK and formula that allow you to determine the hardware configuration required to deploy Oracle CM SDK in your organization.

This section includes the following topics:

Hardware requirements for Oracle CM SDK are primarily determined by the factors described in Table A-3:

Table A-3 Primary Factors Determining Oracle CM SDK Hardware Requirements

Hardware Resource Middle-tier computer requirement variables Database computer requirement variables
CPU
  • Peak number of operations performed per second
  • Peak number of operations performed per second
  • Whether using Oracle Text indexing

Memory
  • Peak number of operations performed per second
  • Peak number of concurrent connected users

  • Average number of protocols used per concurrent connected user

  • Average number of sessions used per concurrent connected user

  • Number of NTFS/AFP/NFS protocol users

  • Number of documents per folder

  • Peak number of operations performed per second
  • Number of documents

Disk Size N/A
  • Number of documents
  • Average content size of documents

Disk Throughput

(not discussed in this document)

N/A
  • Peak number of documents read and written per second
  • Average content size of documents


In order to determine hardware requirements, assumptions must be made about the type of work that users are performing. The following measurements are averages extrapolated from deployment of Oracle Files (an application built using Oracle CM SDK) within Oracle Corporation (40,000+ users), which can be used as a guideline for projecting Oracle CM SDK usage.

Table A-4 User Profiles

User Task Number of Operations per Connected User per Hour
Folders opened 8
Documents read / written 10
Queries 0.1


Note:

These sizing guidelines can be inaccurate if the desired user profile is significantly larger than the average measurements detailed in Table A-4, or if the Oracle CM SDK application is not used as a general file server replacement.

These sizing guidelines are based on benchmarks of 10,000 concurrent connected users on Sun Microsystems hardware. The guidelines have been validated against measurements taken from internal Oracle Corporation production usage of Oracle Files by 40,000 Oracle employees, with 17 million documents and 4TB of content. This system uses Intel Linux hardware for the middle-tier computers, and Sun hardware for the database.

Sizing Formulas for Each Middle-Tier Computer

This section provides formulas that you can use to determine specific hardware sizing for each middle-tier computer.

The following table summarizes the sizing formulas:

Table A-5 General Oracle CM SDK Sizing Recommendations for Each Middle-Tier Computer

Component Sizing Recommendations
Number of CPUs roundup(peak concurrent connected users / 250 + 33% headroom)
Needed usable disk space At least 500MB for software
Total machine memory If HTTP is the primary protocol: 480MB + (3.6 MB * peak concurrent connected users)

If HTTP is not the primary protocol, or if the desired user profile is different than the average measurements described in Table A-4: 480MB + (1MB * peak concurrent connected users * average number of sessions in use by each concurrent connected user) + (3KB * number of objects desired in the java object cache) + (8MB * number of connections to the database)


Number of CPUs

Use the following formula to determine the number of CPUs required:

roundup(peak concurrent connected users / 250 + 33% headroom)

In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated.

This formula is based on the following assumptions:

  • The formula assumes Sun SPARC Solaris 400MHz UltraSPARC-II processors with 8MB secondary cache.

  • Other RISC processors should perform roughly proportional to their MHz.

  • Intel Pentium III or IV processors on Windows boxes should perform roughly proportional to half their MHz. For example, an 800MHz Pentium processor is approximately equivalent to a 400MHz RISC processor.

Required Usable Disk Space

Allocate at least 500MB for software. This does not include the following considerations:

  • Mirroring for backup and reliability

  • Redo log size, which should be determined by how many documents are inserted and their size

  • Unused portion of the last extent in each database, which occurs with pre-created database files or which can be large if the next extent setting is large

Total Computer Memory, HTTP as the Primary Protocol

If HTTP is the primary protocol, use the following formula to determine the total computer memory required:

480MB + (3.6MB * peak concurrent connected users)

The 480MB is for the first Oracle CM SDK middle-tier computer. The value of 3.6MB is calculated from the following assumptions:

  • 1 session per concurrent connected user: This assumes that the primary interface for Oracle CM SDK is through the HTTP node.

  • 0.1 connection pool connections per concurrent connected users: This assumes the stated user profile.

  • 400 objects in the Java data cache per concurrent connected user: This assumes 50 documents per folder and 8 folders opened per hour, assuming the stated user profile.

Total Computer Memory, Non-HTTP Protocol

If HTTP is not the primary protocol, or if the desired user profile is different than the average measurements described in Table A-4, use the following formula to determine the total computer memory required:

480MB + (1MB * peak concurrent connected users * average number of sessions in use by each concurrent connected user) + (3KB * number of objects desired in the Java object cache) + (8MB * number of connections to the database)

The 480MB is for the first Oracle CM SDK middle-tier computer. The other values are calculated from the following assumptions:

  • The value of 1MB is high by design. Oracle CM SDK has been optimized to reduce database CPU load by using middle-tier memory to cache items. This ensures a more scalable and less expensive system, because the database computer is less of a scalability bottleneck, and because memory on one- or two-processor middle-tier computers is typically less expensive than memory or CPU on high-end database computers (computers with large amounts of attached storage or with many processors).

  • Oracle recommends limiting the number of peak concurrent user sessions via the IFS.SERVICE.MaximumConcurrentSessions parameter in the service configuration. Oracle has tested with Java heaps up to 2GB. With this constraint, this implies up to approximately 700 concurrent connected users per node and a total of 1986MB in size, if the following are true:

    • Each user uses 1.6 sessions

    • Each session is 1MB (700 * 1.6 * 1MB = 1,120MB)

    • Each user needs 400 Java data cache objects

    • Each object is 3KB in size (700 * 400 * 3KB = 866MB)

    For each additional node on the same computer, you must include the node overhead in the sizing. See Table A-7 for more information.

    The HTTP/WebDAV memory overhead includes memory for 10 simultaneous guest user requests. Because of this, guest users should not be counted as connected users for HTTP/WebDAV access.

  • For the average number of sessions in use by each concurrent connected user, use the value 1.6 for the HTTP node. For NTFS, this value can be as high as 10, because for each NTFS concurrent connected user there can be an additional 9 other non-concurrent but connected users.

  • Calculate the number of objects desired in the Java object cache by using the following formula:

    (number of folder opens in the peak hour) * (number of objects per folder) * (number peak concurrent connected users)
    
    

    Use the result to set the value of the IFS.SERVICE.DATACACHE.Size parameter.

  • The number of connections to the database depends on the number of simultaneous read or write operations being performed. Assume 0.1 database connections per user if using a standard user profile. This is a sum of the parameters IFS.SERVICE.CONNECTIONPOOL.WRITEABLE.MaximumSize and IFS.SERVICE.CONNECTIONPOOL.READONLY.MaximumSize for each service.

Sizing Formulas for the Database Computer

This section provides formula that you can use to determine specific hardware sizing for each database computer.

The following table summarizes the sizing formulas:

Table A-6 General Oracle CM SDK Sizing Recommendations for the Database Computer

Component Sizing Recommendations
Number of CPUs roundup(peak concurrent connected users / 250 + 33% headroom)
Needed usable disk space 4.5GB + total raw file size + (total raw files size * 20%)
Total machine memory 64MB + 128MB + database buffer cache + (1MB * number of connections to the database) + (500 bytes * number of documents) + (100KB * peak concurrent connected users)

Number of CPUs

Use the following formula to determine the number of CPUs required:

roundup(peak concurrent connected users / 250 + 33% headroom)

In order to ensure optimal efficiency, no more than 75% of the CPU should be allocated. One additional CPU is used for the background Oracle Text indexing of new document content, if you are using Oracle Text indexing.

This formula is based on the following assumptions:

  • The formula assumes Sun SPARC Solaris 400MHz UltraSPARC-II processors with 8MB secondary cache.

  • Other RISC processors should perform roughly proportional to their MHz.

  • Intel Pentium III or IV processors on Windows boxes should perform roughly proportional to half their MHz. For example, an 800MHz Pentium processor is approximately equivalent to a 400MHz RISC processor.

Required Usable Disk Space

Use the following formula to determine the usable disk space required:

4.5GB + total raw file size + (total raw file size * 20%)

The 4.5GB represents the space required for Oracle software and the initial database configuration. If you are not using Oracle Text to index the content, multiply the total raw file size by 15% instead of 20%.

Total Computer Memory

Use the following formula to determine the total computer memory required:

64MB + 128MB + database buffer cache + (1MB * number of connections to the database) + (500 bytes * number of documents) + (100KB * peak concurrent connected users)

This formula is based on the following assumptions:

  • 128MB is the minimum amount of memory required to run a small Oracle Server.

  • Number of documents: The database buffer cache in the default Oracle database configuration is sufficient for approximately 50,000 documents. For deployments with more than 50,000 documents, allocate 500 bytes per document for optimal performance, including wildcard filename searches. Reduce this number if users do not perform wildcard filename searches.

  • 100KB is calculated by assuming that 0.1 database connections are needed per concurrent connected user as in the stated user profile. Each database connection takes approximately 1MB of database memory.

Memory Requirements: Sample Deployment

Approximate minimum memory overhead on the middle-tier computers for each component are detailed in Table A-7:

Table A-7 Memory Overhead by Component

Description Approximate amount of minimum memory (specified in megabytes) for middle-tier computer running a regular node and HTTP node Approximate amount of minimum memory (specified in megabytes) for middle-tier computer running an additional HTTP node Approximate amount of minimum memory (specified in megabytes) for middle-tier computer running an additional regular node
Memory used by the operating system upon booting the computer. 60 60 60
Overhead for first Java Virtual Computer (JVM). 30 30 30
Domain controller JVM. This only needs to be run once for a single Oracle CM SDK schema, regardless of how many middle-tier computers are running Oracle CM SDK protocols. 20 0 0
Oracle Enterprise Manager Web site. This must run on every node to allow managing the node through Oracle Enterprise Manager. 150 150 150
Regular Oracle CM SDK node JVM. By default, this runs all the protocols, such as FTP and NTFS, and the Oracle CM SDK agents. 50 0 50
Oracle CM SDK Node guardian JVM, which monitors the Oracle CM SDK regular node and recovers from node failures. 10 0 10
Oracle HTTP Server, including the default HTTP daemons. This only needs to run where HTTP access is required. 30 30 0
Oracle CM SDK OC4J process. This only needs to run where Oracle CM SDK HTTP/WebDAV/Oracle FileSync access is required. It must be paired with Oracle HTTP Server. 130 130 0
Total 480 400 300

Tablespaces

This section provides guidelines for configuring Oracle CM SDK tablespaces.

This section includes the following topics:

Data Types and Storage Requirements

Table A-8 lists the different types of data stored in Oracle CM SDK and describes the purpose of each tablespace. For about creating custom tablespaces, see Appendix B, "Creating an Oracle Database".

Table A-8 Tablespace Definitions

Tablespace Type Name (in Oracle CM SDK Configuration Assistant) Example Tablespace Name Description
Document Storage Indexed Media IFS_LOB_I Stores the Large Object (LOB) data for documents that are indexed by Oracle Text, such as text and word processing files.
Document Storage Non-Indexed Media IFS_LOB_N Stores the LOB data for documents that are not indexed by Oracle Text, such as zip files.
Document Storage interMedia Media IFS_LOB_M Stores the LOB data for documents that are indexed by Oracle interMedia, such as image, audio, and video files.
Oracle Text Oracle Text Data IFS_CTX_I Stores words (tokens) extracted by Oracle Text from Oracle CM SDK documents (the Oracle table DR$IFS_TEXT$I).
Oracle Text Oracle Text Index IFS_CTX_X Stores the Oracle B*tree index on the Oracle Text tokens (the Oracle index DR$IFS_TEXT$X).
Oracle Text Oracle Text Keymap IFS_CTX_K Stores miscellaneous Oracle Text tables (the Oracle tables DR$IFS_TEXT$K, DR$IFS_TEXT$N, DR$IFS_TEXT$R).
Metadata Primary IFS_MAIN Stores metadata for documents, information about users and groups, and other Oracle CM SDK object data.
General Oracle Storage N/A Various SYSTEM, ROLLBACK, TEMP, and other tablespaces that store the Oracle data dictionary, temporary data during transactions, etc.

Typical tablespace storage space and disk I/O are detailed in Table A-9:

Table A-9 Tablespace Storage Requirements and Disk I/O

Tablespace % of Total I/O Throughput Requirements % of Disk Space Requirements
IFS_MAIN 50% 2%
IFS_CTX_X 20% 1%
IFS_CTX_I 10% 1%
IFS_LOB_I 8% 35%
IFS_LOB_N 5% 55%
Various 5% 1%
IFS_LOB_M 1% 4%
IFS_CTX_K 1% 1%
Total 100% 100

Note the following issues regarding the information in Table A-9:

  • I/O rates are highly dependent on the size of the db_block_cache. These measurements were taken on the Oracle-internal Oracle Files implementation, with 8GB db_block_cache, 17 million documents, and 40,000 named users.

  • The IFS_MAIN tablespace is the most important tablespace to spread across disks for maximum I/O capacity.

  • Disk I/O for the IFS_CTX_I, IFS_CTX_X and IFS_CTX_K tablespaces is largely generated from Oracle Text batch processes (ctx_ddl.sync_index, and ctx_ddl.optimize_index), which are not critical to end-user performance. Therefore, these tablespaces can be on disks with lower I/O capacity, if necessary.

Storing Documents in an Oracle Database

The largest consumption of disk space occurs on the disks that actually contain the documents that reside within Oracle CM SDK, namely the Indexed Medias tablespaces, Non-Indexed Medias tablespaces, and interMedia tablespaces. This section explains how the documents are stored and how to calculate the amount of space those documents require.

As previously mentioned, documents stored in Oracle CM SDK are actually stored in database tablespaces. Oracle CM SDK makes use of the Large Object (LOB) facility of the Oracle database. All documents are stored as Binary Large Objects (BLOBs), which is one type of LOB provided by the database. LOBs provide for transactional semantics much like the normal data stored in a database. In order to accomplish these semantics, LOBs must be broken down into smaller pieces which are individually modifiable and recoverable. These smaller pieces are referred to as chunks. Chunks are a group of one or more sequential database blocks from a tablespace that contains a LOB column.

Both database blocks and chunk information within those blocks (BlockOverhead) impose some amount of overhead for the stored data. BlockOverhead is presently 60 bytes per block, which consists of the block header, the LOB header, and the block checksum. Oracle CM SDK configures its LOBs to have a 32K chunk size.

As an example, assume that the DB_BLOCK_SIZE parameter of the database is set to 8192 (8K). A chunk would require four contiguous blocks and impose an overhead of 240 bytes. The usable space within a chunk would be 32768-240=32528 bytes.

Each document stored in Oracle CM SDK consists of some integral number of chunks. Using the previous example, for instance, a 500K document actually uses 512000/32528=15.74=16 chunks. Sixteen chunks take up 16*32K = 524288 bytes. The chunking overhead for storing this document would then be 524288-512000=12288 bytes which is 2.4% of the original document's size.

The chunk size used by Oracle CM SDK is set to optimize access times for documents. Note that small documents, documents less than one chunk, incur a greater disk space percentage overhead since they must use at least a single chunk.

Another structure required for transactional semantics on LOBs is the LOB Index. Each LOB index entry can point to 8 chunks of a specific LOB object (NumLobPerIndexEntry = 8). In this example, where a 500K document takes up 16 chunks, two index entries are required for that object. Each entry takes 46 bytes (LobIndexEntryOverhead) and is then stored in an Oracle B*tree index, which in turn has its own overhead depending upon how fragmented that index becomes.

The last factor affecting LOB space utilization is the PCTVERSION parameter used when creating the LOB column. For information about how PCTVERSION works, please consult the Oracle9i SQL Reference.

Oracle CM SDK uses the default PCTVERSION of 10% for the LOB columns it creates. This reduces the possibility of "ORA-22924 snapshot too old" errors occurring in read consistent views. So by default, a minimum of a 10 percent increase in chunking space must be added in to the expected disk usage to allow for persistent PCTVERSION chunks.

For large systems where disk space is an issue, Oracle recommends reducing PCTVERSION to 1, in order to reduce disk storage requirements. This can be done at any time in a running system using the following SQL commands:

alter table odmm_contentstore modify lob (globalindexedblob) (pctversion 1);
alter table odmm_contentstore modify lob (emailindexedblob) (pctversion 1);
alter table odmm_contentstore modify lob (emailindexedblob_t) (pctversion 1);
alter table odmm_contentstore modify lob (intermediablob) (pctversion 1);
alter table odmm_contentstore modify lob (intermediablob_t) (pctversion 1);
alter table odmm_nonindexedstore modify lob (nonindexedblob2) (pctversion 1);

To calculate LOB tablespace usage:

  1. Calculate the number of chunks a file uses by figuring the number of blocks per chunk, then subtracting the BlockOverhead (60 bytes) from the chunk size to obtain the available space per chunk.

  2. Divide the file size by the available space per chunk to obtain the number of chunks, per the following formula:

    chunks = roundup(FileSize/(ChunkSize-((ChunkSize/BlockSize)*BlockOverhead)))
    
    

    For example, if FileSize = 100,000, ChunkSize = 32768, Blocksize = 8192, and BlockOverhead = 60, then:

    chunks = roundup (100000 /(32768 - ((32768 / 8192) * 60)))= 4 Chunks
    
    
  3. Calculate the amount of disk space for a file by multiplying the number of chunks times the chunk size, multiplying that result by the PCTVERSION factor, and then adding the space for NumLobPerIndexEntry (8) and LobIndexEntryOverhead (46 bytes). Use the following formula:

    FileDiskSpaceInBytes = roundup(chunks*ChunkSize*PctversionFactor) + roundup(chunks/NumLobPerIndexEntry*LobIndexEntryOverhead)
    
    

    For example, if chunks = 4, ChunkSize = 32768, PctversionFactor = 1.1, NumLobPerIndexEntry = 8, and LobIndexEntryOverhead = 46, then:

    FileDiskSpaceInBytes = roundup (4 * 32768 * 1.1) + (roundup(4/8) * 46) = 144226 FileDiskSpaceInBytes
    
    
  4. Calculate the total disk space used for file storage by summing the application of the above formulas for each file to be stored in the LOB, per this formula:

    TableSpaceUsage = sum(FileDiskSpaceInBytes) for all files stored 
    
    

Oracle CM SDK creates multiple LOB columns. The space calculation must be made for each tablespace based upon the amount of content that qualifies for storage in that tablespace.

Oracle CM SDK Metadata and Infrastructure

The Oracle CM SDK server keeps persistent information about the file system and the contents of that file system in database tables. These tables and their associated structures are stored in the Oracle CM SDK Primary tablespace. This tablespace contains approximately 300 tables and 500 indexes. These structures are required to support both the file system and the various protocols and user interfaces that make use of that file system.

The administration and planning tasks of this space should be very similar to operations on a normal Oracle database installation. The administrator of the system should plan for approximately 6K of overhead per document to be used from this tablespace, or about 2% of the overall content. If there is a significant amount of custom metadata, such as categories, this overhead is larger.

The initial disk space allocated for this tablespace is approximately 50MB for a default install. Of this 50MB, 16MB is actually used at the completion of installation. This includes instantiations for all required tables and indexes and the metadata required for the approximately 700 files that are loaded into Oracle CM SDK as part of the install. Different tables and indexes within this tablespace grow at different rates depending on which features of Oracle CM SDK are used in a particular installation.

Oracle Text

When Oracle CM SDK works in conjunction with Oracle Text, it allows users to use powerful search capabilities on the documents stored within Oracle CM SDK. Disk space for these capabilities is divided among three distinct tablespaces for optimal performance.

The Oracle Text Data tablespace contains tables which hold the text tokens (separate words) that exist within the various indexed documents. The storage for these text tokens is roughly proportional to the ASCII content of the document.

The ASCII content percentage varies depending on the format of the original document. Text files only have white space as their non-ASCII content and therefore incur a greater per document percentage overhead. Document types such as Microsoft Word or PowerPoint contain large amounts of data required for formatting that does not qualify as text tokens. The per document percentage on these types of documents is therefore lower. On a system with diverse content types the expected overhead is approximately 8% of the sum of the original sizes of the indexed documents.

Table A-10 offers some general guidelines for the amount of ASCII text in a document for several popular file formats:

Table A-10 Average ASCII Content Per Document Type

Format Plain ASCII Content as Percentage of File Size Typical Percentage of all Document ContentFoot 
Microsoft ExcelFoot  250% 4%
ASCII 100% 2%
HTML 90% 10%
Rich Text Format 80% 2
Microsoft Word 70% 13%
Acrobat PDF 10% 18%
Microsoft PowerPoint 1% 3%
Images (JPEG, BMP), Compressed files (Zip, TAR), Binary files, etc. 0% 50%
Total  
100%

Footnote From statistics of Oracle Corporation's internal usage of Oracle CM SDK.
Footnote By default, Oracle Text indexes each number in an Excel document as a separate word. Excel stores a number more efficiently than its ASCII equivalent, which is why the ASCII content as a percentage of the file size is greater than 100%.

The Oracle Text Keymap tablespace contains the tables and indexes required to translate from the Oracle CM SDK locator of a document (the Oracle CM SDK DocID) to the Oracle Text locator of that same document (the Oracle Text DocID). The expected space utilization for this tablespace is approximately 70 bytes per indexed document.

The Oracle Text Index tablespace contains the B*tree database index that is used against the text token information stored in the Oracle Text Data tablespace. This grows as a function of the ASCII content just as the Oracle Text Data tablespace does. On a system with diverse content types the expected overhead is approximately 4% of the sum of the ASCII content of the documents, or approximately 1% of the sum of the total sizes of the indexed documents.

Disk Space Requirements: Sample Deployment

This section details disk space requirements, and offers guidance as to how necessary disk space expands with the addition of documents to the server.

Based on experience running Oracle CM SDK for Oracle Corporation's internal usage, the disk overhead of Oracle CM SDK for a large system (hundreds of gigabytes of file content) is detailed in Table A-11:

Table A-11 Disk Space Requirements Summary

Tablespace Overhead Type Overhead Versus Total Raw File ContentFoot  Primarily Determined By
Document Storage 12% Size of documents relative to chunk size (32KB by default)
Oracle Text 5% Amount of ASCII content in all documents
Metadata 2% Number of folders, documents, etc.
General Oracle Storage 1% Fixed, not configurable, database settings for TEMP, UNDO, and other tablespaces
Total 20%  

Footnote This does not include: Mirroring for backup and reliability; Redo log size, which should be determined by how many documents are inserted and their size; Unused portion of the last extent in each database file (which occur with pre-created database files or which can be large if the next extent setting is large).

See the Oracle Concepts Guide for explanation of the terms Large Object (LOB), tablespace, chunk size, and extents.

Given that a large percentage of the overhead is in LOB overhead, note that the overhead for your Oracle CM SDK instance can vary depending on the average and median sizes of documents.