Oracle® TimesTen In-Memory Database TimesTen to TimesTen Replication Guide Release 11.2.1 Part Number E13072-09 |
|
|
View PDF |
This chapter describes how to design a highly available system and define replication schemes. It includes the following topics:
To reduce the amount of bandwidth required for replication, see "Compressing replicated traffic".
When you are planning an active standby pair, keep in mind the following restrictions:
You can specify at most 127 subscriber databases.
Each master and subscriber database must be on a different node to ensure high availability.
The active database and the standby database should be on the same LAN.
The clock skew between the active node and the standby node cannot exceed 250 milliseconds.
For the initial set-up, you can create a standby database only by duplicating the active database with the ttRepAdmin
-duplicate
utility or the ttRepDuplicateEx
C function.
Read-only subscribers can be created only by duplicating the standby database. If the standby database is unavailable, then the read-only subscribers can be created by duplicating the active database. See "Duplicating a database".
After failover, the new standby database can only be recovered from the active database by duplicating the active database unless return twosafe replication is used between the active and the standby databases. If return twosafe replication is used, the automated master catch-up feature may be used instead. See "Automatic catch-up of a failed master database".
Operations on replicated tables are not allowed on the standby database and the subscriber databases. However, operations on sequences and XLA bookmarks are allowed on the standby database and the subscriber databases.
Replication from the standby database to the read-only subscribers occurs asynchronously.
ALTER ACTIVE STANDBY PAIR
statements can be executed only on the active database. If ALTER ACTIVE STANDBY PAIR
is executed on the active database, then the standby database must be regenerated by duplicating the active database. All subscribers must also be regenerated from the standby database. See "Reversing the roles of the active and standby databases".
Before you define the active standby pair, you need to define the DSNs for the active, standby and read-only subscriber databases. On Linux or UNIX, create an odbc.ini
file. On Windows, use the ODBC Administrator to name the databases and set connection attributes. See "Step 1: Create the DSNs for the master and the subscriber databases" for an example.
Each database "name" specified in a replication scheme must match the prefix of the database file name without the path given for the DataStore
data store attribute in the DSN definition for the database. A replication scheme that uses the names specified in the Data Source Name
data store attributes does not work. To avoid confusion, use the same name for both your DataStore
and Data Source Name
data store attributes in each DSN definition. For example, if the database path is directory/subdirectory/foo.ds0
, then foo
is the database name that you should use.
Use the CREATE ACTIVE STANDBY PAIR
SQL statement to create an active standby pair replication scheme. The complete syntax for the CREATE ACTIVE STANDBY PAIR
statement is provided in the Oracle TimesTen In-Memory Database SQL Reference.
You must have the ADMIN
privilege to use the CREATE ACTIVE STANDBY PAIR
statement and to perform other replication operations.
Table 3-1 shows the components of an active standby pair replication scheme and identifies the parameters associated with the topics in this chapter.
Table 3-1 Components of an active standby pair replication scheme
Component | See... |
---|---|
|
|
|
|
|
|
|
|
|
|
|
Use the full database name described in "Defining the DSNs for the databases". The first database name designates the active database. The second database name designates the standby database. Read-only subscriber databases are indicated by the SUBSCRIBER
clause.
You can also specify the hosts where the databases reside by using an IP address or a literal host name surrounded by double quotes.
The active database and the standby database should be on separate hosts to achieve a highly available system. Read-only subscribers can be either local or remote. A remote subscriber provides protection from site-specific disasters.
You can provide an optional host ID as part of FullDatabaseName
:
DatabaseName [ON Host]
Host
can be either an IP address or a literal host name. It is good practice to surround a host name with double quotes.
Tables that are replicated in an active standby pair must have one of the following:
A primary key
A unique index over non-nullable columns
Replication uses the primary key or unique index to uniquely identify each row in the replicated table. Replication always selects the first usable index that turns up in a sequential check of the table's index array. If there is no primary key, replication selects the first unique index without NULL
columns it encounters. The selected index on the replicated table in the active database must also exist on its counterpart table in the standby database.
Note:
The keys on replicated tables are transported in each update record to the subscribers. Smaller keys transport most efficiently.VARCHAR2
, NVARCHAR2
, VARBINARY
and TT_VARCHAR
columns in replicated tables is limited to a size of 4 megabytes. For a VARCHAR2
column, the maximum length when using character length semantics depends on the number of bytes each character occupies when using a particular database character set. For example, if the character set requires four bytes for each character, the maximum possible length is 64,000 characters. For an NVARCHAR2
column, which requires two bytes for each character, the maximum length when using character length semantics is 128,000 characters.
You can configure your replication scheme with a return service to ensure a higher level of confidence that your replicated data is consistent on the active and standby databases. See "Copying updates between databases". This section describes how to configure and manage the return receipt and return twosafe services. NO RETURN
(asynchronous replication) is the default and provides the fastest performance.
The following sections describe the following return service clauses:
TimesTen provides an optional return receipt service to loosely couple or synchronize your application with the replication mechanism.
You can specify the RETURN RECEIPT
clause to enable the return receipt service for the standby database. With return receipt enabled, when your application commits a transaction for an element on the active database, the application remains blocked until the standby acknowledges receipt of the transaction update.
If the standby is unable to acknowledge receipt of the transaction within a configurable timeout period, your application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
You can use the ttRepXactStatus
procedure to check on the status of a return receipt transaction. See "Checking the status of return service transactions" for details.
You can also configure the replication agent to disable the return receipt service after a specific number of timeouts. See "Managing return service timeout errors" for details.
RETURN RECEIPT
enables notification of receipt for all transactions. You can use RETURN RECEIPT
with the BY REQUEST
clause to enable receipt notification only for specific transactions identified by your application.
If you specify RETURN RECEIPT BY REQUEST
, you must use the ttRepSyncSet
procedure to enable the return receipt service for a transaction. The call to enable the return receipt service must be part of the transaction (autocommit must be off).
If the standby database is unable to acknowledge receipt of the transaction update within a configurable timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
You can use ttRepSyncGet
to check if a return service is enabled and obtain the timeout value. For example:
Command> CALL ttRepSyncGet(); < 01, 45, 1> 1 row found.
TimesTen provides a return twosafe service to fully synchronize your application with the replication mechanism. The return twosafe service ensures that each replicated transaction is committed on the standby database before it is committed on the active database. If replication is unable to verify the transaction has been committed on the standby, it returns notification of the error. Upon receiving an error, the application can either take a unique action or fall back on preconfigured actions, depending on the type of failure.
When replication is configured with RETURN TWOSAFE
, you must disable autocommit mode.
A transaction that contains operations that are replicated with RETURN TWOSAFE
cannot have a PassThrough
setting greater than 0. If PassThrough
is greater than 0, an error is returned and the transaction must be rolled back.
If the standby is unable to acknowledge commit of the transaction update within a configurable timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
RETURN TWOSAFE
enables notification of commit on the standby database for all transactions. You can use RETURN TWOSAFE
with the BY REQUEST
clause to enable notification of a commit on the standby only for specific transactions identified by your application.
A transaction that contains operations that are replicated with RETURN TWOSAFE
cannot have a PassThrough
setting greater than 0. If PassThrough
is greater than 0, an error is returned and the transaction must be rolled back.
If you specify RETURN TWOSAFE BY REQUEST
for a standby database, you must use the ttRepSyncSet
procedure to enable the return twosafe service for a transaction. The call to enable the return twosafe service must be part of the transaction (autocommit must be off).
If the standby is unable to acknowledge commit of the transaction within the timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. The application can then chose how to handle the timeout, in the same manner as described for "RETURN TWOSAFE".
The ALTER TABLE
statement cannot be used to alter a replicated table that is part of a RETURN TWOSAFE BY REQUEST
transaction. If DDLCommitBehavior
=1, this operation results in error 8051. If DDLCommitBehavior
=0, the operation succeeds because a commit is performed before the ALTER TABLE
operation, resulting in the ALTER TABLE
operation being in a new transaction which is not part of the RETURN TWOSAFE BY REQUEST
transaction.
See "Setting the return service timeout period" for more information on setting the return service timeout period.
You can use ttRepSyncGet
to check if a return service is enabled and obtain the timeout value. For example:
Command> CALL ttRepSyncGet(); < 01, 45, 1> 1 row found.
Table 3-2 lists the optional STORE
attributes for the CREATE ACTIVE STANDBY PAIR
statement.
Table 3-2 STORE attribute descriptions
STORE attribute | Description |
---|---|
|
Set the return service policy so that return service blocking is disabled after the number of timeouts specified by See "Establishing return service failure/recovery policies". |
|
Set return services on or off when replication is disabled. See "Establishing return service failure/recovery policies". |
|
If See "Establishing return service failure/recovery policies". |
|
Specifies the number of seconds to wait for return service acknowledgement. A value of 0 means that there is no waiting. The default value is 10 seconds. The application can override this timeout setting by using the |
|
Overrides the See "DURABLE COMMIT". |
|
Specifies the default action to be taken for a return service transaction in the event of a timeout. The options are:
This default setting can be overridden for specific transactions by using the |
|
Compresses replicated traffic to reduce the amount of network bandwidth used. |
|
Sets the port number used by a database to listen for updates from another database. In an active standby pair, the standby database listens for updates from the active database. Read-only subscribers listen for updates from the standby database. If no See "Port assignments". |
|
Set the maximum number of seconds a database waits before re-sending a message to an unresponsive database. In an active standby pair, the active database sends messages to the standby database. The standby database sends messages to the read-only subscribers. |
|
Sets the log failure threshold. |
The rest of this section includes these topics:
If a replication scheme is configured with one of the return services described in "Using a return service", a timeout occurs if the standby database is unable to send an acknowledgement back to the active within the time period specified by TIMEOUT
.
The default return service timeout period is 10 seconds. You can specify a different return service timeout period by:
Specifying the RETURN WAIT TIME
in the CREATE ACTIVE STANDBY PAIR
statement or ALTER ACTIVE STANDBY PAIR
statement. A RETURN WAIT TIME
of 0
indicates no waiting.
Specifying a different return service timeout period programmatically by calling the ttRepSyncSet
procedure with a new value for the returnWait
parameter.
Once set, the timeout period applies to all subsequent return service transactions until you either reset the timeout period or terminate the application session.
A return service may time out because of a replication failure or because replication is so far behind that the return service transaction times out before it is replicated. However, unless there is a simultaneous replication failure, failure to obtain a return service confirmation from the standby does not necessarily mean the transaction has not been or will not be replicated.
You can set other STORE
attributes to establish policies that automatically disable return service blocking if there are excessive timeouts and re-enable return service blocking when conditions improve. See "Managing return service timeout errors" for details.
If the standby database is unable to acknowledge the transaction update from the active database within the timeout period, the application receives an errRepReturnFailed
warning on its commit request.
The default return service timeout period is 10 seconds. You can specify a different timeout period by:
Using the RETURN WAIT TIME
clause in the STORE
clause of the CREATE ACTIVE STANDBY PAIR
statement
Calling the ttRepSyncSet
procedure with a new returnWait
parameter
A return service may time out or fail because of a replication failure or because replication is so far behind that the return service transaction times out before it is replicated. However, unless there is a simultaneous replication failure, failure to obtain a return service confirmation from the subscriber does not mean the transaction has not or will not be replicated.
You can respond to return service timeouts by:
You may want respond in some manner if replication is stopped or return service timeout failures begin to adversely impact the performance of your replicated system. Your "tolerance threshold" for return service timeouts may depend on the historical frequency of timeouts and the performance/availability equation for your particular application, both of which should be factored into your response to the problem.
When using the return receipt service, you can manually respond by:
Using the ALTER ACTIVE STANDBY PAIR
statement to disable return receipt blocking
Calling the ttDurableCommit
built-in procedure to durably commit transactions on the active database that you can no longer verify as being received by the standby
If you decide to disable return receipt blocking, your decision to re-enable it depends on your confidence level that the return receipt transaction is no longer likely to time out.
An alternative to manually responding to return service timeout failures is to establish return service failure and recovery policies in the replication scheme. These policies direct the replication agents to detect changes to the replication state and to keep track of return service timeouts and then automatically respond in a predefined manner.
The following attributes in the CREATE ACTIVE STANDBY PAIR
statement set the failure and recovery policies when using a RETURN RECEIPT
or RETURN TWOSAFE
service:
The policies set by these attributes are applicable for the life of the database or until changed. The replication agent must be running to enforce these policies.
The RETURN SERVICES {ON | OFF} WHEN [REPLICATION] STOPPED
attribute determines whether a return receipt or return twosafe service continues to be enabled or is disabled when replication is stopped. "Stopped" in this context means that either the active replication agent is stopped (for example, by ttAdmin
-repStop
active
) or the replication state of the standby database is set to Stop
or Pause
with respect to the active database (for example, by ttRepAdmin
-state stop
standby
). A failed standby that has exceeded the specified FAILTHRESHOLD
value is set to the Failed
state, but is eventually set to the Stop
state by the master replication agent.
Note:
A standby database may become unavailable for a period of time that exceeds the timeout period specified byRETURN WAIT TIME
but still be considered by the master replication agent to be in the Start
state. Failure policies related to timeouts are set by the DISABLE RETURN
attribute.RETURN SERVICES OFF WHEN REPLICATION STOPPED
disables the return service when replication is stopped and is the default when using the RETURN RECEIPT
service. RETURN SERVICES ON WHEN REPLICATION STOPPED
allows the return service to continue to be enabled when replication is stopped and is the default when using the RETURN TWOSAFE
service.
When a DISABLE RETURN
value is set, the database keeps track of the number of return receipt or return twosafe transactions that have exceeded the timeout period set by RETURN WAIT TIME
. If the number of timeouts exceeds the maximum value set by DISABLE RETURN
, the application reverts to a default replication cycle in which it no longer waits for the standby to acknowledge the replicated updates.
Specifying SUBSCRIBER
is the same as specifying ALL
. Both settings refer to the standby database.
The DISABLE RETURN
failure policy is only enabled when the replication agent is running. If DISABLE RETURN
is specified but RESUME RETURN
is not specified, the return services remain off until the replication agent for the database has been restarted.You can cancel this failure policy by stopping the replication agent and specifying DISABLE RETURN
with a zero value for NumFailures
. The count of timeouts to trigger the failure policy is reset either when you restart the replication agent, when you set the DISABLE RETURN
value to 0, or when return service blocking is re-enabled by RESUME RETURN
.
When we say return service blocking is "disabled," we mean that the applications on the master database no longer block execution while waiting to receive acknowledgements from the subscribers that they received or committed the replicated updates. Note, however, that the master still listens for an acknowledgement of each batch of replicated updates from the standby database.
You can establish a return service recovery policy by setting the RESUME RETURN
attribute and specifying a resume latency value. When this attribute is set and return service blocking has been disabled for the standby database, the return receipt or return twosafe service is re-enabled when the commit-to-acknowledge time for a transaction falls below the value set by RESUME RETURN
. The commit-to-acknowledge time is the latency between when the application issues a commit and when the master receives acknowledgement of the update from the subscriber.
The RESUME RETURN
policy is enabled only when the replication agent is running. You can cancel a return receipt resume policy by stopping the replication agent and then using ALTER ACTIVE STANDBY PAIR
to set RESUME RETURN
to zero.
You can set the DURABLE COMMIT
attribute to specify the durable commit policy for applications that have return service blocking disabled by DISABLE RETURN
. When DURABLE COMMIT
is set to ON
, it overrides the DurableCommits
general connection attribute on the master database and forces durable commits for those transactions that have had return service blocking disabled.
If the replication scheme is configured with RETURN SERVICES ON WHEN REPLICATION STOPPED
, the replication agent must be running to enforce the DURABLE COMMIT
policy.
When you are using the return twosafe service, you can specify how the master replication agent responds to timeouts by setting LOCAL COMMIT ACTION
. You can override the setting for specific transactions by calling the localAction
parameter in the ttRepSyncSet
procedure.
The possible actions upon receiving a timeout during replication of a twosafe transaction are:
COMMIT
- Upon timeout, the replication agent on the active database commits the transaction and no more operations are allowed in the transaction.
NO ACTION
- Upon timeout, the replication agent on the active database does not commit the transaction. The process recovery commits the transaction. This is equivalent to a forced commit.
If you are replicating over a low-bandwidth network, or if you are replicating massive amounts of data, you can set the COMPRESS TRAFFIC
attribute to reduce the amount of bandwidth required for replication. The COMPRESS TRAFFIC
attribute compresses the replicated data from the database specified by the STORE
parameter in the CREATE ACTIVE STANDBY PAIR
or ALTER ACTIVE STANDBY PAIR
statement. TimesTen does not compress traffic from other databases.
Though the compression algorithm is optimized for speed, enabling the COMPRESS TRAFFIC
attribute has some impact on replication throughput and latency.
Example 3-1 Compressing traffic from an active database
For example, to compress replicated traffic from active database dsn1
and leave the replicated traffic from standby database dsn2
uncompressed, the CREATE ACTIVE STANDBY PAIR
statement looks like:
CREATE ACTIVE STANDBY PAIR dsn1 ON "machine1", dsn2 ON "machine2" SUBSCRIBER dsn3 ON "machine3" STORE dsn1 ON "machine1" COMPRESS TRAFFIC ON;
Example 3-2 Compressing traffic from both master databases
To compress the replicated traffic from the dsn1
and dsn2
databases, use:
CREATE ACTIVE STANDBY PAIR dsn1 ON "machine1", dsn2 ON "machine2" SUBSCRIBER dsn3 ON "machine3" STORE dsn1 ON "machine1" COMPRESS TRAFFIC ON STORE dsn2 ON "machine2" COMPRESS TRAFFIC ON;
If you do not assign a PORT
attribute, the TimesTen daemon dynamically selects the port. When ports are assigned dynamically in this manner for the replication agents, then the ports of the TimesTen daemons have to match as well.
When statically assigning ports, it is important to specify the full host name, DSN and port in the STORE
attribute of the CREATE ACTIVE STANDBY PAIR
statement.
You can establish a threshold value that, when exceeded, sets an unavailable standby database or a read-only subscriber to the Failed
state before the available log space is exhausted.
Set the log threshold by specifying the STORE
clause with a FAILTHRESHOLD
value in the CREATE ACTIVE STANDBY PAIR
or ALTER ACTIVE STANDBY PAIR
statement. The default threshold value is 0, which means "no limit."
If an active database sets the standby database or a read-only subscriber to the Failed
state, it drops all of the data for the failed database from its log and transmits a message to the failed database. If the active replication agent can communicate with the replication agent of the failed database, then the message is transmitted immediately. Otherwise, the message is transmitted when the connection is reestablished.
Any application that connects to the failed database receives a tt_ErrReplicationInvalid
(8025) warning indicating that the database has been marked Failed
by a replication peer. Once the database has been informed of its failed status, its state on the active database is changed from Failed
to Stop
.
An application can use the ODBC SQLGetInfo
function to check if the database the application is connected to has been set to the Failed
state.
If a replication host has more than one network interface, you may wish to configure replication to use an interface other than the default interface. Although you must specify the host name returned by the operating system's hostname
command when you specify the database name, you can configure replication to send or receive traffic over a different interface using the ROUTE
clause.
The syntax of the ROUTE
clause is:
ROUTE MASTER FullDatabaseName SUBSCRIBER FullDatabaseName {{MASTERIP MasterHost | SUBSCRIBERIP SubscriberHost} PRIORITY Priority} [...]
In the context of the ROUTE
clause, each master database is a subscriber of the other master database and each read-only subscriber is a subscriber of both master databases.
Example 3-4 Configuring multiple network interfaces
If host machine1
is configured with a second interface accessible by the host name machine1fast
, and machine2
is configured with a second interface at IP address 192.168.1.100, you may specify that the secondary interfaces are used with the replication scheme.
CREATE ACTIVE STANDBY PAIR dns1, dsn2 ROUTE MASTER dsn1 ON "machine1" SUBSCRIBER dsn2 ON "machine2" MASTERIP "machine1fast" PRIORITY 1 SUBSCRIBERIP "192.168.1.100" PRIORITY 1 ROUTE MASTER dsn2 ON "machine2" SUBSCRIBER dsn1 ON "machine1" MASTERIP "192.168.1.100" PRIORITY 1 SUBSCRIBERIP "machine1fast" PRIORITY 1;
Alternately, on a replication host with more than one interface, you may wish to configure replication to use one or more interfaces as backups, in case the primary interface fails or the connection from it to the receiving host is broken. You can use the ROUTE
clause to specify two or more interfaces for each master or subscriber that are used by replication in order of priority.
If replication on the master host is unable to bind to the MASTERIP
with the highest priority, it will try to connect using subsequent MASTERIP
addresses in order of priority immediately. However, if the connection to the subscriber fails for any other reason, replication will try to connect using each of the SUBSCRIBERIP
addresses in order of priority before it tries the MASTERIP
address with the next highest priority.
Example 3-5 Configuring network priority
If host machine1
is configured with two network interfaces at IP addresses 192.168.1.100 and 192.168.1.101, and host machine2
is configured with two interfaces at IP addresses 192.168.1.200 and 192.168.1.201, you may specify that replication use IP addresses 192.168.1.100 and 192.168.200 to transmit and receive traffic first, and to try IP addresses 192.168.1.101 or 192.168.1.201 if the first connection fails.
CREATE ACTIVE STANDBY PAIR dns1, dns2 ROUTE MASTER dsn1 ON "machine1" SUBSCRIBER dsn2 ON "machine2" MASTERIP "192.168.1.100" PRIORITY 1 MASTERIP "192.168.1.101" PRIORITY 2 SUBSCRIBERIP "192.168.1.200" PRIORITY 1 SUBSCRIBERIP "192.168.1.201" PRIORITY 2;
An active standby pair replicates an entire database by default. Use the INCLUDE
clause to replicate only the tables, cache groups and sequences that are listed in the INCLUDE
clause. No other database objects will be replicated in an active standby pair that is defined with an INCLUDE
clause. For example, this INCLUDE
clause specifies three tables to be replicated by the active standby pair:
INCLUDE TABLE employees, departments, jobs
You can choose to exclude specific tables, cache groups or sequences from replication by using the EXCLUDE
clause of the CREATE ACTIVE STANDBY PAIR
statement. Use one EXCLUDE
clause for each object type. For example:
EXCLUDE TABLE ttuser.tab1, ttuser.tab2 EXCLUDE CACHE GROUP ttuser.cg1, ttuser.cg2 EXCLUDE SEQUENCE ttuser.seq1, ttuser.seq2