Thursday, July 7, 2016

Oracle GoldenGate Architecture

Introduction: GoldenGate is the Logical Replication solution provided by Oracle that can be used to replicate data across heterogeneous databases.

Overview of the Golden Gate Architecture:

Oracle GoldenGate provides the following methods for replication across multiple heterogeneous databases
  1. Initial Load: This method of GoldenGate replication supports the static extraction of the data records from one database and the loading of those records to another database (s).
  2. Change Synchronization: This  method supports the continuous extraction and replication of transactional DML as well as DDL changes (applies to supported databases) across source and target databases to keep them consistent or in sync.
Apart from the above two methods of replication, GoldenGate also provide means to extract data from a source database and replicate them to external files (of user’s choice) outside the database.
Note: Initial Load (as the name suggests) is primarily used at the beginning of a replication setup (prior to setting up a Change Synchronization Replication Environment) to assure that the source and target databases are in sync prior to replication setup. Initial Load method can also be used on top of the change synchronization method to address any inconsistencies between source and target objects.


Following diagram depicts the complete architecture of a GoldenGate Replication Process
golden_gate_architecture

Oracle GoldenGate accomplishes the task of replication with the help of following integral components.
  • Extract
  • Data Pump
  • Collector
  • Replicat
  • Trails or Extract Files
  • Checkpoint (files and tables)
  • Manager

Brief about the GoldenGate Components:


Extract

This is a GoldenGate process (server process) on the source system and provides the functionality of capturing database changes from the source database.
Extract can be configured for any of  the following purposes:
  • Initial Load: For the Initial Load method of replication, extract captures a static set of data directly from the source table or objects.
  • Change Synchronization: In this method of replication, extract process continuously captures data (DML and DDL) from the source database to keep the source and target database in consistent state of replication and it is sole method to implement continuous replication between source and target database . Extract should be configured for the change synchronization following the initial load (To make sure source and target database are in sync prior to starting with the continuous replication).
The data source of the extract process could be one of the following
  • Source table (if the extract is configured for initial load)
  • The database transaction logs or recovery logs such as (Oracle Redo Logs, Oracle Archive Logs, or SQL audit trails or Sybase transaction logs) depending on the type of source database.
  • Third party capture module can also be used to extract transactional data from source database. In this method the data and metadata from and external API is passed to the extract API.
Extract captures changes from source database based on the extract configuration (contains the objects to be replicated from the source database). It is important to note that Extract captures only the committed transaction from its data source.
Multiple extract processes can be configured on a source database to operate on same/different source objects.
The extract performs either of the following tasks after extracting the data/records from the source database objects.
  • Delivers the the data extracted from source to the target server Trail Files through collector process
  • Writes the data extracted from source on to the Local Trail Files (would be discussed in coming sections) on source system

Optionally, Extract can also be configured to perform data filtering, transformation and mapping while capturing data and or before transferring the data to the target system.

DataPump

This is a optional GoldenGate process (server process) on the source system and comes into picture when the extracted data from source is not directly transferred to the target Trail Files. In the DataPump setup, the extract process gets the records/data from source and keeps it in local file system by means of local Trail Files. The DataPump acts as a secondary extract process where it reads the records from Local Trail Files and delivers to the Target system Trail files through collector.
Optionally, DataPump can also be configured to perform data filtering, transformation and mapping before transferring the data to the target system.

Collector

Collector is a server process that runs in the background on the target system in a GoldenGate replication setup where extract is configured for continuous Change Synchronization.
Collector has the following roles to perform in the GoldenGate replication.
  • When a connection request is sent from the source extract, the collector process on the target system scan and map the  requesting connection to available port and send the port details back to the manager for assignment to the requesting extract process.
  • Collector receives the data sent by source extract process and writes them to Trail Files on the target system.
There is one collector process on the target system per one extract process on the source system, i.e it is a one to one mapping between extract and collector process.

Replicat

The Replicat (note it is not replicate) process runs on the target system and is primarily responsible for replicating the extracted data delivered to the target trail files by the source extract process.
The replicat process scans the Trail Files on the target system, generates the DDL and DML from the Trail Files and finally applies them on to the target system.
Replicat has the following two types of configuration which relates to the type of extract being configured on the source system.
  • Initial loads: In initial data loads configuration, Replicat can apply a static data copy which is extracted by the Initial load extract to target objects or route it to a high-speed bulk-load utility.
  • Change synchronization: In change synchronization configuration, Replicat applies the continuous stream of data extracted from the source objects to the target objects using a native database interface or ODBC drivers, depending on the type of the target database.

Optionally, Replicat can also be configured to perform data filtering, transformation and mapping before applying the transaction on to the target database

Trail or Extract Files

 Trails or Extract Files are the Operating system files which GoldenGate use to keep records extracted from the source objects by the extract process. Trail files can be created on the source system and target system depending on the GoldenGate replication setup. Trail Files on the source system are called Extract Trails or Local Trails and on the target system called as Remote Trails.
By using trail GoldenGate minimize load on the source database as once the transaction logs/online logs/redo logs/ archive logs are extracted and loaded by the extract process to trail files, all the operations like filtering, conversions, mapping happens out of the source database. Use of trail file also make the extraction and replication process independent of each other.

Checkpoint

Checkpoint acts  as synchronization mechanism between source and target system in GoldenGate replication.  Checkpoint stores the current read and write position of a GoldenGate process to disks in order to ensure data consistency and integrity.
Following are the type of checkpoints maintained by GoldenGate
The extract process on the source system creates checkpoint for its read position in the data source (transaction logs) and  write position in the Local Trail files.
Extract Keeps a  current checkpoint which is the position in the transaction logs of the last record being read by extract (current read position from transaction log)  as well as a recovery checkpoint which is the position in the transaction log of the record containing the oldest transaction not yet processed by Extract. These checkpoints ensures that Extract is capturing the correct records from data source.
Extract also keeps a write checkpoint of  the Local trail. This checkpoint is the position in the trail where extract is currently writing the extracted records from data source. This checkpoint ensures that Extract is writing the correct records to the Trail Files.
The Replicat process on the target system keeps a checkpoint of the last read position of the Target Trail file i.e. checkpoint of the last record being replicated to the target database from the Remote Trail. This checkpoint ensures that correct record is replicated to the target database.
The checkpoints help in data synchronization, avoiding data redundancy as well as data recovery at the event of network or target failures.
The checkpoints are maintained by means of checkpoint files. However, in case of Replicat; checkpoints can also be maintained in user defined table on the target database.

Manager

Manager  can be considered as the parent process in a GoldenGate replication setup on both source and target system. Manger controls, manages and maintains  the functioning of other GoldenGate processes and files. A manager process is responsible for the following tasks.
  • Starting up Oracle GoldenGate processes
  • Maintaining port number for processes
  • Starting up dynamic processes
  • Performing GoldenGate Trail Management
  • Creating events, errors and threshold report.

No comments:

Post a Comment