Overview

Plenty of software uses a database to store data. There are many types of database implementations but generally two main branches exist - SQL based relation database management systems (RDBMS) and file based database.

One of the RDBMS is an Interbase. The Interbase is powerful transaction SQL database engine that implements many advanced features - views, triggers, stored procedures, generators, etc. Interbase was developed by Borland as commercial database but version 6.0 was released as open-source. It means that it's possible use it for-free for any purposes. On the open-source database was founded new branch named Firebird. Even Borland released next Interbase v7 as non open-source, the Firebird clone remains open-source.

Even the Interbase architecture is robust and powerful, comparable to other concurrent, one feature limits it - there is not implemented replication engine such as in Oracle, MSSQL or even MSAccess.

In many cases data of an enterprise are not stored in one database but they are distributed to more locations. Replication engine try to maintain data in each of databases as synchronous as possible. To be replication efficient, the engine catches only data changes and passes them to the other databases.

There are many reasons why to split data:

  • non-reliable connection to main database
  • security, shadow database in DM zone
  • data backup to remote shadow database at different place
  • external database on laptops for representatives
  • etc.

There are many replication models because two enterprises will have probably different requirements.

Lets see several examples.

There are main database and shadow (read-only) database that serves as backup. When the main database crashes, shadow database is activated as main one.

There are more equivalent databases. Data should be the same in each of them.

Enterprise maintains net of agents that have database at personal laptops. They can change records (make orders, update information). And because they are competitors they can see only own data.

Enterprise must consolidate data daily from subsidiaries.

So there are many questions:

  • how often replicate?
  • does exist reliable and fast online connection?
  • replicate all tables?
  • replicate all records in table?
  • uni or bidirectional?

Because replicator must meet all contradictory requirements, changes fire triggers that log them in source database and then they are replicated to targets. It means that are transferred only changes. Due to referential integrity it's necessary replicate in the same sequential order as the changes occurred. If new records may be inserted in more locations, primary keys must be unique - every record must be identifiable by unique primary key. The logging brings overhead and the source database growths, so replication log records are deleted when are successfully replicated. You can configure what fields values log - generally primary key values must be logged mandatory, NOT NULL and foreign key fields recommended and remaining values optionally.

Let's suppose that the replicator connects to the source database, looks to change log and if there are some records connects to the target database and transfers the changes. One replicator is connected to both databases in one time, hence it's online replication. But what happen if connection is broken before transfer is finished? Nothing dangerous, change log record is removed when change is committed in target database - it simulates two phase commit. But when an constraint problem occurs record is moved to other log where are stored conflicting records. The replicator cannot solve such problem automatically, it must be solved manually, hence the log is named manual log.

How is possible replicate when one of databases is unavailable from opposite location? It is hidden behind the firewall, machine where is running connects via dial-up or connection is unreliable. The solution is offline replication. In this case the first replicator is connected to source database and prepares packages containing replication data and passes them to a shared drop-off location where are left untouched until the second replicator reads the packages from the shared location and applies data to the target database. The offline replication uses robust four-phase acknowledge protocol and stores package transfer history if transfer log.

In ideal world all data are correctly replicated. But in real world from time to time happen an exception or an accident (hard disk crashes, replication integrity corrupts, etc.). Even no crash happen there is one exception - start of replication, because you must setup data to be synchronous before replication is done. Who can manage databases to be ready for replication? Answer is a synchronization. The replication transfers only changes whereas the synchronization compares databases, table to table, record to record. This process is of course much more time consuming especially for large databases. Note that synchronization is always online.

Now let's suppose that there is an table containing very important data and somebody intentionally rewrites it. This problem solves history - replicator keep record history in history log. Unlike the replication the history is not replicated to other database.

Another possibility how to keep record history is incremental SQL log. It's a plain text file containing sequential list of SQL commands. Note even Interbase data manipulation language does not support BLOB values, the incremental log contains also BLOB metadata. Since it is plain SQL text file it can be used to feed changes to any SQL database, e.g. you can replicate changes to MySQL. Both the online and the offline methods are supported.

If data are changed at both sides data inconsistency can occur, when corresponding record is changed in the source database and the target database. A field/record conflict happened. The replicator is able to recover conflict automatically according or to pass conflict to the manual log. The recovery is configurable for every relation.

All the configuration data, i.e. schemes are stored to a separate configuration database. You can prepare it once and use for all locations, does not matter if you access it online or make physical copy. It simplifies administration significantly.

Features

  • the Interbase and Firebird (1.0, 1.5) databases supported
  • the SQL Dialect 1 and 3 supported
  • the offline or online synchronization
  • FTP, email or shared net dir transfer for offline replication, open source transfer libraries
  • the offline packages can be encoded for improved security or compressed, open source interface, CRC checking
  • visual replication manager (scheme editor)
  • custom configurable history of records
  • the replication server running as a GUI application or a console/NT service application
  • multi-threading supported
  • multi-segment primary key supported
  • conditional replication and synchronization
  • extended conflict checking
  • replication into tables or stored procedures
  • replication actions and errors are logged into logs
  • replication statistics
  • international support - optional UTC time stamps, Unicode/UTF-8 encoding
  • source/target database recovery from daily backup and incremental backup
  • SDK - a developer can implement full replication server functionality to own software
  • password encryption
  • database backup/restore
  • database cloning
  • replication of external files (linked from database using file name)

Features not yet implemented

  • replication of array fields (not planned)
  • not tested for Asian code pages
  • replication of structure (DDL)
  • record history viewer and SDK
  • GUI driven database comparer (dbdiff)
  • custom installer NSIS support

Requirements

  • installed Interbase/Firebird client
  • the SQL server must support CommitRetaining and RollbackRetaining (Interbase v6)
  • Windows 9x/ME/NT/2000/XP or Linux at x86 platform