SQL Server Replication-
The Crib Sheet
For things you need to know rather than the things you want to know
Contents
Introduction
Replication is intended to be a way of distributing
data automatically from a source database to one or more recipient
databases. As such, it can have obvious uses in a distributed system. It
has also been used to implement high-availability systems. It is not
useful for one-off synchronization, or for simply copying data. It is
intended as a long-term data relationship between databases. Typical
uses are in:
- Data warehousing and reporting
- Integrating data from several, possibly only partially connected, sites
- Improving scalability and availability
- Integrating heterogeneous data from other databases via OLE DB,
integrating data from mobile users, getting data to and from
Point-Of-Sale systems
- Offloading/delegating batch processing tasks.
Examples of the use of replication within an application could be
- Distributing data 'owned' by a particular application to other
applications that are 'consumers' of that data. (For example, sales
records to reporting services, manufacturing stock levels to purchasing
systems.)
- Creating several instances of a database to distribute load on it
- Updating a central database with information from a number of
remote Laptops that might be only partially connected, and to
resynchronise the laptops.
There are three methods of replication: Snapshot, Transactional, and
Merge. These are provided to try to meet the wide range of business
requirements for replication.
Replication uses a 'Magazine Publishing' vocabulary. Anything from an
'Article' to an entire database can be replicated. An Article is the
smallest component of distribution, and can be a table, procedure or
function. If it is a table, then a filter can be applied to it so that
only certain rows or columns are replicated. This 'Magazine Publishing'
analogy can be confusing because, in replication, a Subscriber can
sometimes make updates, and a Publisher usually sends out incremental
changes to the articles in a publication.
A 'Publisher' maintains the original copy of the data. It holds the
definition of the 'Publication', which defines the 'articles' that are
to be 'published'. (The database with the original location of the data
determines what is to be distributed).
A 'Subscriber' receives the articles from a publisher. It can
subscribe to one or more publications. Any database can take on either
role or even both roles at once.
A Distributor is a specialist database that runs the 'Replication agents'.
Replication is not part of the SQL Server engine, but an external
application. This makes it much easier to involve other database systems
in replication. Any SQL Server database, or other database system with
an OLE DB provider, can be a publisher or subscriber in snapshot or
transactional replication.
It is essential to plan out the replication in detail as a first
stage, and to be very certain of the type of replication you wish to
implement. A common mistake is to use replication in cases where a much
less complex solution is possible.
A problem with production systems using replication is the difficulty
of restoring the topology after a disaster.. This requires a fully
documented recovery strategy, which has to be periodically tested and
practiced. This means that the whole replication topology and
configuration must be scripted so it can be re-created in an emergency,
even if the system was originally built using the GUI tools. The Object
Browser (or Enterprise Manager) makes the initial deployment relatively
simple to do, and there are plenty of step-by-step guides, but it is not
the best option when trying to restore an existing topology, and
settings, in order to achieve a recovery from major failures.
Problems often follow from developers adding or dropping articles,
changing publication properties, and changing schema on published
databases. It is therefore best to create the replication once the
design of the publisher is relatively stable.
Replication topologies
In most topologies, it makes sense for publishers, distributors, and subscribers to be on separate physical hardware.
Central Publisher
The commonest form of replication is to have a single
publisher with the source data, with one or more subscribers. The
Distributor database can be on the same, or preferably different,
server.
Central Subscriber
Often, where the data from several databases need to
be 'warehoused' centrally in an OLAP or reporting database, one will
find a single 'reporting' database subscribing to several publications.
One can come across other topologies such as
'bi-directional' and 'peer to peer' which are really special cases of
Central Publisher or Central Subscriber and use transactional
replication
Publishing Subscriber
The distribution of data can be relayed to other
subscribers via a publishing subscriber. This allows replication to be
implemented over low-bandwidth WANs to a subscriber that, in turn,
distributes it to other servers within its high-bandwidth LAN
Replication Methods
How Replication Works
Replication begins with the initial synchronization of the published
objects between the Publisher and Subscribers, using a snapshot. A
snapshot is a copy of all of the objects and data specified by a
publication. After the snapshot is created on the publisher, it is
delivered to the Subscribers via the distributor.
For Snapshot replication, this is sufficient. For other types of
replication, all subsequent data changes to the publication flow to the
Subscriber as they happen, in a queue, or on request.
Snapshot Replication
The snapshot replication process provides the initial
synchronization for transactional and merge publications. However, in
several cases, this initial synchronization is all that is necessary.
This would include circumstances where data hardly changes, or if the
latest version of the data is not essential to the subscriber, where the
amount of data is small, or if a large number of changes takes place
rapidly.
Snapshot replication involves copying the articles
that make up the publication. Normally, if they exist already on the
subscriber, they are over-written, though this behavior can be changed.
Snapshot replication is more expensive in terms of overhead and network
traffic and only takes place at intervals. Because locks are held during
snapshot replication, this can impact other users of the subscriber
database. It is therefore more suitable for static data and
enumerations. In SQL Server 2005, several articles can be processed in
parallel, and interrupted snapshots can be recommenced from the point of
interruption. Snapshots can be queued or immediate.
Data changes are not tracked for snapshot
replication; each time a snapshot is applied, it completely overwrites
the existing data.
Transactional Replication
Transactional replication is used if:
- Changes to the data must be propagated immediately
- The database application taking out a subscription needs to react to every change
- The Publisher has a very high volume of insert, update, and delete activity
- The Publisher or Subscriber is a different database application reached via OLE DB.
Essentially, Transaction replication distributes data
in one direction, but transactional replication does offer options that
allow updates at the Subscriber. Once a snapshot replication has
synchronized the subscribers with the publisher, all committed
transactions on the publisher are then propagated to the subscribers in
sequence, via distributed transactions. One can select a queued update
or immediate, depending on requirements.
Peer-to-peer Replication
This is a special type of transactional replication
in which every participant is both a publisher and subscriber (2005
Enterprise only) and is most useful for up to ten databases in a
load-balancing or high-availability group.
Bidirectional Replication
This is where two databases replicate the same
articles to each other via a distributor. There must be loopback
detection. Data conflicts aren't handled and the replication must be
implemented in code, since the GUI doesn't support it.
Transactional replication tracks changes through the SQL Server transaction log
Merge Replication
Merge replication allows various sites to work autonomously and later merge updates into a single, uniform result.
Merge Replication is complex, but provides the means
to implement part of a high-availability system, as well as its original
purpose of serving mobile and disconnected users. It is designed for
cases where the publishers are not in constant communication with the
subscribers. After the initial snapshot synchronization, subsequent
changes are tracked locally with triggers, and the databases are merged
when in contact, using a series of rules to resolve all possible
conflicts.
Merge replication is used when several Subscribers
might need to update the same data at various times and propagate those
changes back to the Publisher and thence to other Subscribers. It is
also required in applications that involve Subscribers receiving data,
making changes offline, and finally reconnecting with the publisher to
synchronize changes with the Publisher and other Subscribers.
To make this possible, each Subscriber requires a
different partition of data and there has to be a set of rules to
determine how every conflict that takes place in the update of the data
is detected and resolved. These conflicts occur when the data is merged
because there can be no 'locking' and so the same data may have been
updated by the Publisher and by more than one Subscriber.
Merge Replication does not use transactions. Merge
replication uses a set of conflict-resolution rules to deal with all the
problems that occur when two databases alter the same data in different
ways, before updating the subscribers with a 'consensus' version. It
normally works on a row-by-row basis but can group rows of related
information into a logical record. One can specify the order in which
'articles' are processed during synchronisation.
Merge replication tracks changes through triggers and metadata tables.
Replication Agents
Replication is done by several different agents, which
are separate applications each responsible for part of the process. The
replication agents should not be run under the SQL Server Agent account
in a production system. Instead, they need the minimal permissions
necessary to perform their function.
SQL Server Agent
This manages the overall replication process via SQL Server Agent jobs.
The Snapshot agent
Snapshot.exe executes on the
Distributor. It extracts the schema and data defined by the publication,
which is then sent to the subscriber via a 'snapshot folder'. It also
updates status information on the distribution database. . It is used in
all forms of replication
The Log Reader Agent
LogRead.exe is used in transactional
replication to extract relevant committed transactions from the
publisher's log, repackage them and send them to the distributor in the
correct sequence.
Distribution Agent
Distrib.exe takes the snapshots, and log entries from the agents we've described, and dispatches them to the subscribers.
Merge Agent
ReplMer.exe is used only in Merge
Replication to send a snapshot when the subscriber is initialized, and
also exchanges transactions between publisher and subscriber
Queue Reader Agent
QrDrSvc.exe is used to queue the updates in transactional or snapshot replication when queuing has been specified.
Monitoring Replication
Many problems associated with replication can be
avoided by .regular checks. The most obvious check is to make sure that
the data has been transferred as expected. Periodic checks with SQL
Compare and SQL Data Compare can be very useful in addition to the tools
that come with Replication. Additionally the replication processes and
jobs need to be checked to make sure they are working.
Checking throughput
The performance of replication must be regularly monitored, and performance-tuned as necessary.
The Replication Monitor is used to check on the
operational state of publications, and inspect the history, and errors.
Right-clicking the replication node in Object Explorer will gain access
to it.
One of the most important concerns is the time delay,
or latency, of transactions from the publications appearing in the
subscriber database. At times of high transaction throughput on the
publisher database, bottlenecks can occur. Whereas the stored procedure
sp_browseReplCmds on the distribution database can tell you how far
behind the synchronisation is at any particular time, one cannot
determine where the problems lies just from the data. Tracer tokens are
now used to measure the actual throughput of the replication
architecture at any particular time to help diagnose such bottlenecks.
Validating
There is always an element of doubt as to whether the
replication has entirely worked. There are stored procedures provided
to compare the 'articles' on the publisher and subscribers to make sure
they are the same.
The sp_publication_validation stored procedure
validates the data associated with each article by calling
sp_article_validation (after the articles associated with a publication
have been activated). The sp_article_validation stored procedure invokes
sp_table_validation stored procedure, which calculates the number of
lines and, optionally, the checksum of the published table. It is
considered good practice to perform a daily row-count and weekly
checksum. SQL Data Compare is ideal for mending a broken replication.
The Distribution Agent raises the '20574' system
message if validation fails, or the '20575' system message if it passes.
The Distribution Agent will replicate changes to a subscriber even if
the validation shows that the subscriber is out of synchronization. It
is a good policy to configure the Replication Alert on the '20574'
message so as to send E-Mail, Pager, or Network notification.
This validation approach will only work within
certain restrictions. For example, it will not work if certain filters
have been applied. They should be used with caution.
Changing the settings
It is best to use the default replication settings
unless there are clear performance gains to be made, or if the
application design forces the issue. However, one cannot assume that the
changes will be generally beneficial to the entire topology without
comprehensive testing.
Articles
Articles are the smallest unit of a publication. An
article can be a table, view, stored Procedure or function. Where an
article is based on a table or view, it can contain all the data or just
part of it. These filters of two types.: More common are the static
'WHERE' clauses, but filters can be used dynamically in Merge
Replication to publish different 'content' (rows) to different
'subscribers' (databases receiving data). These latter Filters are
called 'Dynamic' and can be simple Row Filters, or Join Filters, where
the selection of rows to publish is based on a join with other tables,
rather than a simple WHERE clause.
Normally, any alteration to an article that is a table is propagated
to all the subscribers. You can also opt to propagate schema objects
associated with the article such as indexes, constraints, triggers,
collation and extended properties.
Updating articles
In Merge replication, the subscriber can update the
article. This is, of course, a recipe for conflict, and these have to be
resolved automatically. When the Merge Agent comes across a row that
might have changed recently, it examines the history or 'lineage' of
each site's version of the row to see if there is a conflict. If so,
then the update that is finally used. Has to be based on either
- A "first wins" resolution,
- a user-specified priority scheme to determine the update to select,
- a customised resolution, using COM and stored procedures.
The 'lineage' is a history of changes in a table row
in MSmerge_contents, which is maintained automatically when a user
updates a row. Each column contains one entry for each site that has
updated the row.
Conflicts to the data in the base table can occur
within a column or a row. Most usual are column-tracked articles this
means that, within any row, updates are only recognized as conflicts if
the same column is updated by more than one subscriber. Occasionally,
however, the business rules of the application may treat simultaneous
changes to the any column within the row as a conflict, in which case
row-level tracking is used.
Programming Replication Topologies
Replication agents and replication topologies can be
administered and monitored remotely via SQL Scripts or RMO scripts. The
task of building and maintaining replication topologies is made much
easier of all parts of the deployments are scripted, even if this is
done after the other methods such as wizards or RMO are used for the
initial operation.
There are other uses for a replication script. It will be required in
end-user application where, for example, such as s a pull subscription
is synchronized when the user clicks a button, or where a routine
administration task such as monitoring replication throughput is
performed from a custom console. It is also generally used for writing
customized business rules that are executed when a merge subscription is
synchronized.
One can use the Object Explorer in SSMS, or the Enterprise Manager in
earlier versions of SQL Server, to set up replication. BOL provide
worked examples. Alternatively, RMO can be used with VB or C# to script
the replication. Whatever system you use, it is a good idea to use the
Transact SQL script as the reference for the replication topology you
create
Ultimately, the functionality in a replication topology is provided
by system stored procedures. The easiest approach is to use Transact-SQL
script files to perform a logical sequence of replication tasks,
because it provides a permanent, repeatable, copy of the steps used to
deploy the replication topology that can, for example, be used to
configure more than one subscriber. It also provides a measure of
documentation and disaster-recovery. A script can be stored as a query
object in a SQL Server Management Studio project.
Replication scripts can be created by hand, by the script generation
of the replication wizards in SQL Server Management Studio, or by using
Replication Management Objects (RMOs) to programmatically generate the
script to create an RMO object.
When creating scripts to configure replication, it is best to use
Windows Authentication so as to avoid storing security credentials in
the script file. Otherwise you must secure the script file
Further reading:
SQL Server ReplicationFor the details of implementing replication, all the steps are documented here in
Implementing Replicationdetails on configuring and maintaining a replication are here
Configuring and Maintaining replicationFor details on peer-to-peer replication, read
Peer-to-Peer Transactional ReplicationSome of the wider issues and dangers of replication are discussed here
Data Replication as an Enterprise SOA Antipattern, in the excellent
Microsoft Architecture Journal