Interview Questions on SQL Server Replication

0
Print Friendly, PDF & Email

1) What is SQL Server replication?

Replication is subset of SQL Server that can move data and database objects in an automated way from one database to another database. This allows users to work with the same data at different locations and changes that are made are transferred to keep the databases synchronized.

2) What are the different types of SQL Server replication?

  • Snapshot replication – As the name implies snapshot replication takes a snapshot of the published objects and applies it to a subscriber. Snapshot replication completely overwrites the data at the subscriber each time a snapshot is applied. It is best suited for fairly static data or if it’s acceptable to have data out of sync between replication intervals. A subscriber does not always need to be connected, so data marked for replication can be applied the next time the subscriber is connected. An example use of snapshot replication is to update a list of items that only changes periodically.
  • Transactional replication – As the name implies, it replicates each transaction for the article being published. To set up transactional replication, a snapshot of the publisher or a backup is taken and applied to the subscriber to synchronize the data. After that, when a transaction is written to the transaction log, the Log Reader Agent reads it from the transaction log and writes it to the distribution database and then to the subscriber. Only committed transactions are replicated to ensure data consistency. Transactional replication is widely applied where high latency is not allowed, such as an OLTP system for a bank or a stock trading firm, because you always need real-time updates of cash or stocks.
  • Merge replication – This is the most complex types of replication which allows changes to happen at both the publisher and subscriber. As the name implies, changes are merged to keep data consistency and a uniform set of data. Just like transactional replication, an initial synchronization is done by applying snapshot. When a transaction occurs at the Publisher or Subscriber, the change is written to change tracking tables. The Merge Agent checks these tracking tables and sends the transaction to the distribution database where it gets propagated. The merge agent has the capability of resolving conflicts that occur during data synchronization. An example of using merge replication can be a store with many branches where products may be centrally stored in inventory. As the overall inventory is reduced it is propagated to the other stores to keep the databases synchronized.
  • Peer to Peer replication – Peer to Peer replication is built on transactional replication. It maintains transactional replication between servers and allows publishers and subscribers to send data to each other. Thus participating servers are updated near real-time based on the scheduled frequency. Since we know the functionality of transactional replication and the role of the publisher, distributor and subscriber, let me explain peer-to-peer in terms of transactional architecture. In the peer-to-peer transactional replication, each node acts as publisher and subscriber to one another. As Server1 and Server2 are both a Publisher and Subscriber, when data are inserted/Updated/Deleted in any of the servers (node) the other one is updated through its publisher and subscriber.

3) What is the difference between Push and Pull Subscription?

  • Push – As the name implies, a push subscription pushes data from publisher to the subscriber. Changes can be pushed to subscribers on demand, continuously, or on a scheduled basis.
  • Pull – As the name implies, a pull subscription requests changes from the Publisher. This allows the subscriber to pull data as needed. This is useful for disconnected machines such as notebook computers that are not always connected and when they connect they can pull the data.

4) What are different replication agents and what’s their purpose?

  • Snapshot Agent– The Snapshot Agent is used with all types of replication. It prepares the schema and the initial bulk copy files of published tables and other objects, stores the snapshot files, and records information about synchronization in the distribution database. The Snapshot Agent runs at the Distributor.
  • Log Reader Agent – The Log Reader Agent is used with transactional replication. It moves transactions marked for replication from the transaction log on the Publisher to the distribution database. Each database published using transactional replication has its own Log Reader Agent that runs on the Distributor and connects to the Publisher (the Distributor can be on the same computer as the Publisher)
  • Distribution Agent – The Distribution Agent is used with snapshot replication and transactional replication. It applies the initial snapshot to the Subscriber and moves transactions held in the distribution database to Subscribers. The Distribution Agent runs at either the Distributor for push subscriptions or at the Subscriber for pull subscriptions.
  • Merge Agent – The Merge Agent is used with merge replication. It applies the initial snapshot to the Subscriber and moves and reconciles incremental data changes that occur. Each merge subscription has its own Merge Agent that connects to both the Publisher and the Subscriber and updates both. The Merge Agent runs at either the Distributor for push subscriptions or the Subscriber for pull subscriptions.
  • Queue Reader Agent – The Queue Reader Agent is used with transactional replication with the queued updating option. The agent runs at the Distributor and moves changes made at the Subscriber back to the Publisher. Unlike the Distribution Agent and the Merge Agent, only one instance of the Queue Reader Agent exists to service all Publishers and publications for a given distribution database.

5) Does a specific recovery model need to be used for a replicated database?

Replication is not dependent on any particular recovery model. A database can participate in replication whether it is in simple, bulk-logged, or full. However how data is tracked for replication depends on the type of replication used.

6) What type of locking occurs during the Snapshot generation?

Locking depends on the type of replication used:
a) In snapshot replication, the snapshot agent locks the object during the entire snapshot generation process.
b) In transactional replication, locks are acquired initially for a very brief time and then released. Normal operations on a database can continue after that. In merge replication, no locks are acquired during the snapshot generation process.

7) What options are there to delete rows on the publisher and not on the subscriber?

One option is to replicate stored procedure execution instead of the actual DELETE command. You can create two different versions of the stored procedures one on the publisher that does the delete and the other on the subscriber that does not do the delete. Another option is to not replicate DELETE commands.

8) Is it possible to run multiple publications and different type of publications from the same distribution database?

Yes this can be done and there are no restrictions on the number or types of publications that can use the same distribution database. One thing to note though is that all publications from a Publisher must use the same Distributor and distribution database.

9) Data is not being delivered to Subscribers, what can be the possible reasons?

There are a number of possible causes for data not being delivered to Subscribers:

  • The table is filtered, and there are no changes to deliver to a given Subscriber.
  • One or more agents are not running or are failing with an error.
  • Data is deleted by a trigger, or a trigger includes a ROLLBACK statement.
  • A transactional subscription was initialized without a snapshot, and changes have occurred on the Publisher since the publication was created.
  • Replication of stored procedure execution for a transactional publication produces different results at the Subscriber.
  • The INSERT stored procedure used by a transactional article includes a condition that is not met.
  • Data is deleted by a user, a replication script, or another application.

10) Explain what stored procedure sp_replcounters is used for?

Sp_replcounters is a system stored procedure that returns information about the transaction rate, latency, and first and last log sequence number (LSN) for each publication on a server. This is run on the publishing server. Running this stored procedure on a server that is acting as the distributor or subscribing to publications from another server will not return any data.

11) How will you monitor replication latency in transactional replication?

Tracer tokens were introduced with SQL Server 2005 transactional replication as a way to monitor the latency of delivering transactions from the publisher to the distributor and from the distributor to the subscriber(s). For details, please refer tip to this tip: Monitor SQL Server replication latency using tracer tokens.

12) If I create a publication with one table as an article, and then change the schema of the published table (for example, by adding a column to the table), will the new schema ever be applied at the Subscribers?

Yes. Schema changes to tables must be made by using Transact-SQL or SQL Server Management Objects (SMO). When schema changes are made in SQL Server Management Studio, Management Studio attempts to drop and re-create the table and since you cannot drop a published objects, the schema change will fail.

13) Is it possible to replicate data from SQL Server to Oracle?

Yes this can be done using heterogeneous replication. In SQL Server 2000, publishing data to other databases such as DB2 or Oracle was supported; however, publishing data from other databases was not supported without custom programming. In SQL Server 2005 and later versions, Oracle databases can be directly replicated to SQL Server in much the same way as standard SQL Server replication. For more information, please read tip: Introduction to SQL Server Heterogeneous Replication.

14) How will you monitor replication activity and performance? What privilege do you need to use replication monitor?

The easiest way to monitor replication activity and performance is to use replication monitor, but you can also use the below tools to monitor replication performance:

T-SQL commands. For more details refer msdn article – http://msdn.microsoft.com/en-us/library/ms147874.aspx

Microsoft SQL Server Management studio. For more details refer msdn article – http://msdn.microsoft.com/en-us/library/ms152763.aspx

To monitor replication, a user must be a member of the sys admin fixed server role at the Distributor or a member of the repl monitor fixed database role in the distribution database. A system administrator can add any user to the repl monitor role, which allows that user to view replication activity in Replication Monitor; however, the user cannot administer replication.

15) Can you tell me some of the common replication DMV’s and their use?

  • dm_repl_articles – Contains information about each article being published. It returns data from the database being published and returns a row for each object being published in each article.
  • dm_repl_schemas – Contains information about each table and column being published. It returns data from the database being published and returns one row for each column in each object being published
  • dm_repl_traninfo – Contains information about each transaction in a transactional replication

16) Why is primary key needed in Transactional replication?

The reason is in the subscriber, rows are updated/deleted one-by-one using primary key.
For example:
If you delete 100 rows in the publisher using a single DELETE statement, in the subscriber 100 DELETE statements would be executed.
— on publisher
DELETE FROM dbo.tbAddress WHERE City = ‘LONDON’
— on subscriber
DELETE FROM dbo.tbAddress WHERE pk = @pk

17) Which all database objects can be included in replication?

  • a)Tables
  • b) Partitioned Tables
  • c) Stored Procedures – Definition (Transact-SQL and CLR)
  • d) Stored Procedures – Execution (Transact-SQL and CLR)
  • e) Views
  • f) Indexed Views
  • g) Indexed Views as Tables
  • h) User-Defined Types (CLR)
  • User-Defined Functions (Transact-SQL and CLR)
  • j) k) Alias Data Types
  • l) Full text indexes
  • m) Schema Objects

18) What are prerequisites of transactional replication?

  • Primary Key: This is a basic rule that every article should have a Primary Key to be a candidate table for Transactional Replication. Primary keys are used to maintain uniqueness of records and to maintain referential integrity between tables, and that is why it is recommended for every article to have a primary key.
  • Securing snapshot folder:
  • Schedule:
  • Network bandwidth:
  • Enough disk space for database being published: We need to make sure that we have ample space available for the transaction log for the published database, as it will continue to grow and won’t truncate the log records until they are moved to the distribution database. Please note that even in simple recovery model, the log file can grow large if replication breaks. That is the reason it is recommended to set T-log’s auto grow option to value “true”. We should also make sure that the distribution database is available and log reader agent is running.
  • Enough disk space for distribution database: It is necessary to have enough disk space allocated to the distribution database. This is because the distribution database will store the transactions marked for replication until it is applied to the subscriber database within the limit of retention period of distribution (which is 72 hours by default), or it will retain the transactions until the snapshot agent re-runs and creates a new snapshot. re-runs and creates a new snapshot.
  • Use domain account as service account: We should always use the domain account as a service account, so that when agents access the shared folder of snapshot files, it won’t have any problem just because they are local to the system and do not have permission to access network share. While mentioning service account, we are asked to choose from two built-in accounts including Local System account, Network Services, and this account, wherein we have to specify the domain account on which the service account will run.

19) Define Distributor, Subscriber & Publisher

  • Publisher: The Publisher is a server that makes data available for replication to other servers. In addition to being the server where you specify which data is to be replicated, the Publisher also detects which data has changed and maintains information about all publications at that site. Usually, any data element that is replicated has a single Publisher, even if it may be updated by several Subscribers or republished by a Subscriber. The publication database is the database on the Publisher that is the source of data and database objects to be replicated. Each database used in replication must be enabled as a publication database either through the Configure Publishing and Distribution Wizard, the Publisher and Distributor properties, by using the sp_replicationdboption system stored procedure, or by creating a publication on that database using the Create Publication Wizard.
  • Distributor: The Distributor is a server that contains the distribution database and stores meta data, history data, and/or transactions. The Distributor can be a separate server from the Publisher (remote Distributor), or it can be the same server as the Publisher (local Distributor). The role of the Distributor varies depending on which type of replication you implement, and in general, its role is much greater for snapshot replication and transactional replication than it is for merge replication.
  • Subscribers: Subscribers are servers that receive replicated data. Subscribers subscribe to publications, not to individual articles within a publication, and they subscribe only to the publications that they need, not necessarily all of the publications available on a Publisher. If you have applications using transactional replication built with Microsoft® SQL Server™ version 6.5 or later, and those applications subscribe directly to articles instead of to publications, the applications will continue to work in SQL Server 2000. However, you should begin to migrate your subscriptions to the publication level where each publication is composed of one or more articles.

20) Define Article, Publication & Subscription.

  • Article: An article identifies a database object that is included in a publication. A publication can contain different types of articles, including tables, views, stored procedures, and other objects. When tables are published as articles, filters can be used to restrict the columns and rows of the data sent to Subscribers.
  • Publication: A publication is a collection of one or more articles from one database. The grouping of multiple articles into a publication makes it easier to specify a logically related set of database objects and data that are replicated as a unit.
  • Subscribe: A subscription is a request for a copy of a publication to be delivered to a Subscriber. The subscription defines what publication will be received, where, and when. There are two types of subscriptions: push and pull.

21) Can we add or drop a single article from a publication. If so, How?

It is not necessary to stop activity on the publication or subscription databases in order to add a table (or another object). Add a table to a publication through the Publication Properties – <Publication> dialog box or the stored procedures sp_addarticle and sp_addmergearticle. Remove a table from the publication using sp_droparticle, sp_dropmergearticle, or the Publication Properties – <Publication> dialog box. You cannot drop articles from
snapshot or transactional publications after subscriptions have been added; you must drop the subscriptions first.

22) Define sp_replcounters

Returns replication statistics about latency, throughput, and transaction count for each published database. This stored procedure is executed at the Publisher on any database.

23) Can we use replication to replicate data across different RDBMS i.e. SQL to Oracle

Oracle and DB2 can subscribe to snapshot and transactional publications using push subscriptions. Subscriptions are supported for the two most recent versions of each database listed using the most recent version of the OLE DB provider listed.

However, Heterogeneous replication to non-SQL Server subscribers is deprecated. Oracle Publishing is deprecated. To move data, create solutions using change data capture and SSIS.

This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

24) Explain Latency in replication. How can you monitor Latency of particular publication?

Transactional replication provides the tracer token feature, which provides a convenient way to measure latency in transactional replication topologies and to validate the connections between the Publisher, Distributor and Subscribers. A token (a small amount of data) is written to the transaction log of the publication database, marked as though it were a typical replicated transaction, and sent through the system, allowing a calculation of:

  1. How much time elapses between a transaction being committed at the Publisher and the corresponding command being inserted in the distribution database at the Distributor?
  2. How much time elapses between a command being inserted in the distribution database and the corresponding transaction being committed at a Subscriber?

25) What permissions are needed to a user to monitor replication?

The replmonitor database role in the distribution database. These users can monitor replication, but cannot change any replication properties.

26) Name some commonly used Replication DMVs and their use.

There are four replication related DMV’s in SQL Server.

  • dm_repl_articles
  • dm_repl_schemas
  • dm_repl_tranhash
  • dm_repl_traninfo

27) What are the advantages and disadvantages of Snapshot replication over Transactional replication?

Snapshot Replication would be good to use if:

  • if you are sure that you would synchronize only once in a day and your business requirements do not include replicating transactions as and when they are committed on the publisher
  • If the size of the replicating articles is small – may be a few MBs/GBs
  • If it is does not matter that for some time the replicating articles would be locked (till the snapshot would be generated)

Transactional Replication would be good to use if:

  • You want incremental changes to be propagated to Subscribers as they occur.
  • The application requires low latency between the time changes are made at the Publisher and the changes arrive at the Subscriber.
  • The application requires access to intermediate data states. For example, if a row changes five times, transactional replication allows an application to respond to each change (such as firing a trigger), not simply the net data change to the row.
  • The Publisher has a very high volume of insert, update, and delete activity.

28) What is peer to peer replication?

Peer-to-peer replication provides a scale-out and high-availability solution by maintaining copies of data across multiple server instances, also referred to as nodes. Built on the foundation of transactional replication, peer-to-peer replication propagates transactionally consistent changes in near real-time. This enables applications that require scale-out of read operations to distribute the reads from clients across multiple nodes. Because data is maintained across the nodes in near real-time, peer-to-peer replication provides data redundancy, which increases the availability of data.

29) What is conflict resolution in merge replication?

Merge replication allows multiple nodes to make autonomous data changes, so situations exist in which a change made at one node may conflict with a change made to the same data at another node. In other situations, the Merge Agent encounters an error such as a constraint violation and cannot propagate a change made at a particular node to another node.

The Merge Agent detects conflicts by using the lineage column of the MSmerge_contents system table; if column-level tracking is enabled for an article, the COLV1 column is also used. These columns contain metadata about when a row or column is inserted or updated, and about which nodes in a merge replication topology made changes to the row or column. You can use the system stored procedure sp_showrowreplicainfo (Transact-SQL) to view this metadata.

As the Merge Agent enumerates changes to be applied during synchronization, it compares the metadata for each row at the Publisher and Subscriber. The Merge Agent uses this metadata to determine if a row or column has changed at more than one node in the topology, which indicates a potential conflict. After a conflict is detected, the Merge Agent launches the conflict resolver specified for the article with a conflict and uses the resolver to determine the conflict winner. The winning row is applied at the Publisher and Subscriber, and the data from the losing row is written to a conflict table.

Conflicts are resolved automatically and immediately by the Merge Agent unless you have chosen interactive conflict resolution for the article.

30) What are data type concerns in transactional replication?

Transactional replication supports publishing LOBs and performs partial updates on LOB columns: if a LOB column is updated, only the fragment of data changed is replicated, rather than all the data in the column.

If a published table includes any LOBs, consider using the following Distribution Agent parameters: -UseOledbStreaming, -OledbStreamThreshold, and -PacketSize. The most straightforward way to set these parameters is to use the Distribution Agent profile titled Distribution Profile for OLEDB streaming.

The process of replicating text, ntext and image data types in a transactional publication is subject to a number of considerations. It is recommend that you use the data types varchar(max), nvarchar(max), varbinary(max) instead of text, ntext, and image data types, respectively.

31) Can we rename a database used in Publication or subscription?

No. we would need to drop the publications, rename the database and re-configure replication all over again. So there is no easy way to do this.

32) Are logins and passwords replicated?

No. You could create a DTS\SSIS package to transfer logins and passwords from a Publisher to one or more Subscribers.

33) Please underline the complications involved in using replication on SQL Cluster.

No special considerations are required because all data is stored on one set of disks on the cluster.

34) Are tables locked during snapshot generation?

The length of time that the locks are taken depends on the type of replication used:

For merge publications, the Snapshot Agent does not take any locks.
For transactional publications, by default the Snapshot Agent takes locks only during the initial phase of snapshot generation. For snapshot publications the Snapshot Agent takes locks during the entire snapshot generation process. Because locks prevent other users from updating the tables, the Snapshot Agent should be scheduled to execute during periods of lower activity on the database, especially for snapshot publications.

35) Can the same objects be published in different publications?

Replication supports publishing articles in multiple publications (including republishing data) with the following restrictions:

If an article is published in a transactional publication and a merge publication, ensure that the @published_in_tran_pub property is set to TRUE for the merge article. An article cannot be published in both a merge publication and a transactional publication with queued updating subscriptions.
Articles included in transactional publications that support updating subscriptions cannot be republished.
Transactional replication and unfiltered merge replication support publishing a table in multiple publications and then subscribing within a single table in the subscription database (commonly referred to as a roll up scenario). Roll up is often used for aggregating subsets of data from multiple locations in one table at a central Subscribe

36) Does replication encrypt data?

No. Replication does not encrypt data that is stored in the database or transferred over the network.

37) What is the effect of running a bulk insert command on a replicated database?

For transactional replication, bulk inserts are tracked and replicated like other inserts. For merge replication, you must ensure that change tracking metadata is updated properly.

38) Why can’t I run TRUNCATE TABLE on a published table?

TRUNCATE TABLE is a non-logged operation that does not fire triggers. It is not permitted because replication cannot track the changes caused by the operation: transactional replication tracks changes through the transaction log; merge replication tracks changes through triggers on published tables.

39) What is NOT FOR REPLICATION option for table constraints?

In some cases, it is desirable for user activity in a replication topology to be treated differently from agent activity. For example, if a row is inserted by a user at the Publisher and that insert satisfies a check constraint on the table, it might not be required to enforce the same constraint when the row is inserted by a replication agent at the Subscriber. The NOT FOR REPLICATION option allows you to specify that the following database objects are treated differently when a replication agent performs an operation:

Foreign key constraints: The foreign key constraint is not enforced when a replication agent performs an insert, update, or delete operation.
Check constraints: The check constraint is not enforced when a replication agent performs an insert, update, or delete operation.
Identity columns: The identity column value is not incremented when a replication agent performs an insert operation.
Triggers: The trigger is not executed when a replication agent performs an insert, update, or delete operation.

40) Does replication resume if a connection is dropped or do we need to reinitialize the replication?

Yes. Replication processing resumes at the point at which it left off if a connection is dropped. If you are using merge replication over an unreliable network, consider using logical records, which ensures related changes are processed as a unit.

41) How do I move or rename files for databases involved in replication?

In versions of SQL Server prior to SQL Server 2005, moving or renaming database files required detaching and reattaching the database. Because a replicated database cannot be detached, replication had to be removed from these databases first. Beginning with SQL Server 2005, you can move or rename files without detaching and re-attaching the database, with no effect on Replication.

References: Thanks to the all the SQL Server techies who wrote and shared the valuable information which helped me a lot to prepare this series of Questions. Also big thanks to Microsoft Documentation which contains each and everything about their product.

Facebook Comments
Share.

About Author

SanjayHumania.Com is a notebook, where I have tried to cover information and tips related to Technology, Social Media, Life Style, Travelling and Others… It’s owned by me and technically supported by my friend, Stiff Roy. I am from India and completed my M.Tech in Mechatronics from Jawaharlal Nehru Technological University Hyderabad in 2013. The main focus of my notebook is to cover information & tips related to the above topics. I will do my best to provide more informative and unique information to the readers at SanjayHumania.Com