Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Tuesday, May 8, 2012

Introduction to Hierarchical Query using a Recursive CTE by pinaldave


This is follow up blog post of my earlier blog post on the same subject - SQL SERVER – Introduction to Hierarchical Query using a Recursive CTE – A Primer. In the article we discussed various basics terminology of the CTE. The article further covers following important concepts of common table expression.
·         What is a Common Table Expression (CTE)
·         Building a Recursive CTE
·         Identify the Anchor and Recursive Query
·         Add the Anchor and Recursive query to a CTE
·         Add an expression to track hierarchical level
·         Add a self-referencing INNER JOIN statement
Above six are the most important concepts related to CTE and SQL Server.  There are many more things one has to learn but without beginners fundamentals one can't learn the advanced  concepts. Let us have small quiz and check how many of you get the fundamentals right.

Quiz

1) You have an employee table with the following data.
EmpID
FirstName
LastName
MgrID
1
David
Kennson
11
2
Eric
Bender
11
3
Lisa
Kendall
4
4
David
Lonning
11
5
John
Marshbank
4
6
James
Newton
3
7
Sally
Smith
NULL
You need to write a recursive CTE that shows the EmpID, FirstName, LastName, MgrID, and employee level. The CEO should be listed at Level 1. All people who work for the CEO will be listed at Level 2. All of the people who work for those people will be listed at Level 3. Which CTE code will achieve this result?
1.    WITH EmpList AS
(SELECT Boss.EmpID, Boss.FName, Boss.LName, Boss.MgrID,
1 AS Lvl
FROM Employee AS Boss WHERE Boss.MgrID IS NULL
UNION ALL
SELECT E.EmpID, E.FirstName, E.LastName, E.MgrID, EmpList.Lvl + 1
FROM Employee AS E INNER JOIN EmpList
ON E.MgrID = EmpList.EmpID)
SELECT * FROM EmpList
2.    WITH EmpListAS
(SELECT EmpID, FirstName, LastName, MgrID, 1 as Lvl
FROM Employee WHERE MgrID IS NULL
UNION ALL
SELECT EmpID, FirstName, LastName, MgrID, 2 as Lvl )
SELECT * FROM BossList
3.    WITH EmpList AS
(SELECT EmpID, FirstName, LastName, MgrID, 1 as Lvl
FROM Employee WHERE MgrID is NOT NULL
UNION
SELECT EmpID, FirstName, LastName, MgrID, BossList.Lvl + 1
FROM Employee INNER JOIN EmpList BossList
ON Employee.MgrID = BossList.EmpID)
SELECT * FROM EmpList
2) You have a table named Employee. The EmployeeID of each employee's manager is in the ManagerID column. You need to write a recursive query that produces a list of employees and their manager. The query must also include the employee's level in the hierarchy. You write the following code segment:
WITH EmployeeList (EmployeeID, FullName, ManagerName, Level)
AS (
--PICK ANSWER CODE HERE
)
1.    SELECT EmployeeID, FullName, '' AS [ManagerID], 1 AS [Level]
FROM Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT emp.EmployeeID, emp.FullName mgr.FullName, 1 + 1 AS [Level]
FROM Employee emp JOIN Employee mgr
ON emp.ManagerID = mgr.EmployeeId
2.    SELECT EmployeeID, FullName, '' AS [ManagerID], 1 AS [Level]
FROM Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT emp.EmployeeID, emp.FullName, mgr.FullName, mgr.Level + 1
FROM EmployeeList mgr JOIN Employee emp
ON emp.ManagerID = mgr.EmployeeId
Now make sure that you write down all the answers on the piece of paper.
Watch following video and read earlier article over here. If you want to change the answer you still have chance.

Solution

1) 1
2) 2
Now compare let us check the answers and compare your answers to following answers. I am very confident you will get them correct.

Tuesday, April 24, 2012

SQL SERVER – Introduction to Hierarchical Query using a Recursive CTE – A Primer


What is a Common Table Expression (CTE)

A CTE can be thought of as a temporary result set and are similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. A CTE is generally considered to be more readable than a derived table and does not require the extra effort of declaring a Temp Table while providing the same benefits to the user. However; a CTE is more powerful than a derived table as it can also be self-referencing, or even referenced multiple times in the same query.
The basic syntax structure for a CTE is shown below:
WITH MyCTE
AS ( SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
WHERE ManagerID IS NULL )
SELECT *
FROM MyCTE

Building a Recursive CTE

In the following examples, you will learn how to harness the power of a recursive CTE query by fulfilling a common business requirement, retrieving hierarchical data. By the time the final query is complete you will be able to easily determine how many levels from the top executive each employee is.
A recursive CTE requires four elements in order to work properly.
1.    Anchor query (runs once and the results ‘seed’ the Recursive query)
2.    Recursive query (runs multiple times and is the criteria for the remaining results)
3.    UNION ALL statement to bind the Anchor and Recursive queries together.
4.    INNER JOIN statement to bind the Recursive query to the results of the CTE.
WITH MyCTE
AS ( SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT EmpID, FirstName, LastName, ManagerID
FROM Employee
INNER JOIN MyCTE ON Employee.ManagerID = MyCTE.EmpID
WHERE Employee.ManagerID IS NOT NULL )
SELECT *
FROM MyCTE

Identify the Anchor and Recursive Query

Anyone who does not have a boss is considered to be at the top level of the company and everyone who does have a boss either works for the person(s) at the top level (upper management), or the people that work for them (mid-management thru base employees).
For example, a CEO is at the top level and thus has a ManagerID of null. Likewise, everyone below the CEO will have a ManagerID. This is demonstrated in the two queries below:
CTE Screenshot 1
The first SELECT statement will become your Anchor query as it will find the employee that has a ManagerID of null (representing Level 1 of the organization). The second SELECT statement will become your Recursive query and it will find all employees that do have a ManagerID (representing Level 2-3 of this organization).
As you can see from the results so far, these queries are unable to give hierarchical data on which level each employee is at within the organization.

Add the Anchor and Recursive query to a CTE

Begin transforming this entire query into a CTE by placing a UNION ALL statement between the Anchor and Recursive queries. Now add parentheses around the entire query, indenting it, moving it down, and adding the declaration WITH EmployeeList AS before the open parenthesis, and then add SELECT * FROM EmployeeList on the next line after the close parenthesis.
Your query should now look like the screenshot below:
CTE Screenshot 2
As you can see, the results from your CTE are exactly the same as the results returned from running the anchor and Recursive queries simultaneously in the previous example.

Add an expression to track hierarchical level

The Anchor query (aliased as ‘Boss’) inside the CTE represents everyone at Level 1 (i.e. Sally Smith). The Recursive query (aliased as ‘Emp’) represents everyone at Levels 2 and 3. In order to visualize each level in a result set, you will need to add an expression field to each query.
Add the expression “1 AS EmpLevel” to the Anchor query and the expression “2 AS EmpLevel” to the Recursive query. Before executing the entire query, look closely at the expression field. The EmpLevel expressions in the Anchor query will hard-code the numeral 1 (for Sally Smith’s level), while the EmpLevel expressions in the Recursive query will hard-code the numeral 2 for everyone else.
Your query should now look like the screenshot below:
CTE Screenshot 3
The two new expression fields were a helpful step. In fact, they show the correct EmpLevel information for Sally Smith and for the people at Level 2 (i.e., Adams, Bender, Brown, Kennson, Lonning and Osako). However, the 2 is just a hard-coded placeholder to help visualize your next step. Lisa Kendall and several other employees need to be at Level 3.
Ideally you would like to make the expression dynamic by replacing “2 AS EmpLevel” with the expression “EmpLevel + 1”.

Add a self-referencing INNER JOIN statement

Let’s take a moment and recognise why this is not going to work quite so simply. The idea to increment EmpLevel in the recursive query of the CTE is on the right track. Unfortunately, the recursive query is trying to reference a field called EmpLevel but can’t find one, since it has only been materialized in the result set of the Anchor query and does not yet exist in the recursive set.
How can you materialize the EmpLevel field for the recursive query? We can use the CTE for this! Remember, a recursive CTE requires an INNER JOIN to connect the recursive query to the CTE itself. Go ahead and write an INNER JOIN statement binding the recursive query ‘Emp’ to the CTE ‘EmployeeList AS EL’ ON Emp.ManagerID = EL.EmpID.
Your query should now look like the screenshot below:
CTE Screenshot 4
Success! You can now see that Sally is at the first level, Alex is at the second level and Lisa appears at the third level. Since a CTE can reference itself, the ‘Emp’ recursive query can now access the EmpLevel field materialized in the EmployeeList CTE.

This blog post is inspired from SQL Queries Joes 2 Pros: SQL Query Techniques For Microsoft SQL Server 2008 – SQL Exam Prep Series 70-433 – Volume 2

Saturday, April 21, 2012

SQL Server Replication

SQL Server Replication-
The Crib Sheet

For things you need to know rather than the things you want to know

Contents

Introduction

Replication is intended to be a way of distributing data automatically from a source database to one or more recipient databases. As such, it can have obvious uses in a distributed system. It has also been used to implement high-availability systems. It is not useful for one-off synchronization, or for simply copying data. It is intended as a long-term data relationship between databases. Typical uses are in:
  • Data warehousing and reporting
  • Integrating data from several, possibly only partially connected, sites
  • Improving scalability and availability
  • Integrating heterogeneous data from other databases via OLE DB, integrating data from mobile users, getting data to and from Point-Of-Sale systems
  • Offloading/delegating batch processing tasks.
Examples of the use of replication within an application could be
  • Distributing data 'owned' by a particular application to other applications that are 'consumers' of that data. (For example, sales records to reporting services, manufacturing stock levels to purchasing systems.)
  • Creating several instances of a database to distribute load on it
  • Updating a central database with information from a number of remote Laptops that might be only partially connected, and to resynchronise the laptops.
There are three methods of replication: Snapshot, Transactional, and Merge. These are provided to try to meet the wide range of business requirements for replication.
Replication uses a 'Magazine Publishing' vocabulary. Anything from an 'Article' to an entire database can be replicated. An Article is the smallest component of distribution, and can be a table, procedure or function. If it is a table, then a filter can be applied to it so that only certain rows or columns are replicated. This 'Magazine Publishing' analogy can be confusing because, in replication, a Subscriber can sometimes make updates, and a Publisher usually sends out incremental changes to the articles in a publication.
A 'Publisher' maintains the original copy of the data. It holds the definition of the 'Publication', which defines the 'articles' that are to be 'published'. (The database with the original location of the data determines what is to be distributed).
A 'Subscriber' receives the articles from a publisher. It can subscribe to one or more publications. Any database can take on either role or even both roles at once.
A Distributor is a specialist database that runs the 'Replication agents'.
Replication is not part of the SQL Server engine, but an external application. This makes it much easier to involve other database systems in replication. Any SQL Server database, or other database system with an OLE DB provider, can be a publisher or subscriber in snapshot or transactional replication.
It is essential to plan out the replication in detail as a first stage, and to be very certain of the type of replication you wish to implement. A common mistake is to use replication in cases where a much less complex solution is possible.
A problem with production systems using replication is the difficulty of restoring the topology after a disaster.. This requires a fully documented recovery strategy, which has to be periodically tested and practiced. This means that the whole replication topology and configuration must be scripted so it can be re-created in an emergency, even if the system was originally built using the GUI tools. The Object Browser (or Enterprise Manager) makes the initial deployment relatively simple to do, and there are plenty of step-by-step guides, but it is not the best option when trying to restore an existing topology, and settings, in order to achieve a recovery from major failures.
Problems often follow from developers adding or dropping articles, changing publication properties, and changing schema on published databases. It is therefore best to create the replication once the design of the publisher is relatively stable.

Replication topologies

In most topologies, it makes sense for publishers, distributors, and subscribers to be on separate physical hardware.

Central Publisher

The commonest form of replication is to have a single publisher with the source data, with one or more subscribers. The Distributor database can be on the same, or preferably different, server.

Central Subscriber

Often, where the data from several databases need to be 'warehoused' centrally in an OLAP or reporting database, one will find a single 'reporting' database subscribing to several publications.
One can come across other topologies such as 'bi-directional' and 'peer to peer' which are really special cases of Central Publisher or Central Subscriber and use transactional replication

Publishing Subscriber

The distribution of data can be relayed to other subscribers via a publishing subscriber. This allows replication to be implemented over low-bandwidth WANs to a subscriber that, in turn, distributes it to other servers within its high-bandwidth LAN

Replication Methods

How Replication Works

Replication begins with the initial synchronization of the published objects between the Publisher and Subscribers, using a snapshot. A snapshot is a copy of all of the objects and data specified by a publication. After the snapshot is created on the publisher, it is delivered to the Subscribers via the distributor.
For Snapshot replication, this is sufficient. For other types of replication, all subsequent data changes to the publication flow to the Subscriber as they happen, in a queue, or on request.

Snapshot Replication

The snapshot replication process provides the initial synchronization for transactional and merge publications. However, in several cases, this initial synchronization is all that is necessary. This would include circumstances where data hardly changes, or if the latest version of the data is not essential to the subscriber, where the amount of data is small, or if a large number of changes takes place rapidly.
Snapshot replication involves copying the articles that make up the publication. Normally, if they exist already on the subscriber, they are over-written, though this behavior can be changed. Snapshot replication is more expensive in terms of overhead and network traffic and only takes place at intervals. Because locks are held during snapshot replication, this can impact other users of the subscriber database. It is therefore more suitable for static data and enumerations. In SQL Server 2005, several articles can be processed in parallel, and interrupted snapshots can be recommenced from the point of interruption. Snapshots can be queued or immediate.
Data changes are not tracked for snapshot replication; each time a snapshot is applied, it completely overwrites the existing data.

Transactional Replication

Transactional replication is used if:
  • Changes to the data must be propagated immediately
  • The database application taking out a subscription needs to react to every change
  • The Publisher has a very high volume of insert, update, and delete activity
  • The Publisher or Subscriber is a different database application reached via OLE DB.
Essentially, Transaction replication distributes data in one direction, but transactional replication does offer options that allow updates at the Subscriber. Once a snapshot replication has synchronized the subscribers with the publisher, all committed transactions on the publisher are then propagated to the subscribers in sequence, via distributed transactions. One can select a queued update or immediate, depending on requirements.

Peer-to-peer Replication

This is a special type of transactional replication in which every participant is both a publisher and subscriber (2005 Enterprise only) and is most useful for up to ten databases in a load-balancing or high-availability group.

Bidirectional Replication

This is where two databases replicate the same articles to each other via a distributor. There must be loopback detection. Data conflicts aren't handled and the replication must be implemented in code, since the GUI doesn't support it.
Transactional replication tracks changes through the SQL Server transaction log

Merge Replication

Merge replication allows various sites to work autonomously and later merge updates into a single, uniform result.
Merge Replication is complex, but provides the means to implement part of a high-availability system, as well as its original purpose of serving mobile and disconnected users. It is designed for cases where the publishers are not in constant communication with the subscribers. After the initial snapshot synchronization, subsequent changes are tracked locally with triggers, and the databases are merged when in contact, using a series of rules to resolve all possible conflicts.
Merge replication is used when several Subscribers might need to update the same data at various times and propagate those changes back to the Publisher and thence to other Subscribers. It is also required in applications that involve Subscribers receiving data, making changes offline, and finally reconnecting with the publisher to synchronize changes with the Publisher and other Subscribers.
To make this possible, each Subscriber requires a different partition of data and there has to be a set of rules to determine how every conflict that takes place in the update of the data is detected and resolved. These conflicts occur when the data is merged because there can be no 'locking' and so the same data may have been updated by the Publisher and by more than one Subscriber.
Merge Replication does not use transactions. Merge replication uses a set of conflict-resolution rules to deal with all the problems that occur when two databases alter the same data in different ways, before updating the subscribers with a 'consensus' version. It normally works on a row-by-row basis but can group rows of related information into a logical record. One can specify the order in which 'articles' are processed during synchronisation.
Merge replication tracks changes through triggers and metadata tables.

Replication Agents

Replication is done by several different agents, which are separate applications each responsible for part of the process. The replication agents should not be run under the SQL Server Agent account in a production system. Instead, they need the minimal permissions necessary to perform their function.

SQL Server Agent

This manages the overall replication process via SQL Server Agent jobs.

The Snapshot agent

Snapshot.exe executes on the Distributor. It extracts the schema and data defined by the publication, which is then sent to the subscriber via a 'snapshot folder'. It also updates status information on the distribution database. . It is used in all forms of replication

The Log Reader Agent

LogRead.exe is used in transactional replication to extract relevant committed transactions from the publisher's log, repackage them and send them to the distributor in the correct sequence.

Distribution Agent

Distrib.exe takes the snapshots, and log entries from the agents we've described, and dispatches them to the subscribers.

Merge Agent

ReplMer.exe is used only in Merge Replication to send a snapshot when the subscriber is initialized, and also exchanges transactions between publisher and subscriber

Queue Reader Agent

QrDrSvc.exe is used to queue the updates in transactional or snapshot replication when queuing has been specified.

Monitoring Replication

Many problems associated with replication can be avoided by .regular checks. The most obvious check is to make sure that the data has been transferred as expected. Periodic checks with SQL Compare and SQL Data Compare can be very useful in addition to the tools that come with Replication. Additionally the replication processes and jobs need to be checked to make sure they are working.

Checking throughput

The performance of replication must be regularly monitored, and performance-tuned as necessary.
The Replication Monitor is used to check on the operational state of publications, and inspect the history, and errors. Right-clicking the replication node in Object Explorer will gain access to it.
One of the most important concerns is the time delay, or latency, of transactions from the publications appearing in the subscriber database. At times of high transaction throughput on the publisher database, bottlenecks can occur. Whereas the stored procedure sp_browseReplCmds on the distribution database can tell you how far behind the synchronisation is at any particular time, one cannot determine where the problems lies just from the data. Tracer tokens are now used to measure the actual throughput of the replication architecture at any particular time to help diagnose such bottlenecks.

Validating

There is always an element of doubt as to whether the replication has entirely worked. There are stored procedures provided to compare the 'articles' on the publisher and subscribers to make sure they are the same.
The sp_publication_validation stored procedure validates the data associated with each article by calling sp_article_validation (after the articles associated with a publication have been activated). The sp_article_validation stored procedure invokes sp_table_validation stored procedure, which calculates the number of lines and, optionally, the checksum of the published table. It is considered good practice to perform a daily row-count and weekly checksum. SQL Data Compare is ideal for mending a broken replication.
The Distribution Agent raises the '20574' system message if validation fails, or the '20575' system message if it passes. The Distribution Agent will replicate changes to a subscriber even if the validation shows that the subscriber is out of synchronization. It is a good policy to configure the Replication Alert on the '20574' message so as to send E-Mail, Pager, or Network notification.
This validation approach will only work within certain restrictions. For example, it will not work if certain filters have been applied. They should be used with caution.

Changing the settings

It is best to use the default replication settings unless there are clear performance gains to be made, or if the application design forces the issue. However, one cannot assume that the changes will be generally beneficial to the entire topology without comprehensive testing.

Articles

Articles are the smallest unit of a publication. An article can be a table, view, stored Procedure or function. Where an article is based on a table or view, it can contain all the data or just part of it. These filters of two types.: More common are the static 'WHERE' clauses, but filters can be used dynamically in Merge Replication to publish different 'content' (rows) to different 'subscribers' (databases receiving data). These latter Filters are called 'Dynamic' and can be simple Row Filters, or Join Filters, where the selection of rows to publish is based on a join with other tables, rather than a simple WHERE clause.
Normally, any alteration to an article that is a table is propagated to all the subscribers. You can also opt to propagate schema objects associated with the article such as indexes, constraints, triggers, collation and extended properties.

Updating articles

In Merge replication, the subscriber can update the article. This is, of course, a recipe for conflict, and these have to be resolved automatically. When the Merge Agent comes across a row that might have changed recently, it examines the history or 'lineage' of each site's version of the row to see if there is a conflict. If so, then the update that is finally used. Has to be based on either
  • A "first wins" resolution,
  • a user-specified priority scheme to determine the update to select,
  • a customised resolution, using COM and stored procedures.
The 'lineage' is a history of changes in a table row in MSmerge_contents, which is maintained automatically when a user updates a row. Each column contains one entry for each site that has updated the row.
Conflicts to the data in the base table can occur within a column or a row. Most usual are column-tracked articles this means that, within any row, updates are only recognized as conflicts if the same column is updated by more than one subscriber. Occasionally, however, the business rules of the application may treat simultaneous changes to the any column within the row as a conflict, in which case row-level tracking is used.

Programming Replication Topologies

Replication agents and replication topologies can be administered and monitored remotely via SQL Scripts or RMO scripts. The task of building and maintaining replication topologies is made much easier of all parts of the deployments are scripted, even if this is done after the other methods such as wizards or RMO are used for the initial operation.
There are other uses for a replication script. It will be required in end-user application where, for example, such as s a pull subscription is synchronized when the user clicks a button, or where a routine administration task such as monitoring replication throughput is performed from a custom console. It is also generally used for writing customized business rules that are executed when a merge subscription is synchronized.
One can use the Object Explorer in SSMS, or the Enterprise Manager in earlier versions of SQL Server, to set up replication. BOL provide worked examples. Alternatively, RMO can be used with VB or C# to script the replication. Whatever system you use, it is a good idea to use the Transact SQL script as the reference for the replication topology you create
Ultimately, the functionality in a replication topology is provided by system stored procedures. The easiest approach is to use Transact-SQL script files to perform a logical sequence of replication tasks, because it provides a permanent, repeatable, copy of the steps used to deploy the replication topology that can, for example, be used to configure more than one subscriber. It also provides a measure of documentation and disaster-recovery. A script can be stored as a query object in a SQL Server Management Studio project.
Replication scripts can be created by hand, by the script generation of the replication wizards in SQL Server Management Studio, or by using Replication Management Objects (RMOs) to programmatically generate the script to create an RMO object.
When creating scripts to configure replication, it is best to use Windows Authentication so as to avoid storing security credentials in the script file. Otherwise you must secure the script file

Further reading:

SQL Server Replication
For the details of implementing replication, all the steps are documented here in Implementing Replication
details on configuring and maintaining a replication are here Configuring and Maintaining replication
For details on peer-to-peer replication, read Peer-to-Peer Transactional Replication
Some of the wider issues and dangers of replication are discussed here Data Replication as an Enterprise SOA Antipattern, in the excellent Microsoft Architecture Journal

Could not find a part of the path ... bin\roslyn\csc.exe

I am trying to run an ASP.NET MVC (model-view-controller) project retrieved from TFS (Team Foundation Server) source control. I have added a...