Tuesday, September 7, 2010

SQL Server- Basics

The smallest unit of data that SQL Server works with is 8KB. This 8KB chunk of data is called pages. SQL Server therefore stores data on the disk in pages. In Memory also SQL Server manipulates those same 8 KB chunks (pages).

An entire record or row must fit within 8kb chunk or page when data is written to the disk. This means that sometime a single page can share more than one row. A row cannot span multiple pages.E.g. If a employee table has employee name, address, city, state, zip, phone then all that combined data must be written in less than a page (or 8KB). Few data types like LOB of text, binary data like files which stores pointer to the real data can be spread across multiple pages or even in file. All this data is gathered in to the data files on the disk which has either .MDF or .NDF as file extension.

When a Structured Query Language (SQL) is written in the query analyze of SQL Server then SQL Server's internal query optimizer looks at the query and constructs a execution plan for executing it (i.e. what all steps it will need to take in order to get that data off of the disk). SQL Server has a number of techniques it can use, some which are better in certain conditions than others. Once SQL Server has the plan, it executes it and retrieves the needed data off of the disk. If the data is requested from the client then the data is sent across the network.

In case of any modification query (DML), SQL Server first modifies the pages of that data in the memory. After modification it does not write those updated pages directly to the disk, but it makes a copy of the DML query result in a log file called transaction log. A transaction log file has .LDF extension. It keeps track of every transaction in the database. If it writes to the disk after every DML query is fired on the table then it would increase the system load to write records on to the disk.

SQL Server has an automated recovery mode that kicks in when it starts back up after a SQL Server crash. Once the server is up after the crash, it will decide to write the modified pages to the disk or not. It first checks in the transaction log file if the transaction is completed then it goes and writes to the disk. This is how SQL Server knows that the change is safe on the disk.

When SQL Server crashed, it has automated recovery mode that gets kick when SQL Server starts back up. It first go to the transaction log and checks for the uncommitted transaction or those which are not yet flagged. It knows that the flagged transactions are safe on the disk and the remaining was not copied to disk but still present in the memory when it crashed. In this case SQL Server reads those transactions from the log file, executes them again and immediately writes/commit those pages to the disk. This process allows SQL Server to catch up with all in-progress work and ensures that no loss of data occurred (provided database files are not corrupted). In short whatever happens in SQL Server it takes place through transaction log. Again it depends on the recovery model of the database. We have these 3 database recovery model – Full, Simple and Bulk-logged.

Individual databases can be changed from one recovery model to other depending upon the requirement or functionality. E.g. in Simple recovery model does not use transaction log (very less usage). It removes the transactions automatically to keep the log file size to minimal. Simple recovery is appropriate for read-only database that have no changes being made. If there is no change then there will not be any data loss during SQL Server crash.

This is how the data moves from disk to memory. This entire process is absolutely essential to how most of SQL Server's functionality actually works, as well as how to administer it.

No comments:

Post a Comment