When we create a new database, two files are created by default. One of them is MDF, which is a data file and the other is LDF, which stores transaction logs.
SQL Server Transaction Log Architecture
Transaction: It is a set of changes that occur in the database.
Commit: Successful completion of the Transaction.
Rollback: It is the process of cancelling and withdrawing the Transaction.
Logging: It is the logging process performed to ensure database consistency.
Transaction log: The name given to the log file where log records are stored.
Crash: It is the unexpected shutdown of SQL Server.
Recovery: When Crash occurs, it is the process performed in order to maintain and save the consistency of the database.
Checkpoint: It is the process of flushing the dirty pages in the buffer pool to the disk. It will be explained in detail.
HADR: High Availability / Disaster Recovery
ACID: (A)tomicity, (C)onsistency, (I)ndependent, (D)urability.
The Importance and Necessity of Log and Logging
- Logging, the transactions can be stored permanently and recovery and return operations can take place in case of disasters.
- Keeping and keeping the changes completely.
- In order to provide the Durability principle, one of the ACID principles, after all the changes that have taken place are reflected in the database.
- Logging is of great importance in HADR solutions. The whole structure is almost built on this infrastructure. For example; In the mirroring solution, all data transferred from principal DB to mirror DB is realized through the logging mechanism. Availability Groups solution that is offered with SQL Server 2012 also works with a similar structure in Mirroring solution. Other solutions are transactional replication and database snapshot. Database snapshot uses the logging mechanism to keep the snapshots on the DB received, after the snapshot is received, in the snapshot DB. We can also count change data capture (CDC).
Log reflection of a simple update transaction in the database
- If the transaction is not explicitly specified in the query (explicit), SQL Server creates an implicit transaction for us.
- If the page to which the change will be applied is not in the buffer pool, it is read from the disk, then it is pulled into memory (latch, physical read).
- To ensure consistency, table, page and row a lock are placed respectively.
- The pages in the buffer pool are also changed.
- In the buffer pool, log records are created in order for the transaction to be subject to roll back and commit operation.
- After the change is applied, the log record in the buffer pool is sent to the transaction log (log file).
- If there are mirroring or availability group solutions and sync replica is available,
- It is expected that the log record will be delivered to other nodes and a message will be received. Therefore, in such HADR solutions, it is important to generate logs.
- All locks are removed and resources (row> page> table) are released
- The transaction is committed and a message and/or result is returned to the source that sent the query. Like ‘206 rows affected.’
- Pages that are changed in the Buffer pool are marked with a dirty page. The reason for this is that the change has not been permanently reflected on the disk yet. These dirty pages are flushed to disk during the checkpoint process, which is managed by SQL Server and runs at certain intervals.
For example: Wouldn’t it be easier if we wrote the changes that happened instead of these processes directly to disk?
The answer to this question is no, both in terms of performance, reliability and consistency. The reason is that if it were written directly to the disk, working instantly with discs that are almost 10 times slower than memory (30GB / sec-550MB / sec) will affect performance, in addition, unexpected system outages encountered while writing to the disk. Data loss or inconsistency would occur as a result of access problems. In the above steps, for the purpose of reliability, consistency, durability and performance, resources are isolated, I / O processes are performed, log logging is written and flushed from memory to disk.
Internal Structure of Transaction Log File
When we create a new database or add a new log file, if we do not specify a first size value, 0.5MB or 25% of the total of the data files is determined by default for the log file. For example, if we create a 500GB data file, we will have a 125GB log file. Even if the best practice Windows IFI is active, transaction log file does not use this mechanism, so Windows creates a 125GB file on NTFS File System, it is filled with 0 (zero) and this file is delivered to SQL Server after the allocation process is completed. The longer it takes to create a 125GB file, the longer it will take to add a log file. For this reason, the initial and growth values of the log files should be determined and determined by observing.
We do not normally display the filling process with 0 (zero), but we can activate trace flags 3605 and 3004 to observe. Thus, we can see the relevant entries in SQL Server logs. Also, ASYNC_IO_COMPLETION wait type can be observed during the filling process with 0 (zero). If we are actively monitoring the wait types, it will be useful to examine the values of this wait type.
Transaction log file consists of logically divided virtual log files (VLF) divided into a certain series. The reason for this is the need to simplify SQL Server log management.
Let’s say we have created a new database or a new log file. Filling with 0 (zero) above etc. After the operations are completed, the log file is divided into VLFs within itself. If database creation is taking place, the first VLF will be marked as active, because the database needs at least one active VLF for logging. Except for this, VLFs in the newly added log file is marked as passive and out of use.
Note: VLFs can be active or passive. At the beginning of the transaction log file, metadata, file size and growth rates etc. belonging to this log file. There is an 8 KB file header page with information.
So how many VLFs are allocated when a new log file is added:
- 4 new VLFs for sizes 64MB and below.
- 8 new VLFs for sizes ranging from 64MB-1GB.
- 16 new VLFs are created for sizes of 1GB and above. Another important thing to know is that the VLF sizes and numbers are completely SQL Server, but are changed by us.
VLFs have a unique and sequential number. The number continues to increase with each new VLF active. When a VLF content is full, if a new VLF will be marked as active, it starts from the VLF with the smallest sequence number and continues in sequence. For a new database, the VLF sequence number will not start from 1, because 1 is owned by the model system database. It always continues increasing as 1+.
There is a VLF header in the VLF. The header contains the following information:
- Whether VLF is active.
- Log sequence number (log sequence number).
- Parity information is valid for 512byte blocks.
VLFs form the internal structure of transaction log files, while log blocks form the internal structure of VLFs. VLFs have log blocks with variable sizes. While the minimum size of the log blocks is 512bytes, the maximum size is 60KB. When log blocks need an increase, they can grow up to a maximum of 60KB with an increase of 512bytes each time.
Log blocks consist of log records. Log records are stored in the order in which they are written in log blocks and have a structure similar to data file pages. Each log record has a sequential and unique sequence number, just like log VLFs, called Log Sequence Number (LSN).
The sequence number of the log record, LSN, consists of the combination of the log block to which it belongs, its sequence number and the sequence number of the VLF to which it belongs. To formulate:
LSN = [VLF sequence number (4byte)]: [Log block number (4byte)]: [Log record number (2byte)]
LSN is of great importance in many issues. Backup, restore recovery, data file etc.
For example; In the header section on each data file page, the LSN information that is flushed to itself is located.