Recent Entries 10
- pattern minor 112d agoDo I really need a backup when using REPAIR_ALLOW_DATA_LOSS?Reluctant DBA here. Do I really need a backup when using REPAIR_ALLOW_DATA_LOSS? I am well aware MS recommends doing a backup. What's the worse that could happen if I run a DBCC CHECKTABLE using REPAIR_ALLOW_DATA_LOSS? A bit of background. A client wants to migrate the DB used by our software from on-prem to Azure. Currently, the DB size is roughly 1TB and he wants to delete a lot of data in order to get it below 500GB before migration. He has no DBA. There are no backups whatsoever. Not enough space for backups. I wrote a program that deletes data older than the threshold agreed upon, but while deleting data from one of the tables for a certain date, I got a Fatal Error 824. DBCC CHECKDB output: ``` Msg 8939, Level 16, State 98, Line 1 Table error: Object ID 914102297, index ID 1, partition ID 72057594051493888, alloc unit ID 72057594057195520 (type In-row data), page (1:3692983). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 133129 and -4. Msg 8928, Level 16, State 1, Line 1 Object ID 914102297, index ID 1, partition ID 72057594051493888, alloc unit ID 72057594057195520 (type In-row data): Page (1:3692983) could not be processed. See other errors for details. Msg 8976, Level 16, State 1, Line 1 Table error: Object ID 914102297, index ID 1, partition ID 72057594051493888, alloc unit ID 72057594057195520 (type In-row data). Page (1:3692983) was not seen in the scan although its parent (1:3693476) and previous (1:3692476) refer to it. Check any previous errors. Msg 8978, Level 16, State 1, Line 1 Table error: Object ID 914102297, index ID 1, partition ID 72057594051493888, alloc unit ID 72057594057195520 (type In-row data). Page (1:3693015) is missing a reference from previous page (1:3692983). Possible chain linkage problem. CHECKDB found 0 allocation errors and 4 consistency errors in table 'schema.FancyTableName' (object ID 914102297). CHECKDB found 0 allocation errors and 4 consistency errors in database 'FancyDatabaseName'. repair_allow_da
- pattern minor 112d agoFormula for memory grant for DBCC CHECKDBHow is the memory grant for a DBCC checkdb determined (other than the max size for memory grants which is from resource governor, possibly the default pool)? I'm not trying to solve a specific problem. I am just curious and trying to understand.
- debug minor 112d agoSQL Server sample database WideWorldImporters fails dbcc checkdbProblem Statement I just ran `dbcc checkdb` on the sample WideWorldImporters database. I received (4) errors about statistics corruption that I do not remember having seen in the past when I ran this command. Although this is a sample database only, I am curious as I'd hate to see a similar error in a real database. SSMS and Windows Versions SSMS: 18.10 (15.0.18390.0) / Windows 10: Version 21H1 (OS Build 19043.1526) SQL Server Developer Version Microsoft SQL Server 2019 (RTM-CU13) (KB5005679) - 15.0.4178.1 (X64) Sep 23 2021 16:47:49 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Pro 10.0 (Build 19043: ) Command Issued in SSMS: ``` dbcc checkdb ([WideWorldImporters]) with all_errormsgs, data_purity, extended_logical_checks , no_infomsgs; go ``` Error Messages Msg 9122, Level 16, State 201, Line 10 The statistics 'sys.TT_OrderIDList_22AA2996.PK__TT_Order__C3905BAE80B57D6F' is corrupt. Msg 9122, Level 16, State 201, Line 10 The statistics 'sys.TT_OrderLineList_24927208.IX_Website_OrderLineList' is corrupt. Msg 9122, Level 16, State 201, Line 10 The statistics 'sys.TT_OrderList_25869641.PK__TT_Order__288FD689F5006DE2' is corrupt. Msg 9122, Level 16, State 201, Line 10 The statistics 'sys.TT_SensorDataList_276EDEB3.PK__TT_Senso__88385F3043E9E8D9' is corrupt. CHECKDB found 0 allocation errors and 4 consistency errors in database 'WideWorldImporters'. My attempts to fix I quickly learned that I do not have the required privileges to delete sys.* indexes. I tried restoring the database from my last backup, but this produced the same exact error messages. Next, I opened this web page to download a fresh copy: https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0 I downloaded "WideWorldImporters-Full.bak" (121 MB), cleared the "Mark of the Web" (MOTW) in properties, and successfully restored the database. However, upon running the same dbcc checkdb command shown above, I receive
- pattern minor 112d agoDBCC CHECKDB on a user database: Time-out occurred while waiting for buffer latch type 2 for page (X:XXX), database ID 2Our Ola Hallengren IntegrityCheck job failed due to a buffer latch timeout while running DBCC CHECKDB on a user database. However, the buffer latch timeout reported was in TempDB (database ID 2). Output from the job: ``` Date and time: 2022-01-22 09:04:15 [SQLSTATE 01000] Database context: [master] [SQLSTATE 01000] Command: SET LOCK_TIMEOUT 600000; DBCC CHECKDB ([SentryOne]) WITH NO_INFOMSGS, ALL_ERRORMSGS, MAXDOP = 4 [SQLSTATE 01000] Msg 845, Sev 17, State 1, Line 1 : Time-out occurred while waiting for buffer latch type 2 for page (6:222), database ID 2. [SQLSTATE 42000] Outcome: Failed [SQLSTATE 01000] Duration: 12:40:32 [SQLSTATE 01000] Date and time: 2022-01-22 21:44:47 [SQLSTATE 01000] ``` Messages in the SQL Error Log: ``` Date 1/22/2022 9:35:22 PM Log SQL Server (Archive #1 - 1/23/2022 12:00:00 AM) Source spid777 Message A time-out occurred while waiting for buffer latch -- type 2, bp 0000016D7A1DE340, page 6:222, stat 0x40d, database id: 2, allocation unit Id: 536870912/281475513581568, task 0x0000016CD624E4E8 : 2, waittime 300 seconds, flags 0x1a, owning task 0x0000016CD624E4E8. Not continuing to wait. BobMgr::GetBuf: Sort Big Output Buffer write not complete after 60 seconds. DBCC CHECKDB (SentryOne) WITH all_errormsgs, no_infomsgs, maxdop = 4 executed by NT SERVICE\SQLSERVERAGENT terminated abnormally due to error state 6. Elapsed time: 12 hours 39 minutes 57 seconds. [INFO] Database ID: [12]. Cleaning up StorageArray. LastClosedCheckpointEndTs: '1225176490' [WARNING] ALTER or DROP TABLE could not clean up root row within 10 seconds. ``` The page in question is an IAM page in TempDB. DBCC PAGE output for it: ``` PAGE: (6:222) BUFFER: BUF @0x0000016D6B622C40 bpage = 0x00000161A6C52000 bPmmpage = 0x0000000000000000 bsort_r_nextbP = 0x0000016D6B622AD0 bsort_r_prevbP = 0x0000016D6B622B80 bhash = 0x0000000000000000 bpageno = (6:222) bpart = 4 ckptGen = 0x0000000000000000 bDirtyRefCount = 0 bstat = 0x9 breferences = 3 berrcode = 0 bUse1 = 19321
- pattern minor 112d agoDatabase Cache Memory in Performance Monitor drops down significantly after DBCC CheckDBWe have been monitoring some `SQLServer: Memory Manager`'s metrics, and noticed that after `DBCC CheckDB` job, metric ``` Database Cache Memory (KB) ``` drops down significantly. If to be exact, it dropped from 140 GB cached DB memory to 60 GB. And after that, slowly ramp up again during the week. (Amount of "`Free Memory KB`", went from 20 to 100 GB right after `CheckDB`) `DBCC CheckDB` is run every Sunday, so Database Cache Memory has to ramp up back again each week ``` What is the behavior of this ? Why CheckDB pushes database pages out of memory ? ``` Second question is why "`buffer cache hit ratio`" did not change after `DBCC CheckDB` completes ? It was 99.99% on average and after `DBCC CheckDB` job it drops to ~98.00%, and returns back to 99% pretty fast while I expected "`buffer cache hit ratio`" to drop significantly because database data has to be read from storage to RAM again ?
- pattern minor 112d agoFailed to allocate BUFs during DBCC CHECKDBI am writing to see if anyone can help me out with a problem that I just can't seem to figure out. This one is going to be a doosey and I am going to try and list out as much of the important info that I can but if I miss anything please let me know and I would be happy to provide whatever info you might need to be able to assist if you're willing? The symptom that I am experiencing is that I have dbcc checkdb locking up when running against a VLDB (approx 1TB) as one of the tasks in a Maintenance Plan and the error log is reporting the error: Failed to allocate BUFs: FAIL_BUFFER_ALLOCATION 7 (sometimes 8) and then the error log starts filling up with memory charts (best way I can describe it, see attached screenshot) about physical and virtual memory. So here is the scenario. We are currently beginning testing of our NEWSERVER before we migrate off our OLDSERVER. Everything is working as expected on our OLDSERVER. The problem is occurring on our NEWSERVER in the PROD instance during our nightly Maintenance Plan routine. There are multiple db's present in the instance but the one that we are concerned with is DB1. DB1 is made up of 2 data files and 1 log file. On the OLDSERVER the .mdf (519 GB) is located on H:, the .ndf (200 GB) is located on E:, and the .ldf (313 GB) on D:. On the NEWSERVER both data files are on E: and the log file is on D:. Note: I was not involved in the configuration of the database having 2 data files or their location, or the setup/configuration of either of the servers. On the OLDSERVER the maintenance plan (consisting of a Check Database Integrity task, Full Database Backup, and a Maintenance Cleanup Task and is configured to run against DB1 only) completes nightly with no issues. On the NEWSERVER the maintenance plan (set up exactly the same way) will sometimes complete but is mostly slowing to a snail's crawl (or find something even slower than a snail) and will eventually fail during the Check Database Integrity task . I can run th
- pattern minor 112d agoDistribution database corrupt - What to do?The below error message has been received from one of our SQL Server databases: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x6847f47b; actual: 0x15160a19). It occurred during a read of page (1:934295) in database ID 5 at offset 0x000001c832e000 in file 'C:\XYZ.MDF'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. Obviously I have to run `DBCC CHECKDB` on the corrupt database. Before doing that, I tried to find out what the database name was (using database id) and what object the error was related to, using and the page number that was given in the error message. So I understood that the database name was `distribution` and the table name was `MSrepl_errors`. I'm not much familiar with SQL Server replication features, so would appreciate if someone can please help me with the below questions: - In order to investigate and/or possibly fix the issue, I have to first execute `DBCC CHECKDB`; however, `CHECKDB` cannot be run without first putting the database in single user mode, is this correct? - Regarding the fact that the `distribution` database is not used by end users, do I still have to put it in the single user mode? And, if yes, what will be the impact on the replication jobs?
- pattern moderate 112d agoDBCC checkdb on tempdbIs it recommended to run DBCC CheckDB on tempdb? If yes, can someone please list out the reasons to do so? This could be on any SQL Server version. The one I have is SQL Server 2017.
- pattern minor 112d agoUnexplainable DBCC CHECKDB Command OutputMicrosoft SQL Server 2017 (RTM-CU14-GDR) (KB4494352) - 14.0.3103.1 (X64) Mar 22 2019 22:33:11 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2016 Datacenter 10.0 (Build 14393: ) (Hypervisor) My Production server executes these command inside a SQL Agent job to output results to a table in my Administration database: ``` DECLARE @SQL VARCHAR(MAX) = 'master..xp_cmdshell ''SQLCMD -E -Q"DBCC CHECKDB('+@Database+') WITH NO_INFOMSGS, ALL_ERRORMSGS"'''; -- If all is well, there will be no output. INSERT INTO DBCC_CHECKDB_Results EXEC(@SQL); ``` Last night, one line of information was output and the job stopped executing, with no errors shown in the job history. The one line of information was: "Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C." How is this possible? At 10:00 p.m. Sunday night, no one is logged onto the server. `CTRL+C` is not in the command line. As a test, I executed this command, and while it was executing, typed: `CTRL+C`: ``` USE ARC DBCC CHECKDB WITH NO_INFOMSGS, ALL_ERRORMSGS: Nothing happened. ``` Any ideas on what would cause that line of output? Thanks, Lee
- pattern minor 112d agoCan a full DBCC CHECKDB operation be safely offloaded to a restored backup on a different server?We are currently moving towards a 24/7/365 operating model - as such I am investigating our maintenance strategy. One item I'm having trouble in figuring out is DBCC CHECKDB. I have seen numerous resources report that DBCC can be offloaded by performing a FULL backup, restoring that backup on a separate SQL instance, and running our CHECKDB on that restored copy. I've also found some resources that report this is not safe due to edge-case scenarios. Is running CHECKDB safe if not run on the production database itself? If not, what parts are safe to run on a restored copy?