Hi all, this is REALLY weird, I can t seem to make heads or tails of it. but from my understanding the each datatype has set contrstraint assigned to it ( example int datatype can only except non-decimal numeric values ) as well as a datetime datatype can only except a vaild date.
Though oddly i have found the following dates in a table ( in addtion the when aby kind of data operation is performed on the table sql returns the following error:
Server: Msg 8630, Level 16, State 1, Line 1
Internal Query Processor Error: The query processor encountered an unexpected error during execution.
Here are some of the dates that were present in the table:
DateColumn1
--
1900-01-01 857:44:45.813
1900-01-01 872:51:16.427
1900-01-01 872:54:57.440
1900-01-01 873:09:32.107
1900-01-01 873:13:10.560
1900-01-01 873:16:49.867
1900-01-01 888:27:00.640
DateColumn2
-22063-05-18 00:00:00.000
-20285-02-03 00:00:00.000
In addtion there were some other columns that have had odd data in them(VERY WEIRD)
Char25Column
?Q307000
?Q307000
?Q307000
?Q307000
?Q307000
Any Thoughs?
Thanks
Have you run a DBCC CHECKDB lately? You might have some issue with your hardware, storage, or perhaps some 3rd party filter that's somehow modifying your data.
What version of SQL Server is this? If it is SQL Server 2005 hopefully you have database page checksums enabled so that if something does scribble on database pages, then the changes would be detected the next time you read those pages.
Do you have any XPs (extended stored procedures)? A poorly written/tested XP might scribble across SQL Server's memory.
Check your overall system and consider a call to product support. Also, you probably want to check your recent backups to see if the errors occur there as well.
Don
Hi Don, thanks for the quick reply,to answer your question, i have ran DBCC table checks on the table in question and as expected consistancy error were found.( i would have posted them before but I tried to make the post a readable as possible.) (i'll post them below) the odd thing here is the fact that SQL allowed the corrupted data to be inserted in the first place. i mean you would think that SQL servver would throw up an error or two.
As far as the extended sp yes but they wouldnt have been called during the this data input.(btw was push in via informatic) then again reguardless of the client thats pushing and puling the data, SQL server should have had the last called as to what data got commited.
Here are the dbcc table check results( i have ommited a lot of the extract error that were repeated /per row)
DBCC results for 'tLoadedTable'.
Msg 2570, Level 16, State 3, Line 1
Page (85:10), slot 0 in object ID 1762977507, index ID 0, partition ID 72057594677035008, alloc unit ID 72057594649313280 (type "In-row data"). Column "PaidAmt" value is out of range for data type "decimal". Update column to a legal value.
Msg 2570, Level 16, State 3, Line 1
Page (85:10), slot 0 in object ID 1762977507, index ID 0, partition ID 72057594677035008, alloc unit ID 72057594649313280 (type "In-row data"). Column "PaidDt" value is out of range for data type "datetime". Update column to a legal value.
Msg 2570, Level 16, State 3, Line 1
Page (85:10), slot 0 in object ID 1762977507, index ID 0, partition ID 72057594677035008, alloc unit ID 72057594649313280 (type "In-row data"). Column "InvoiceDt" value is out of range for data type "datetime". Update column to a legal value.
Msg 2570, Level 16, State 3, Line 1
Page (85:10), slot 0 in object ID 1762977507, index ID 0, partition ID 72057594677035008, alloc unit ID 72057594649313280 (type "In-row data"). Column "LoadDate" value is out of range for data type "datetime". Update column to a legal value.
Msg 2570, Level 16, State 3, Line 1
Table error: object ID 1762977507, index ID 0, partition ID 72057594677035008. A row should be on partition number 1 but was found in partition number 10. Possible extra or invalid keys for:
Msg 8988, Level 16, State 1, Line 1
Row (85:10:0) identified by (HEAP RID = (85:10:0)).
Too many errors found (201) for object ID 1762977507. To see all error messages rerun the statement using "WITH ALL_ERRORMSGS".
There are 133508601 rows in 9102393 pages for object "tLoadedTable".
CHECKTABLE found 0 allocation errors and 336633 consistency errors in table 'tLoadedTable' (object ID 1762977507).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Thanks
|||Im running SQL server 2005 Enterprise SP 2, in addtion this has also happened on another database on a different server also running SQL 2005 SP 2 on a different SAN ( same client though)
Thanks
|||Don, sorry I missed your question about the page verify mode, currently i running all databases in torn_page_verify mode, were mostly a OLAP shop and dont have a lot of rouge client make data modifications. though based on this little bout i may be changing it to page checksums ( though there seems to be an considerable overhead due to the constaint checksum verifications right? )
Thanks
|||Since DBCC has detected over 360,000 anomalous data values it's quite likely that you have hardware/storage issues. I suggest you get that looked at before you make any further changes to the data.
It may be that you have a wayward XP that's scribbling outside its intended address space. An XP can write anywhere in SQL Server's writable memory, so even if an XP wasn't supposed to be writing on these pages, it might be doing so accidentally. Of course, a well-written and well-tested XP should do this, but it could have a bug or two that's pointing it in the wrong direction. There's nothing that SQL Server can do to keep your XP from scribbling where it shouldn't, other than replacing XPs with CLR modules as we've enabled with SS2005.
At least page checksums can detect that what was read is not the same as what was written. In your case it would detect these issues if they were caused by the IO system. Torn-page detection might detect an IO problem here, but only if the IO system happened to flip some bits near the torn-page bits on a page (there are only 32 of these bits on an 8K page, so the IO problem could well miss those).
And yes, there is some overhead for the page checksums, but you also would have detected this issue much sooner (if indeed it is an IO issue). You'd have to do some tests to see what the overhead would be for your system; it's hard to predict as it depends on cache hit probabilities and other factors .. but often people find it to be less than they expect .. or they're willing to "pay the price" for it since it will detect issues with the IO/storage system.
So: I suggest you investigate your IO system for issues (check out sqliosim and use it to test your system), and that you double- triple check any XPs to make sure they aren't generating those strings of bytes that you're finding in your dates and decimals (check for any strings/varchars as well .. perhaps it's writing a string of readable characters that you'll be able to use to track down the source of the issues).
Don
|||Hi Don,
Again thanks for the reply.
As far as IO error there were none, ( funny you'd mentioned the sqliosim utility i had ran a check earlier in the week with no errors found) another thing is that this issue is occurring on two separate servers that also reside on separate Sans, oddly the issue occurred on the exact same partition boundary. as it sits I have a trace running on the entire process and found no evidence that any xp or clr routines were being called. ( just the simple sp_prepare and execute calls for table collation definition information. later followed with a API insert bulk process" that shows no errors during the loading)
Well I have been able to re-produce the issue yet its very weird.
Ok, so if I hadn’t mentioned it before the table in question is partitioned. its partition scheme is also shared by other table ( yet they don’t have the same issue.) the data load is being pushed in via infromatioca API insert bulk call. regardless of which partitioned key is used is still allows the records to be entered yet with a dbcc checktable or simple select is performed on the table it fails and returns the above error.
--
I attempting to loading to other partitioned boundaries and some succeeded with no corruption other succeeded with corruption. the thing that really bothers me is that SQL is allowing this errors to be committed. I have checked the input data and the errors are not there so SQL server has to be corrupting them when the insert bulk block is read into memory.(yet again you would think SQL server would be checking it here)
--test 1
I created another table with the same structure with in the same database but did not add it to a partition scheme( just left it on the primary file group( rather PrimaryData file group, I keep the system objects and user object separate) I ran the same ETL loading process into this new table with no errors and the data was clean( I was able to select against it with no issues and the data look good as expected.)
--test 2
I created another database on the built the same table with partitioned scheme and function within it ( so basically the environment is exactly the same as production minus all other table and the extra data within the partitioned table.)
I ran the same ETL load process against the new database... and it loaded fine, no error no corruption.. ughhh!!!
-- additional
I have ran dbcc dbcheck numerous time yet no errors are even returned. i have placed a call with MS but they seem to think its the third party tool that causing the issue ( though I’m not convinced due to I’m able to load into a non-partitioned table and a partitioned table another database on the same server same disk with no issues.)
YIKES!!!!
Thanks
|||Yes, it does seem very odd. It would seem that even though a 3rd party API was being used, it shouldn't be able to insert bad values into a datatype. That's why I suspect either an IO issue or an XP issue since the bytes seem be be changed outside of SQL Server's actual control.
I suggest you pursue this some more with MS support together with the 3rd party .. perhaps the 3rd party API is using some XP of its own (some XPs are named sp_xxx, so that can seem like no XPs are being called if you're expecting xp_xxx).
Don
|||Hi Don, Thanks again for your reply,
I have had my SAN admin review the SAN log to look for any kind latency or stale IO issues ( so far he has been able to find anything. I my self haven’t found anything at the OS or SQL level that you suggest IO or memory issues...) very odd. I have though been able to find out that the insert bulk API that the third party vendor had been using was indeed a odbc call to the bcp.exe, funny though i have since the data being submitted ( I’ve captured the network packets to review the data that was being sent to the bcp process, all is well so the corrupted data is not coming from the third party tool ( potentially its could be the memory block that the odbc driver has consumed when the connection was made to the api. perhaps sql is indeed seeing the correct data value verifying that they are compliant and during the commit process of writing the data to physical disk the memory block is being corrupted by some low level IO driver filter.
In addition after reviewing the trace( I know not all events are trapped by SQL server) again the only operations that are called were the sp_prepare and sp_execute which were unprepared before the api called was made. also in reviewing the third party tools model it show no other sql server functions are called ( just internal application calls)
One other thing was weird, the fact that I was able to load into a non-partitioned table with the same load process (third party tool) as well as another database partitioned table (same scheme, same function, same disk, same server) without data corruption.
Who knows? I’m still perusing the MS avenue and will post my findings( so far it my been a MS bug but there escalation team isn’t in until Monday.)
Meanwhile I have been attempting to read up on driver filters (interesting enough they can modify or FILTER data prior to persisting the data ( i guess it happens in the transfer process from memory to bus to disk.)
Though if anyone has any thought please do tell...
Thanks
|||Could you send me an email so we can look into this a bit more? DonV@.microsoft.com.(nospam)
Don
No comments:
Post a Comment