Recent Entries 10
- pattern minor 112d agoWhat is the collation used while comparing Unicode string literals in SQL Server 2019?My understanding is that the collation for comparing Unicode string literals is determined by the database collation. My database is using `SQL_Latin1_General_CP1_CI_AS` collation. When I compare N'ß' to 'ss', I expect the comparison to fail. But it does not. I am trying to figure out why. Here is the reproduction: ``` set nocount on go use tempdb go SELECT @@version as SqlServerVersion, CONVERT(nvarchar(128), SERVERPROPERTY('collation')) as SqlServerCollation, DB_NAME() AS DatabaseName ,DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS CollationUsedBySQLServerDatabase GO declare @ss varchar(255) = 'ss' declare @Nscharfess nvarchar(255) = N'ß' declare @scharfess varchar(255) = 'ß' select case when @Nscharfess = @ss then 'Unicode : Strings match' else 'Unicode : Strings do not match' end, case when @scharfess = @ss then 'SQL_Latin1_General_CP1_CI_AS : Strings match' else 'SQL_Latin1_General_CP1_CI_AS : Strings do not match' end ``` Output: `SqlServerVersion Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64) Sep 24 2019 13:48:23 Copyright (C) 2019 Microsoft Corporation Standard Edition (64-bit) on Windows 10 Enterprise 10.0 (Build 19045: ) (Hypervisor) SqlServerCollation SQL_Latin1_General_CP1_CI_AS DatabaseName tempdb CollationUsedBySQLServerDatabase SQL_Latin1_General_CP1_CI_AS Unicode : Strings match SQL_Latin1_General_CP1_CI_AS : Strings do not match Completion time: 2024-02-15T00:16:59.1968376-08:00 ` I expect 'ß' and 'ss' to match but not if my database is at SQL_Latin1_General_CP1_CI_AS. I wanted to check if there was a gap in my understanding which currently stands as "For Nvarchar() or Nchar() datatype, the SQL Server grabs the collation from the database setting". This translates to "my database is at SQL_Latin1_General_CP1_CI_AS and thus the comparison should fail. If it is succeeding, what collation is the SQL Server using? And why? It is still not
- snippet minor 112d agoHow to handle short UUIDs with Postgres?I see that many web services (Stripe comes to mind) use a special encoding for their UUIDs. Instead of the usual encoding `a44521d0-0fb8-4ade-8002-3385545c3318` they are going to be encoded using a special alphabet (usually lowercase, uppercase letters and numbers), which results in a UUID like `mhvXdrZT4jP5T8vBxuvm75`. Both represent the same UUID but with a different encoding. I'm wondering how these services handle these IDs? Is it possible with Postgres in particular to directly use this short ID encoding in the database? (in other words, any select or insert would use the short ID) Or would it make sense to save these short IDs directly as text in the database? I can't find much info about this so I'm not sure what's the best practice is here.
- debug minor 112d agoIs it possible to use OPENROWSET to import fixed width UTF8 encoded files?I have an example data file with following contents and saved with UTF8 encoding. ``` oab~opqr öab~öpqr öab~öpqr ``` The format of this file is fixed width with columns 1 to 3 each being allocated 1 character and column 4 reserved 5 characters. I have created an XML format file as below ``` ``` Disappointingly running the following SQL... ``` SELECT * FROM OPENROWSET ( BULK 'mydata.txt', FORMATFILE = 'myformat_file.xml', CODEPAGE = '65001' ) AS X ``` Produces the following results ``` Col1 Col2 Col3 Col4 ---- ---- ---- ----- o a b ~opqr � � a b~öp � � a b~öp ``` from which I conclude the `LENGTH` is counting bytes rather than characters. Is there any way I can get this working correctly for fixed character widths with UTF8 encoding? (Target environment is Azure SQL Database reading from Blob storage) NB: It was suggested in the comments that adding `COLLATION="LATIN1_GENERAL_100_CI_AS_SC_UTF8"` to the `FIELD` elements might help but the results remain unchanged with this.
- pattern moderate 112d agoMsg 6355 "Conversion of one or more characters from XML to target collation impossible" when querying sys.dm_exec_query_planI like to find missing indexes on the go, looking at the execution plans! It can potentially give me an indication where further to look at if I want to improve something that is currently running. For doing this I use the following query: ``` SET NOCOUNT ON SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED SELECT er.session_id, er.blocking_session_id, er.start_time, er.status, dbName = DB_NAME(er.database_id), er.wait_type, er.wait_time, er.last_wait_type, er.granted_query_memory, er.reads, er.logical_reads, er.writes, er.row_count, er.total_elapsed_time, er.cpu_time, er.open_transaction_count, er.open_transaction_count, s.text, qp.query_plan, logDate = CONVERT(DATETIME,GETDATE()), logTime = CONVERT(DATETIME,GETDATE()) FROM sys.dm_exec_requests er CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) s CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) qp WHERE CONVERT(VARCHAR(MAX), qp.query_plan) LIKE '%<missing%' ``` It generally works fine; however, I have recently encountered a problem with collation and XML: Error message says: `Msg 6355, Level 16, State 1, Line 40 Conversion of one or more characters from XML to target collation impossible ` I have already found out what is causing it: ``` -- get only the applications from Italy: exec usp_sel_outstandingItems @startdate='2021-04-07 00:00:00', @endDate='2021-08-15 00:00:00', @statusDateStart=NULL, @statusDateEnd=NULL, @office=N'UK', @country=N'IT ', @userState=N'ParticipantPlaced', @outstandingBalance=0 ``` My question is: What inside my query gets
- pattern minor 112d agoDoes a huge key length value for a mulibyte column affect the index performance?When I look at the `EXPLAIN` results, the `key len` value is always calculated based on the actual column length multiplied on the maximum number of bytes for the chosen encoding. Say, for a `varchar(64)` using `utf8` encoding the key len is 192. Does this number affect performance in any way and should I reduce it when possible? I mean, does it make MySQL to reserve some space somewhere that remain unused, or it's just a maximum possible value while the used space is based on the exact data length? So the actual question is: if I have a column that contains only Latin letters and numbers, should I change its encoding to `latin1` from `utf8` in regard of the space occupied by the index/overall index performance?
- principle minor 112d agoDoes MySQL 8 ASCII vs utf8mb4_0900_ai_ci size differ when only using ASCII characters?If I only use only ASCII characters, will `VARCHAR (255)` with `utf8mb4_0900_ai_ci` be larger on disk than `VARCHAR (255)` using ASCII?
- pattern minor 112d agoQuery to find rows containing ASCII characters in a given rangeI am using some scripts from another topic, but the accepted answer isn't working for all my data scenarios. I would have asked my question on the original How to check for Non-Ascii Characters post, but I don't have enough reputation to comment or up-vote as of yet. Questions: - Why is the accepted answer not working for char(150)? - Is there a way to fix the accepted answer? My Testing I created SQL Fiddle with sample data, the stored procedure from one of the answers, and queries to demonstrate the issue. Query 1: sample_table ``` -- Note: The "bad dash" row has char(150) SELECT * FROM sample_table; +-------------------+ | DataColumn | +-------------------+ | test - good dash | | test – bad dash | +-------------------+ ``` Query 2: The other answer by John shows the "bad dash" row containing char(150): ``` SELECT dbo.Find_Invalid_Chars(DataColumn) [Invalid Characters] FROM sample_table WHERE dbo.Find_Invalid_Chars(DataColumn) IS NOT NULL; +----------------------+ | Invalid Characters | +----------------------+ | test [150] bad dash | +----------------------+ ``` Query 3: The accepted answer by Martin Smith returns no results: ``` SELECT DataColumn AS [Bad Data] FROM sample_table WHERE DataColumn LIKE '%[' + CHAR(127)+ '-' +CHAR(255)+']%' COLLATE Latin1_General_100_BIN2; +------------+ | [Bad Data] | +------------+ -- No rows returned. ``` Conclusion Unfortunately, I often need to find characters within (or outside of) a range in databases I can't create stored procedures in. I'd really like to find a fix for the accepted answer or a simple script that wouldn't require creation of any objects (including temp tables). Any suggestions? Thanks in advance. EDIT 1: The solution cannot modify or add any objects or settings in the database. I'm looking for a self-contained query that will select rows with one or more characters in a range between two `CHAR()` numbers, regardless of the ASCII or Extended ASCII number supp
- pattern minor 112d agoDifferent characters, same ASCII code?I have this query that throws two results: ``` SELECT id FROM table1 WHERE id like 'nm041033%' ``` - nm0410331 - nm0410331 And this slightly different query that throws only one result: ``` SELECT id FROM table1 WHERE id='nm0410331' ``` - nm0410331 I tried to check the ASCII of the last character and got the same: ``` SELECT id,ascii(substr(id,9,1)) FROM table1 WHERE id like 'nm041033%' ``` - nm0410331 49 - nm0410331 49 I guess it is a rare encoding problem. How can I solve it? PS: The field `id` is a primary key. The charset is `latin1_general_ci`, and the values were inserted using PHP `utf8_decode()`. UPDATE: I changed the charset to `ascii_general_ci`, and now this query gives me zero results: ``` SELECT id FROM table1 WHERE id='nm0410331' ``` However, those two ids are not the same yet. If I use `SELECT DISTINCT` or `GROUP BY` I get two rows. PS: The last character isn't the number you can type with the keyboard.
- pattern minor 112d agoIs there a MySQL character set and encoding that will allow for both emojis and accents?I've got a database of terms that get added to by one group of users, and queried against by another. I was running into problems when people would query for an emoji in the database and my React app would throw an error (oddly a CORS error, but that's a different issue). I eventually solved this by changing my table's character set to `utf8mb4` with `utf8mb4_unicode_ci` collation. Now I'm getting errors when adding new terms saying, for example, that a duplicate row for "beyoncé" already exists. However, when I query the db for "beyoncé", it returns the row with "beyonce" in it. Is there a combination of charset and collation that can handle this properly? My DB is MySQL 5.6.40 running on Amazon RDS.
- gotcha major 112d agoPostgreSQL: difference between collations 'C' and 'C.UTF-8'In PostgreSQL, what is the difference between collations `C` and `C.UTF-8`? Both show up in rows of `pg_collation`. Is it perhaps the case that `C.UTF-8` is the same as `C` with encoding `UTF-8` regardless or what the actual encoding of a database is?