HiveBrain — Knowledge Dashboard

pattern minor by @import:stackexchange-dba 112d ago

What is the collation used while comparing Unicode string literals in SQL Server 2019?

sqldbasql-serverstackoverflowencoding

snippet minor by @import:stackexchange-dba 112d ago

How to handle short UUIDs with Postgres?

I see that many web services (Stripe comes to mind) use a special encoding for their UUIDs. Instead of the usual encoding `a44521d0-0fb8-4ade-8002-3385545c3318` they are going to be encoded using a special alphabet (usually lowercase, uppercase letters and numbers), which results in a UUID like `mhvXdrZT4jP5T8vBxuvm75`. Both represent the same UUID but with a different encoding. I'm wondering how these services handle these IDs? Is it possible with Postgres in particular to directly use this short ID encoding in the database? (in other words, any select or insert would use the short ID) Or would it make sense to save these short IDs directly as text in the database? I can't find much info about this so I'm not sure what's the best practice is here.

postgresqlsqlstackoverflowencodinguuid

debug minor by @import:stackexchange-dba 112d ago

Is it possible to use OPENROWSET to import fixed width UTF8 encoded files?

I have an example data file with following contents and saved with UTF8 encoding. ``` oab~opqr öab~öpqr öab~öpqr ``` The format of this file is fixed width with columns 1 to 3 each being allocated 1 character and column 4 reserved 5 characters. I have created an XML format file as below ``` ``` Disappointingly running the following SQL... ``` SELECT * FROM OPENROWSET ( BULK 'mydata.txt', FORMATFILE = 'myformat_file.xml', CODEPAGE = '65001' ) AS X ``` Produces the following results ``` Col1 Col2 Col3 Col4 ---- ---- ---- ----- o a b ~opqr � � a b~öp � � a b~öp ``` from which I conclude the `LENGTH` is counting bytes rather than characters. Is there any way I can get this working correctly for fixed character widths with UTF8 encoding? (Target environment is Azure SQL Database reading from Blob storage) NB: It was suggested in the comments that adding `COLLATION="LATIN1_GENERAL_100_CI_AS_SC_UTF8"` to the `FIELD` elements might help but the results remain unchanged with this.

azure-sql-databasesqlsql-serverstackoverflowencoding

pattern moderate by @import:stackexchange-dba 112d ago

Msg 6355 "Conversion of one or more characters from XML to target collation impossible" when querying sys.dm_exec_query_plan

I like to find missing indexes on the go, looking at the execution plans! It can potentially give me an indication where further to look at if I want to improve something that is currently running. For doing this I use the following query: ``` SET NOCOUNT ON SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED SELECT er.session_id, er.blocking_session_id, er.start_time, er.status, dbName = DB_NAME(er.database_id), er.wait_type, er.wait_time, er.last_wait_type, er.granted_query_memory, er.reads, er.logical_reads, er.writes, er.row_count, er.total_elapsed_time, er.cpu_time, er.open_transaction_count, er.open_transaction_count, s.text, qp.query_plan, logDate = CONVERT(DATETIME,GETDATE()), logTime = CONVERT(DATETIME,GETDATE()) FROM sys.dm_exec_requests er CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) s CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) qp WHERE CONVERT(VARCHAR(MAX), qp.query_plan) LIKE '%<missing%' ``` It generally works fine; however, I have recently encountered a problem with collation and XML: Error message says: `Msg 6355, Level 16, State 1, Line 40 Conversion of one or more characters from XML to target collation impossible ` I have already found out what is causing it: ``` -- get only the applications from Italy: exec usp_sel_outstandingItems @startdate='2021-04-07 00:00:00', @endDate='2021-08-15 00:00:00', @statusDateStart=NULL, @statusDateEnd=NULL, @office=N'UK', @country=N'IT ', @userState=N'ParticipantPlaced', @outstandingBalance=0 ``` My question is: What inside my query gets

sqldbasql-serverxmlstackoverflow

pattern minor by @import:stackexchange-dba 112d ago

Does a huge key length value for a mulibyte column affect the index performance?

When I look at the `EXPLAIN` results, the `key len` value is always calculated based on the actual column length multiplied on the maximum number of bytes for the chosen encoding. Say, for a `varchar(64)` using `utf8` encoding the key len is 192. Does this number affect performance in any way and should I reduce it when possible? I mean, does it make MySQL to reserve some space somewhere that remain unused, or it's just a maximum possible value while the used space is based on the exact data length? So the actual question is: if I have a column that contains only Latin letters and numbers, should I change its encoding to `latin1` from `utf8` in regard of the space occupied by the index/overall index performance?

explainsqldbastackoverflowmysql

principle minor by @import:stackexchange-dba 112d ago

Does MySQL 8 ASCII vs utf8mb4_0900_ai_ci size differ when only using ASCII characters?

If I only use only ASCII characters, will `VARCHAR (255)` with `utf8mb4_0900_ai_ci` be larger on disk than `VARCHAR (255)` using ASCII?

sqldbamysql-8.0stackoverflowmysql

pattern minor by @import:stackexchange-dba 112d ago

Query to find rows containing ASCII characters in a given range

I am using some scripts from another topic, but the accepted answer isn't working for all my data scenarios. I would have asked my question on the original How to check for Non-Ascii Characters post, but I don't have enough reputation to comment or up-vote as of yet. Questions: - Why is the accepted answer not working for char(150)? - Is there a way to fix the accepted answer? My Testing I created SQL Fiddle with sample data, the stored procedure from one of the answers, and queries to demonstrate the issue. Query 1: sample_table ``` -- Note: The "bad dash" row has char(150) SELECT * FROM sample_table; +-------------------+ | DataColumn | +-------------------+ | test - good dash | | test – bad dash | +-------------------+ ``` Query 2: The other answer by John shows the "bad dash" row containing char(150): ``` SELECT dbo.Find_Invalid_Chars(DataColumn) [Invalid Characters] FROM sample_table WHERE dbo.Find_Invalid_Chars(DataColumn) IS NOT NULL; +----------------------+ | Invalid Characters | +----------------------+ | test [150] bad dash | +----------------------+ ``` Query 3: The accepted answer by Martin Smith returns no results: ``` SELECT DataColumn AS [Bad Data] FROM sample_table WHERE DataColumn LIKE '%[' + CHAR(127)+ '-' +CHAR(255)+']%' COLLATE Latin1_General_100_BIN2; +------------+ | [Bad Data] | +------------+ -- No rows returned. ``` Conclusion Unfortunately, I often need to find characters within (or outside of) a range in databases I can't create stored procedures in. I'd really like to find a fix for the accepted answer or a simple script that wouldn't require creation of any objects (including temp tables). Any suggestions? Thanks in advance. EDIT 1: The solution cannot modify or add any objects or settings in the database. I'm looking for a self-contained query that will select rows with one or more characters in a range between two `CHAR()` numbers, regardless of the ASCII or Extended ASCII number supp

sqldbasql-servert-sqlstackoverflow

pattern minor by @import:stackexchange-dba 112d ago

Different characters, same ASCII code?

I have this query that throws two results: ``` SELECT id FROM table1 WHERE id like 'nm041033%' ``` - nm0410331 - nm0410331 And this slightly different query that throws only one result: ``` SELECT id FROM table1 WHERE id='nm0410331' ``` - nm0410331 I tried to check the ASCII of the last character and got the same: ``` SELECT id,ascii(substr(id,9,1)) FROM table1 WHERE id like 'nm041033%' ``` - nm0410331 49 - nm0410331 49 I guess it is a rare encoding problem. How can I solve it? PS: The field `id` is a primary key. The charset is `latin1_general_ci`, and the values were inserted using PHP `utf8_decode()`. UPDATE: I changed the charset to `ascii_general_ci`, and now this query gives me zero results: ``` SELECT id FROM table1 WHERE id='nm0410331' ``` However, those two ids are not the same yet. If I use `SELECT DISTINCT` or `GROUP BY` I get two rows. PS: The last character isn't the number you can type with the keyboard.

sqlvarcharstackoverflowmysqlencoding

pattern minor by @import:stackexchange-dba 112d ago

Is there a MySQL character set and encoding that will allow for both emojis and accents?

I've got a database of terms that get added to by one group of users, and queried against by another. I was running into problems when people would query for an emoji in the database and my React app would throw an error (oddly a CORS error, but that's a different issue). I eventually solved this by changing my table's character set to `utf8mb4` with `utf8mb4_unicode_ci` collation. Now I'm getting errors when adding new terms saying, for example, that a duplicate row for "beyoncé" already exists. However, when I query the db for "beyoncé", it returns the row with "beyonce" in it. Is there a combination of charset and collation that can handle this properly? My DB is MySQL 5.6.40 running on Amazon RDS.

sqldbastackoverflowmysqlencoding

gotcha major by @import:stackexchange-dba 112d ago

PostgreSQL: difference between collations 'C' and 'C.UTF-8'

In PostgreSQL, what is the difference between collations `C` and `C.UTF-8`? Both show up in rows of `pg_collation`. Is it perhaps the case that `C.UTF-8` is the same as `C` with encoding `UTF-8` regardless or what the actual encoding of a database is?

postgresqlsqldbastackoverflowencoding