HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMajor

How to create Unicode parameter and variable names

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
createunicodenameshowandvariableparameter

Problem

All of this works:

CREATE DATABASE [¯\_(ツ)_/¯];
GO
USE [¯\_(ツ)_/¯];
GO
CREATE SCHEMA [¯\_(ツ)_/¯];
GO
CREATE TABLE [¯\_(ツ)_/¯].[¯\_(ツ)_/¯]([¯\_(ツ)_/¯] NVARCHAR(20));
GO
CREATE UNIQUE CLUSTERED INDEX [¯\_(ツ)_/¯] ON [¯\_(ツ)_/¯].[¯\_(ツ)_/¯]([¯\_(ツ)_/¯]);
GO
INSERT INTO [¯\_(ツ)_/¯].[¯\_(ツ)_/¯]([¯\_(ツ)_/¯]) VALUES (N'[¯\_(ツ)_/¯]');
GO
CREATE VIEW [¯\_(ツ)_/¯].[vw_¯\_(ツ)_/¯] AS SELECT [¯\_(ツ)_/¯] FROM [¯\_(ツ)_/¯].[¯\_(ツ)_/¯];
GO
CREATE PROC [¯\_(ツ)_/¯].[sp_¯\_(ツ)_/¯] @Shrug NVARCHAR(20) AS SELECT [¯\_(ツ)_/¯] FROM [¯\_(ツ)_/¯].[vw_¯\_(ツ)_/¯] WHERE [¯\_(ツ)_/¯] = @Shrug;
GO
EXEC [¯\_(ツ)_/¯].[¯\_(ツ)_/¯].[sp_¯\_(ツ)_/¯] @Shrug = N'[¯\_(ツ)_/¯]';
GO


But you can probably see where I'm going with this: I don't want @Shrug, I want @¯\_(ツ)_/¯.

Neither of these work on any version from 2008-2017:

CREATE PROC [¯\_(ツ)_/¯].[sp_¯\_(ツ)_/¯] @[¯\_(ツ)_/¯] NVARCHAR(20) AS SELECT [¯\_(ツ)_/¯] FROM [¯\_(ツ)_/¯].[vw_¯\_(ツ)_/¯] WHERE [¯\_(ツ)_/¯] = @[¯\_(ツ)_/¯];
GO
CREATE PROC [¯\_(ツ)_/¯].[sp_¯\_(ツ)_/¯] [@¯\_(ツ)_/¯] NVARCHAR(20) AS SELECT [¯\_(ツ)_/¯] FROM [¯\_(ツ)_/¯].[vw_¯\_(ツ)_/¯] WHERE [¯\_(ツ)_/¯] = [@¯\_(ツ)_/¯];
GO


So, is there a way to use unicode stored procedure parameter names?

Solution

Well, identifiers are always Unicode / NVARCHAR, so technically you can't create anything that doesn't have a Unicode name 🙃.

The problem you are having here is due entirely to the classification of the character(s) being used. The rules for regular (i.e. non-delimited) identifiers are:

  • First letter must be:



  • A letter as defined by the Unicode Standard 3.2.



  • underscore (_), at sign (@), or number sign (#)



  • Subsequent letters can be:



  • Letters as defined in the Unicode Standard 3.2.



  • Decimal numbers from either Basic Latin or other national scripts.



  • underscore (_), at sign (@), number sign (#), or dollar sign ($)



  • Embedded spaces or special characters are not allowed.



  • Supplementary characters are not allowed.



I bolded the only rules that matter in this context. The reason that the "First letter" rules are not relevant here is that the first letter in all local variables and parameters is always the "at sign" @.

And to be clear: what is considered a "letter" and what is considered a "decimal digit" is based upon the properties that each character is assigned in the Unicode Character Database. Unicode assigns many properties to each character, such as: is_uppercase, is_lowercase, is_digit, is_decimal, is_combining, etc, etc. This is not a matter of what we mortals would consider letters or decimal digits, but which characters have been assigned these properties. These properties are often used in Regular Expressions to match on "punctuation", etc. For example, \p{Lu} matches any upper-case letter (across all languages / scripts), and \p{IsDingbats} matches any "Dingbats" character.

So, in your attempt to do:

DECLARE @¯\_(ツ)_/¯ INT;


only the _ (underscore or "low line") and (Katakana Letter Tu U+30C4) characters fit into those rules. Now, all of the characters in ¯\_(ツ)_/¯ are fine for delimited identifiers, but unfortunately it seems that variable / parameter names and GOTO labels cannot be delimited (although cursor names can be).

So, for variable / parameter names, since they cannot be delimited, you are stuck with using only characters that qualify as being either "letters" or "decimal digits" as of Unicode 3.2 (well, according to the documentation; I need to test if classifications have been updated for newer versions of Unicode since classifications are handled differently than sort weights).

HOWEVER #1, things are not as straight-forward as they should be. I have now been able to complete my research and have found that the stated definition is not entirely correct. The precise (and verifiable) definition of which characters are valid for regular identifiers is:

-
First character:

  • Can be anything classified in Unicode 3.2 as "ID_Start" (which does include "Letters" but also "letterlike numeric characters")



  • Can be _ (low line / underscore) or _ (fullwidth low line)



  • Can be @, but only for variables / parameters



  • Can be #, but if schema-bound object, then only for Tables and Stored Procedures (in which case they indicate that the object is temporary)



-
Subsequent characters:

  • Can be anything classified in Unicode 3.2 as "ID_Continue" (which includes "decimal" numbers, but also "spacing and nonspacing combining marks", and "connecting punctuation marks")



  • Can be @, #, or $



  • Can be any of the 26 characters classified in Unicode 3.2 as format control characters



(fun fact: the "ID" in "ID_Start" and "ID_Continue" stands for "Identifier". Imagine that ;-)

According to "Unicode Utilities: UnicodeSet":

-
Valid starting characters

[:Age=3.2:] & [:ID_Start=Yes:]

-- Test one "Letter" from each of 10+ languages, as of Unicode 3.2
DECLARE @ᔠᑥᑒᏯשፙᇏᆇᄳᄈლဪඤaൌgೋӁウﺲﶨ   INT;
-- works

-- Test a Supplementary Character that is a "Letter" as of Unicode 3.2
DECLARE @ INT;-- Mathematical Script Capital W (U+1D4B2)
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '0xd835'.
*/


-
Valid continuation characters

[:Age=3.2:] & [:ID_Continue=Yes:]

-- Test various decimal numbers, but none are Supplementary Characters
DECLARE @६৮༦൯௫୫9 INT;
-- works (including some Hebrew and Arabic, which are right-to-left languages)

-- Test a Supplementary Character that is a "decimal" number as of Unicode 3.2
DECLARE @ INT; -- MATHEMATICAL DOUBLE-STRUCK DIGIT FOUR (U+1D7DC)
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '0xd835'.
*/
-- D835 is the first character in the surrogate pair D835 DFDC that makes up U+1D7DC


HOWEVER #2, not even searching the Unicode database can be that easy. Those two searches do produce a list of valid characters for those categorizations, and those characters are from Unicode 3.2, BUT the definitions of the various categorizations changes across versions of the Unicode Standard. Meaning, the definition of "ID_Start" in Unicode v 10.0 (what that search is using today, 2018-03-26) is not what it was in Unicode v 3.2. So, the online search cannot provide an exact list. But you can

Code Snippets

DECLARE @¯\_(ツ)_/¯ INT;
-- Test one "Letter" from each of 10+ languages, as of Unicode 3.2
DECLARE @ᔠᑥᑒᏯשፙᇏᆇᄳᄈლဪඤaൌgೋӁウﺲﶨ   INT;
-- works


-- Test a Supplementary Character that is a "Letter" as of Unicode 3.2
DECLARE @ INT;-- Mathematical Script Capital W (U+1D4B2)
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '0xd835'.
*/
-- Test various decimal numbers, but none are Supplementary Characters
DECLARE @६৮༦൯௫୫9 INT;
-- works (including some Hebrew and Arabic, which are right-to-left languages)


-- Test a Supplementary Character that is a "decimal" number as of Unicode 3.2
DECLARE @ INT; -- MATHEMATICAL DOUBLE-STRUCK DIGIT FOUR (U+1D7DC)
/*
Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near '0xd835'.
*/
-- D835 is the first character in the surrogate pair D835 DFDC that makes up U+1D7DC
SELECT UNICODE('ツ'); -- 12484

Context

StackExchange Database Administrators Q#201598, answer score: 44

Revisions (0)

No revisions yet.