Recent Entries 10
- pattern minor 112d agoPostgreSQL - Index efficiently on REGEX_REPLACE()I have a query which is designed to loop and search addresses for duplicates, the query uses REGEX_REPLACE. I am trying to index on the regex as on doing an explain and its doing a sequential scan on the user_property table with a filter on the regex ``` EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) with user_detail AS ( SELECT user_id, max(user_property_value) FILTER (WHERE user_property_type_id = 6 ) AS FIRST_NAME, max(user_property_value) FILTER (WHERE user_property_type_id = 7 ) AS LAST_NAME, max(TO_DATE(user_property_value, 'YYYY-MM-DD')) FILTER (WHERE user_property_type_id = 8 ) AS DOB, max(user_property_value) FILTER (WHERE user_property_type_id = 33 ) AS BIRTH_NUMBER FROM PUBLIC.user_property cp JOIN PUBLIC.user c using (user_id) WHERE c.user_group_id= '38' AND cp.user_property_is_active GROUP BY user_id ), duplicate as ( SELECT COALESCE(MAX( CASE WHEN REGEXP_REPLACE((address_line1), E'\\_|\\W','','g') = 'Flat 25 Arliss Court 24' AND ( COALESCE(REGEXP_REPLACE((address_line2), E'\\_|\\W','','g'), '') = '' OR REGEXP_REPLACE((address_line2), E'\\_|\\W','','g') = 'Calderon Road' ) AND REGEXP_REPLACE((address_place), E'\\_|\\W','','g') = 'Dartford' AND address_country_code = 'GB' THEN 1 ELSE 0 END), 0) AS dup_name_address, COALESCE(MAX(CASE WHEN REGEXP_REPLACE(UPPER(address_postcode), E'\\_|\\W','','g') = 'WD17 1JY' THEN 1 ELSE 0 END), 0) AS dup_name_postcode FROM user_detail cd LEFT JOIN PUBLIC.address ad ON cd.user_id = ad.user_id WHERE ( (REGEXP_REPLACE(UPPER(cd.FIRST_NAME), E'\\_|\\W', '', 'g') = 'Clyde' AND REGEXP_REPLACE(UPPER(cd.LAST_NAME), E'\\_|\\W', '', 'g') = 'Len') OR (REGEXP_REPLACE(UPPER(cd.LAST_NAME), E'\
- snippet moderate 112d agoHow can you replace the last occurrence of a character in a string (text column)?Let's say I have a text field with this value in Postgres: ``` 'bar$foo$john$doe$xxx' ``` I'd like to replace just the last occurrence of the dollar (`$`) character with another character, for example '-'. After the replacement, the contents of the field should be: ``` 'bar$foo$john$doe-xxx' ```
- pattern minor 112d agoAlternate Solution For REPLACE() In ntext ColumnOverview I have a picture database that catalogue information of pictures taken with cameras. The database contains a table `pictures` which has 256 columns which contain information about the picture that has been taken. One column `Comments` is formatted as `ntext` and contains CR/LFs. There are another 21 columns which have been configured as `ntext`. I extract the data from the database into a flat file using the `Tasks | Export Data...` function found in SSMS. The exported data is then transferred by an external partner to a new system which will be used in the near future. The export file (CSV) contains some 256 columns with 21 columns that could possibly contain CR/LFs. Problem The `Comments` field/column contains numerous CR/LFs (SQL: `CHAR(13)`, `CHAR(10)`) which is impacting the analysis of the data. I tried using the `REPLACE(expression, value, value)` to search for the CR/LF and replace it with `@@` and was thinking of implementing this during the export using `Export Data` in SSMS. However, the `REPLACE()` function returns a Msg 8116, Level 16, State 1, Line 4 Argument data type ntext is invalid for argument 1 of replace function. ...when I execute something like: ``` SELECT 'Start *******************', REPLACE(Comment,' ','@@'), ID, '********************End' FROM dbo.pictures WHERE Comment LIKE '% %'; ``` Sample Data From Column Comment Issuing the following statement: ``` SELECT Comment FROM dbo.Pictures WHERE Comment like '% %'; ``` ...will retrieve the following sample `Comment` record: ``` Zwei Fotos von Arenenberg auf einer Seite einer englischen Zeitschrift. Seite 148 der Zeitung "The Graphic" vom 4. August 1906 = News from Abroad. "The last stage of all": the retreat for aged actors opened last week near Meaux, in France 1. General view of the home 2. M. Coquelin reciting in the open-air theater The château of Arenenberg which has been presented by the Empress Eugénie to the canton of Thurgovie 3. View from the chateau [Arenenb
- snippet minor 112d agoLarge hexadecimal PRIMARY KEY - how to index substrings of the PK field?I have a requirement for a very large (in terms of the number of rows, not data) database, and need to use a very long, unique, hexadecimal `PRIMARY KEY` instead of the usual `AUTO_INCREMENT` integer. As an example, lets say that the `PK` is similar to an IPv6 address, say something like `2001:0db8:0000:0000:0000:8a2e:0370:7334`. So, apart from the usual searching by the `PK`, there will be a need to search by subfields of it - i.e. find all keys, that begin with "2001" (covered by the `PK`- should be `sargeable`), or (say) has `0db8` in their second set of four hex characters, and then to reference this 2nd key from other tables. What is the best way to store such a primary key (assume that I am using MariaDB or MySQL)? My thoughts are to do one of the following: - one big column, - have 8 columns and make a composite key, - or something else. From a general database point of view, what would be the best technical solution to server my requirements? Apart from the purely technical aspects of the question above, I would also be interested in any recommendations that users here might have as to the best database engine to use in this case? I will only have a couple more tables, that will have `FOREIGN KEY` references to the long hex `PRIMARY KEY`. I can do without the `DRI` (Declarative Referential Integrity) if there are major performance benefits
- pattern minor 112d agoIs using CLR for regular expressions safer than using external scripts?Problem The main problem we need to use regular expression on MS SQL Server 2019, with the capability of at least the level on the POSIX Regular expression. Possible solutions This Q/A from stackoverflow rightly concludes that if you query must rely on regular expressions you shuould use CLR. This Readgate article elaborates this approach more. So one of my colleagues and I proposed this solution, but my other colleague categorically stated that using CLR here would be a huge risk to security and stability, and using external script (Python or R) is more secure. This seems to be dubious claim, since the user code in the CLR can be managed, so perhaps the opposite is true, but I was not able to persuade my colleague. In my other question which I wrote in my desperation because I was forced to use external script and still produce a blazing fast query. SQLpro user states in his comment that: Using Python or R can be worst in terms of security rather than using CLR! Which I tend to believe. Questions So I have two questions: - Which Regexp solution is more secure external script or CLR based (as described here)? And why? - I also proposed to run the python code on the same Windows Server (must be the same server, because of a policy) but with python intrepeter installed on the OS. Because the results are exported into CSV files either way and stored in the SQL Server. So then I would able to use Python's multiprocessing module to achieve the right performance. The answer was the same that running Python inside SQL Server is more secure than in a outside application. Which is also a questionable claim.
- snippet major 112d agoHow can I enforce that values in a table column match the Regular Expression "[axyto0-9\s]{0,2}[\s0-9]{0,10}"?I have a table as below ``` CREATE TABLE dbo.DemoTable ( Value VARCHAR(12) ) ``` I would like to constrain it to only contain rows where `Value` matches the following pattern `[axyto0-9\s]{0,2}[\s0-9]{0,10}` - Start of string - A single character present in the list below between zero and 2 times, as many times as possible - A single character from the list “axyto” (case sensitive) - A character in the range between “0” and “9” - A “whitespace character” - Match a single character present in the list below between zero and 10 times, as many times as possible - A “whitespace character” - A character in the range between “0” and “9” - End of string How can I do this in SQL Server? (inspired by this question on SO)
- pattern moderate 112d agoIs there an operator or an easy way to match an expression one or more times with the LIKE operator in SQL?For example if I run the following query, the expression in the WHERE clause will only match on the first letter before a number but is there a way to match on one or more letters first (e.g. in the case where there's a variable number of letters before the number in the field I'm filtering on)? ``` SELECT FilteredField FROM Table WHERE FilteredField LIKE '[a-zA-Z][0-9]' ``` Example data: - ABC9 - DEF2 - GH7 - Z1 - XYH2 Essentially I'm looking for the SQL Server 2016 equivalent of the RegEx + when using the LIKE operator.
- debug minor 112d agoPostgres function index doesn't work correctly with regular expressionsI have this Postgres function. It forces text to be lowercase and all nulls to become empty strings, so that I more cleanly perform searches for things that don't match something, etc: ``` CREATE OR REPLACE FUNCTION magic_text(txt text) RETURNS text IMMUTABLE PARALLEL SAFE LANGUAGE SQL AS $$ SELECT lower(coalesce(txt,'')) $$; ``` This function is heavily used in many different query types, so I created all of the varieties of text index for it: ``` CREATE INDEX index_magic_cards_on_oracle_text_magic ON magic_cards USING BTREE (magic_text(oracle_text)) WITH (fillfactor = 100); CREATE INDEX index_magic_cards_on_oracle_text_magic_text_pattern ON magic_cards USING BTREE (magic_text(oracle_text) text_pattern_ops) WITH (fillfactor = 100); CREATE INDEX index_magic_cards_on_oracle_text_magic_gist_trgm ON magic_cards USING GIST (magic_text(oracle_text) gist_trgm_ops); CREATE INDEX index_magic_cards_on_oracle_text_magic_gin_trgm ON magic_cards USING GIN (magic_text(oracle_text) gin_trgm_ops) ``` These indexes are interfering somehow with some (but not all) kinds of complex regular expression searches. I haven't been able to determine if specific regex symbols or features cause the problem. Here's an example (explain.depesz): ``` SELECT card_name FROM magic_cards WHERE magic_text(oracle_text) ~ '***:(?n)eldrazi\ (?!scion)'; ``` This returns nothing, and according to the query planner, it performs a bitmap index scan on `index_magic_cards_on_oracle_text_magic_gin_trgm`. This expanded query also scans the same index, and fails to find anything (explain.depesz): ``` SELECT card_name FROM magic_cards WHERE lower(coalesce(oracle_text, '')) ~ '***:(?n)eldrazi\ (?!scion)'; ``` However, if I force Postgres to not use my index, this query has results! (explain.depesz) The following has multiple rows of results and performs a sequential scan on the table. All I did below was change the coalesce fallback to `⌘`, which shouldn't affect the results: ``` SELECT
- pattern major 112d agoQuerying non-ASCII rows from PostgresDoes `[:ascii:]` class work in Postgres at all? It is not listed in their help, however I see examples in the web which utilize it. I have a UTF-8 database, where collation and c_type are `en_US.UTF-8`, and Postgres version is 9.6.2. When I search for non-ASCII rows like this: ``` select title from wallabag_entry where title ~ '[^[:ascii:]]'; ``` I get both Unicode and non-Unicode symbols (full output is here): ``` Сталинская правозащитница: мать Меленкова бабушка Настя Дневник НКВДиста Шабалина: Знает ли Москва положение на фронте? Бег по городу и поездка на осле: как в средневековье наказывали прелюбодеев Как комиссар Крекшин в 1740-е чуть не отменил историю России Have you heard of Saint Death? Don’t pray to her. Архаїчна українська мова: перевага чи недолік? Гренада не их China’s marriage rate is plummeting because women are choosing autonomy over ``` What is wrong with this query?
- pattern minor 112d agoSQLite, ASCII A-Z Check ConstraintI have the following table which I'm trying to limit the column "prefix" to ASCII alphabetical characters. However, I am still able to insert other characters after using the following constraint. Why is it not working? ``` CREATE TABLE test ( id INTEGER NOT NULL PRIMARY KEY, prefix TEXT NOT NULL, CHECK(prefix NOT LIKE '%[^a-zA-Z]%') ) ``` Using Python's sqlite3 package and DB Browser for SQLite