Recent Entries 10
- pattern minor 112d agoCTE get all descendants with parentsSchema: `CREATE TABLE item ( id int primary key, parent_id int, FOREIGN KEY(parent_id) REFERENCES item(id) ); INSERT INTO item (id, parent_id) VALUES (1, NULL), (2, 1), (3, 1), (4, 2), (5, 3), (6, 3), (7, 6) ; ` Query: `WITH RECURSIVE descendants(id, parent_id) AS ( SELECT 1, parent_id FROM item UNION ALL SELECT item.id, descendants.parent_id FROM item, descendants WHERE item.parent_id=descendants.id ) SELECT * FROM descendants; ` Fiddle: http://sqlfiddle.com/#!5/27c03/4 Goal: given some parents (eg 2, 3) get all of its descendants. I came up with the following CTE, but it returns far too many items. `SELECT * FROM descendants WHERE parent_id IN (2, 3);` Should give ``` id | parent_id 4 | 2 5 | 3 6 | 3 7 | 3 ``` The answer here is close, but my schema varies: https://dba.stackexchange.com/a/94944/253249
- pattern minor 112d agoRecursively find the path to all leaves descending from a given parent in a treeI'm trying to write a query to identify the path to each furthest descendant of a particular parent node in a table of tree structures like: ``` 0 1 | | 2 3 | | 4 5 / \ | *6* 8 *7* | *9* ``` There are many parents, all children have one parent, parents have 0-5 children (but the graph is very "long" – most parents only have one child). There are no cycles. I'm trying to efficiently identify the path to the furthest descendants of a specific node (and not to any intermediate nodes). E.g. in the above: - `get_leaf_paths(1)` would return 1 row: `{1, 3, 5, 7}` - `get_leaf_paths(2)` would return 2 rows: `{2, 4, 6}` and `{2, 4, 8, 9}` Sample table: ``` CREATE TABLE graph ( id bigint PRIMARY KEY, parent_id bigint, FOREIGN KEY (parent_id) REFERENCES graph(id) ); INSERT INTO graph (id, parent_id) VALUES (0, NULL), (1, NULL), (2, 0), (3, 1), (4, 2), (5, 3), (6, 4), (7, 5), (8, 4), (9, 8); ``` I'm hoping for a result that looks something like: ``` SELECT get_leaf_paths.* FROM get_leaf_paths(0); path ----- {0, 2, 4, 6} {0, 2, 4, 8, 9} (2 rows) ``` In my initial attempt at a function with a recursive query, I've had trouble selecting only the furthest leaves, especially since some branches are shorter than others (e.g. `6` and `9` above are at different depths). The paths can be very deep (thousands or millions of elements), so I would also like to avoid the excessive memory usage of generating paths for every intermediate node. Any ideas would be greatly appreciated. Thanks!
- pattern minor 112d agoFind friends of friends (recursively) efficiently using PostgresqlObjective: Users submit their Contact Books, and then the application looks for connections between users, according to their Phone Number. Something like "6 Handshakes" idea (https://en.wikipedia.org/wiki/Six_degrees_of_separation). Problem: Make this query performance close to real time. When the User submits his phone number and gets the full list of other phones, he may know. Plain list - without connections (graph vertices etc), but full, not paginated (this requirement is here because original goal is more complex). Question: is it possible to achieve close-to-realtime performance with pure relational database, without Graph databases (Neo4j, etc), graph extensions (bitnine agensgraph) or workflow redesign? Any denormalization is possible, but to my understanding, it won't help. Given: ``` test=# select * from connections; user_phone | friend_phone ------------+-------------- 1 | 2 1 | 3 1 | 4 2 | 6 2 | 7 2 | 8 8 | 10 8 | 11 8 | 12 20 | 30 40 | 50 60 | 70 ``` I expect to receive the following connections for User with Phone === 1: ``` friend_phone -------------- 2 3 4 6 7 8 10 11 12 (9 rows) ``` It is really difficult to estimate real-world connections numbers. But I was testing at least with: - 10,000 Users (Phone Numbers) - Each user was randomly assigned with 50-1000 Connections with pseudo-random other Users - This resulted in about 1,000,000 Connections If it is impossible to achieve this in general (using some tricky ORDER BYs or subqueries etc) - what metrics should be considered for example to understand that: with 1,000,000 connections you need 128GB RAM instance to get 2 seconds response time an
- pattern minor 112d agoSQL Server CTE Bottom to Top Recursive with Where clauseI have an Employee Table with an EmployeeId, ManagerId and a Name field. The goal is to make a recursive With fetching all the rows from an employee to the top manager (ManagerId is null). I found this link which helped to get the base of the code but I do not manage to make it work for my case ``` DECLARE @EmployeeTable table ([EmployeeId] int, [name] varchar(10), [managerId] int) INSERT @EmployeeTable VALUES (1,'Jerome', NULL ) -- tree is as follows: INSERT @EmployeeTable VALUES (2,'Joe' ,1) -- 1-Jerome INSERT @EmployeeTable VALUES (3,'Paul' ,2) -- / \ INSERT @EmployeeTable VALUES (4,'Jack' ,3) -- 2-Joe 9-Bill INSERT @EmployeeTable VALUES (5,'Daniel',3) -- / \ \ INSERT @EmployeeTable VALUES (6,'David' ,2) -- 3-Paul 6-David 10-Sam INSERT @EmployeeTable VALUES (7,'Ian' ,6) -- / \ / \ INSERT @EmployeeTable VALUES (8,'Helen' ,6) -- 4-Jack 5-Daniel 7-Ian 8-Helen INSERT @EmployeeTable VALUES (9,'Bill ' ,1) -- INSERT @EmployeeTable VALUES (10,'Sam' ,9) -- DECLARE @employeeId int = 3 ;WITH StaffTree AS ( SELECT c.[EmployeeId], c.[name], c.managerId, 0 AS [Level] FROM @EmployeeTable c LEFT OUTER JOIN @EmployeeTable cc ON c.managerId=cc.EmployeeId WHERE c.EmployeeId=@employeeId OR (@employeeId IS NULL AND c.managerId IS NULL) UNION ALL SELECT s.[EmployeeId], s.[name], s.managerId, t.[Level]+1 FROM StaffTree t INNER JOIN @EmployeeTable s ON t.[EmployeeId]=s.managerId WHERE s.managerId=@employeeId OR @employeeId IS NULL OR t.Level>1 ) SELECT * FROM StaffTree ``` In case you select the employee 3 hierarchy, the result should be: ``` EmployeeId | Name | ManagerId 1 | Jerome | NULL 2 | Joe | 1 3 | Paul | 2 ```
- snippet minor 112d agoHow can I pass on a value in the hierarchy based on the value of the parent using an hierarchial query?I have a table with categories and another table with a discount for categories per user: ``` create table category ( id NUMBER NOT NULL, parent_id NUMBER, name nvarchar2(255), PRIMARY KEY( id) ); create table category_discount ( user_id NUMBER NOT NULL, category_id NUMBER NOT NULL, discount NUMBER NOT NULL, PRIMARY KEY( user_id, category_id) ); insert into category( id, parent_id, name ) VALUES( 1, null, 'root'); insert into category( id, parent_id, name ) VALUES( 2, 1, 'C1'); insert into category( id, parent_id, name ) VALUES( 11, 7, 'C11'); insert into category( id, parent_id, name ) VALUES( 12, 7, 'C12'); insert into category( id, parent_id, name ) VALUES( 3, 1, 'C3'); insert into category( id, parent_id, name ) VALUES( 4, 1, 'C4'); insert into category( id, parent_id, name ) VALUES( 5, 4, 'C5'); insert into category( id, parent_id, name ) VALUES( 6, 4, 'C6'); insert into category( id, parent_id, name ) VALUES( 9, 6, 'C9'); insert into category( id, parent_id, name ) VALUES( 10, 6, 'C10'); insert into category( id, parent_id, name ) VALUES( 7, 1, 'C7'); insert into category_discount( user_id, category_id, discount ) VALUES( 1, 1, 30); insert into category_discount( user_id, category_id, discount ) VALUES( 1, 4, 20); insert into category_discount( user_id, category_id, discount ) VALUES( 1, 7, 25); ``` Now when I do this query: ``` SELECT category.*, category_discount.user_id, category_discount.discount, LEVEL FROM category LEFT JOIN category_discount ON category.id = category_discount.category_id AND category_discount.user_id = 1 START WITH parent_id is null CONNECT BY PRIOR id = parent_id; ``` ...I get this output which is fine: ``` | ID | PARENT_ID | NAME | USER_ID | DISCOUNT | LEVEL | |----|-----------|------|---------|----------|-------| | 1 | (null) | root | 1 | 30 | 1 | | 2 | 1 | C1 | (null) | (null) | 2 | | 3 | 1 | C3 | (null) | (null)
- snippet minor 112d agoHow to do hierarchical queries without CTE tables or recusive queries in Snowflake? (parent, child relationships)I have several tables in Postgres that we are migrating to Snowflake. Some of these table are hierarchical (a foreign key point to the same table), normally to query this data I use CTE tables and/or recursive SQL statements. As far as I can tell Snowflake does not support these operations. Are there any other queries that can be made to mimic this or work around this? "Currently, Snowflake does not fully support common table expressions (CTE) in DDL operations." Snowflake documentation: https://docs.snowflake.net/manuals/sql-reference/constructs/with.html see: https://www.postgresql.org/docs/current/static/queries-with.html
- principle minor 112d agoParent/Child Relationship Table Design - What's The Best Practice?I have a single table for storing 'Tasks'. A task can be a parent and/or a child. I use the 'ParentID' as the FK referencing the PK on the same table. It is NULLABLE, so if it is NULL it does not have a parent task. Example is the screenshot below... It has been argued in my team, that it would be much better (for normalization/best practices) to create a separate table to store the ParentIDs and so avoid having NULLS in the table and lead to better normalization design. Would this be a better option? Or will it be more difficult with querying and cause performance issues? We just want to get the design right from the beginning rather than finding issues later. SQL-DDL code for the existing table: ``` CREATE TABLE [Tasks].[TaskDetail] ( [TaskDetailID] [int] IDENTITY(1,1) NOT NULL, [TaskName] [varchar](50) NOT NULL, [TaskDescription] [varchar](250) NULL, [IsActive] [bit] NOT NULL CONSTRAINT [DF_TaskDetail_IsActive] DEFAULT ((1)), [ParentID] [int] NULL, CONSTRAINT [PK_TaskDetail_TaskDetailID] PRIMARY KEY CLUSTERED ([TaskDetailID] ASC), CONSTRAINT [FK_TaskDetail_ParentID] FOREIGN KEY([ParentID]) REFERENCES [Tasks].[TaskDetail]([TaskDetailID]) ); ```
- pattern minor 112d agoNullable, dependent attributesI'm designing a database for students. Computer science students here in Switzerland can decide between different specializations, e.g. application development, system engineering etc. There are students from other professions in this application as well, which have no such specialization. A few examples: ``` +--------------------+-------------------------+ | Profession | Specialization | +--------------------+-------------------------+ | Computer scientist | | | | Application development | | | System engineering | | | Support | | | | | Electrician | | | | none | | | | | Janitor | | | | none | | | | | Architect | | | | Small buildings | | | High buildings | | | | +--------------------+-------------------------+ ``` I hope you get the idea. My question now is, how do I design the database tables with these attributes, since they're dependent of each other? Every user has a profession, some do not have a specialization depending on their profession. So an electrician should not be an application developer, nor should an architect. My thoughts so far 1 ``` +-----------------------------+ | User | +-----------------------------+ | #id | | profession_id | | specialisazion_id, nullable | +-----------------------------+ ``` Enforce logic through constraint checks Approach 2 ``` +-------------------+ +----------------+ +------------+ | User | | Specialization | | Pr
- principle minor 112d agoCondition in WHERE vs condition in CONNECT BYCan anyone explain to me the difference between the following two queries? Even though they seem the same the results are different. ``` select concept.concept_id, concept.PARENT_ID from ebti_thes_concept_v concept start with PARENT_ID = '11025' connect by parent_id = prior concept_id and exists (..) ``` In the second one, the `exists` predicate is moved from the `connect by` clause to the `where` clause. ``` select concept.concept_id, concept.PARENT_ID from ebti_thes_concept_v concept where exists (..) start with PARENT_ID = '11025' connect by parent_id = prior concept_id; ```
- snippet minor 112d agoHow to store hierarchical dimension for timeseries dataI have a requirement to store data about a load of Searches. Each time someone searches we want to record it for analyzing later on. As part of the search the user can select one or more Categories to search within. We would like to record the categories selected and the hierarchy that the category was in. However, the categories can change and move over time. We are likely to want to analyze both by the parent categories that the child categories were in at the time of the search, and also by the parents that the categories are in at the time of analysis. - What is the best way of storing this kind of data? Currently I have a stream of JSON API calls (we need to support 50 a second) which look like this: ``` { "timestamp":"2018-03-08T11:22:33.456+0000", "searchString":"This is a search", "categories":[ "Cat D/Cat B/Cat A", "Cat E/Cat B/Cat A" ] } ``` Categories D and E were selected, Category B is the parent of D & E and Category A (one of several roots) is the parent of B. We are currently just storing the time stamp and search string in a flat table in a normal OLTP style SQL Server DB: ``` CREATE TABLE SearchEvent ( id int PRIMARY KEY, searchString nvarchar(100), timestamp datetimeoffset ) ```