HiveBrain — Knowledge Dashboard

pattern minor by @import:stackexchange-dba 112d ago

Schema: `CREATE TABLE item ( id int primary key, parent_id int, FOREIGN KEY(parent_id) REFERENCES item(id) ); INSERT INTO item (id, parent_id) VALUES (1, NULL), (2, 1), (3, 1), (4, 2), (5, 3), (6, 3), (7, 6) ; ` Query: `WITH RECURSIVE descendants(id, parent_id) AS ( SELECT 1, parent_id FROM item UNION ALL SELECT item.id, descendants.parent_id FROM item, descendants WHERE item.parent_id=descendants.id ) SELECT * FROM descendants; ` Fiddle: http://sqlfiddle.com/#!5/27c03/4 Goal: given some parents (eg 2, 3) get all of its descendants. I came up with the following CTE, but it returns far too many items. `SELECT * FROM descendants WHERE parent_id IN (2, 3);` Should give ``` id | parent_id 4 | 2 5 | 3 6 | 3 7 | 3 ``` The answer here is close, but my schema varies: https://dba.stackexchange.com/a/94944/253249

hierarchysqlsqlitectestackoverflow

pattern minor by @import:stackexchange-dba 112d ago

Recursively find the path to all leaves descending from a given parent in a tree

I'm trying to write a query to identify the path to each furthest descendant of a particular parent node in a table of tree structures like: ``` 0 1 | | 2 3 | | 4 5 / \ | *6* 8 *7* | *9* ``` There are many parents, all children have one parent, parents have 0-5 children (but the graph is very "long" – most parents only have one child). There are no cycles. I'm trying to efficiently identify the path to the furthest descendants of a specific node (and not to any intermediate nodes). E.g. in the above: - `get_leaf_paths(1)` would return 1 row: `{1, 3, 5, 7}` - `get_leaf_paths(2)` would return 2 rows: `{2, 4, 6}` and `{2, 4, 8, 9}` Sample table: ``` CREATE TABLE graph ( id bigint PRIMARY KEY, parent_id bigint, FOREIGN KEY (parent_id) REFERENCES graph(id) ); INSERT INTO graph (id, parent_id) VALUES (0, NULL), (1, NULL), (2, 0), (3, 1), (4, 2), (5, 3), (6, 4), (7, 5), (8, 4), (9, 8); ``` I'm hoping for a result that looks something like: ``` SELECT get_leaf_paths.* FROM get_leaf_paths(0); path ----- {0, 2, 4, 6} {0, 2, 4, 8, 9} (2 rows) ``` In my initial attempt at a function with a recursive query, I've had trouble selecting only the furthest leaves, especially since some branches are shorter than others (e.g. `6` and `9` above are at different depths). The paths can be very deep (thousands or millions of elements), so I would also like to avoid the excessive memory usage of generating paths for every intermediate node. Any ideas would be greatly appreciated. Thanks!

postgresqlgraphhierarchysqlstackoverflow

pattern minor by @import:stackexchange-dba 112d ago

Find friends of friends (recursively) efficiently using Postgresql

Objective: Users submit their Contact Books, and then the application looks for connections between users, according to their Phone Number. Something like "6 Handshakes" idea (https://en.wikipedia.org/wiki/Six_degrees_of_separation). Problem: Make this query performance close to real time. When the User submits his phone number and gets the full list of other phones, he may know. Plain list - without connections (graph vertices etc), but full, not paginated (this requirement is here because original goal is more complex). Question: is it possible to achieve close-to-realtime performance with pure relational database, without Graph databases (Neo4j, etc), graph extensions (bitnine agensgraph) or workflow redesign? Any denormalization is possible, but to my understanding, it won't help. Given: ``` test=# select * from connections; user_phone | friend_phone ------------+-------------- 1 | 2 1 | 3 1 | 4 2 | 6 2 | 7 2 | 8 8 | 10 8 | 11 8 | 12 20 | 30 40 | 50 60 | 70 ``` I expect to receive the following connections for User with Phone === 1: ``` friend_phone -------------- 2 3 4 6 7 8 10 11 12 (9 rows) ``` It is really difficult to estimate real-world connections numbers. But I was testing at least with: - 10,000 Users (Phone Numbers) - Each user was randomly assigned with 50-1000 Connections with pseudo-random other Users - This resulted in about 1,000,000 Connections If it is impossible to achieve this in general (using some tricky ORDER BYs or subqueries etc) - what metrics should be considered for example to understand that: with 1,000,000 connections you need 128GB RAM instance to get 2 seconds response time an

postgresqlgraphhierarchysqlstackoverflow

pattern minor by @import:stackexchange-dba 112d ago

SQL Server CTE Bottom to Top Recursive with Where clause

hierarchysqlctesql-serverstackoverflow

snippet minor by @import:stackexchange-dba 112d ago

How can I pass on a value in the hierarchy based on the value of the parent using an hierarchial query?

hierarchyoracle-12cstackoverfloworacle-11g-r2dba

snippet minor by @import:stackexchange-dba 112d ago

How to do hierarchical queries without CTE tables or recusive queries in Snowflake? (parent, child relationships)

I have several tables in Postgres that we are migrating to Snowflake. Some of these table are hierarchical (a foreign key point to the same table), normally to query this data I use CTE tables and/or recursive SQL statements. As far as I can tell Snowflake does not support these operations. Are there any other queries that can be made to mimic this or work around this? "Currently, Snowflake does not fully support common table expressions (CTE) in DDL operations." Snowflake documentation: https://docs.snowflake.net/manuals/sql-reference/constructs/with.html see: https://www.postgresql.org/docs/current/static/queries-with.html

snowflakehierarchyqueryctestackoverflow

principle minor by @import:stackexchange-dba 112d ago

Parent/Child Relationship Table Design - What's The Best Practice?

I have a single table for storing 'Tasks'. A task can be a parent and/or a child. I use the 'ParentID' as the FK referencing the PK on the same table. It is NULLABLE, so if it is NULL it does not have a parent task. Example is the screenshot below... It has been argued in my team, that it would be much better (for normalization/best practices) to create a separate table to store the ParentIDs and so avoid having NULLS in the table and lead to better normalization design. Would this be a better option? Or will it be more difficult with querying and cause performance issues? We just want to get the design right from the beginning rather than finding issues later. SQL-DDL code for the existing table: ``` CREATE TABLE [Tasks].[TaskDetail] ( [TaskDetailID] [int] IDENTITY(1,1) NOT NULL, [TaskName] [varchar](50) NOT NULL, [TaskDescription] [varchar](250) NULL, [IsActive] [bit] NOT NULL CONSTRAINT [DF_TaskDetail_IsActive] DEFAULT ((1)), [ParentID] [int] NULL, CONSTRAINT [PK_TaskDetail_TaskDetailID] PRIMARY KEY CLUSTERED ([TaskDetailID] ASC), CONSTRAINT [FK_TaskDetail_ParentID] FOREIGN KEY([ParentID]) REFERENCES [Tasks].[TaskDetail]([TaskDetailID]) ); ```

hierarchysqlsql-serverstackoverflowdba

pattern minor by @import:stackexchange-dba 112d ago

Nullable, dependent attributes

hierarchystackoverflowdbadatabase-designdependencies

principle minor by @import:stackexchange-dba 112d ago

Condition in WHERE vs condition in CONNECT BY

Can anyone explain to me the difference between the following two queries? Even though they seem the same the results are different. ``` select concept.concept_id, concept.PARENT_ID from ebti_thes_concept_v concept start with PARENT_ID = '11025' connect by parent_id = prior concept_id and exists (..) ``` In the second one, the `exists` predicate is moved from the `connect by` clause to the `where` clause. ``` select concept.concept_id, concept.PARENT_ID from ebti_thes_concept_v concept where exists (..) start with PARENT_ID = '11025' connect by parent_id = prior concept_id; ```

hierarchywherestackoverflowdbaoracle

snippet minor by @import:stackexchange-dba 112d ago

How to store hierarchical dimension for timeseries data

I have a requirement to store data about a load of Searches. Each time someone searches we want to record it for analyzing later on. As part of the search the user can select one or more Categories to search within. We would like to record the categories selected and the hierarchy that the category was in. However, the categories can change and move over time. We are likely to want to analyze both by the parent categories that the child categories were in at the time of the search, and also by the parents that the categories are in at the time of analysis. - What is the best way of storing this kind of data? Currently I have a stream of JSON API calls (we need to support 50 a second) which look like this: ``` { "timestamp":"2018-03-08T11:22:33.456+0000", "searchString":"This is a search", "categories":[ "Cat D/Cat B/Cat A", "Cat E/Cat B/Cat A" ] } ``` Categories D and E were selected, Category B is the parent of D & E and Category A (one of several roots) is the parent of B. We are currently just storing the time stamp and search string in a flat table in a normal OLTP style SQL Server DB: ``` CREATE TABLE SearchEvent ( id int PRIMARY KEY, searchString nvarchar(100), timestamp datetimeoffset ) ```

hierarchysqlsql-serverslowly-changing-dimensionstackoverflow