patternsqlMinor
Postgresql text column character Limit
Viewed 0 times
postgresqlcolumnlimittextcharacter
Problem
Im not even close to a DBA, so plain English please. I recently needed to look into a performance issue on a relatively small PG database, but yet is performing very slow. the server is running PGSql 9.0.22. Its largest table is about 30Mb, and the entire db is hardly 80Mb. However, i found that one of the tables has a text column (which i assume has no width limit) which probably stretches about 10 miles long. I copied the text from one column into Word, an i got 377 pages and a word count of 69814 (as in sixty eight thousand). Again, im not im not expert, but it just doesnt seem right to me. Can PG handle such large text columns? Is this considered a normal design/practice?
EDIT
Thank you both for your response. I would have to research its purpose and types of queries running against it. For the sack of argument, lets assume the column has a list of fruits, and is just a point of reference. Meaning, there's a query which just returns a True or Flase if Avocado is on that list, and then moves on to another action based on the True/Flase results. So theres a text "Fruits" column in the "Foods" table, the text in the column might look like this (with just spaces between each fruit):
Apple Avacado Mango Orange Cherry Grape Peach Srawberry ... and on listing about 68,000 fruits
Would it have been beneficial, or perhaps considered a better design, if each fruit was in its own field. In other words, have 68000 fields under the "Fruits" column, as opposed to one text column with 68000 words?
EDIT
I didn't mean a column for each fruit. I meant a row for each fruit under the fruit column
And so on...a list of about 1.5 million
EDIT
In short, I mean having each piece of data in its own field as opposed to using a large text column. So for ex
EDIT
Thank you both for your response. I would have to research its purpose and types of queries running against it. For the sack of argument, lets assume the column has a list of fruits, and is just a point of reference. Meaning, there's a query which just returns a True or Flase if Avocado is on that list, and then moves on to another action based on the True/Flase results. So theres a text "Fruits" column in the "Foods" table, the text in the column might look like this (with just spaces between each fruit):
Apple Avacado Mango Orange Cherry Grape Peach Srawberry ... and on listing about 68,000 fruits
Would it have been beneficial, or perhaps considered a better design, if each fruit was in its own field. In other words, have 68000 fields under the "Fruits" column, as opposed to one text column with 68000 words?
EDIT
I didn't mean a column for each fruit. I meant a row for each fruit under the fruit column
**Columns - Fruits, Vegetables**
Rows -Apple -Corn
-Avocado -Tomatoes
-Mango
-Orange
-Cherry
-Grape
-Peach
-StrawberryAnd so on...a list of about 1.5 million
EDIT
In short, I mean having each piece of data in its own field as opposed to using a large text column. So for ex
Solution
You are right in thinking that items of a list should be stored as individual values of a column, i.e. in separate rows of that column, rather than as a single value (CSV string or anything like that) – at least if you expect to have queries against individual items of the list. Relational databases are designed to work that way, and storing multiple values of a column as a list (single value) is a no-no in such cases.
Regarding your data sample of fruits and vegetables, I am assuming that you are not thinking of splitting two lists of values exactly like that, storing each list's elements alongside each other on the same row without there actually being a relationship between them, just for the sake of normalisation of the way the vegetables and fruits are stored. That would be wrong.
I mean, if currently you have a row with a list of fruits and a list of vegetables such that each fruit and each vegetable are related to this particular row, i.e. like this:
and you want to expand each list into a row set while keeping the relationship of each fruit and each vegetable to the
Again, that would be wrong. Fruit
Commonly, when you want to store a relationship between a subset and an item, you use a separate table. As you have two subsets of different kinds that are related to the same item, you just use two tables. So it would be:
-
table
-
table
-
table
where
This way, when querying against individual items of either list as belonging to that particular row in
More can be said on this topic, which would, however, be beyond the scope of your question. I suggest you look up the terms I highlighted in bold for more information.
Regarding your data sample of fruits and vegetables, I am assuming that you are not thinking of splitting two lists of values exactly like that, storing each list's elements alongside each other on the same row without there actually being a relationship between them, just for the sake of normalisation of the way the vegetables and fruits are stored. That would be wrong.
I mean, if currently you have a row with a list of fruits and a list of vegetables such that each fruit and each vegetable are related to this particular row, i.e. like this:
FoodsID Fruits Vegetables other columns
------- -------------------------- ------------- ----- -------
100 Apple,Avocado,Mango,Orange Corn,Tomatoes ... ...and you want to expand each list into a row set while keeping the relationship of each fruit and each vegetable to the
Foods row, you do not just add more rows and put the first item of one list together with the first item of the other list on the same row, then the second item of each list on another row and so on, like this:FoodsID Fruits Vegetables ...
------- ------- ---------- -----
100 Apple Corn ...
100 Avocado Tomatoes ...
100 Mango ...
100 Orange ...Again, that would be wrong. Fruit
Apple may well be related to food item 100 as may vegetable Corn, but if these two items are not related to each other, there is no reason to store them on the same row.Commonly, when you want to store a relationship between a subset and an item, you use a separate table. As you have two subsets of different kinds that are related to the same item, you just use two tables. So it would be:
-
table
Foods:FoodsID other columns
------- ----- -------
100 ... ...-
table
Fruits:FoodsID Fruit
------- -------
100 Apple
100 Avocado
100 Mango
100 Orange-
table
Vegetables:FoodsID Vegetable
------- ---------
100 Corn
100 Tomatoeswhere
FoodsID in table Foods would be the primary key of that table and same-named column in tables Fruits and Vegetable would serve as a reference (foreign key) to the corresponding row in Foods.This way, when querying against individual items of either list as belonging to that particular row in
Foods, you would probably use a join. Your typical filter conditions would be as simple as Vegetable = 'some vegetable' or Fruit IN ('some fruit', 'some other fruit') – not FindPos(Vegetable, 'some vegetable') > 0 or anything of the kind. The conditions would be simpler because each column would now contain one value per row. That will lend well to further optimisation of performance, because when you query often against a column value (without applying a function to it first, that is), you can make such queries faster by adding an index on the column – something that would be pointless for lists stored as a single value.More can be said on this topic, which would, however, be beyond the scope of your question. I suggest you look up the terms I highlighted in bold for more information.
Code Snippets
FoodsID Fruits Vegetables other columns
------- -------------------------- ------------- ----- -------
100 Apple,Avocado,Mango,Orange Corn,Tomatoes ... ...FoodsID Fruits Vegetables ...
------- ------- ---------- -----
100 Apple Corn ...
100 Avocado Tomatoes ...
100 Mango ...
100 Orange ...FoodsID other columns
------- ----- -------
100 ... ...FoodsID Fruit
------- -------
100 Apple
100 Avocado
100 Mango
100 OrangeFoodsID Vegetable
------- ---------
100 Corn
100 TomatoesContext
StackExchange Database Administrators Q#124966, answer score: 3
Revisions (0)
No revisions yet.