HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

How should I model a database structure to store information about words?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
howwordsstoredatabasestructureaboutshouldmodelinformation

Problem

I have an application where the user can click on any word in a text, which will bring back related information about the base word and the specific inflection (word form) of the word clicked. I've tried representing this in a multitude of ways, but now that I'm close to launch, I'm worried as hell about the best way to represent it (this is my first ever project with programming, so I have no idea what I'm in for).
Considerations

The way I had originally set this up was that there was a base word table that contained the base word, part of speech, and then an array of word forms for each base word.

I then learned that this was a bad idea in terms of normalization, and I also realized that I needed to store specific inflection information for the words (mood, tense, person, etc.), so I rearranged it.

In the second iteration, I had three tables, a base_word table that had the base_word and the part of speech, a word table that had the word and its inflection information, and then a “join” table between them that contained the base_word_id and the word_id.

However, I've been re-thinking this setup, because semantically speaking, a base word is also a word, just a specific type of word. So in this case, I feel like a “self-join” table might be most appropriate, although I could be completely overthinking this. In the “self-join” table, there would be an is_baseword boolean attribute, which would theoretically allow all the inflection columns to store NULL marks if it was set to true. Is this a better idea?

Note: I found these two similar questions which didn't seem to completely answer my questions, so I'm posting my twist on these here:

-
Table design for a dictionary that can have words with many different spellings

-
How can I represent a list of words and related words in a relational structure?

Solution

I believe your design for a one to many join with base_word in one table and word forms in another is fine. The reasons I make this judgement are:

  • The base_word, as an entity, is sufficiently different to word_forms as to represent a different entity. For instance, if you were to include the words origin language, this would also be only applicable to the base word entity, not its word forms.



  • Storing the data in the same table will have implications on the size of the tables due to having sparse columns with attributes that are associated with base_word OR word_form only, meaning they are NULL or empty for the others. You point this out with the is_baseword and inflection columns. Having a bunch of NULL values on a row which will never be changed is usually a reason to use the EAV model or split into different entities.



  • Performance considerations for queries should be taken into account. With the additional size consideration a self join to a sparsely populated table will likely not be as performant. You will probably find yourself indexing a single table for base_words OR word_forms separately. The indexes will be larger and more difficult to maintain for inserts/deletes etc. Better to have indexes on smaller tables and to have as few as possible on a table.



Don't get too concerned with perfect normalization. While a perfectly normalized system may adhere to strict information science principles, it is often better in the real world to have a slightly denormalized database for other reasons.

Context

StackExchange Database Administrators Q#162364, answer score: 2

Revisions (0)

No revisions yet.