HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Correct term for “string consisting of words”

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
wordstermforcorrectconsistingstring

Problem

In a paper I am writing I want to make distinction between (1) string consisting of any characters and (2) string consisting of a chain of words from known language, with possible delimiters. My intuitive idea is to simply use string for meaning (1) and text for meaning (2). It sounds a bit naive, but this terminology could work, given that I define it properly in my paper.

Yet I have an uneasy feeling that meaning (2) has fancier name in fields of computer science or computational linguistics. So, what are the precise terms to make distinction between the two types of strings?

UPDATE

Suppose we have an alphabet Σ = {a, b, c, ~}, where ~ is a delimiter symbol, and language L = {aaa, bbb, abc}.

Now, the following strings satisfy definition (1), but not (2):

  • cba



  • a



  • aaaa



  • a~b~~



And the following strings would satisfy both definitions (because they are made of the words of language L).

  • aaabbbabc



  • abc



  • aaa~bbb~aaa~~~aaa



  • ~



  • (an empty string)



In some applications my strings could be actual text in a human language like English, Lithuanian or Esperanto. But this is not required. It could also be a DNA chain, a binary file, or anything else. Also keep in mind, that in practical applications the strings would most likely be long (like a journal article, or entire corpus for that matter), so calling it a "sentence" would be a bit of understatement. Meaning of the text is entirely irrelevant here.

So, regarding definition (1) all is clear - I just call it a string on alphabet Σ. Now the core question is this: what do I call the strings from the second example to make them distinct from the first example. My initial idea is to call it a "text". One of the answers proposed "word string", which I like even better. Maybe you have seen other terms being used for such purpose in the literature?

It might seem that I'm in extreme hair splitting mode here. Yet that term will be all over my PhD thesis, very likely including the title. Therefore I rea

Solution

If definition (1) is intended for any sequence of characters, I would
simply call it string as you suggest, but I would call it word or
lexeme if it is intended to be words of a language.

Regarding definition (2), it depends again on what you are expecting
to consider. If it is any sequence of words, usually meaningless, with
a variety of separators, the name text would do fine, and I would
not worry too much about computational linguistic since the only
meaningless piece of text that matters in CL is "Colorless green ideas
sleep furiously".

If it is actually intended to be a sentence of a language, then you
might call it sentence. I feel that text would rather be used for
larger pieces of discourse. You should be careful though that is is
not confused with sentence meaning a string of words. Speech
processing people may speak of utterance, but it may be
inappropriate for your use. They use also sentence, which they
structure into word lattice when the separators are not clearly
identified, which amounts to a word sequence or word string if they
are clearly identified.

This disctinction may also depend on whether your separators are one or
many, and whether they have a role.

In other words, it is hard to give you a precise answer without more
details on what you are doing. I first tried, and then realized that
it led me to make unwarranted assumptions about what you are doing.

The one thing that is really important is that you are very clear
about your definitions. And if you can motivate your terminology
choices, that may help the reader. I am still wondering why the borogoves had to be so mimsy.

Context

StackExchange Computer Science Q#27984, answer score: 4

Revisions (0)

No revisions yet.