HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Algorithm to find the probability of a given text to be about a large topic

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thetopictextalgorithmlargeprobabilityaboutfindgiven

Problem

I want the conditional probability for each topic (being the word that we give as input). For example, the text being


have seen and reviewed your requirements you posted here. If you can
give me the fix criteria/category of your data mining then I can do
this job. If you want me to define and allot criteria and categorize
it in then charges will be extra for per categorization included.


I have seen and reviewed your requirements you posted here. If you can give me the fix criteria/category of your data mining then
I can do this job. If you want me to define and allot criteria and
categorize it in then charges will be extra for per categorization
included.

Assume that I give a word called research as an input, I want to know

What is the likelihood/probability that the text relates to research?


What algorithms we should create to get the above?

Solution

You can try simple probabilistic graphical models, the simplest one being Naive Bayes.

One way to do this would be to represent a portion of text as a word frequency vector, that will be associated with a topic (the "class variable"). Then you use many such texts that are associated with topics to train your model (i.e. you model the probability of a frequency vector given a certain topic). Finally, given a new text you can ask what is the most likely topic assignment.

Naive Bayes, the simplest graphical model, would miss dependencies between the frequencies of the various words, but it is worth a shot as it is easy to implement. More complicated models could be used to capture these dependencies.

Context

StackExchange Computer Science Q#4970, answer score: 3

Revisions (0)

No revisions yet.