patternMinor
Algorithm to find the probability of a given text to be about a large topic
Viewed 0 times
thetopictextalgorithmlargeprobabilityaboutfindgiven
Problem
I want the conditional probability for each topic (being the word that we give as input). For example, the text being
have seen and reviewed your requirements you posted here. If you can
give me the fix criteria/category of your data mining then I can do
this job. If you want me to define and allot criteria and categorize
it in then charges will be extra for per categorization included.
I have seen and reviewed your requirements you posted here. If you can give me the fix criteria/category of your data mining then
I can do this job. If you want me to define and allot criteria and
categorize it in then charges will be extra for per categorization
included.
Assume that I give a word called research as an input, I want to know
What algorithms we should create to get the above?
have seen and reviewed your requirements you posted here. If you can
give me the fix criteria/category of your data mining then I can do
this job. If you want me to define and allot criteria and categorize
it in then charges will be extra for per categorization included.
I have seen and reviewed your requirements you posted here. If you can give me the fix criteria/category of your data mining then
I can do this job. If you want me to define and allot criteria and
categorize it in then charges will be extra for per categorization
included.
Assume that I give a word called research as an input, I want to know
What is the likelihood/probability that the text relates to research?What algorithms we should create to get the above?
Solution
You can try simple probabilistic graphical models, the simplest one being Naive Bayes.
One way to do this would be to represent a portion of text as a word frequency vector, that will be associated with a topic (the "class variable"). Then you use many such texts that are associated with topics to train your model (i.e. you model the probability of a frequency vector given a certain topic). Finally, given a new text you can ask what is the most likely topic assignment.
Naive Bayes, the simplest graphical model, would miss dependencies between the frequencies of the various words, but it is worth a shot as it is easy to implement. More complicated models could be used to capture these dependencies.
One way to do this would be to represent a portion of text as a word frequency vector, that will be associated with a topic (the "class variable"). Then you use many such texts that are associated with topics to train your model (i.e. you model the probability of a frequency vector given a certain topic). Finally, given a new text you can ask what is the most likely topic assignment.
Naive Bayes, the simplest graphical model, would miss dependencies between the frequencies of the various words, but it is worth a shot as it is easy to implement. More complicated models could be used to capture these dependencies.
Context
StackExchange Computer Science Q#4970, answer score: 3
Revisions (0)
No revisions yet.