patternpythonModerate

"AI" chat program

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview ai stackoverflow python python-3.x set

programchatstackoverflow

Problem

I've thrown together this simple chat program, and while it works. I think it could certainly use some improvement. Here's how it works.

Obtain user input by removing all punctuation from the input string, splitting it on whitespace, and converting it into a set.

Loop through a list of possible input sets, and check the following condition: \$\text{Possible Input Set}\subseteq\text{User Input Set}\$

Return a random reply based on the above condition, and output that reply.

Can this be made more flexible and extensible?

```
from re import sub as rsub
from random import choice

def obtain_user_input(prompt: str) -> set:
"""
Obtains user input. This function
removes all punctuation, lowers text,
splits on whitespace, and returns a
set of all the unique elements. An
example input value could be:

"Hello, hello, there, there"

And the return value would be:

["hello", "there"]
"""
user_input = input(prompt)
user_input = rsub("[^a-zA-Z0-9\s]*", user_input, "")
user_input = user_input.lower()
user_input = user_input.split(" ")
return set(user_input)

def get_replies(user_input: set) -> str:
"""
This function returns a string as
a "reply" to user input. This is done
by checking if any of the possible
inputs are subsets of the user's input.
If so, then a random "reply" is returned.
"""
possible_replies = [
[{"hi"}, ["Hello there.", "Hello.", "Hi."]],
[{"hello"}, ["Hello there.", "Hello.", "Hi."]],
[{"goodbye"}, ["Goodbye user.", "Goodbye", "Bye."]],
[{"your", "name"}, ["My name is not known.", "I cannot tell you my name,"]],
[{"idiot"}, ["And so are you.", "Likewise."]],
[{"you", "are", "smart"}, ["Why thank you!", "Thanks!"]]
]

for possible_reply in possible_replies:
if possible_reply[0].issubset(user_input):
return choice(possible_reply[1])
return "Say again?"

def main():
whi

Solution

Your python coding is great, there isn't anything I can come up with that affects your style.

I like the basic idea of your simple AI system, but it needs to be refined a bit to make it good in practice. In addition you can extend it with a lot of funky things.
The issues

First, let's take a look at the subset selection part. Here I can see 3 potential problems.

What happens when someone posts a sentence containing words of multiple subsets? E.g. Hi, what is your name? -> contains Hi, your and name. I believe the answer will be a single Hi. or Hello there in return.

Some groups seem to overlap: Hi and Hello seem like a same subset group to me.

Words like are and you trigger answers like: thanks, which can be totally unrelated to the question.

How to fix them?

Natural language processing is a very hard task with multiple (partial) solutions going from easy approaches to very hard ones. Since you want to build a simple AI system, I would advise the following changes:

Get rid of words that are not important to the key message in the reply of the AI system. Words like are, you,... These will trigger to many false positives. Which solves issue 3.

Group similar words that should end up with the same answer-set together e.g. ('Hi','Hello'). I believe that was your original idea at the beginning, but you need to use it ;). This can solve issue 2.

Solving issue 1. has many possible approaches, I believe the simplest would be to check the input string for matches with multiple subsets. Keep some kind of hierarchy or ranking order of which subset should come first. Put replies in a queue, based on that order and reply them one by one. e.g. If the input: 'Hi, what is your name' => 'Hello there.', 'I cannot tell you my name'

A bit more in detail

I suggest you to take a look at the nltk library. This library offers a wide set of functions for natural language processing. In addition, it offers some cool classification, analysis and more functionality which allows you to do a lot more (advanced) AI related things.

But, the nltk library is especially useful in your case for the preprocessing step of the user_input.

you already performed lower-casing, and splitted on whitespaces. But actually there is a lot more and more powerful ways for preprocessing by using this library.

First of all you could tokenize the text, which will split the

text in tokens. This is similar to your splitting on whitespaces.

You can perform stopword removal. Stop word removal will remove

simple words from the input_text, words like 'the', 'a', etc. They
don't give any significant meaning.

Next you can perform stemming, which will transform words like

working, worked,works to it's stem form: work. This way you don't need multiple verb-forms in your subsets, only the stem will do.

Last and most important I would suggest to use n-grams, probably

bigrams or trigrams, in addition to your unigrams. A bigram is every
sequence of two adjacent elements in a string of tokens.

By performing this all, you can achieve the following:

user_input:

Hi, what is your name?

preprocessing will result in a list of words/tokens:

[Hi, what, be, your,name, Hi_what, what_is, is_your, your_name]
based on these you can build your subsets and change: [{"your", "name"}, ["My name is not known.", "I cannot tell you my name,"]], into [{"your_name"}, ["My name is not known.", "I cannot tell you my name,"]],

I only touched these things briefly and you probably need to take a further look into it yourself. But this could make your "simple" AI, a bit more "advanced". ;)

Good Luck!

Context

StackExchange Code Review Q#96833, answer score: 11

Revisions (0)

No revisions yet.