Yet another AI chat

A hybrid of AI assistant and AI roleplay, powered by locally running LLMs.

It probably doesn't have much practical sense since there are more powerful services, but I like the result and want to share it.

Role play. Chat with an AI that pretends to be a human. There are several personality templates.
To answer questions AI can use web search via DuckDuckGo API and it is reasonably integrated with the role play.
Memory system based on similarity of embeddings.
LLMs run locally. No external services (except of DuckDuckGo search) are used.

Two instances of llama.cpp server are used at the same time.

Tested with the following models:

https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/blob/main/mythomax-l2-13b.Q4_K_M.gguf (13B, context 4096 tokens) - used to generate chat messages;
https://huggingface.co/TheBloke/openchat_3.5-GGUF/blob/main/openchat_3.5.Q4_K_M.gguf (7B, context 8192 tokens) - used for question classification, for logic tasks, and for web pages summarization;

If both models run on GPU requires about 20 GB of video memory. Llama.cpp can work on CPU as well, but it is quite slow.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
static		static
templates		templates
LICENSE		LICENSE
README.md		README.md
actors.py		actors.py
data.py		data.py
llm.py		llm.py
memory.py		memory.py
respond.py		respond.py
run_all.sh		run_all.sh
run_chroma.sh		run_chroma.sh
run_llama.sh		run_llama.sh
server.py		server.py
test_util.py		test_util.py
web_search.py		web_search.py

Provide feedback