-
I'd like to use Semantic Kernel for a mostly-RAG application that can also call some custom functions (otherwise I'd probably just use kernel memory). But - when I integrate kernel memory as memory plugin, I think it always needs 2 LLM calls even though it might be a basic RAG call. This seems bad for cases where 99% of the cases are basic RAG functions. Any idea to work around this? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
@chaelli the first call is about intent detection, do you know which code is making that first request? About KM, the ASK API uses these 2 requests:
|
Beta Was this translation helpful? Give feedback.
-
@dluc thanks for the quick reply.
but as I forgot about the embedding part, maybe I miscounted. but - intent detection will always need a separate call right? |
Beta Was this translation helpful? Give feedback.
-
so I wonder if there is any way to get around this? |
Beta Was this translation helpful? Give feedback.
@chaelli the first call is about intent detection, do you know which code is making that first request? About KM, the ASK API uses these 2 requests: