AI for Confluence Architecture
Architecture Overview
AI for Confluence is a plugin for Confluence that enhances it with AI capabilities, supporting integration with multiple large models
Call LLM Directly
In some cases, the system will directly call the LLM after performing some processing on the user query:
Summarize , translate
Content generation in Confluence page editor (brainstorm, change tunes …)
RAG (Retrieval-Augmented Generation) in AI for Confluence
The plugin is based on the RAG model and provides intelligent capabilities integration with Confluence knowledge base. During user queries, it leverages content from Confluence and utilizes a large language model (LLM) to deliver answers, along with the sources of the content. Typical use cases are:
Chat
Intelligent query
Define
Content indexing
During the indexing stage, documents are pre-processed in a way that enables efficient search during the retrieval stage.
Indexing Steps
Get text from Confluence contents
Chun the text into smaller chunks
Create a numerical embedding for each chunk of text
Load embeddings to vector database
What kind of contents will be indexed ?
Page
Blog
Attachments
Office docs (
.docx
,.pptx
,.xlsx
)PDF (
.pdf
)Text files (
.txt
,.xml
,.html
,.rtf
etc)
Will the achieved contents be indexed?
The plugin will call confluence search API to get the new contents to index for embedding, achieved docs can not be searched, so that will not be indexed either.
Where the embeddings stored
The embeddings are stored in vector database, there’re two option supported for the vector DB:
In Memory
Embeddings are stored in an in-memory vector DB for each Confluence cluster node. In-memory means that the embeddings will be lost when Confluence or plugin restarts, the plugin will re-do the indexing for such cases.
Due to limitations in memory size, not all data can be stored in the in-memory database. The system includes a configuration parameter Max Index Number to set the maximum number of entries.
It’s recommended that in-memory database is only used for testing or demo environments. For production environments, it is suggested to use Chroma DB.
Chroma DB
Chroma DB is an open-source tool recommended for use as a database if Confluence has a large volume of data to index. Customer is required to install a Chroma DB that can be accessed by Confluence.
Scopes of Index
All the contents in Confluence will be indexed by default. Confluence administrator can customized scopes in by Spaces and CQL with Knowledge Configuration of the plugin configuration.
Spaces
Spaces
All spaces
Site spaces (does not include personal spaces)
Specified spaces by space name
Content Types: Page, Blog, Attachment
Labels
CQL: Customize contents to index with CQL (Confluence Query Language)
Frequency of Indexing
Indexing job will be executed once per minute for incremental contents by default, it can be configured by Scheduled Jobs in by Confluence administrator.
Intelligent Query
When users utilize "Ask AI" in the Confluence search bar for intelligent searches, this plugin will combine content from Confluence as context for the query provided to the LLM. The LLM will generate responses based on the query and context, and we will provide the answers to the users along with the source links.
Query Steps:
Convert query to embeddings
Lookup relevant documents using embeddings and vector database
Filter the relevant documents by Confluence permissions
Combine the query and filtered relevant context to LLM
Return the answer from LLM and source links for relevant contents to users
Q&A
Q: Is Confluence data send to LLM training the model ?
A: No,Confluence data will not be sent to LLM for model training. We use RAG instead of fine-tuning.
Q: What kind of data will be send to LLM?
A: Query prompts + relevant context
Q: Can user query contents which are blocked by Confluence permissions ?
A: No, the Permission Filter will check the permissions and filter contents to user according to permission settings of confluence (e.g. Space permission, Page permission).
Q: Do you (XDevPod) collect user data?
A: No, we do not collect user data