Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.go		client.go
collection.go		collection.go
document.go		document.go
embedding.go		embedding.go
go.mod		go.mod
go.sum		go.sum
query.go		query.go
vector.go		vector.go

Repository files navigation

chromem-go

In-memory vector database for Go with Chroma-like interface.

It's not a library to connect to the Chroma database. It's an in-memory database on its own, meant to enable retrieval augmented generation (RAG) applications in Go without having to run a separate database.
As such, the focus is not scale or performance, but simplicity.

⚠️ The initial implementation is fairly naive, with only the bare minimum in features. But over time we'll improve and extend it.

Interface

Our inspiration, the Chroma interface, is the following (taken from their README).

import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
    documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
    ids=["doc1", "doc2"], # unique for each doc
)

# Query/search 2 most similar results. You can also .get by id
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

Our Go library exposes the same interface:

package main

import "github.com/philippgille/chromem-go"

func main() {
    // Set up chromem-go in-memory, for easy prototyping. Persistence will be added in the future.
    client := chromem.NewClient()

    // Create collection. GetCollection, GetOrCreateCollection, DeleteCollection will be added in the future.
    collection := client.CreateCollection("all-my-documents", nil, nil)

    // Add docs to the collection. Update and delete will be added in the future.
    // Row-based API will be added when Chroma adds it!
    _ = collection.Add(ctx,
        []string{"doc1", "doc2"}, // unique ID for each doc
        nil, // We handle embedding automatically. You can skip that and add your own embeddings as well.
        []map[string]string{{"source": "notion"}, {"source": "google-docs"}}, // Filter on these!
        []string{"This is document1", "This is document2"},
    )

    // Query/search 2 most similar results. Getting by ID will be added in the future.
    results, _ := collection.Query(ctx,
        "This is a query document",
        2,
        map[string]string{"metadata_field": "is_equal_to_this"}, // optional filter
        map[string]string{"$contains": "search_string"},         // optional filter
    )
}

Initially, only a minimal subset of all of Chroma's interface is implemented or exported, but we'll add more in future versions.

Features

Embedding creators:
- OpenAI ada v2 (default)
- Bring your own
- Mistral (API)
- ollama
- LocalAI
Similarity search:
- Exact nearest neighbor search using cosine similarity
- Approximate nearest neighbor search with index
  - Hierarchical Navigable Small World (HNSW)
  - Inverted file flat (IVFFlat)
Filters:
- Document filters: $contains, $not_contains
- Metadata filters: Exact matches
- Operators ($and, $or etc.)
Storage:
- In-memory
- Persistent (file)
- Persistent (others (S3, PostgreSQL, ...))

Usage

For a full, working example, using the vector database for retrieval augmented generation (RAG), see example/main.go

Inspirations

Shoutout to @eliben whose blog post and example code inspired me to start this project!
Chroma: Looking at Pinecone, Milvus, Qdrant, Weaviate and others, Chroma stood out by showing its core API in 4 commands on their README and on the landing page of their website. It was also one of the few ones with an in-memory mode (for Python).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chromem-go

Interface

Features

Usage

Inspirations

About

Releases 7

Contributors 5

Languages

License

philippgille/chromem-go

Folders and files

Latest commit

History

Repository files navigation

chromem-go

Interface

Features

Usage

Inspirations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Contributors 5

Languages