I made something similar, but used duckDB as the vector store (and query engine)...

jackbravo · on Nov 5, 2024

I love duckdb, but their concurrency model is very limiting:

DuckDB has two configurable options for concurrency:

1. One process can both read and write to the database.

2. Multiple processes can read from the database, but no processes can write (access_mode = 'READ_ONLY').

https://duckdb.org/docs/connect/concurrency.html

barrenko · on Oct 30, 2024

Amy specific reason to use dDB?

I've got a crapload of json q & a formatted discussions on a topic, and am trying to figure out if I just store it somewhere and query it, or do I also do vector embeddings, kinda lost with all the possible options.

pjot · on Oct 30, 2024

Embeddings are what encode the “meaning” of a given text. Similarity search works by computing the angle between your query vector and the rest of the vectors already stored. DuckDB (and columnar stores in general) is great at aggregation. It’s particularly well suited because DuckDB is a single file. There’s no server to muck with.

ekianjo · on Oct 30, 2024

There is vector type data available in duckdb now?

wild_egg · on Oct 30, 2024

They call it a fixed size array type but, yes. It was added earlier this year. Works really great

https://duckdb.org/2024/05/03/vector-similarity-search-vss.h...

pjot · on Oct 30, 2024

Yep! It was added in v0.10.0 - which was released a month or two after I made this.

This is using v0.9.1