Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I made something similar, but used duckDB as the vector store (and query engine)! It’s impressively fast

https://github.com/patricktrainer/duckdb-embedding-search



I love duckdb, but their concurrency model is very limiting:

DuckDB has two configurable options for concurrency:

1. One process can both read and write to the database.

2. Multiple processes can read from the database, but no processes can write (access_mode = 'READ_ONLY').

https://duckdb.org/docs/connect/concurrency.html


Amy specific reason to use dDB?

I've got a crapload of json q & a formatted discussions on a topic, and am trying to figure out if I just store it somewhere and query it, or do I also do vector embeddings, kinda lost with all the possible options.


Embeddings are what encode the “meaning” of a given text. Similarity search works by computing the angle between your query vector and the rest of the vectors already stored. DuckDB (and columnar stores in general) is great at aggregation. It’s particularly well suited because DuckDB is a single file. There’s no server to muck with.


There is vector type data available in duckdb now?


They call it a fixed size array type but, yes. It was added earlier this year. Works really great

https://duckdb.org/2024/05/03/vector-similarity-search-vss.h...


Yep! It was added in v0.10.0 - which was released a month or two after I made this.

This is using v0.9.1




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: