There are some great guides out there on how to create long-term memory for AI applications using embedding-based vector stores like ChromaDB or Pinecone. These vector stores are well-suited for storing unstructured text data. But what if you want to query data that’s already in a SQL database – or what if you have tabular data that doesn’t make sense to write into a dedicated vector store? 

For example, what if we want to ask arbitrary historical questions about how many GitHub issues have been created in the Airbyte repo, how many PRs have been merged, and who was the most active contributor overall time? Pre-calculated embeddings would not be able to answer these questions, since they rely upon aggregations that are dynamic and whose answers are changing constantly. It would be nearly impossible – and efficient – to try to answer these questions with pre-formed text documents and vector-based document retrieval.

Leave a Reply

Your email address will not be published. Required fields are marked *