WORK IN PROGRESS: This post is a work in progress as I flesh out my thoughts. Expect it to change over time.

Motivation

Almost all thin wrappers on top of LLMs have already been built. What remains is specialised tools which use LLMs within a niche to: Process knowledge and/or perform actions.

LLM specialisation requires the following steps: Data management, LLM management, and agent interfacing

Anyone working in this space will need to get decently good at all of these. Some businesses have already begun making frameworks that help with these steps. Likewise, open source libraries assist with various parts of these steps at varying degrees of quality and in different ways.

We haven't yet seen an open source framework that makes every part of LLM development as easy as Vercel makes it to deploy a website.

Can we develop an open source framework to make every part of it easy?

Steps

1. Data management

Data extraction: Extract source data into a usable format

Scraping
Document to text
git repos

Data transformation, enrichment, storage

Chunking
Indexing
Embeddings generation
Storage in a Vector DB for RAG
Fine-tuning

2. LLM management

Prompt engineering
LLM evaluation
Chains or graphs: multi-agent solutions
Tooling integration

3. LLM interfacing

For your use case, how do you want to interface with the LLM?

Plug a chatbot into your site
Run the LLM on a wide variety of datapoints
Plug a one-off LLM call into your app (e.g. an internal endpoint that solves a specific problem)

pipeline

Existing Tools

There are numerous tools out there trying to solve one or many parts of LLM specialisation. It's somewhat staggering trying to keep up with them all. The need for tooling is generally accepted, but there is no definitive winner in the space that is open source, easy to use, and solves every part of the pipeline, as far as I can tell.

Palantir's AIP
- Cool, but closed source.
Langchain & Langgraph
- Cool, but often syntactically awkward, regularly brick your build (in TS), and don't facilitate low-code prototyping. Their docs could use some work. This is the closest open source project I know of attempting to cover the solution e2e.
https://github.com/FlowiseAI/Flowise
- Fully open source
https://github.com/botpress/botpress
- Part open, part closed. Their UI appears to be closed source.
https://github.com/crewAIInc/crewAI
https://github.com/dagworks-inc/burr
LlamaIndex: https://github.com/run-llama/LlamaIndexTS and https://github.com/run-llama/llama_index
https://github.com/Significant-Gravitas/AutoGPT
https://github.com/microsoft/autogen
https://github.com/langflow-ai/langflow
- Open source
- Low-code, graph-based
https://github.com/langroid/langroid
https://github.com/deepset-ai/haystack

Other AI Tools

Awesome LLM Lists

This list is far from comprehensive, but if you find a really great tool (or really great list of tools), please be sure to let me know.