Published on

LLM Specialisation

Authors

WORK IN PROGRESS: This post is a work in progress as I flesh out my thoughts. Expect it to change over time.

Motivation

Almost all thin wrappers on top of LLMs have already been built. What remains is specialised tools which use LLMs within a niche to: Process knowledge and/or perform actions.

LLM specialisation requires the following steps: Data management, LLM management, and agent interfacing

Anyone working in this space will need to get decently good at all of these. Some businesses have already begun making frameworks that help with these steps. Likewise, open source libraries assist with various parts of these steps at varying degrees of quality and in different ways.

We haven't yet seen an open source framework that makes every part of LLM development as easy as Vercel makes it to deploy a website.

Can we develop an open source framework to make every part of it easy?

Steps

1. Data management

Data extraction: Extract source data into a usable format

  • Scraping
  • Document to text
  • git repos

Data transformation, enrichment, storage

  • Chunking
  • Indexing
  • Embeddings generation
  • Storage in a Vector DB for RAG
  • Fine-tuning

2. LLM management

  • Prompt engineering
  • LLM evaluation
  • Chains or graphs: multi-agent solutions
  • Tooling integration

3. LLM interfacing

For your use case, how do you want to interface with the LLM?

  • Plug a chatbot into your site
  • Run the LLM on a wide variety of datapoints
  • Plug a one-off LLM call into your app (e.g. an internal endpoint that solves a specific problem)

pipeline

Existing Tools

There are numerous tools out there trying to solve one or many parts of LLM specialisation. It's somewhat staggering trying to keep up with them all. The need for tooling is generally accepted, but there is no definitive winner in the space that is open source, easy to use, and solves every part of the pipeline, as far as I can tell.

Other AI Tools

Awesome LLM Lists

This list is far from comprehensive, but if you find a really great tool (or really great list of tools), please be sure to let me know.