Purity Test Logo
How good is your RAG pipeline? take the test below to find out.
Caution: This is not a checklist. Completion of all items on this test will likely result in customer churn.
  1. I blindly chunk by character limit, instead of looking for stop tokens.
  2. I use more than 80% of the context window for each generation.
  3. I vectorize full documents, without creating chunks.
  4. I only use vector search for retrieving documents.
  5. I pack the context window with more than 10 results for each generation.
  6. I think I'm too cool to use plaintext search in my RAG pipeline
  7. I still use text-ada-002 embeddings model from OpenAI.
  8. I do not use a semantic filter or post-processing step.
  9. I blindly process user query without protecting for prompt injection.
  10. I blindly dump user documents into my vector index without any pre-processing or reformatting
  11. I don't inject context to my document chunks before vectorizing them
  12. I still use the standard RAG prompt from the Langchain tutorial.
  13. I evaulate my RAG pipeline performance using vibes.
  14. I don't use LlamaParse for processing PDFs.
  15. I don't use overlapping chunks to prevent information loss.
  16. I rely on cosine similarity alone to rank results.
  17. I utilize a minimum cosine similarity threshold to filter out irrelevant results.
  18. I do not use query expansion.
  19. I have not experimented with chunk sizes.
  20. I have not updated my pipeline in the last 6 months.