Comparative and up-to-date information on the selection of Large Language Models for Artificial Intelligence projects.

We will be happy to hear and include news, suggestions and ideas, so don't hesitate to contact us if you have any of them.

Name Aggregated Score Privacy Score by Context size Pricing/Cost
92.67% Low
Data not used to train.
Can be revised to enforce policies and laws
1225 8K tokens
  • Prompt: $0.03/1k tokens
  • Completion: $0.06/1k tokens
  • 92.67% Low
    Data not used to train.
    Can be revised to enforce policies and laws
    1225 32K tokens
  • Prompt: $0.06/1k tokens
  • Completion: $0.12/1k tokens
  • 80.23% Low
    Data not used to train.
    Can be revised to enforce policies and laws
    1143 4K tokens
  • $0.002/1k tokens
  • 78.10% Low
    Enterprise-grade privacy (Vertex AI).
    Can be revised to enforce policies and laws
    1042 8K tokens
  • More information here
  • 66.70% High
    N/A 2k tokens
  • $6200/month
  • 55.80% High
    854 2K tokens
  • $1374/month
  • 53.70% High
    1054 2K tokens
  • $1374/month
  • 51.13% High
    952 2K tokens
  • $2754/month
  • N/A Low
    Data not used to train unless feedback was given by the user.
    Can be revised to enforce policies and laws
    1195 100K tokens
  • Prompt: $0.01102/1k tokens
  • Completion: $0.03268/1k tokens
  • N/A Low
    Data not used to train unless feedback was given by the user.
    Can be revised to enforce policies and laws
    1153 100K tokens
  • Prompt: $0.00163/1k tokens
  • Completion: $0.00551/1k tokens
  • Criteria used in our LLM Comparison

    We have combined a number of elements that we believe give a fairly comprehensive and overall useful overview. We compare the model’s features with some respected third-party scores and benchmarks.

    • Privacy: Privacy is mostly enhanced if you own your model and infrastructure, having full control over your data. Overall, well-known LLMs such as GPT are less private than open-source ones, because with open-source models you are the one that decides where is going to be hosted and have full control over it.
    • User Feedback Score: Based on the LMSYS leaderboard. LMSYS is an organization that aims to provide LLMs that are open source and available to everyone, being also the creators of Vicuna.
    • Context size: The context size refers to how many tokens the LLM can handle. Tokens are the basic units of text or code that an LLM uses to process and generate language. Tokens can be characters, words, subwords, or other segments of text or code, depending on the chosen tokenization method or scheme.
    • Cost: We’ve gathered information from well-known services (like OpenAI or Anthropic) and, for the open-source models, we’ve utilized different hardware based on the requirements recommended by those models, apart from the ones we have tested ourselves.
    • Aggregated score: An average score between three state-of-the-art benchmarks: MMLU (Massive Multi-task Language Understanding), HellaSwag (Commonsense tasks), and ARC (Reasoning).
    Selecting the right LLM

    Selecting the right Language Learning Model is crucial because the effectiveness of your AI solution largely depends on it. Each model has its strengths and weaknesses, and some models may be more suitable for certain tasks than others. Additionally, models vary in terms of resource requirements, such as the amount of training data needed and processing power. The correct model choice can also have implications for your project's privacy and cost. Here are a few factors to consider when choosing a Language Learning Model:

    • Accuracy: How well can the model perform the task you need? Can it generate accurate and coherent responses? Can it understand and generate text in the language or languages you require?
    • Efficiency: How resource-intensive is the model? Does it require a lot of processing power? How long does it take to generate responses?
    • Privacy: Does the model ensure data privacy? Can it handle sensitive data securely? A solid Service Level Agreement is enough or your use case requires extreme control on data access and running everything on-prem?
    • Cost: What is the cost to use or refine the model? Are there associated costs with using the necessary infrastructure to run the model? What’s the cost of associated databases for long-term storage in your use case?
    • Flexibility: Can the model adapt to different tasks or contexts? Is it easy to customize or adapt to your specific needs?
    LLM available options

    The market is incredibly active, with new developments happening almost daily, but today we have curated ten models that are powerful and well-established. These models have been chosen because of their popularity in both commercial (OpenAi, Google…) and open-source solutions (like falcon, currently top 1 in Hugging Face’s leaderboard).

    If you think there are others that should be included, feel free to drop us a line.


    The proprietary options from OpenAI, Anthropic, and Google are currently the most comprehensive and powerful.
    As for open-source models, the best option available is Falcon-40B-instruct, with a reasonable cost given its benchmark scores, and new models appearing every week. Open-source models are very appealing because you can host your own LLM instance on a server under your control, offering different possibilities in terms of security, privacty and cost flexibility. Our approach at PrivateGPT is a combination of models. We're about creating hybrid systems that can combine and optimize the use of different models based on the needs of each part of the project.With the right configuration and design, you can combine different LLMs to offer a great experience while meeting other requirements in terms of security and privacy.

    ChooseLLM is an initiative by PrivateGPT

    If this sounds interesting for your organisation. Submit your application and let us know about your needs and ideas, and we'll get in touch if we can help you.

    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    Please refresh and try again.
    A spin-off project by the agile monkeys.