As a Chief Information Officer (CIO), you’re likely exploring ways to leverage artificial intelligence (AI) to enhance your business operations. One area of AI that has gained significant traction in recent years is Large Language Models (LLMs). These models can generate human-like text, making them useful for a variety of applications, from customer service chatbots to content creation tools.
In this blog post, we’ll explore two main avenues for deploying LLMs: self-hosted open-weights LLMs and commercial offerings from tech giants like Azure, Google, and OpenAI.
Self-Hosted Open-Weights LLMs
The analogue of open-source software in the world of LLMs are the open-weight models. Self-hosted open-weights LLMs are models that you can host on your own servers.
The structure of the LLMs are well-defined, based on the transformer architecture. What defines a particular LLM over this structure are the learned “weights” between the neural network layers after the training is completed. Training an LLM from scratch is a major undertaking requiring expertise and significant hardware resources. Luckily, there are open-weight models released with commercially friendly licenses, such as LLaMA from Meta and mistral from Mistral AI. Moreover, there are specialized LLMs produced by fine tuning open-weight foundational models.
These models offer several benefits:
- Security and Privacy: Hosting the model on your own servers can provide enhanced data security and privacy.
- Customization: You can fine-tune the model to better suit your specific use cases.
- Control: You can have full control over the data used for fine-tuning and the outputs generated by the model.
However, self-hosting LLMs can be challenging. It requires significant computational resources and technical expertise. You’ll need to consider where to host the LLMs and whether you have the necessary infrastructure. There are variants of the LLMs, usually with lower resolution quantization of the weights, that can run –not train– on the CPU only; however, for reasonable response times in many use cases, you will need GPUs. Unfortunately, GPUs are scarce commodities these days, there are long lead times in procuring server-grade GPUs.
A key task is to right-size the infrastructure: RAM, GPU, CPU, and disk requirements will be driven by the model parameter size, i.e. the number of weights (look for the 7B, 13B, 70B, etc. in the model name), the quantization level (Q8, Q5, Q4, etc. in the model name), number of simultaneous users and the latency budget.
It is probably a good idea to deploy dedicated servers for a use case or a group of related use cases instead of a single huge server to handle any use case. This gives more control over the LLM selection that best suits a use case, more control over access permissions. At TAZI we give the Platform administrators the controls over who can access which LLM instance.
Commercial Offerings
Commercial offerings, such as from Azure, Google, and OpenAI provide access to state-of-the-art LLMs with robust support and infrastructure.
While commercial offerings provide ease of use and access to cutting-edge technology, they may not offer the same level of control and customization as self-hosted models. Also, costs can be a factor, as these services often operate on a pay-as-you-go pricing model. Many of the providers give guarantees in the terms and conditions for their pay-for-use APIs, not to use your data for LLM training, thus preventing sensitive information leakage. You should check with your legal team if these guarantees are enough for your company policies or industry regulations.
Conclusion
Choosing between self-hosted open-weights LLMs and commercial offerings depends on your company’s specific needs, resources, and technical capabilities. Both avenues offer their own sets of advantages and challenges. As a CIO, understanding these options can help you make an informed decision about deploying LLMs in your company.
At TAZI AI, we are supporting multiple LLM backends, open-weight and commercial, in recognition of these variety of requirements. The administrator can configure which LLMs are accessible for the model and application developers on the Platform.
Remember, the world of LLMs is rapidly evolving. Staying updated with the latest research and developments in the field will ensure that your company can effectively leverage these powerful tools to enhance your business operations.