A product manager today faces a key architectural question with AI : to use a small language model or a large language model?
The pace of innovation in the field clouds the answer. Each day, researchers publish novel findings on performance, discover new techniques to implement, & surface new challenges to wrestle with.
This is my current mental model of when to choose a large or small model :
When to choose a large model :
- time to ship is critical : many of these models are available via API, requiring formatted data as an index or vector database – which an engineer can achieve within a few hours for a working beta.
- the company would prefer to rely on external experts to drive innovation within the models.
- the company has no plan/interest to staff a team to manage AI infrastructure or develop deep machine learning experience / expertise in-house.
- the product lead would like to minimize career risk by choosing a well-known player.
- the company believes the relatively high costs using these models will decline with time & scale.
When to choose a small model?
- the team has or would like to develop intellectual property around machine learning as a competitive advantage or mechanism to increase the value of the business.
- the company uses proprietary or sensitive data within its models and needs strict controls / guarantees for compliance or legal reasons. The company doesn’t believe sensitive data masking & indexes provide enough security.
- the product has an edge architecture : models are trained or run on mobile phones or hardware at the edge, away from the data center. The computing limitations of those devices, plus the benefit of running models locally (primarily cost) demand a smaller model.
- the business would like to minimize vendor lock-in, keeping an agility to switch to another provider
- the business prefers to manage its AI costs actively by instrumenting code & training built-for-purpose models.
There’s a third option : MLOps businesses offer managed infrastructure with running small-language models, providing simpler management, reduced operating expense, but with the freedom of smaller models.
As the nascent market matures, customers will elect their preferred deployment option. Today, it’s too early to predict which approach will capture the majority of spend & which infrastructure choice suits different use cases best.
We can say though that managed large-language models have a head start, as Microsoft earning showed with its $900m ARR AI business.