The Promise and Pitfalls of Chaining Large Language Models for Email by @ttunguz

on

|

views

and

comments



Over the last few weeks I’ve been experimenting with chaining together large language models.

I dictate emails & blog posts often. Recently, I started using Whisper for drafting emails and documents. (Initially there were some issues with memory management, but I’ve since found a compiled version that works well on my Mac called whisper.cpp)

After tying Google’s Duet I wondered if I could replicate something similar. I’ve been chaining the Whisper dictation model together with LLaMA 2 model from Facebook. When drafting an email, I can dictate a response to LLaMA 2, which will then generate a reply using the context from my original email.

So far it works sometimes, but there are some clear limitations:

First, the default tone of the generated emails is far too formal.

Second, if I prompt LLaMA 2 to use a more casual tone, it often goes too far in the other direction. The problem is a lack of nuanced context – the appropriate level of familiarity varies greatly between emails to close colleagues versus board communications or potential investors. Without that nuance labeled and incorporated into the training data, it’s hard for the model to strike the right tone.

Third, in multi-party email threads things can get confusing. If Lauren introduces Rafa to me, then Rafa bccs Lauren on the email, LlaMA 2 often replies as Lauren.

Fourth, figuring out exactly the right settings for the model can be tough. Sometimes I dictate long emails, in which case the context windows (how much the computer listens to before transcribing) should be very long so the system can remember what I’ve said previously.

Other times I’m just returning a very fast email. A quick see you soon or thank you very much. In which case a long context window doesn’t make sense and I’m left waiting for the system to process.

I’m wondering whether small errors in the first model compound in the second model. Bad data from the transcription -> inaccurate prompt to the LLM -> incorrect output.

Overall the potential is exciting, but there are still challenges around tone, context, and multi-party interactions that need to be addressed before this can become a seamless productivity tool. In machine learning systems, achieving an 80% solution is pretty rapid. The marginal 15% – the magic behind ML – takes a huge amount of effort, data, & tuning.

Share this
Tags

Must-read

The Great Bitcoin Crash of 2024

Bitcoin Crash The cryptocurrency world faced the hell of early 2024 when the most popular Bitcoin crashed by over 80% in a matter of weeks,...

Bitcoin Gambling: A comprehensive guide in 2024

Bitcoin Gambling With online currencies rapidly gaining traditional acceptance, the intriguing convergence of the crypto-trek and gambling industries is taking place. Cryptocurrency gambling, which started...

The Rise of Bitcoin Extractor: A comprehensive guide 2024

Bitcoin Extractor  Crypto mining is resources-thirsty with investors in mining hardware and those investing in the resources needed as the main beneficiaries. In this sense,...

Recent articles

More like this