The Fracking of Information by @ttunguz

on

|

views

and

comments



Large language models enable fracking of documents. Historically, extracting value from unstructured text files has been difficult. But LLMs do this beautifully, pumping value from one of the hardest places to mine.

We have a collection of thousands of notes researching startups. We are tinkering with deploying large language models on top of them.

Here are some quick observations about our initial experiments :

The Future is Constellations of Models. When faced with a search box, a user might ask quantitative questions. For example, how many people from Google have a met in the last month?

Unfortunately, large language models – at least the ones that we have tested – do not answer quantitative questions in this way.

That’s problematic because users don’t stop to think about the type of query (quantitative, classification, segmentation, prediction, etc) before they type it into a search box.

To solve this, knowledge management systems will likely employ a constellation of different models. Perhaps the first model will classify the query, then route it to the right machine learning model to answer.

Summarization works out of the box. We have been researching the robotic process automation (RPA) space. Here is a subset of that output. One could imagine replacing the background or introduction sections in an investment memo or producing a blog post from this in about 2 minutes on a laptop. Editing remains essential.

Question:
write a summary of the RPA space & the opportunities & challenges within it

Answer (took 119.74 s.):
The context mentions that there are problems with maintaining consistency
and quality in process discovery documents, which can cause issues for
business continuity. There may be multiple factors contributing to these
difficulties such as high attrition rates or lack of experience among
personnel involved during development phase due diligence on large
enterprises. However the most important factor…

source_documents/redacted.txt
the Rpa market is converging with the process discovery market, so
understanding which process is to automate and then automating them are
really important.Rpa is a bottoms-up business and process discovery is
tops down

Source Identification Matters
LLM models are now linking to the source text. In the example above, the model cites the file (whose name I’ve redacted) & the location of the contributing source.

This behavior matters for two reasons. First, it builds trust & credibility in the model. Questions will inevitably arise from summaries. Drilling down to the root answer should assuage those doubts.

Second, this pattern should limit hallucinations, when models “invent” answers without basis in the source or training data.

Ubiquity means being everywhere. Our business maintains a single knowledge repository but outputs will appear in email, presentations, investment memos, blog posts, & search results.

New knowledge management systems will find a way to be integrated into all those outputs while respecting permissions, governance, & other policies that matter to a business.

If data is the new oil, then LLMs are the environmentally friendly fracking rigs, blasting value from unstructured text shale formations.

Share this
Tags

Must-read

The Great Bitcoin Crash of 2024

Bitcoin Crash The cryptocurrency world faced the hell of early 2024 when the most popular Bitcoin crashed by over 80% in a matter of weeks,...

Bitcoin Gambling: A comprehensive guide in 2024

Bitcoin Gambling With online currencies rapidly gaining traditional acceptance, the intriguing convergence of the crypto-trek and gambling industries is taking place. Cryptocurrency gambling, which started...

The Rise of Bitcoin Extractor: A comprehensive guide 2024

Bitcoin Extractor  Crypto mining is resources-thirsty with investors in mining hardware and those investing in the resources needed as the main beneficiaries. In this sense,...

Recent articles

More like this