Loader
Load Content into a Document
[edit]The Loader
interface provides a way to load content into a document. It is used to transform a text content into a document, a structured representation of the content. LinGoose offers loaders for different types of content:
To use a loader, you need to create an instance of your preferred loader. Here we show how to create an instance of a loader using the plain text loader:
pdfLoader := loader.NewPDFToText().WithPDFToTextPath("/opt/homebrew/bin/pdftotext")
kbDocuments := loader.LoadFromSource(context.Background(),"./kb/mydocument.pdf")
A loader produces a document for each content it loads. However documents may contain a huge amount of text, and it’s convenient to split them into smaller parts.
audioLoader := loader.NewWhisper().
WithTextSplitter(textsplitter.NewRecursiveCharacterTextSplitter(2000, 200)).
LoadFromSource(context.Background(), "audio.mp3")
A text splitter is a component that splits a document into documents of a smaller size. The RecursiveCharacterTextSplitter
accepts as parameters the size of the text chunks and the size of chunk overlap.