Unstructured Workflow Endpoint Quickstart


Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API Workflows S3

RAG with Databricks Vector Search with Context from Multiple Sources


Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks Introductory notebook

Agentic RAG with Hugging Face smolagents vs Vanilla RAG


Build Agentic RAG with smolagents library and compare the results with Vanilla RAG in pure Python
GPT-4o smolagents Agents DataStax S3 Advanced notebook

LLama3.2 RAG evaluation on unstructured text


Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o Ragas LangChain Llama3.2 Pinecone S3 Advanced notebook

Multimodal RAG: Enhancing RAG outputs with image results


Process a file in S3 with Unstructured and return images in your RAG output
S3 FAISS GPT-4o-mini Advanced notebook

Quantitative Reasoning with tables inside PDFs


From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API Hex Advanced notebook

PII removal with GLiNER in unstructured data ETL


Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API PII GLiNER Advanced notebook

Custom metadata extraction and self-querying retrieval


Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API MongoDB Metadata Advanced notebook

Selecting an embedding model for custom data


End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API Hugging Face Advanced notebook

RAG with PDFs, LangChain and Llama 3


A RAG system with the Llama 3 model from Hugging Face.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Unstructured data ETL from S3 to SingleStore DB


Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API SingleStoreDB AWS S3 Introductory notebook

Google Drive to DataStax Astra DB


Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API Google DataStax Introductory notebook

Weaviate RAG quickstart


Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API OpenAI Weaviate Introductory notebook

Preprocess PDFs in AWS S3, load into Elasticsearch


Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API AWS S3 Elasticsearch Introductory notebook

Preprocess documents in Google Drive, load into Databricks Volume


Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API Google Drive Databricks Introductory notebook

Source references in RAG responses


Add document source references to RAG responses based on documents metadata.
Unstructured API RAG LangChain Intermediate notebook

Query processed PDF with HuggingChat


Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to
HuggingChat for chatbot-style querying.
Unstructured API 🤗 Hugging Face 🤗 HuggingChat Introductory notebook

Llama 3 Local RAG with emails


Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API LangChain Ollama Llama 3 Introductory notebook

Building RAG With PowerPoint presentations


A RAG solution that is based on PowerPoint files.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Synthetic test dataset generation


Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API GPT-4o Ragas LangChain Advanced notebook