After partitioning and chunking, you can have Unstructured generate text-based summaries of detected images.This summarization is done by using models offered through these providers:
Here is an example of the output of a detected image using GPT-4o. Note specifically the text field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
Copy
Ask AI
{ "type": "Image", "element_id": "3303aa13098f5a26b9845bd18ee8c881", "text": "{\n \"type\": \"graph\",\n \"description\": \"The graph shows the relationship between Potential (V) and Current Density (A/cm2). The x-axis is labeled 'Current Density (A/cm2)' and ranges from 0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges from -2.5 to 1.5. There are six different data series represented by different colors: blue (10g), red (4g), green (6g), purple (2g), orange (Control), and light blue (8g). The data points for each series show how the potential changes with varying current density.\"\n}", "metadata": { "filetype": "application/pdf", "languages": [ "eng" ], "page_number": 1, "image_base64": "/9j...<full results omitted for brevity>...Q==", "image_mime_type": "image/jpeg", "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf", "data_source": {} }}
Any embeddings that are produced after these summaries are generated will be based on the text field’s contents.
To generate image descriptions, in an Enrichment node in a workflow, specify the following:
You can change a workflow’s image description settings only through Custom workflow settings.Image summaries are generated only when the Partitioner node in a workflow is also set to use the High Res partitioning strategy. Learn more.
Select Image, and then choose one of the following provider (and model) combinations to use: