Self-hosted plugins tutorial
This hands-on tutorial shows how to use the Unstructured self-hosted plugin framework to create a sample plugin. This sample plugin uses a VertexAI model from Google to perform sentiment analysis on the text that Unstructured extracts from documents. For example, given the following custom prompt:
And the following text:
The model returns a sentiment analysis in this format:
Requirements
-
A self-hosted deployment of the Unstructured UI and Unstructured API into infrastructure that you maintain in your Amazon Web Services (AWS), Azure, or Google Cloud Platform (GCP) account. If you do not have a self-hosted deployment, stop and contact your Unstructured sales representative, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website first.
-
A local development machine with Docker Desktop and the Python pacakge and project manager uv installed.
-
For sending requests to the plugin through Docker locally, the curl utility installed on the development machine.
-
For deploying the plugin to your self-hosted Unstructured UI, you must have aceess to a container registry that is compliant with the Open Container Initiative (OCI) and that is also reachable from your AWS, Azure, or GCP account. For example:
- For AWS accounts, Amazon Elastic Container Registry (Amazon ECR).
- For Azure accounts, Azure Container Registry.
- For GCP accounts, Google Artifact Registry (GAR).
You must also have the related command-line interface installed and configured on the development machine:
- For AWS accounts, the AWS CLI.
- For Azure accounts, the Azure CLI.
- For GCP accounts, the Google Cloud CLI.
-
To call the VertexAI portions of this tutorial:
-
The Vertex AI API enabled in the Google Cloud account. Learn how.
-
Within the Google Cloud account, a Google Cloud service account and its related
credentials.json
key file or its contents in JSON format. Create a service account. Create credentials for a service account. -
A single-line string that contains the contents of the downloaded
credentials.json
key file for the service account (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace<path-to-downloaded-key-file>
with the path to thecredentials.json
key file that you downloaded by following the preceding instructions.-
For macOS or Linux:
-
For Windows:
-
Getting started
In this section, you set up the local development environment for this tutorial’s plugin. This includes creating a directory for overall plugin development, creating a virtual environment to isolate and version Python and various code dependencies, installing the Unstructured plugin development tools and their dependencies, and creating and initializing the code project for this tutorial’s plugin.
Identify a directory for overall plugin development
We recommend creating or using a centralized directory on your local development machine to use for developing this and other plugins. If you create a new directory, be
sure to switch to it after you create it. This tutorial uses a directory named plugins
within the
current working directory. For example:
Create a virtual environment within the directory
Use uv
to create a virtual environment within the directory that you want to use for overall plugin development. After you
create the virtual environment, activate it.
This tutorial uses a virtual environment named
plugins_3_12_9
. This virtual environment uses Python 3.12.9. If this Python version is not installed
on the system, uv
installs it first. For example:
Install the Unstructured plugin development tools and their dependencies
Use uv
to install the Unstructured plugin development tools and their dependencies into this virtual environment. These tools and their
dependencies will be the same for all plugins that you develop that use this virtual environment.
The dependent cookiecutter
package is a command-line utility that uses techniques such as wizards along with Python project templates to
initialize new projects based on user input.
Create and initialize this tutorial's code project for the plugin
-
Use the
unstructured-plugins new
command to create the starter code for this tutorial’s plugin development project. This command starts a wizard that is used
to create a new directory for developing this plugin and then creates the plugin’s starter files and subdirectories within that directory: -
When propmpted, enter some display name for the plugin, and then press
Enter
. This tutorial usesSentiment Analysis
as the plugin’s display name: -
Next, enter the plugin’s type, and then press
Enter
. This tutorial usessentiment
as the plugin’s type: -
Next, enter the plugin’s subtype, and then press
Enter
. This tutorial usesanalysis
as the plugin’s subtype: -
A project folder is created within the centralized
plugins
directory. The project folder is namedplugin-
followed by the plugin’s type, another dash, and the plugin’s subtype. For this tutorial, the project folder’s name is namedplugin-sentiment-analysis
.Switch to the plugin’s project folder and then use
uv
to install and update this project’s specific code dependencies:
Write the plugin
In this section, you write the plugin’s runtime logic. This tutorial’s logic is primarily within the project’s src/plugin_sentiment_analysis/__init__.py
file.
Add user interface settings
In this step, you add the user interface (UI) settings for the plugin. The UI settings are the fields that
users see when they add the plugin as a node to a workflow’s visual DAG designer in the UI. The UI settings are defined in the __init__.py
file
of the plugin project’s src/plugin_<type>_<subtype>
subfolder. These settings are specified in the
__init__.py
file’s PluginSettings
class, which is a subclass of
the Pydantic BaseModel
class. The BaseModel
class provides a Pydantic implementation of various type validation,
data parsing, and serialization functionality.
-
In the project’s
src
directory, under theplugin_sentiment_analyis
subdirectory, open the__init__.py
file. -
In the
__init__.py
file, add the necessary imports to capture VertexAI settings that the user sets in the UI. To do this, add the followingfrom...import
statements to the top of the file:The
Literal
is a type hint in Python that restricts a field to specific literal values (such as strings, numbers, or booleans). It enforces that the input must match one of the specified options.The
SecretStr
is a specialized string type in Pydantic for sensitive data (such as passwords and API keys). It masks the value in fields by displaying*****
. -
In the
__init__.py
file’sPluginSettings
class, replace the samplestring_field
setting definition with settings for thelocation
,credentials
, andmodel
fields. The class definition should now look as follows:- The
location
field specifies the location of the VertexAI API. The field in the UI’s help pane for the plugin node will display the title of API Location. - The
credentials
field specifies the JSON credentials for the VertexAI API. The field in the UI will have the title of Credentials JSON. Specifying theSecretStr
type displays the field’s text with asterisks. - The
model
field specifies the model for VertexAI to use. The field in the UI will have the title of Model. The default value for this field isgemini-1.5-flash
. - At run time, the
PluginSettings
class reads these field’s values from the UI and writes them as a JSON dictionary into asettings.json
file in the project’s root for the plugin to read from later.
- The
Integrate with VertexAI
-
Add the necessary VertexAI dependencies:
-
At the top of the
__init__.py
file, add the necessary import statements for calling the VertexAI API and for standard Python logging and JSON parsing: -
In the
__init__.py
file’sPlugin
class, replace the__post_init__
function body with the following definition:- The
__post_init__
function is called after thePlugin
class is initialized. The function reads in the UI field values from thesettings.json
file that thePluginSettings
class wrote to earlier. - The function then prepares the authorization credentials that were provided in the UI to be used by VertexAI.
- The
aiplatform.init
function initializes the VertexAI API with the specified location, project ID, and authorization credentials. - The
GenerativeModel
class gets the model to be used that was specified in the UI.
- The
-
In the
__init__.py
file’sPlugin
class, just before therun
function, add the prompt text to be sent to VertexAI. At run time, this prompt, along with a piece of text that Unstructured extracts from the document, is sent to VertexAI for sentiment analysis:"""
-
In the
__init__.py
file’sPlugin
class, replace therun
function body with the following definition:- The
run
function is called once for every file that is processed. The function takes a list of the elements that Unstructured generated from the file as input. - Each element in the list of elements is a dictionary that contains the text extracted from the document and its related metadata.
- The function sends the prompt and the element’s text to the model.
- The function then adds the sentiment analysis output to the element’s
metadata
field. - After the last element’s sentiment analysis is output into the last element’s
metadata
field, the enitre updated list’s contents are given as input into the next node in the workflow’s DAG.
- The
Run plugin tests locally with pytest
In this section, you manually run the plugin’s tests locally using pytest
to make sure that the plugin’s logic is working as expected before further
testing in Docker and eventual deployment for use in the UI.
In practice, you would typically use a continuous integration and continuous deployment (CI/CD) pipeline to automate running these tests. If any of the tests fail, the pipeline should stop and notify you of the failure. If all of the tests pass, the pipeline should then continue by running the plugin in Docker as a further test.
-
Add the necessary
pytest
dependencies. Also add a dependency on thedotenv
package, which is used to read environment variables from a local.env
file: -
In the project’s
test
directory, at the top of thetest_plugin.py
file, add the following import statements to enable reading local environment variables. Also, call theload_dotenv
function to load the environment variables from the.env
file: -
In the
test_plugin.py
file, update the followingfrom...import
statement to find the specified classes that are defined in thesrc/plugin_sentiment_analyis
folder: -
In the root of the project’s
test
directory, add a blank__init__.py
file. This file is required to allow thesrc
directory to be seen by thetest
directory to enable the precedingfrom...import
statement to work.
-
In the
test_plugin.py
file, replace theplugin
function body with the following definition:- The
plugin
function is a fixture that sets up the plugin’s infrastructure for thetest_plugin
test function that follows. - The function reads the
VERTEXAI_CREDENTIALS
environment variable from the.env
file that you will create next. - Instead of using the
settings.json
file that would normally be used by thePluginSettings
class, the function creates a temporarysettings.json
file just for these tests. This temporary file contains sample values for the API Location and Credentials JSON fields that users would have otherwise specified when using the plugin in the UI.
- The
-
In the project’s root, create a file named
.env
. In this file, add an environment variable namedVERTEXAI_CREDENTIALS
, and set it to the single-line representation of thecredentials.json
file that you generated in this tutorial’s requirements:If you plan to publish this plugin’s source code to an external repository such as GitHub, do not include the
.env
file in the repository, as it can expose sentitive information publicly, such as your credentials for the VertexAI API.To help prevent this file from accidentally being included in the repository, add a
.env
entry to a.gitignore
file in the root of the project. -
In the
test_plugin.py
file, replace thetest_plugin
function body with the following definition. The function body definition should now look as follows:- The
test_plugin
function is a test case that uses theplugin
fixture to run the plugin’s logic. - The function takes a list of Unstructured-formatted elements as input. The first element in the list contains the text that is used to test the plugin.
- The function then runs the plugin’s logic and checks that the output is as expected.
- The function checks that the output contains the expected values for the
toxicity
,emotion
, andintent
fields that are returned. If the expected values match, the test passes. Otherwise, the test fails.
- The
Run the test
To run the test, use the following command to run pytest
though the test
target in the file named Makefile
in the root of the project:
If the test passes, you should see something similar to the following:
Run the plugin in Docker locally
In this section, you proceed with local testing by manually running the plugin in Docker locally. This allows you to more fully test the plugin’s logic in an isolated environment before you deploy it into your self-hosted UI.
In practice, you would typically use a CI/CD pipeline to automate running the plugin in Docker and testing the output against an expected result. If the plugin’s output does not match the expected result, the pipeline should stop and notify you of the failure. If the plugin’s output matches the expected result, the pipeline should then continue by deploying the plugin to the staging version of your self-hosted Unstructured UI.
In your local machine’s home directory, create a hidden file named .vertex-plugin-settings.json
. This file contains
information that your local installation of Docker passes into the running container. In this file, add the following JSON content:
In the preceding JSON:
- Replace
<location>
with the location of the VertexAI API that you want to use, for example,us-east1
. - Replace
<single-line-credentials-json>
with the single-line representation of thecredentials.json
file that you generated in this tutorial’s requirements.
This .vertex-plugin-settings.json
file contains sensitive information and
is intended for local Docker testing only. Do not check in this file with your plugin’s source code.
-
In the file named
Makefile
in the root of the project, replace the.PHONY: run-docker
definition with the following definition:The
run-docker
target builds the Docker image locally and then runs it as a container representing the plugin. -
Start Docker Desktop on your local machine, if it is not already running.
-
Run the following command to call the
run-docker
target, which builds the Docker image and then runs the resulting container, representing the plugin:You must leave this terminal window open and running while you are testing the plugin locally within the running Docker container. If you interrupt the running process here or close this terminal window, the Docker container stops running, and the plugin stops working.
Send a request to the listening plugin
-
In a new terminal window, use the following
curl
command to send a request to the plugin that is running in the Docker container. The request contains some sample text that you want VertexAI to perform sentiment analysis on along with some pretend metadata in the format that is typically generated by Unstructured during processing. -
If successful, the output should look similar to the following. Notice that the
toxicity
,emotion
, andintent
fields were added to the element’smetadata
field (JSON formatting has been applied here for better readability): -
When you are done testing, you can stop the plugin by interrupting or closing the terminal window where the Docker container is running.
Deploy the plugin to your self-hosted UI
In this section, you manually deploy the successfully-tested plugin for your users to add to their workflows’ DAGs within your self-hosted Unstructured UI. This section describes how to deploy the plugin from your local development machine directly into your existing container registry.
In practice, you would typically use a CI/CD pipeline to automate deploying the plugin.
Specify the name of your container registry
In the file named Makefile
in the root of the project, set the IMAGE_REGISTRY
variable, replacing REGISTRY_NAME_REPLACE_ME
with the name of your container registry.
To get the name of your container registry if you do not already know it, run the command that is appropriate for your container registry. For example:
- For AWS ECR, run the AWS CLI command aws ecr describe-repositories with the appropriate command-line options.
- For Azure Container Registry, run the Azure CLI command az acr list with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud artifacts repositories list with the appropriate command-line options.
The container registry name typically takes the following format:
- For AWS ECR,
<aws_account_id>.dkr.ecr.<region>.amazonaws.com
- For Azure Container Registry,
<acr-name>.azurecr.io
- For GAR,
<location>-docker.pkg.dev/<project-id>/<repository-name>
Specify the username and password for access to your container registry
Set the following environment variables to the appropriate username and password for access to your container registry:
PLUGIN_REGISTRY_USERNAME
PLUGIN_REGISTRY_PASSWORD
For example:
In the preceding commands, for <container-registry-login-command>
, run the command that is appropriate for your container registry. For example:
- For AWS ECR, you do not run a separate login command here.
- For Azure Container Registry, run the Azure CLI command az acr login with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud auth configure-docker with the appropriate command-line options.
In the preceding commands, to get the value for <password>
, run the command that is appropriate for your container registry. For example:
- For AWS ECR, run the AWS CLI command aws ecr get-login-password with the appropriate command-line options.
- For Azure Container Registry, run the Azure CLI command az acr credential show with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud auth print-access-token with the appropriate command-line options.
Build and deploy the plugin's container
Run the following commands, one command at a time, to build the plugin’s container, deploy it to your container registry, and make the plugin available for use in the staging version of your self-hosted Unstructured UI:
Test the plugin in your UI
Test the plugin in staging
- Sign in to the staging version of your self-hosted Unstructured UI.
- Create a new workflow or open an existing workflow.
- In the workflow’s visual DAG designer, click the
+
icon anywhere between a Chunker node and a Destination node, and select Plugins > Sentiment Analysis. - Click the Sentiment Analysis node to open its settings pane.
- In the settings pane, enter the required settings for the plugin.
For example, enter the location of the VertexAI API, the single-string version of the
credentials.json
file’s contents for accessing the VertexAI API, and the model for VertexAI to use. - Run the workflow.
- When the workflow is finished, go to the destination location, and look for the
toxicity
,emotion
, andintent
values that the plugin adds to themetadata
field for each element that Unstructured generated based on the source files’ contents.
Make any changes to the plugin
If you need to make any changes to the plugin, you can do so by returning to the previous section titled Write the plugin.
Make the necessary code changes and then:
- Run plugin tests locally with pytest.
- Run the plugin in Docker locally.
- Increment the plugin’s version number. To do this, in the project’s
src/plugin_sentiment_analyis/__init__.py
file, update the value ofversion
in thePLUGIN_MANIFEST
variable, for example from0.0.1
to0.0.2
. Then save this file. - Deploy the plugin again to the staging version of your self-hosted Unstructured UI.
- Test the updated plugin again in staging.
Keep repeating this loop until you are satisfied with the plugin’s performance in staging.
Promote the plugin to production
After you have tested the plugin in your staging UI and are satisfied with its performance, you can promote it from staging to production. To do this, run the following command:
Of coursse, you should immediately sign in to the production version of your self-hosted Unstructured UI and test the plugin from there there before you start advertising its availability to your users.
Congratulations! You have successfully created, tested, and deployed your first custom plugin into your self-hosted Unstructured UI that your users can now add to their workflow DAGs to unlock new capabilities and insights for their files and data!
Was this page helpful?