Requirements
- A self-hosted deployment of the Unstructured UI and Unstructured API into infrastructure that you maintain in your Amazon Web Services (AWS), Azure, or Google Cloud Platform (GCP) account. If you do not have a self-hosted deployment, stop and contact your Unstructured sales representative, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website first.
- A local development machine with Docker Desktop and the Python pacakge and project manager uv installed.
- For sending requests to the plugin through Docker locally, the curl utility installed on the development machine.
-
For deploying the plugin to your self-hosted Unstructured UI, you must have aceess to a container registry that is compliant with the Open Container Initiative (OCI) and
that is also reachable from your AWS, Azure, or GCP account. For example:
- For AWS accounts, Amazon Elastic Container Registry (Amazon ECR).
- For Azure accounts, Azure Container Registry.
- For GCP accounts, Google Artifact Registry (GAR).
- For AWS accounts, the AWS CLI.
- For Azure accounts, the Azure CLI.
- For GCP accounts, the Google Cloud CLI.
-
To call the VertexAI portions of this tutorial:
- A Google Cloud account.
- The Vertex AI API enabled in the Google Cloud account. Learn how.
-
Within the Google Cloud account, a Google Cloud service account and its related
credentials.jsonkey file or its contents in JSON format. Create a service account. Create credentials for a service account. -
A single-line string that contains the contents of the downloaded
credentials.jsonkey file for the service account (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace<path-to-downloaded-key-file>with the path to thecredentials.jsonkey file that you downloaded by following the preceding instructions.-
For macOS or Linux:
-
For Windows:
-
For macOS or Linux:
Getting started
In this section, you set up the local development environment for this tutorial’s plugin. This includes creating a directory for overall plugin development, creating a virtual environment to isolate and version Python and various code dependencies, installing the Unstructured plugin development tools and their dependencies, and creating and initializing the code project for this tutorial’s plugin.1
Identify a directory for overall plugin development
We recommend creating or using a centralized directory on your local development machine to use for developing this and other plugins. If you create a new directory, be
sure to switch to it after you create it. This tutorial uses a directory named
plugins within the
current working directory. For example:2
Create a virtual environment within the directory
Use
uv to create a virtual environment within the directory that you want to use for overall plugin development. After you
create the virtual environment, activate it.This tutorial uses a virtual environment named
plugins_3_12_9. This virtual environment uses Python 3.12.9. If this Python version is not installed
on the system, uv installs it first. For example:3
Install the Unstructured plugin development tools and their dependencies
Use The dependent
uv to install the Unstructured plugin development tools and their dependencies into this virtual environment. These tools and their
dependencies will be the same for all plugins that you develop that use this virtual environment.cookiecutter package is a command-line utility that uses techniques such as wizards along with Python project templates to
initialize new projects based on user input.4
Create and initialize this tutorial's code project for the plugin
-
Use the
unstructured-plugins newcommand to create the starter code for this tutorial’s plugin development project. This command starts a wizard that is used
to create a new directory for developing this plugin and then creates the plugin’s starter files and subdirectories within that directory: -
When propmpted, enter some display name for the plugin, and then press
Enter. This tutorial usesSentiment Analysisas the plugin’s display name: -
Next, enter the plugin’s type, and then press
Enter. This tutorial usessentimentas the plugin’s type: -
Next, enter the plugin’s subtype, and then press
Enter. This tutorial usesanalysisas the plugin’s subtype: -
A project folder is created within the centralized
pluginsdirectory. The project folder is namedplugin-followed by the plugin’s type, another dash, and the plugin’s subtype. For this tutorial, the project folder’s name is namedplugin-sentiment-analysis. Switch to the plugin’s project folder and then useuvto install and update this project’s specific code dependencies:
Write the plugin
In this section, you write the plugin’s runtime logic. This tutorial’s logic is primarily within the project’ssrc/plugin_sentiment_analysis/__init__.py file.
1
Add user interface settings
In this step, you add the user interface (UI) settings for the plugin. The UI settings are the fields that
users see when they add the plugin as a node to a workflow’s visual DAG designer in the UI. The UI settings are defined in the
__init__.py file
of the plugin project’s src/plugin_<type>_<subtype> subfolder. These settings are specified in the
__init__.py file’s PluginSettings class, which is a subclass of
the Pydantic BaseModel class. The BaseModel class provides a Pydantic implementation of various type validation,
data parsing, and serialization functionality.-
In the project’s
srcdirectory, under theplugin_sentiment_analyissubdirectory, open the__init__.pyfile. -
In the
__init__.pyfile, add the necessary imports to capture VertexAI settings that the user sets in the UI. To do this, add the followingfrom...importstatements to the top of the file:TheLiteralis a type hint in Python that restricts a field to specific literal values (such as strings, numbers, or booleans). It enforces that the input must match one of the specified options. TheSecretStris a specialized string type in Pydantic for sensitive data (such as passwords and API keys). It masks the value in fields by displaying*****. -
In the
__init__.pyfile’sPluginSettingsclass, replace the samplestring_fieldsetting definition with settings for thelocation,credentials, andmodelfields. The class definition should now look as follows:- The
locationfield specifies the location of the VertexAI API. The field in the UI’s help pane for the plugin node will display the title of API Location. - The
credentialsfield specifies the JSON credentials for the VertexAI API. The field in the UI will have the title of Credentials JSON. Specifying theSecretStrtype displays the field’s text with asterisks. - The
modelfield specifies the model for VertexAI to use. The field in the UI will have the title of Model. The default value for this field isgemini-1.5-flash. - At run time, the
PluginSettingsclass reads these field’s values from the UI and writes them as a JSON dictionary into asettings.jsonfile in the project’s root for the plugin to read from later.
- The
2
Integrate with VertexAI
-
Add the necessary VertexAI dependencies:
-
At the top of the
__init__.pyfile, add the necessary import statements for calling the VertexAI API and for standard Python logging and JSON parsing: -
In the
__init__.pyfile’sPluginclass, replace the__post_init__function body with the following definition:- The
__post_init__function is called after thePluginclass is initialized. The function reads in the UI field values from thesettings.jsonfile that thePluginSettingsclass wrote to earlier. - The function then prepares the authorization credentials that were provided in the UI to be used by VertexAI.
- The
aiplatform.initfunction initializes the VertexAI API with the specified location, project ID, and authorization credentials. - The
GenerativeModelclass gets the model to be used that was specified in the UI.
- The
-
In the
__init__.pyfile’sPluginclass, just before therunfunction, add the prompt text to be sent to VertexAI. At run time, this prompt, along with a piece of text that Unstructured extracts from the document, is sent to VertexAI for sentiment analysis:""" -
In the
__init__.pyfile’sPluginclass, replace therunfunction body with the following definition:- The
runfunction is called once for every file that is processed. The function takes a list of the elements that Unstructured generated from the file as input. - Each element in the list of elements is a dictionary that contains the text extracted from the document and its related metadata.
- The function sends the prompt and the element’s text to the model.
- The function then adds the sentiment analysis output to the element’s
metadatafield. - After the last element’s sentiment analysis is output into the last element’s
metadatafield, the enitre updated list’s contents are given as input into the next node in the workflow’s DAG.
- The
Run plugin tests locally with pytest
In this section, you manually run the plugin’s tests locally usingpytest to make sure that the plugin’s logic is working as expected before further
testing in Docker and eventual deployment for use in the UI.
In practice, you would typically use a continuous integration and continuous deployment (CI/CD) pipeline to automate running these tests.
If any of the tests fail, the pipeline should stop and notify you of the failure. If all of the tests pass, the pipeline should then
continue by running the plugin in Docker as a further test.
1
-
Add the necessary
pytestdependencies. Also add a dependency on thedotenvpackage, which is used to read environment variables from a local.envfile: -
In the project’s
testdirectory, at the top of thetest_plugin.pyfile, add the following import statements to enable reading local environment variables. Also, call theload_dotenvfunction to load the environment variables from the.envfile: -
In the
test_plugin.pyfile, update the followingfrom...importstatement to find the specified classes that are defined in thesrc/plugin_sentiment_analyisfolder: -
In the root of the project’s
testdirectory, add a blank__init__.pyfile. This file is required to allow thesrcdirectory to be seen by thetestdirectory to enable the precedingfrom...importstatement to work.
2
-
In the
test_plugin.pyfile, replace thepluginfunction body with the following definition:- The
pluginfunction is a fixture that sets up the plugin’s infrastructure for thetest_plugintest function that follows. - The function reads the
VERTEXAI_CREDENTIALSenvironment variable from the.envfile that you will create next. - Instead of using the
settings.jsonfile that would normally be used by thePluginSettingsclass, the function creates a temporarysettings.jsonfile just for these tests. This temporary file contains sample values for the API Location and Credentials JSON fields that users would have otherwise specified when using the plugin in the UI.
- The
-
In the project’s root, create a file named
.env. In this file, add an environment variable namedVERTEXAI_CREDENTIALS, and set it to the single-line representation of thecredentials.jsonfile that you generated in this tutorial’s requirements:If you plan to publish this plugin’s source code to an external repository such as GitHub, do not include the.envfile in the repository, as it can expose sentitive information publicly, such as your credentials for the VertexAI API.To help prevent this file from accidentally being included in the repository, add a.enventry to a.gitignorefile in the root of the project. -
In the
test_plugin.pyfile, replace thetest_pluginfunction body with the following definition. The function body definition should now look as follows:- The
test_pluginfunction is a test case that uses thepluginfixture to run the plugin’s logic. - The function takes a list of Unstructured-formatted elements as input. The first element in the list contains the text that is used to test the plugin.
- The function then runs the plugin’s logic and checks that the output is as expected.
- The function checks that the output contains the expected values for the
toxicity,emotion, andintentfields that are returned. If the expected values match, the test passes. Otherwise, the test fails.
- The
3
Run the test
To run the test, use the following command to run If the test passes, you should see something similar to the following:
pytest though the test target in the file named Makefile in the root of the project:Run the plugin in Docker locally
In this section, you proceed with local testing by manually running the plugin in Docker locally. This allows you to more fully test the plugin’s logic in an isolated environment before you deploy it into your self-hosted UI. In practice, you would typically use a CI/CD pipeline to automate running the plugin in Docker and testing the output against an expected result. If the plugin’s output does not match the expected result, the pipeline should stop and notify you of the failure. If the plugin’s output matches the expected result, the pipeline should then continue by deploying the plugin to the staging version of your self-hosted Unstructured UI.1
In your local machine’s home directory, create a hidden file named In the preceding JSON:
.vertex-plugin-settings.json. This file contains
information that your local installation of Docker passes into the running container. In this file, add the following JSON content:- Replace
<location>with the location of the VertexAI API that you want to use, for example,us-east1. - Replace
<single-line-credentials-json>with the single-line representation of thecredentials.jsonfile that you generated in this tutorial’s requirements.
This
.vertex-plugin-settings.json file contains sensitive information and
is intended for local Docker testing only. Do not check in this file with your plugin’s source code.2
-
In the file named
Makefilein the root of the project, replace the.PHONY: run-dockerdefinition with the following definition:Therun-dockertarget builds the Docker image locally and then runs it as a container representing the plugin. - Start Docker Desktop on your local machine, if it is not already running.
-
Run the following command to call the
run-dockertarget, which builds the Docker image and then runs the resulting container, representing the plugin:You must leave this terminal window open and running while you are testing the plugin locally within the running Docker container. If you interrupt the running process here or close this terminal window, the Docker container stops running, and the plugin stops working.
3
Send a request to the listening plugin
-
In a new terminal window, use the following
curlcommand to send a request to the plugin that is running in the Docker container. The request contains some sample text that you want VertexAI to perform sentiment analysis on along with some pretend metadata in the format that is typically generated by Unstructured during processing. -
If successful, the output should look similar to the following. Notice that the
toxicity,emotion, andintentfields were added to the element’smetadatafield (JSON formatting has been applied here for better readability): - When you are done testing, you can stop the plugin by interrupting or closing the terminal window where the Docker container is running.
Deploy the plugin to your self-hosted UI
In this section, you manually deploy the successfully-tested plugin for your users to add to their workflows’ DAGs within your self-hosted Unstructured UI. This section describes how to deploy the plugin from your local development machine directly into your existing container registry. In practice, you would typically use a CI/CD pipeline to automate deploying the plugin.1
Specify the name of your container registry
In the file named To get the name of your container registry if you do not already know it, run the command that is appropriate for your container registry. For example:
Makefile in the root of the project, set the IMAGE_REGISTRY variable, replacing REGISTRY_NAME_REPLACE_ME with the name of your container registry.- For AWS ECR, run the AWS CLI command aws ecr describe-repositories with the appropriate command-line options.
- For Azure Container Registry, run the Azure CLI command az acr list with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud artifacts repositories list with the appropriate command-line options.
- For AWS ECR,
<aws_account_id>.dkr.ecr.<region>.amazonaws.com - For Azure Container Registry,
<acr-name>.azurecr.io - For GAR,
<location>-docker.pkg.dev/<project-id>/<repository-name>
2
Specify the username and password for access to your container registry
Set the following environment variables to the appropriate username and password for access to your container registry:In the preceding commands, for
PLUGIN_REGISTRY_USERNAMEPLUGIN_REGISTRY_PASSWORD
<container-registry-login-command>, run the command that is appropriate for your container registry. For example:- For AWS ECR, you do not run a separate login command here.
- For Azure Container Registry, run the Azure CLI command az acr login with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud auth configure-docker with the appropriate command-line options.
<password>, run the command that is appropriate for your container registry. For example:- For AWS ECR, run the AWS CLI command aws ecr get-login-password with the appropriate command-line options.
- For Azure Container Registry, run the Azure CLI command az acr credential show with the appropriate command-line options.
- For GAR, run the Google Cloud CLI command gcloud auth print-access-token with the appropriate command-line options.
3
Build and deploy the plugin's container
Run the following commands, one command at a time, to build the plugin’s container, deploy it to your container registry, and make the plugin
available for use in the staging version of your self-hosted Unstructured UI:
Test the plugin in your UI
1
Test the plugin in staging
- Sign in to the staging version of your self-hosted Unstructured UI.
- Create a new workflow or open an existing workflow.
- In the workflow’s visual DAG designer, click the
+icon anywhere between a Chunker node and a Destination node, and select Plugins > Sentiment Analysis. - Click the Sentiment Analysis node to open its settings pane.
- In the settings pane, enter the required settings for the plugin.
For example, enter the location of the VertexAI API, the single-string version of the
credentials.jsonfile’s contents for accessing the VertexAI API, and the model for VertexAI to use. - Run the workflow.
- When the workflow is finished, go to the destination location, and look for the
toxicity,emotion, andintentvalues that the plugin adds to themetadatafield for each element that Unstructured generated based on the source files’ contents.
2
Make any changes to the plugin
If you need to make any changes to the plugin, you can do so by returning to the previous section titled Write the plugin.Make the necessary code changes and then:
- Run plugin tests locally with pytest.
- Run the plugin in Docker locally.
- Increment the plugin’s version number. To do this, in the project’s
src/plugin_sentiment_analyis/__init__.pyfile, update the value ofversionin thePLUGIN_MANIFESTvariable, for example from0.0.1to0.0.2. Then save this file. - Deploy the plugin again to the staging version of your self-hosted Unstructured UI.
- Test the updated plugin again in staging.
3
Promote the plugin to production
After you have tested the plugin in your staging UI and are satisfied with its performance, you can promote it from staging to production.
To do this, run the following command:Of coursse, you should immediately sign in to the production version of your self-hosted Unstructured UI and test the plugin from there
there before you start advertising its availability to your users.

