Skip to main content

Processing Images on the Edge

If your account includes access to edge models, you can download and install them on your edge devices. This allows you to run Groundlight's ML models locally on your edge devices, reducing latency and increasing throughput. Additionally, inference requests handled on the edge are not counted towards your account's usage limits.

This is achieved through a proxy service called the edge-endpoint, a lightweight, open-source service that runs on your edge devices. The edge-endpoint is responsible for downloading and running models and communicating with the Groundlight cloud service. You can find the source code and documentation for the edge-endpoint on GitHub.

How the Edge Endpoint Works

The edge-endpoint is a proxy service that runs on your edge devices. It intercepts requests and responses between your application and the Groundlight cloud service, enabling you to run Groundlight's ML models locally on your edge devices.

When your application sends an image query to the Groundlight cloud service, the edge-endpoint intercepts the request and downloads the relevant edge-sized model from the cloud. It then runs the model locally on the edge device and returns the result to your application. By default, it will return answers without escalating to the cloud if the edge model answers above the specified confidence threshold. Otherwise, it will escalate to the cloud for a more confident answer. This process also allows Groundlight to learn from examples that are challenging for the edge model. Once a new edge model is trained to handle such examples, it will automatically be downloaded to the edge device for future queries.

The edge-endpoint operates as a set of containers on an "edge device," which can be an NVIDIA Jetson device, a rack-mounted server, or even a Raspberry Pi. The main container is the edge-endpoint proxy service, which handles requests and manages other containers, such as the inferencemodel containers responsible for loading and running the ML models.

Installing and Running the Edge Endpoint

To set up an edge-endpoint manually, please refer to the deploy README.

Groundlight also provides managed edge-endpoint servers. Management is performed via Balena. To received a managed edge-endpoint, please contact us.

Using the Edge Endpoint

To utilize the edge-endpoint, set the Groundlight SDK to use the edge-endpoint's URL instead of the cloud endpoint. Your application logic can remain unchanged and will work seamlessly with the Groundlight edge-endpoint. This setup allows some ML responses to be returned much faster, locally.

Note that image queries processed at the edge-endpoint will not appear on the Groundlight cloud dashboard unless specifically configured. In such cases, the edge prediction will not be reflected in the cloud image query. Additional documentation and configuration options are available in the edge-endpoint repository.

To set the Groundlight Python SDK to submit requests to your edge-endpoint proxy server, you can either pass the endpoint URL to the Groundlight constructor like this:

from groundlight import Groundlight
gl = Groundlight(endpoint="http://localhost:30101")

or set the GROUNDLIGHT_ENDPOINT environment variable like:

export GROUNDLIGHT_ENDPOINT=http://localhost:30101
python your_app.py
tip

In the above example, the edge-endpoint is running on the same machine as the application, so the endpoint URL is http://localhost:30101. If the edge-endpoint is running on a different machine, you should replace localhost with the IP address or hostname of the machine running the edge-endpoint.

Edge Endpoint performance

We have benchmarked the edge-endpoint handling 500 requests/sec at a latency of less than 50ms on an off-the-shelf Katana 15 B13VGK-1007US laptop (Intel® Core™ i9-13900H CPU, NVIDIA® GeForce RTX™ 4070 Laptop GPU, 32GB DDR5 5200MHz RAM) running Ubuntu 20.04.

The following graphs show the throughput and latency of the edge-endpoint running on the Katana 15 laptop. As time progresses along the x-axis, the benchmark script ramps up the number of requests per second from 1 to 500 (and the number of clients submitting requests from 1 to 60). The y-axes shows the throughput in requests per second and the latency in seconds.

edge-endpoint throughput

edge-endpoint latency

The edge-endpoint is designed to be lightweight and efficient, and can be run on a variety of edge devices, including NVIDIA Jetson devices, Raspberry Pi, and other ARM- and x86-based devices.