OpenTelemetry#

Background#

Since version 10.0.0 PyTango provides support for distributed tracing and logging via the OpenTelemetry framework. You can read all about the concepts on their website.

You will need a collector to receive the traces and/or logs from your application. This could either be one running locally, or on a remote server. For configuration, the important thing is the collector’s endpoint URL and protocol.

E.g., if you run the Signoz standalone demo, there will be a collector running locally for gRPC and HTTP traffic. Signoz will also provide a website for viewing the telemetry data.

Alternatively, ask your IT infrastructure team if they already have an OpenTelemetry-compatible backend running, what the configuration details are for traces and logs, and how to view the data.

Warning

Emitting telemetry from a large number of device servers and clients can generate a high load on the backend receiving this data. There is also small impact on the Tango servers and clients that use this feature. Be careful when enabling this feature, and monitor the performance impact. See the benchmarks.

How to check if your PyTango installation supports telemetry#

As a first step, you need at least version 10.0.0, and both PyTango and cppTango must be compiled with telemetry support (i.e., the cppTango CMake compiler option TANGO_USE_TELEMETRY was enabled):

$ python -c "import tango; print(tango.__version__)"
10.0.0

$ python -c "import tango; print(tango.constants.TELEMETRY_SUPPORTED)"
True

See the PyTango news page for which versions of PyTango are packaged with telemetry support compiled in.

The global enable for OpenTelemetry in Tango is provided by the environment variable TANGO_TELEMETRY_ENABLE. Set it to on to enable telemetry by default for a new client or device server process.

Next, you need the OpenTelemetry Python dependencies installed. You can check that they are available by attempting to enable telemetry, either via the environment or at runtime from a client.

$ TANGO_TELEMETRY_ENABLE=on python -c "import tango"

If the required Python packages are missing and telemetry is requested, you may see a warning like:

$ TANGO_TELEMETRY_ENABLE=on python -c "import tango; dp = tango.DeviceProxy('sys/tg_test/1'); dp.ping()"
/path/to/python/lib/python3.10/site-packages/tango/_telemetry.py:729:
PyTangoUserWarning: OpenTelemetry SDK packages are not available:
telemetry can be enabled in cppTango, but Python telemetry spans will not be emitted.
Install opentelemetry-sdk and an OTLP exporter package to emit spans.

There are two cases to distinguish:

  • If the OpenTelemetry API packages are missing, PyTango cannot propagate trace context and cannot emit Python spans.

  • If the OpenTelemetry SDK or exporter packages are missing, cppTango telemetry can still be enabled, but Python spans are not emitted.

Install the packages you need, for example pip install "pytango[telemetry]".

How to run a device server that emits telemetry#

The main environment variables are:

  • TANGO_TELEMETRY_ENABLE: global default enable/disable switch.

  • TANGO_TELEMETRY_TRACING_EXPORTERS: comma-separated tracing exporters.

  • TANGO_TELEMETRY_TRACING_ENDPOINTS: comma-separated tracing endpoints.

  • TANGO_TELEMETRY_LOGGING_EXPORTERS: comma-separated logging exporters.

  • TANGO_TELEMETRY_LOGGING_ENDPOINTS: comma-separated logging endpoints.

  • TANGO_TELEMETRY_TYPES: comma-separated telemetry types such as tracing and logging. If unset, it defaults to tracing,logging.

  • TANGO_TELEMETRY_TOPICS: comma-separated telemetry topics.

The old names are no longer usable. If they are set, PyTango emits a warning, but ignores them for configuration:

  • TANGO_TELEMETRY_TRACES_EXPORTER -> TANGO_TELEMETRY_TRACING_EXPORTERS

  • TANGO_TELEMETRY_TRACES_ENDPOINT -> TANGO_TELEMETRY_TRACING_ENDPOINTS

  • TANGO_TELEMETRY_LOGS_EXPORTER -> TANGO_TELEMETRY_LOGGING_EXPORTERS

  • TANGO_TELEMETRY_LOGS_ENDPOINT -> TANGO_TELEMETRY_LOGGING_ENDPOINTS

Assuming you have a traces collector using the HTTPS protocol listening at URL https://traces.my-institute.org:4319/v1/traces, and a logs collector, also using HTTPS, at URL https://logs.my-institute.org:443/otlp/v1/logs, you can set your environment up as follows:

$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TRACING_EXPORTERS=http
$ export TANGO_TELEMETRY_TRACING_ENDPOINTS=https://traces.my-institute.org:4319/v1/traces
$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=http
$ export TANGO_TELEMETRY_LOGGING_ENDPOINTS=https://logs.my-institute.org:443/otlp/v1/logs
$ export TANGO_TELEMETRY_TYPES=tracing,logging
$ export TANGO_TELEMETRY_TOPICS=user

And then launch your application, as normal.

$ python MySuperDS.py instance

Another example is using a local collector, with the gRPC protocol:

$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TRACING_EXPORTERS=grpc
$ export TANGO_TELEMETRY_TRACING_ENDPOINTS=grpc://localhost:4317
$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=grpc
$ export TANGO_TELEMETRY_LOGGING_ENDPOINTS=grpc://localhost:4317
$ export TANGO_TELEMETRY_TYPES=tracing,logging
$ export TANGO_TELEMETRY_TOPICS=all

For Tango, when using the gRPC protocol, the URLs must start with grpc://, even though your backend might suggest an http:// endpoint for the gRPC traffic.

If you want to emit traces, but disable logging via the telemetry backend, this can be done by setting the exporter to none. This may be useful if your logs are handled by a different system, or your telemetry backend doesn’t support logs. This can be done as follows:

$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=none

Alternatively, you could leave the loggging exporter and endpoint set, but exclude logging from the telemetry types:

$ export TANGO_TELEMETRY_TYPES=tracing

Note

The environment variables can be set in a configuration file, similar to TANGO_HOST. See the reference documentation.

How to change device telemetry at runtime#

Runtime configuration is driven by cppTango. PyTango follows those changes by recreating or replacing its Python tracer state when needed.

Note

Runtime configuration was first added in version 10.3.0.

From a client, use the telemetry convenience methods on DeviceProxy:

import tango
from tango.telemetry import TelemetryEndpoint, TelemetryExporter

proxy = tango.DeviceProxy("sys/tg_test/1")

proxy.start_telemetry()
proxy.set_telemetry_tracing(True)
proxy.set_telemetry_topics(["user"])
proxy.set_telemetry_tracing_endpoints(
    [
        TelemetryEndpoint(
            TelemetryExporter.HTTP,
            "https://traces.example.org:4319/v1/traces",
        )
    ]
)
proxy.stop_telemetry()

The corresponding admin device commands are implemented in cppTango. When they change a device’s telemetry state, cppTango calls the virtual DeviceImpl methods and PyTango refreshes the Python tracer provider or switches to no-op tracing for future spans.

For high-level Python devices, runtime telemetry reconfiguration is controlled through the admin device and DeviceProxy. Device does not expose telemetry-related runtime set/get telemetry methods as part of its public API.

Changing logging endpoints at runtime uses the same pattern:

proxy.set_telemetry_logging(True)
proxy.set_telemetry_logging_endpoints(
    [
        TelemetryEndpoint(
            TelemetryExporter.GRPC,
            "grpc://collector.example.org:4317",
        )
    ]
)

Runtime changes affect future spans and log export configuration. In-flight spans are not rewritten.

How to persist device telemetry configuration#

If you want telemetry settings to survive a device or device-server restart, store them as device properties in the Tango database.

The supported telemetry device properties are:

  • telemetry_enable: global enable or disable flag, usually "1" or "0".

  • telemetry_topics: list of enabled topics.

  • telemetry_types: list of enabled telemetry types, such as tracing and logging.

  • telemetry_tracing_exporters: list of tracing exporter types.

  • telemetry_tracing_endpoints: list of tracing endpoints.

  • telemetry_logging_exporters: list of logging exporter types.

  • telemetry_logging_endpoints: list of logging endpoints.

For the exporter and endpoint properties, the lists are positional: the 0th endpoint belongs to the 0th exporter, the 1st endpoint belongs to the 1st exporter, and so on.

You can write these properties using Database:

import tango

db = tango.Database()
device_name = "sys/tg_test/1"

db.put_device_property(
    device_name,
    {
        "telemetry_enable": "1",
        "telemetry_topics": ["user"],
        "telemetry_types": ["tracing", "logging"],
        "telemetry_tracing_exporters": ["http"],
        "telemetry_tracing_endpoints": [
            "https://traces.example.org:4319/v1/traces"
        ],
        "telemetry_logging_exporters": ["grpc"],
        "telemetry_logging_endpoints": [
            "grpc://collector.example.org:4317"
        ],
    },
)

These properties are read by cppTango before init_device(). That makes them the persistent middle layer in the telemetry configuration order:

  1. Environment variables provide process-wide defaults.

  2. Device properties override those defaults for a specific device.

  3. Admin device commands override the current runtime state until restart.

So if you set TANGO_TELEMETRY_ENABLE=off for the process but store telemetry_enable = "1" for one device, that device starts with telemetry enabled. If you later call proxy.stop_telemetry(), that runtime change applies immediately, but after restart the device returns to the stored property-based configuration.

PyTango and cppTango do not write these properties implicitly. Use runtime commands when you want a temporary change, and database properties when you want the configuration to persist.

How to use telemetry topics#

PyTango currently documents two practical topic values:

  • user: emit the user-level spans for device methods.

  • all: emit the user-level spans and the current kernel-level tracing.

Example:

proxy.set_telemetry_topics(["user"])

Use ["all"] only when you need the extra kernel detail, since it produces more data. Additional topic names may exist in cppTango, but PyTango does not yet rely on them as stable user-facing behaviour.

How to include PyTango kernel spans#

By default, PyTango does not emit kernel-level Python spans, even when TANGO_TELEMETRY_TOPICS=all is set. These spans can add significant overhead and are normally only useful when investigating PyTango internals.

To include them, set PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS to a truthy value before starting the device server:

$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TOPICS=all
$ export PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS=on
$ python MySuperDS.py instance

Leave PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS unset to keep the default behaviour and emit only user-level PyTango spans.

How to run a client that emits telemetry#

The environment variables mentioned above also apply to clients. Although clients won’t emit logs to the Tango Logging System. Simply using the client classes, DeviceProxy, AttributeProxy, Group, and Database, in such an environment will emit telemetry.

The tracer instance (opentelemetry.trace.Tracer) used for client requests depends on the context. If it is within a device method for an attribute, command, device initialisation or shutdown, then the device’s tracer is used. For all other cases the client tracer (singleton) is used.

By default, the OpenTelemetry service name associated with client traces from PyTango is pytango.client. This is very generic, so it is useful to customise this for your own application. This can be done by setting the environment variable PYTANGO_TELEMETRY_CLIENT_SERVICE_NAME to the string you prefer. If the client tracer has not been created yet, PyTango uses the current value when telemetry is first used. If runtime configuration invalidates the client tracer provider, the next traced client call recreates it from the latest config.

It could be set programmatically, if the actual environment should be ignored:

import os
import tango

if __name__ == "__main__":
    os.environ["PYTANGO_TELEMETRY_CLIENT_SERVICE_NAME"] = "my.client"
    dp = tango.DeviceProxy("sys/tg_test/1")
    dp.ping()

How to change client telemetry at runtime#

Note

Currently, there is no way to change the telemetry settings for clients at runtime. For applications or scripts that use DeviceProxy or other client classes, the client tracer is created based on the environment variables at startup.

How to add process information to the telemetry traces#

The OpenTelemetry Python library has many environmental variables for configuration. One of them (at least at version 1.25.0) allows additional information about the process to be added to each trace. This is done by setting the environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=process.

Note that cppTango uses the C++ OpenTelemetry library, which has different behaviour and configuration.

How to add custom information to device traces#

Devices can be customised in two different ways. Firstly, common information can be added to all traces. Secondly, specific information can be added in custom spans when performing tasks within the device.

Adding common information to all traces#

To add generic resource information, the creation of tracer provider, create_telemetry_tracer_provider(), can be overridden. This method is called when the device is being initialised, before init_device, and again if runtime telemetry reconfiguration requires a new tracer provider.

from opentelemetry.trace import TracerProvider
from opentelemetry.sdk.resources import DEPLOYMENT_ENVIRONMENT
from tango.telemetry import get_telemetry_tracer_provider_factory


class Example(Device):
    def create_telemetry_tracer_provider(
        self, class_name, device_name
    ) -> TracerProvider:
        tracer_provider_factory = get_telemetry_tracer_provider_factory()
        extra_resource_attributes = {DEPLOYMENT_ENVIRONMENT: "production"}
        return tracer_provider_factory(
            class_name,
            device_name,
            extra_resource_attributes,
            endpoints=self.get_telemetry_tracing_endpoints(),
        )

Even more customisation is possible by overriding the device’s create_telemetry_tracer() method. This method is also called after the tracer provider has been created, including after runtime reconfiguration.

For more extreme cases, the factory used for all device and client tracers can be changed using set_telemetry_tracer_provider_factory().

If you replace the factory at runtime, PyTango uses the new factory the next time it recreates a device or client tracer provider because of a telemetry reconfiguration.

from tango.telemetry import (
    TelemetryEndpoint,
    TelemetryExporter,
    set_telemetry_tracer_provider_factory,
)


def my_factory(service_name, service_instance_id=None, extra_resource_attributes=None, endpoints=None):
    ...


set_telemetry_tracer_provider_factory(my_factory)

# Trigger provider recreation with the new factory.
proxy.set_telemetry_tracing_endpoints(
    [
        TelemetryEndpoint(
            TelemetryExporter.HTTP,
            "https://traces.example.org:4319/v1/traces",
        )
    ]
)

Adding specific information to a span#

Each device has its own instance of an opentelemetry.trace.Tracer. This tracer associates the device’s spans with the device’s name, and its Tango device class. The tracer instance can be accessed at runtime using get_telemetry_tracer().

For example, a partial implementation of a device is shown below with a command handler that creates a custom span. This span automatically inherits the trace context of the caller. When creating the span, it adds the configuration string as an attribute. Note that only a few simple types are allowed as attribute values (see opentelemetry.utils.types.Attributes). The example also emits an event during the span.

import json
from tango.server import Device, command


class Example(Device):

   @command
   def Configure(self, configuration_json: str) -> None:
       device_tracer = self.get_telemetry_tracer()
       with device_tracer.start_as_current_span(
           "manager.configure", attributes={"configuration": configuration_json}
       ) as span:
           span.add_event("configuration requested")
           configuration = json.loads(configuration_json)
           self._comms_library.configure(configuration)

It is not necessary to create a new span within a command handler or attribute read/write method, as PyTango has already created a span automatically. This span could be accessed as follows:

import json
from opentelemetry import trace as trace_api
from tango.server import Device, command


class Example(Device):

    @command
    def Configure(self, configuration_json: str) -> None:
        span = trace_api.get_current_span()
        span.set_attribute("configuration", configuration_json)
        span.add_event("configuration requested")
        configuration = json.loads(configuration_json)
        self._comms_library.configure(configuration)

How to manually instrument your own application#

Device servers and clients are automatically instrumented, so that they emit spans for the basic operations. However, your custom devices and applications that build on Tango can benefit from additional context. Manual instrumentation is well described in the OpenTelemetry instrumentation docs.

You can create your own custom tracer for your application. It is convenient to use the factory function from PyTango, so that you make use of the same environment variables that Tango is using to configure the tracer end point. If you do not pass any endpoints to the returned factory, it will use the current client tracing endpoints derived from the telemetry environment variables.

from opentelemetry import trace as trace_api
from tango.telemetry import get_telemetry_tracer_provider_factory

tracer_provider_factory = get_telemetry_tracer_provider_factory()
tracer_provider = tracer_provider_factory("my.app")
tracer = trace_api.get_tracer(
    instrumenting_module_name="my.app.reader",
    instrumenting_library_version=my_app.__version__,
    tracer_provider=tracer_provider,
)

Then you can create spans in any interesting functions. Consider a web application that is providing a way to read Tango device attribute values. It may be useful to add details about the requesting client to the span.

from fastapi import FastAPI, Request

app = FastAPI()

@app.get("/read_attr_value/{device_name}/{attr_name}")
def read_attr_value(device_name: str, attr_name: str, request: Request):
    with tracer.start_as_current_span(
           "my-web-proxy.read_attr_value",
           attributes={"client.address": request.client.host}
       ):
           proxy = tango.DeviceProxy(device_name)
           value = proxy.read_attribute(attr_name).value
           return {"value": value}

Note

Creating a span around a very long running task is not recommended. The span is only emitted on completion. Users viewing traces related to such a span will not get a complete picture until it completes. Also, having a huge number of child spans (100s to 1000s) will be problematic to view in typical web UIs.

The Tango logs that go to OpenTelemetry are emitted by cppTango. PyTango doesn’t expose a way to use the logging directly for client-only applications. Devices already have a standard way to emit logs. If you want your application’s logs to be emitted from Python, this is still an experimental feature in OpenTelemetry Python (as at v1.26.0). See the logs examples.

How to hide error messages when traces cannot be sent#

The traces are sent to the backend in the background. This might fail if the host is unreachable or too busy. If that happens, the error messages from the OpenTelemtry SDK are printed to stdout. For example:

Exception while exporting Span batch.
Traceback (most recent call last):
  ...
[Error] File: /Users/runner/miniforge3/conda-bld/opentelemetry-sdk_1733208709442/work/exporters/otlp/src/otlp_http_exporter.cc:145 [OTLP TRACE HTTP Exporter] ERROR: Export 6 trace span(s) error: 1

For an end-user these messages might be confusing, or a nuisance. It is possible to hide them by changing the OpenTelemetry SDK’s log level. PyTango provides an environment variable, PYTANGO_TELEMETRY_SDK_LOG_LEVEL, to do this. Set the value to fatal before starting your application to hide the error logs.

The standard Python logging levels are all options: critical, fatal, error, warning, info, debug, notset.

The name of the opentelemetry-python logger used for this may change in future, so there is a second environment variable, PYTANGO_TELEMETRY_SDK_LOGGER_NAMES, which can be set to a comma-seperated list of logger names. Defaults are used if the environment variable is empty or not set.

(As at version 1.35.0, opentelemetry-python does not support its own OTEL_LOG_LEVEL environment variable).

How to reduce the number of traces being stored#

Storing all traces from all Tango devices in your facility is probably not feasible.

One option is to only enable telemetry after a problem has occurred, and further debugging is planned. Unfortunately, it means that rare errors typically won’t be captured. PyTango 10.3.0 adds runtime control via DeviceProxy methods and the cppTango admin device commands, so a process restart is no longer required just to enable, disable, or retarget telemetry.

Another option is to have all devices emitting telemetry, but have the collector apply some filtering to reduce the number of traces that get stored. This is the concept of sampling. You may consider a probabilistic sampler, or a tail sampler, or many of the other contributed samplers.

Telemetry API summary#

The main telemetry-related methods are listed here for reference.

DeviceProxy#

Note

Runtime configuration was first added in version 10.3.0, so these methods will raise DevFailed if used with a Tango device running an older version.

High-level Device#

Further examples#

The prototyping.py file in the source repository has some further examples, including creating a custom tracer, passing the trace context to a different thread, enabling and disabling telemetry at runtime, and changing telemetry configuration while the process is running.