OpenTelemetry#
Background#
Since version 10.0.0 PyTango provides support for distributed tracing and logging via the OpenTelemetry framework. You can read all about the concepts on their website.
You will need a collector to receive the traces and/or logs from your application. This could either be one running locally, or on a remote server. For configuration, the important thing is the collector’s endpoint URL and protocol.
E.g., if you run the Signoz standalone demo, there will be a collector running locally for gRPC and HTTP traffic. Signoz will also provide a website for viewing the telemetry data.
Alternatively, ask your IT infrastructure team if they already have an OpenTelemetry-compatible backend running, what the configuration details are for traces and logs, and how to view the data.
Warning
Emitting telemetry from a large number of device servers and clients can generate a high load on the backend receiving this data. There is also small impact on the Tango servers and clients that use this feature. Be careful when enabling this feature, and monitor the performance impact. See the benchmarks.
How to check if your PyTango installation supports telemetry#
As a first step, you need at least version 10.0.0, and both PyTango and cppTango must be
compiled with telemetry support (i.e., the cppTango CMake compiler option TANGO_USE_TELEMETRY was enabled):
$ python -c "import tango; print(tango.__version__)"
10.0.0
$ python -c "import tango; print(tango.constants.TELEMETRY_SUPPORTED)"
True
See the PyTango news page for which versions of PyTango are packaged with telemetry support compiled in.
The global enable for OpenTelemetry in Tango
is provided by the environment variable TANGO_TELEMETRY_ENABLE.
Set it to on
to enable telemetry by default
for a new client or device server process.
Next, you need the OpenTelemetry Python dependencies installed. You can check that they are available by attempting to enable telemetry, either via the environment or at runtime from a client.
$ TANGO_TELEMETRY_ENABLE=on python -c "import tango"
If the required Python packages are missing and telemetry is requested, you may see a warning like:
$ TANGO_TELEMETRY_ENABLE=on python -c "import tango; dp = tango.DeviceProxy('sys/tg_test/1'); dp.ping()"
/path/to/python/lib/python3.10/site-packages/tango/_telemetry.py:729:
PyTangoUserWarning: OpenTelemetry SDK packages are not available:
telemetry can be enabled in cppTango, but Python telemetry spans will not be emitted.
Install opentelemetry-sdk and an OTLP exporter package to emit spans.
There are two cases to distinguish:
If the OpenTelemetry API packages are missing, PyTango cannot propagate trace context and cannot emit Python spans.
If the OpenTelemetry SDK or exporter packages are missing, cppTango telemetry can still be enabled, but Python spans are not emitted.
Install the packages you need,
for example pip install "pytango[telemetry]".
How to run a device server that emits telemetry#
The main environment variables are:
TANGO_TELEMETRY_ENABLE: global default enable/disable switch.TANGO_TELEMETRY_TRACING_EXPORTERS: comma-separated tracing exporters.TANGO_TELEMETRY_TRACING_ENDPOINTS: comma-separated tracing endpoints.TANGO_TELEMETRY_LOGGING_EXPORTERS: comma-separated logging exporters.TANGO_TELEMETRY_LOGGING_ENDPOINTS: comma-separated logging endpoints.TANGO_TELEMETRY_TYPES: comma-separated telemetry types such astracingandlogging. If unset, it defaults totracing,logging.TANGO_TELEMETRY_TOPICS: comma-separated telemetry topics.
The old names are no longer usable. If they are set, PyTango emits a warning, but ignores them for configuration:
TANGO_TELEMETRY_TRACES_EXPORTER->TANGO_TELEMETRY_TRACING_EXPORTERSTANGO_TELEMETRY_TRACES_ENDPOINT->TANGO_TELEMETRY_TRACING_ENDPOINTSTANGO_TELEMETRY_LOGS_EXPORTER->TANGO_TELEMETRY_LOGGING_EXPORTERSTANGO_TELEMETRY_LOGS_ENDPOINT->TANGO_TELEMETRY_LOGGING_ENDPOINTS
Assuming you have a traces collector using the HTTPS protocol
listening at URL https://traces.my-institute.org:4319/v1/traces,
and a logs collector, also using HTTPS, at URL https://logs.my-institute.org:443/otlp/v1/logs,
you can set your environment up as follows:
$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TRACING_EXPORTERS=http
$ export TANGO_TELEMETRY_TRACING_ENDPOINTS=https://traces.my-institute.org:4319/v1/traces
$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=http
$ export TANGO_TELEMETRY_LOGGING_ENDPOINTS=https://logs.my-institute.org:443/otlp/v1/logs
$ export TANGO_TELEMETRY_TYPES=tracing,logging
$ export TANGO_TELEMETRY_TOPICS=user
And then launch your application, as normal.
$ python MySuperDS.py instance
Another example is using a local collector, with the gRPC protocol:
$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TRACING_EXPORTERS=grpc
$ export TANGO_TELEMETRY_TRACING_ENDPOINTS=grpc://localhost:4317
$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=grpc
$ export TANGO_TELEMETRY_LOGGING_ENDPOINTS=grpc://localhost:4317
$ export TANGO_TELEMETRY_TYPES=tracing,logging
$ export TANGO_TELEMETRY_TOPICS=all
For Tango, when using the gRPC protocol, the URLs must start with grpc://, even though your backend might
suggest an http:// endpoint for the gRPC traffic.
If you want to emit traces, but disable logging via the telemetry backend, this can be done by setting
the exporter to none. This may be useful if your logs are handled by a different system, or your
telemetry backend doesn’t support logs. This can be done as follows:
$ export TANGO_TELEMETRY_LOGGING_EXPORTERS=none
Alternatively, you could leave the loggging exporter and endpoint set, but exclude logging from the telemetry types:
$ export TANGO_TELEMETRY_TYPES=tracing
Note
The environment variables can be set in a configuration file, similar to TANGO_HOST. See the reference documentation.
How to change device telemetry at runtime#
Runtime configuration is driven by cppTango. PyTango follows those changes by recreating or replacing its Python tracer state when needed.
Note
Runtime configuration was first added in version 10.3.0.
From a client,
use the telemetry convenience methods
on DeviceProxy:
import tango
from tango.telemetry import TelemetryEndpoint, TelemetryExporter
proxy = tango.DeviceProxy("sys/tg_test/1")
proxy.start_telemetry()
proxy.set_telemetry_tracing(True)
proxy.set_telemetry_topics(["user"])
proxy.set_telemetry_tracing_endpoints(
[
TelemetryEndpoint(
TelemetryExporter.HTTP,
"https://traces.example.org:4319/v1/traces",
)
]
)
proxy.stop_telemetry()
The corresponding admin device commands
are implemented in cppTango.
When they change a device’s telemetry state,
cppTango calls the virtual DeviceImpl methods
and PyTango refreshes the Python tracer provider
or switches to no-op tracing for future spans.
For high-level Python devices,
runtime telemetry reconfiguration is controlled through
the admin device and DeviceProxy.
Device does not expose telemetry-related
runtime set/get telemetry methods as part of its public API.
Changing logging endpoints at runtime uses the same pattern:
proxy.set_telemetry_logging(True)
proxy.set_telemetry_logging_endpoints(
[
TelemetryEndpoint(
TelemetryExporter.GRPC,
"grpc://collector.example.org:4317",
)
]
)
Runtime changes affect future spans and log export configuration. In-flight spans are not rewritten.
How to persist device telemetry configuration#
If you want telemetry settings to survive a device or device-server restart, store them as device properties in the Tango database.
The supported telemetry device properties are:
telemetry_enable: global enable or disable flag, usually"1"or"0".telemetry_topics: list of enabled topics.telemetry_types: list of enabled telemetry types, such astracingandlogging.telemetry_tracing_exporters: list of tracing exporter types.telemetry_tracing_endpoints: list of tracing endpoints.telemetry_logging_exporters: list of logging exporter types.telemetry_logging_endpoints: list of logging endpoints.
For the exporter and endpoint properties, the lists are positional: the 0th endpoint belongs to the 0th exporter, the 1st endpoint belongs to the 1st exporter, and so on.
You can write these properties
using Database:
import tango
db = tango.Database()
device_name = "sys/tg_test/1"
db.put_device_property(
device_name,
{
"telemetry_enable": "1",
"telemetry_topics": ["user"],
"telemetry_types": ["tracing", "logging"],
"telemetry_tracing_exporters": ["http"],
"telemetry_tracing_endpoints": [
"https://traces.example.org:4319/v1/traces"
],
"telemetry_logging_exporters": ["grpc"],
"telemetry_logging_endpoints": [
"grpc://collector.example.org:4317"
],
},
)
These properties are read by cppTango
before init_device().
That makes them the persistent middle layer
in the telemetry configuration order:
Environment variables provide process-wide defaults.
Device properties override those defaults for a specific device.
Admin device commands override the current runtime state until restart.
So if you set
TANGO_TELEMETRY_ENABLE=off
for the process
but store telemetry_enable = "1"
for one device,
that device starts with telemetry enabled.
If you later call
proxy.stop_telemetry(),
that runtime change applies immediately,
but after restart
the device returns to the stored property-based configuration.
PyTango and cppTango do not write these properties implicitly. Use runtime commands when you want a temporary change, and database properties when you want the configuration to persist.
How to use telemetry topics#
PyTango currently documents two practical topic values:
user: emit the user-level spans for device methods.all: emit the user-level spans and the current kernel-level tracing.
Example:
proxy.set_telemetry_topics(["user"])
Use ["all"] only when you need the extra kernel detail,
since it produces more data.
Additional topic names may exist in cppTango,
but PyTango does not yet rely on them
as stable user-facing behaviour.
How to include PyTango kernel spans#
By default,
PyTango does not emit kernel-level Python spans,
even when TANGO_TELEMETRY_TOPICS=all is set.
These spans can add significant overhead
and are normally only useful when investigating PyTango internals.
To include them,
set PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS to a truthy value
before starting the device server:
$ export TANGO_TELEMETRY_ENABLE=on
$ export TANGO_TELEMETRY_TOPICS=all
$ export PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS=on
$ python MySuperDS.py instance
Leave PYTANGO_TELEMETRY_EMIT_KERNEL_SPANS unset
to keep the default behaviour
and emit only user-level PyTango spans.
How to run a client that emits telemetry#
The environment variables mentioned above also apply to clients. Although clients won’t emit logs
to the Tango Logging System. Simply using the client classes, DeviceProxy,
AttributeProxy, Group, and Database, in such an
environment will emit telemetry.
The tracer instance (opentelemetry.trace.Tracer) used for client requests depends on the context.
If it is within a device method for an attribute, command, device initialisation or shutdown, then
the device’s tracer is used. For all other cases the client tracer (singleton) is used.
By default, the OpenTelemetry service name associated with client traces from PyTango is
pytango.client. This is very generic, so it is useful to customise this for your own application.
This can be done
by setting the environment variable PYTANGO_TELEMETRY_CLIENT_SERVICE_NAME
to the string you prefer.
If the client tracer has not been created yet,
PyTango uses the current value when telemetry is first used.
If runtime configuration invalidates the client tracer provider,
the next traced client call
recreates it from the latest config.
It could be set programmatically, if the actual environment should be ignored:
import os
import tango
if __name__ == "__main__":
os.environ["PYTANGO_TELEMETRY_CLIENT_SERVICE_NAME"] = "my.client"
dp = tango.DeviceProxy("sys/tg_test/1")
dp.ping()
How to change client telemetry at runtime#
Note
Currently,
there is no way
to change the telemetry settings
for clients at runtime.
For applications or scripts
that use DeviceProxy
or other client classes,
the client tracer is created
based on the environment variables
at startup.
How to add process information to the telemetry traces#
The OpenTelemetry Python library has many
environmental variables
for configuration.
One of them (at least at version 1.25.0) allows additional information about the process to be added
to each trace. This is done by setting the environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=process.
Note that cppTango uses the C++ OpenTelemetry library, which has different behaviour and configuration.
How to add custom information to device traces#
Devices can be customised in two different ways. Firstly, common information can be added to all traces. Secondly, specific information can be added in custom spans when performing tasks within the device.
Adding common information to all traces#
To add generic resource information, the creation of tracer provider,
create_telemetry_tracer_provider(), can be overridden.
This method is called
when the device is being initialised,
before init_device,
and again if runtime telemetry reconfiguration
requires a new tracer provider.
from opentelemetry.trace import TracerProvider
from opentelemetry.sdk.resources import DEPLOYMENT_ENVIRONMENT
from tango.telemetry import get_telemetry_tracer_provider_factory
class Example(Device):
def create_telemetry_tracer_provider(
self, class_name, device_name
) -> TracerProvider:
tracer_provider_factory = get_telemetry_tracer_provider_factory()
extra_resource_attributes = {DEPLOYMENT_ENVIRONMENT: "production"}
return tracer_provider_factory(
class_name,
device_name,
extra_resource_attributes,
endpoints=self.get_telemetry_tracing_endpoints(),
)
Even more customisation is possible
by overriding the device’s
create_telemetry_tracer() method.
This method is also called
after the tracer provider has been created,
including after runtime reconfiguration.
For more extreme cases, the factory used for all device and client tracers can be changed using
set_telemetry_tracer_provider_factory().
If you replace the factory at runtime, PyTango uses the new factory the next time it recreates a device or client tracer provider because of a telemetry reconfiguration.
from tango.telemetry import (
TelemetryEndpoint,
TelemetryExporter,
set_telemetry_tracer_provider_factory,
)
def my_factory(service_name, service_instance_id=None, extra_resource_attributes=None, endpoints=None):
...
set_telemetry_tracer_provider_factory(my_factory)
# Trigger provider recreation with the new factory.
proxy.set_telemetry_tracing_endpoints(
[
TelemetryEndpoint(
TelemetryExporter.HTTP,
"https://traces.example.org:4319/v1/traces",
)
]
)
Adding specific information to a span#
Each device has its own instance of an opentelemetry.trace.Tracer. This tracer associates the
device’s spans with the device’s name, and its Tango device class. The tracer instance can be
accessed at runtime using get_telemetry_tracer().
For example, a partial implementation of a device is shown below with a command handler that creates a custom
span. This span automatically inherits the trace context of the caller. When creating the span, it adds
the configuration string as an attribute. Note that only a few simple types are allowed as attribute values
(see opentelemetry.utils.types.Attributes). The example also emits an event during the span.
import json
from tango.server import Device, command
class Example(Device):
@command
def Configure(self, configuration_json: str) -> None:
device_tracer = self.get_telemetry_tracer()
with device_tracer.start_as_current_span(
"manager.configure", attributes={"configuration": configuration_json}
) as span:
span.add_event("configuration requested")
configuration = json.loads(configuration_json)
self._comms_library.configure(configuration)
It is not necessary to create a new span within a command handler or attribute read/write method, as PyTango has already created a span automatically. This span could be accessed as follows:
import json
from opentelemetry import trace as trace_api
from tango.server import Device, command
class Example(Device):
@command
def Configure(self, configuration_json: str) -> None:
span = trace_api.get_current_span()
span.set_attribute("configuration", configuration_json)
span.add_event("configuration requested")
configuration = json.loads(configuration_json)
self._comms_library.configure(configuration)
How to manually instrument your own application#
Device servers and clients are automatically instrumented, so that they emit spans for the basic operations. However, your custom devices and applications that build on Tango can benefit from additional context. Manual instrumentation is well described in the OpenTelemetry instrumentation docs.
You can create your own custom tracer for your application. It is convenient to use the factory function from PyTango, so that you make use of the same environment variables that Tango is using to configure the tracer end point. If you do not pass any endpoints to the returned factory, it will use the current client tracing endpoints derived from the telemetry environment variables.
from opentelemetry import trace as trace_api
from tango.telemetry import get_telemetry_tracer_provider_factory
tracer_provider_factory = get_telemetry_tracer_provider_factory()
tracer_provider = tracer_provider_factory("my.app")
tracer = trace_api.get_tracer(
instrumenting_module_name="my.app.reader",
instrumenting_library_version=my_app.__version__,
tracer_provider=tracer_provider,
)
Then you can create spans in any interesting functions. Consider a web application that is providing a way to read Tango device attribute values. It may be useful to add details about the requesting client to the span.
from fastapi import FastAPI, Request
app = FastAPI()
@app.get("/read_attr_value/{device_name}/{attr_name}")
def read_attr_value(device_name: str, attr_name: str, request: Request):
with tracer.start_as_current_span(
"my-web-proxy.read_attr_value",
attributes={"client.address": request.client.host}
):
proxy = tango.DeviceProxy(device_name)
value = proxy.read_attribute(attr_name).value
return {"value": value}
Note
Creating a span around a very long running task is not recommended. The span is only emitted on completion. Users viewing traces related to such a span will not get a complete picture until it completes. Also, having a huge number of child spans (100s to 1000s) will be problematic to view in typical web UIs.
The Tango logs that go to OpenTelemetry are emitted by cppTango. PyTango doesn’t expose a way to use the logging directly for client-only applications. Devices already have a standard way to emit logs. If you want your application’s logs to be emitted from Python, this is still an experimental feature in OpenTelemetry Python (as at v1.26.0). See the logs examples.
How to hide error messages when traces cannot be sent#
The traces are sent to the backend in the background. This might fail if the host is unreachable or too busy. If that happens, the error messages from the OpenTelemtry SDK are printed to stdout. For example:
Exception while exporting Span batch.
Traceback (most recent call last):
...
[Error] File: /Users/runner/miniforge3/conda-bld/opentelemetry-sdk_1733208709442/work/exporters/otlp/src/otlp_http_exporter.cc:145 [OTLP TRACE HTTP Exporter] ERROR: Export 6 trace span(s) error: 1
For an end-user these messages might be confusing, or a nuisance. It is possible to hide them by changing the
OpenTelemetry SDK’s log level. PyTango provides an environment variable, PYTANGO_TELEMETRY_SDK_LOG_LEVEL,
to do this. Set the value to fatal before starting your application to hide the error logs.
The standard Python logging levels are all options: critical, fatal, error, warning, info, debug, notset.
The name of the opentelemetry-python logger used for
this may change in future, so there is a second environment variable, PYTANGO_TELEMETRY_SDK_LOGGER_NAMES, which
can be set to a comma-seperated list of logger names. Defaults are used if the environment variable is empty or
not set.
(As at version 1.35.0, opentelemetry-python
does not support its own OTEL_LOG_LEVEL
environment variable).
How to reduce the number of traces being stored#
Storing all traces from all Tango devices in your facility is probably not feasible.
One option is to only enable telemetry after a problem has occurred, and further
debugging is planned. Unfortunately, it means that rare errors typically won’t be captured.
PyTango 10.3.0 adds runtime control
via DeviceProxy methods and the cppTango admin device commands,
so a process restart is no longer required
just to enable, disable, or retarget telemetry.
Another option is to have all devices emitting telemetry, but have the collector apply some filtering to reduce the number of traces that get stored. This is the concept of sampling. You may consider a probabilistic sampler, or a tail sampler, or many of the other contributed samplers.
Telemetry API summary#
The main telemetry-related methods are listed here for reference.
DeviceProxy#
Note
Runtime configuration was first added in version 10.3.0,
so these methods will raise DevFailed
if used with a Tango device running an older version.
High-level Device#
Further examples#
The prototyping.py file in the source repository has some further examples, including creating a custom tracer, passing the trace context to a different thread, enabling and disabling telemetry at runtime, and changing telemetry configuration while the process is running.