Stand with Ukraine flag
Pricing Try it now
Edge
Getting Started
Devices Library Installation Architecture API FAQ
On this page

Troubleshooting

Troubleshooting instruments and tips

MessagePack Processing Log

To enable logging for the slowest and most frequently called rule-nodes, update your logging configuration with the following logger:

1
<logger name="org.thingsboard.server.service.queue.TbMsgPackProcessingContext" level="DEBUG" />

After this, you can find the following messages in your logs:

1
2
3
4
5
6
7
8
9
2021-03-24 17:01:21,023 [tb-rule-engine-consumer-24-thread-3] DEBUG o.t.s.s.q.TbMsgPackProcessingContext - Top Rule Nodes by max execution time:
2021-03-24 17:01:21,024 [tb-rule-engine-consumer-24-thread-3] DEBUG o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f740670-8cc0-11eb-bcd9-d343878c0c7f] max execution time: 1102. [RuleChain: Thermostat|RuleNode: Device Profile Node(3f740670-8cc0-11eb-bcd9-d343878c0c7f)]
2021-03-24 17:01:21,024 [tb-rule-engine-consumer-24-thread-3] DEBUG o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f6debf0-8cc0-11eb-bcd9-d343878c0c7f] max execution time: 1. [RuleChain: Thermostat|RuleNode: Message Type Switch(3f6debf0-8cc0-11eb-bcd9-d343878c0c7f)]
2021-03-24 17:01:21,024 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - Top Rule Nodes by avg execution time:
2021-03-24 17:01:21,024 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f740670-8cc0-11eb-bcd9-d343878c0c7f] avg execution time: 604.0. [RuleChain: Thermostat|RuleNode: Device Profile Node(3f740670-8cc0-11eb-bcd9-d343878c0c7f)]
2021-03-24 17:01:21,025 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f6debf0-8cc0-11eb-bcd9-d343878c0c7f] avg execution time: 1.0. [RuleChain: Thermostat|RuleNode: Message Type Switch(3f6debf0-8cc0-11eb-bcd9-d343878c0c7f)]
2021-03-24 17:01:21,025 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - Top Rule Nodes by execution count:
2021-03-24 17:01:21,025 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f740670-8cc0-11eb-bcd9-d343878c0c7f] execution count: 2. [RuleChain: Thermostat|RuleNode: Device Profile Node(3f740670-8cc0-11eb-bcd9-d343878c0c7f)]
2021-03-24 17:01:21,028 [tb-rule-engine-consumer-24-thread-3] INFO  o.t.s.s.q.TbMsgPackProcessingContext - [Main][3f6debf0-8cc0-11eb-bcd9-d343878c0c7f] execution count: 1. [RuleChain: Thermostat|RuleNode: Message Type Switch(3f6debf0-8cc0-11eb-bcd9-d343878c0c7f)]

Logs

Read logs

Regardless of the deployment type, ThingsBoard Edge logs are stored in the following directory:

1
/var/log/tb-edge

Different deployment tools provide different ways to view logs:

View last logs in runtime:

1
tail -f /var/log/tb-edge/tb-edge.log

You can use grep command to show only the output with desired string in it. For example, you can use the following command in order to check if there are any errors on the service side:

1
cat /var/log/tb-edge/tb-edge.log | grep ERROR

View last logs in runtime:

1
docker compose logs -f tb-edge
Doc info icon

If you still rely on Docker Compose as docker-compose (with a hyphen) execute next command:

docker-compose logs -f tb-edge

You can use grep command to show only the output with desired string in it. For example, you can use the following command in order to check if there are any errors on the backend side:

1
docker compose logs tb-edge | grep ERROR
Doc info icon

If you still rely on Docker Compose as docker-compose (with a hyphen) execute next command:

docker-compose logs tb-edge | grep ERROR

Tip: you can redirect logs to file and then analyze with any text editor:

1
docker compose logs -f tb-edge > tb-edge.log
Doc info icon

If you still rely on Docker Compose as docker-compose (with a hyphen) execute next command:

docker-compose logs -f tb-edge > tb-edge.log

Note: you can always log into the ThingsBoard Edge container and view logs there:

1
2
docker ps
docker exec -it NAME_OF_THE_CONTAINER bash

Enable specific logs

ThingsBoard provides the ability to enable/disable logging for specific parts of the system, depending on the information you need for troubleshooting.

You can do this by modifying the logback.xml file. Like the logs themselves, the file is stored in the following directory:

1
/usr/share/tb-edge/conf

Here’s an example of the logback.xml configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<!DOCTYPE configuration>
<configuration scan="true" scanPeriod="10 seconds">

    <appender name="fileLogAppender"
              class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>/var/log/tb-edge/tb-edge.log</file>
        <rollingPolicy
                class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>/var/log/tb-edge/tb-edge.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>100MB</maxFileSize>
            <maxHistory>30</maxHistory>
            <totalSizeCap>3GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>%d{ISO8601} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>

    <logger name="org.thingsboard.server" level="INFO" />
    <logger name="org.thingsboard.js.api" level="TRACE" />
    <logger name="com.microsoft.azure.servicebus.primitives.CoreMessageReceiver" level="OFF" />

    <root level="INFO">
        <appender-ref ref="fileLogAppender"/>
    </root>
</configuration>

The most useful for the troubleshooting parts of the config files are loggers. They allow you to enable/disable logging for a specific class or group of classes.

In the example above, the default logging level is set to INFO (meaning that logs will contain only general information, warnings and errors). However, for the org.thingsboard.js.api package we enabled the most detailed level of logging by setting it to TRACE.

It’s also possible to completely disable logging for certain parts of the system. In the example above, we did this to the com.microsoft.azure.servicebus.primitives.CoreMessageReceiver class by setting the log-level to OFF.

To enable/disable logging for a specific part of the system, you need to add the appropriate </logger> configuration, and wait up to 10 seconds.

Different deployment tools provide different ways to update logs:

For standalone deployment you need to update /usr/share/tb-edge/conf/logback.xml in order to change logging configuration.

For docker-compose deployment we are mapping /config folder to your local system (./tb-edge/conf folder). So in order to change logging configuration you need to update ./tb-edge/conf/logback.xml file.

Metrics

You can enable Prometheus metrics by setting the following environment variables in the configuration file:

  • set METRICS_ENABLED to true
  • set METRICS_ENDPOINTS_EXPOSE to prometheus

These metrics are exposed at the path: https://<yourhostname>/actuator/prometheus which can be scraped by prometheus (no authentication is required).

Prometheus metrics

Some internal state metrics can be exposed by the Spring Boot Actuator using Prometheus.

Here is the list of stats that ThingsBoard pushes to Prometheus:

tb-edge metrics:

  • attributes_queue_${index_of_queue} (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent attribute writes to the database.
    Note that several queues (threads) are used to persist attributes for maximum performance.
  • ruleEngine_${name_of_queue} (statsNames - totalMsgs, failedMsgs, successfulMsgs, tmpFailed, failedIterations, successfulIterations, timeoutMsgs, tmpTimeout): The stats that represent message processing in the Rule Engine. They are persisted for each queue (e.g. Main, HighPriority, SequentialByOriginator etc). Descriptions of some metrics:
    • tmpFailed: The number of messages that failed and got reprocessed later.
    • tmpTimeout: The number of messages that timed out and got reprocessed later.
    • timeoutMsgs: The number of messages that timed out and were discarded.
    • failedIterations: The iterations of processing message packs where at least one message wasn’t processed successfully.
  • ruleEngine_${name_of_queue}_seconds (for each present tenantId): The stats that represent the time it took to process messages in different queues.
  • core (statsNames - totalMsgs, toDevRpc, coreNfs, sessionEvents, subInfo, subToAttr, subToRpc, deviceState, getAttr, claimDevice, subMsgs): The stats that represent the processing of internal system messages. Descriptions of some metrics:
    • toDevRpc: The number of processed RPC responses from Transport services.
    • sessionEvents: The number of session events from Transport services.
    • subInfo: The number of subscription infos from Transport services.
    • subToAttr: The number of subscribes to attribute updates from Transport services.
    • subToRpc: The number of subscribes to RPC from Transport services.
    • getAttr: The number of ‘get attributes’ requests from Transport services.
    • claimDevice: The number of device claims from Transport services.
    • deviceState: The number of processed changes to Device State.
    • subMsgs: The number of processed subscriptions.
    • coreNfs: The number of processed specific ‘system’ messages.
  • jsInvoke (statsNames - requests, responses, failures): The stats that represent the number of total, successful and failed requests to the JS executors.
  • attributes_cache (results - hit, miss): The stats that represent the number of attribute requests that went to the cache.

transport metrics:

  • transport (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the number of requests received by Transport from Core.
  • ruleEngine_producer (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the number of messages pushed from Transport to the Rule Engine.
  • core_producer< (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the number of messages pushed from Transport to the ThingsBoard node device actor.
  • transport_producer (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the number of requests from Transport to the Core.

PostgreSQL-specific metrics:

  • ts_latest_queue_${index_of_queue} (statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the latest telemetry writes to the database.
    Note that multiple queues (threads) are used to ensure maximum performance.
  • ts_queue_${index_of_queue}(statsNames - totalMsgs, failedMsgs, successfulMsgs): The stats that represent the telemetry writes to the database.
    Note that multiple queues (threads) are used to ensure maximum performance.

Monitoring message statistics

Available since TB Edge version 4.2

To diagnose and resolve issues with message delivery between the Cloud and Edge, you can monitor the state of uplink (Edge → Cloud) and downlink (Cloud → Edge) message flows.

  • Download the preconfigured Edge dashboard.
  • Import the dashboard to your Cloud:
    • Go to the Dashboards section.
    • Click the ”+” button, select the “Import dashboard” option and browse for the .json file on your computer. Click the “Import” button to proceed.

Dashboard overview

Main widgets

  • Entities table (Edges): The widget displays the list of connected Edge instances and includes interactive controls, and links to deeper views.
  • Edge quick overview: The widget displays a hierarchical snapshot of key components synced from Cloud to each Edge (Assets, Devices, Entity Views, Dashboards, and Rule Chains)
  • Map: Visualizes the geographical location of Edge nodes.
  • Message flow widgets (Uplink and Downlink): The time-series widgets detect message buildup or delivery issues from Cloud to Edge, as well as whether there are communication delays or data loss at the Edge.

Edge Details view

When you click on a specific Edge instance, the dashboard opens a detailed view that includes:

  • HTML card: You can fill in the card with any information related to the Edge (e.g., contact details, software version, or current alarm status)
  • Local alarms: The widget tracks recent alarms (e.g., critical events or device failures) originating from this Edge.
  • Uplinks/Downlinks time-series graphs: The message flow widget filtered specifically for the selected Edge.
  • Entities table (Devices): The widget lists all devices connected to this Edge instance.

The telemetry keys for statistics monitoring

ThingsBoard Edge exposes a set of telemetry keys that allow you to monitor message statistics between Edge and Cloud.

  • uplinkMsgsAdded: The number of messages added to the queue.
  • uplinkMsgsPushed: The number of messages successfully sent to the Cloud.
  • uplinkMsgsPermanentlyFailed: The number of permanently failed messages.
  • uplinkMsgsTmpFailed: The number of temporarily failed messages (e.g., due to network issues).
  • uplinkMsgsLag: The number of messages remaining in the queue (lag).
  • downlinkMsgsAdded: The number of messages added to the queue.
  • downlinkMsgsPushed: The number of messages successfully sent to the Cloud.
  • downlinkMsgsPermanentlyFailed: The number of permanently failed messages.
  • downlinkMsgsTmpFailed: The number of temporarily failed messages (e.g., due to network issues).
  • downlinkMsgsLag: The number of messages remaining in the queue (lag).

Getting help

If your problem isn’t answered by any of the guides above, feel free to contact ThingsBoard team.

Contact us