Skip to main content

Metrics references

BlueMind's Tick package is used to monitor large amounts of data (metrics). Some monitored data is raw, but other data is the result of pre-processing to provide more relevance make interpretation and analysis easier.

Every metric has its own tree structure which can contain:

  • datalocation: server name
  • host: host name or IP
  • meterType: data type
    • gauge: instant measurement
    • counter: incremental counter
    • distsum : data pair comprising a counter and a quantity, for example:
      • -lmtpd.emailSize = (number of email, total size of emails)
      • -lmtpd.emailRecipients = (number of emails, number of recipients)
    • timer: identical to distsum but the quantity is always expressed in nanoseconds
  • status: depending on the type of data, this status may be ok/failed (e.g. request successful/failed), success/failure (e.g. authentication successful/failed), etc.

Common data

As a rule, metrics are grouped by component.

JVM

There are JVM metrics for every JVM component:

  • bm-<component>.hprof : the number of hprof files existing on the machine, can be used to determine if there has been a crash
  • bm-<component>.jvm: all information concerning jvm for this component (current memory consumption, maximum, etc.)

Heartbeat

In each component with interactions with the core, you will find the following metrics which are used to make sure that the component is receiving the core's health data:

Metric NameTypeContentAdditional Information
heartbeat.receiver.ageGaugeage of the last heartbeat receivedThe time between 2 heartbeats.
The core is supposed to send its health information every 4 seconds. Durations exceeding this, or exceeding 8 seconds, may indicate some issue
heartbeat.receiver.failuresCounternumber of reception failures
heartbeat.receiver.latencyGaugereception time heartbeatTime between the heartbeat being sent by the core and it being received by the component.
heartbeat.receiver.latencyMaxGaugemaximum heartbeat delivery time
heartbeat.receiver.receivedCounternumber of receptions OK

Hazelcast

The servers members of the hazelcast cluster comprise the following metric:

Metric NameTypeContentAdditional Information
cluster.membersGaugeThe value of this metric must be '3'

Metrics

Metric NameTypeContentAdditional Information
agent.metricsGatheredCounternumber of metrics collected by the agentthis metric is mainly used to check that the agent is still working: the absence of data indicates that the agent is no longer collecting data, and therefore no longer working
agent.vmware*agent host server dataThe agent is enabled only if vmware tools are detected on the BlueMind host servers. In this case, the "vSphere Guest SDK" metrics are extracted and historized.
These metrics are used to diagnose issues with BlueMind's virtualization on vmware.
bluemind.cluster
bluemind.cluster.partitions
bm-coreMain BlueMind Engine
callsCountCounternumber of calls received by the core
dirVersionGauge
directory.cluster.eventsCounter
handlingDurationTimerrequest handling time
heartbeat.broadcastCounter
heartbeat.maxPeriodGauge
heartbeat.periodGauge
bm-easMobile Connection Service
executionTimeTimer
responseSizeDistSum
activeConnectionsGaugenumber of active connections
connectionCountCounter
deliveriesCounter
emailRecipientsDistSumnumber of recipients per email
emailSizeDistSumsize of messages
sessionDurationTimersession length
traffic.transportLatencyTimer
bm-milterAnalysis and Modification of Emails at SMTP Level
connectionsCountCounter
sessionDurationTimer
traffic.classCounter
traffic.sizeCounter
bm-webserverWeb Application Server
appCache.requestTimeTimer
appCache.requestsCounter
ftlTemplates.requestsCounternumber of display requests generated by the webserver
staticFile.requestsCounternumber of static page display requests
bm-ysnpData Validation Service
authCountCounternumber of requests handled- ok statuses: confirmed requests (e.g. authentications accepted for a username/password entered by a user)
- failed statuses: rejected validations (e.g. failed authentications due to a wrong password)
Other
cpuprocessor usage dataused to monitor usage and processor distribution
diskdisk handling spaceused to monitor disk usage space used/free/total/etc. by disk, partition, path, etc.
diskionumber of bites written/read in real timeused to see whether the disk is working properly or excessively
elasticsearch*ElasticSearch datafor more information and details about ES metrics, please refer to the dedicated documentation
influxdb*metrics storage database data
kapacitor*data concerning the tool itself
kernel
kernel_vmstat
mem
memcached
net
netstat
nginx
phpfpm
postfix_queue
postgresqlBlueMind database information
processes
swap
syslog
system