Observing systems has always meant two things: identifying what happened, and what caused it to happen. A few people also used tracing, but as a niche tool for performance analysis).

Crazed shrews that we are, we scurry around scrounging together some logs to try and reconstruct the chain of events while also digging into conf files looking for anything surprising.

We have to infer this information, then look in separate tools that know nothing of the alerts and metrics that we care about.

In OpenTelemetry, we call these attributes semantic conventions, and we try to be consistent in the way they are recorded.

Related Articles