Datadog

Datadog

Give your observability agents Datadog-specific understanding of infrastructure metrics

Datadog Observability

Datadog distinguishes itself in the observability landscape through its unified approach to telemetry collection, combining metrics, traces, and logs within a single platform that emphasizes agent-based data collection and pre-built integrations. The Datadog Agent serves as the primary data collection mechanism, deployed across hosts to automatically discover and monitor services while providing granular control over metric collection intervals and cardinality. What makes Datadog's metrics architecture particularly powerful is its tagging infrastructure, which enables dimensional querying across complex, multi-cloud environments. The platform ingests custom metrics alongside out-of-the-box performance indicators, applying intelligent aggregation to manage scale while preserving the ability to drill down into individual time series when troubleshooting incidents.

For SRE workflows, Datadog excels through its extensive integration ecosystem covering databases (MongoDB, PostgreSQL, MySQL, Redis), streaming platforms (Apache Kafka), data warehouses (Snowflake, Databricks), search engines (Elasticsearch), and cloud-native services (AWS Lambda, GCP Cloud Run, AWS RDS). These integrations provide immediate visibility into query performance, connection pool utilization, replication lag, and resource saturation metrics critical for maintaining service level objectives. Datadog's synthetic monitoring and alerting capabilities integrate seamlessly with these infrastructure metrics, enabling SREs to establish proactive monitoring patterns that catch degradation before customer impact.

Within the observability ecosystem, Datadog positions itself as an all-in-one commercial platform competing with vendors like New Relic and Dynatrace, while also contending with open-source alternatives like Prometheus and Grafana stacks. Unlike Prometheus's pull-based model, Datadog's push-based agent architecture simplifies deployment in dynamic environments but creates vendor lock-in through proprietary metric storage. The platform's strength lies in its low barrier to entry and comprehensive feature set spanning APM, RUM, security monitoring, and infrastructure observability, making it particularly attractive for organizations seeking consolidated tooling. However, the cost model based on host count, custom metrics volume, and data retention can become prohibitive at scale compared to self-hosted solutions, requiring careful capacity planning and metric filtering strategies to control expenses.

INFRASTRUCTURE PRODUCTS ON DATADOG

Temporal
Orchestration
Varnish
Cache
Istio
Service mesh
Milvus
Vector database
ArgoCD
CI/CD
Elasticsearch
Search/Analytics
HAProxy
Web/Proxy
Cilium
Networking
Grafana
Observability
MySQL
Database
NGINX
Web/Proxy
PostgreSQL
Database
Apache Pulsar
Message queue
Apache Spark
Data platform
etcd
Distributed computing
Ray
Distributed computing
CoreDNS
Networking
Apache Kafka
Message queue
Presto
Data platform
Weaviate
Vector database
Linkerd
Service mesh
Apache Flink
Data platform
Express
Web Framework
MariaDB
Database
OpenStack
Virtualization
Qdrant
Vector database
Apache Airflow
Orchestration
Redis
Cache
ClickHouse
Database
Docker
Container/K8s
Apache Tomcat
Web/Proxy
Supabase
Database
Apache HTTP Server
Web/Proxy
Apache Solr
Search/Analytics
Ceph
Storage
RabbitMQ
Message queue
Jenkins
CI/CD
Trino
Data platform
vLLM
LLM Ops
Hadoop HDFS
Storage
Cassandra
Database
Apache ZooKeeper
Orchestration
Traefik
Web/Proxy
Kubernetes
Container/K8s
Kong Gateway
Web/Proxy
Celery
Message queue
Django
Web Framework
Tailscale
Networking
BentoML
LLM Ops
Luigi
Data engineering
Prefect
Orchestration
LlamaIndex
LLM Ops
Flask
Web Framework
MLflow
LLM Ops
LangChain
LLM Ops
dbt
Data engineering
FastAPI
Web Framework
Gunicorn
Web Framework
Langfuse
LLM Ops
Envoy
Service mesh
OpenSearch
Search/Analytics
Dramatiq
Message queue
Chroma
Vector database
CrewAI
Orchestration