Skip to content

Update index.md #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docusaurus/docs/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Start first with what your success criteria are. For example, if you run an e-co

Regardless of your application, you must start with identifying your key metrics. Then *work backwards[^1]* from that to see what impacts it from an application or infrastructure perspective. For example, if high CPU on your web servers endangers customer satisfaction, and in-turn your sales, then monitoring CPU utilization is important!

Note that cost is often an overlooked metric which impacts every business: if cost is higher than revenue, the business is at risk. If cost implications are considered early and continuously, systems can be designed to balance features, time-to-market, and efficiency. Also, you need to ensure that your costs have a measurable impact on your business and that your costs rise proportional to your profit. So, designate the cost as one of your key metrics and track it continuously. You can read [The Frugal Architect](https://www.thefrugalarchitect.com/) for further information.

#### Know your objectives, and measure them!

Having identified your important top-level KPIs, your next job is to have an automated way to track and measure them. A critical success factor is doing so in the same system that watches your workload's operations. For our e-commerce workload example this may mean:
Expand All @@ -30,6 +32,8 @@ Regardless of your metric data's original location or format, it must be maintai
![Example of a time series](../images/time_series.png)
*Figure 1: example of a time series*

Make the key metrics (especially cost) visible to engineers and relevant stakeholders, preferably through a screen in their office. This can foster sustainable practices such as tuning operations to trim costs. Also it can foster a healthy competition between teams which in turn increases productivity. You can read [The Frugal Architect](https://www.thefrugalarchitect.com/) for further information.

## Context propagation and tool selection

Tool selection is important and has a profound difference in how you operate and remediate problems. But worse than choosing a sub-optimal tool is tooling for all basic signal types. For example, collecting basic [logs](../signals/logs) from a workload, but missing transaction traces, leaves you with a gap. The result is an incohesive view of your entire application experiece. All modern approaches to observability depend on "connecting the dots" with application traces.
Expand Down Expand Up @@ -90,6 +94,6 @@ Depending on the size of your application, you may have a very large number of c

Like security, observability should not be an afterthought to your development or operations. The best practice is to put observability early in your planning, just like security, which creates a model for people to work with and reduces opaque corners of your application. Adding transaction tracing after major development work is done takes time, even with auto-instrumentation. The effort returns far greater returns! But doing so late in your development cycle may create some rework.

Rather than bolting observability in your workload later one, use it to help *accelerate* your work. Proper [logging](../signals/logs), [metric](../signals/metrics), and [trace](../signals/traces) collection enables faster application development, fosters good practices, and lays the foundation for rapid problem solving going forward.
Rather than bolting observability in your workload later one, use it to help *accelerate* your work. Proper [logging](../signals/logs), [metric](../signals/metrics), and [trace](../signals/traces) collection enables faster application development, fosters good practices, and lays the foundation for rapid problem solving going forward. Even though implementing observability requires investment, the return on investment typically far outweighs the expense.

[^1]: Amazon uses the *working backwards* process extensively as a way to obsession over our customers and their outcomes, and we highly recommend that anyone working on observability solutions work backwards from their own objectives in the same way. You can read more about *working backwards* on [Werner Vogels's blog](https://www.allthingsdistributed.com/2006/11/working_backwards.html).
[^1]: Amazon uses the *working backwards* process extensively as a way to obsession over our customers and their outcomes, and we highly recommend that anyone working on observability solutions work backwards from their own objectives in the same way. You can read more about *working backwards* on [Werner Vogels's blog](https://www.allthingsdistributed.com/2006/11/working_backwards.html).