Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 142 additions & 10 deletions docs/course/capstone.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,123 @@
# Capstone: Operate a Small Linux Service

The capstone turns the lessons into one realistic workflow.
The capstone turns the lessons into one realistic workflow. Treat it like a
small production change: plan first, work in a disposable environment, capture
evidence as you go, and leave a handoff that another operator could use.

## Scenario

You joined a small team that needs a Linux VM to host an internal status page. Your job is to build it, secure it, monitor it, document it, and prove it can recover from common failures.

## Operating constraints

- Use a lab VM, cloud trial instance, or local container that you can delete.
- Do not run the capstone on a workstation, shared server, or irreplaceable VM.
- Take a snapshot before the failure drill if your platform supports snapshots.
- Keep notes in a text file as you work; do not rely on shell history alone.
- Avoid storing real secrets, customer data, or personal credentials in evidence.

## Suggested stack

Pick one path and keep it simple:

- Ubuntu Server or Debian with Nginx or Apache.
- Rocky Linux, AlmaLinux, or Fedora Server with Nginx or Apache.
- A local VM from VirtualBox, VMware, UTM, Multipass, or a short-lived cloud VM.

The exact distribution matters less than the habits: inspect before changing,
make one change at a time, verify it, and record what happened.

## Requirements

- Provision a disposable Linux VM.
- Create a non-root admin user with SSH access.
- Install and run a web server.
- Serve a static status page.
- Configure a firewall so only SSH and HTTP/HTTPS are reachable.
- Capture logs for the web service and SSH.
- Add a basic monitoring check.
- Simulate one failure and recover from it.
- Write an operator handoff.
- Provision a disposable Linux system and record how it was created.
- Create a non-root admin user with SSH access and sudo rights.
- Update package metadata and install a web server.
- Serve a static status page with a hostname, owner, and last-updated note.
- Configure a firewall so only SSH and HTTP or HTTPS are reachable.
- Capture service logs and SSH authentication logs.
- Add a basic monitoring check for the web service and disk space.
- Simulate one failure, recover from it, and show before/after evidence.
- Write an operator handoff with rollback and next maintenance steps.

## Work phases

### 1. Baseline the host

Capture enough information to prove what system you started with:

```bash
hostnamectl
uname -a
ip address
ip route
df -h
systemctl --failed
```

Also record the package source evidence for your distribution:

```bash
cat /etc/os-release
apt policy 2>/dev/null || dnf repolist 2>/dev/null || yum repolist 2>/dev/null
```

### 2. Build the service

Install the web server, enable it, and verify it locally before opening the
firewall:

```bash
sudo systemctl enable --now nginx 2>/dev/null || sudo systemctl enable --now httpd
systemctl status nginx 2>/dev/null || systemctl status httpd
curl -I http://localhost/
```

Replace the default page with a small status page that includes the service
purpose, operator contact, and update timestamp. Keep the page boring and easy
to inspect.

### 3. Secure the access path

Show that access is intentional:

```bash
id
sudo -l
ss -tulpn
sudo ufw status verbose 2>/dev/null || sudo firewall-cmd --list-all 2>/dev/null
```

Only SSH and the web service should be reachable unless your lab platform
requires another management port. If you use a cloud VM, record both the Linux
firewall state and the cloud security group or firewall rule summary.

### 4. Add lightweight monitoring

Monitoring can be a simple script, cron job, systemd timer, or manual check
documented in the handoff. It should answer two questions:

- Is the web service responding?
- Is the host at risk from obvious resource pressure?

Example checks:

```bash
curl -fsS http://localhost/ >/dev/null
df -h /
systemctl is-active nginx 2>/dev/null || systemctl is-active httpd
```

### 5. Run the failure drill

Choose one reversible failure:

- Stop the web service, detect the outage, and restart it.
- Move the status page aside, detect the bad response, and restore it.
- Fill a temporary test directory enough to trigger your disk-space check, then
clean it up.

Record the failure time, symptom, detection command, recovery command, and final
proof that the service is healthy again.

## Evidence to submit

Expand All @@ -27,6 +128,37 @@ You joined a small team that needs a Linux VM to host an internal status page. Y
- Log excerpts that prove the service worked.
- Handoff note with rollback and next maintenance step.

## Handoff template

Use this structure for the final operator note:

```text
Service:
Host:
Owner:
Purpose:
Build summary:
Access method:
Firewall policy:
Monitoring check:
Failure tested:
Recovery steps:
Rollback plan:
Known risks:
Next maintenance:
Evidence location:
```

## Completion checks

Before you call the capstone complete, verify these are true:

- A fresh SSH session can log in with the non-root admin account.
- The web page responds locally and from the expected network path.
- The firewall state matches the intended exposure.
- Logs show the service start, access test, failure, and recovery.
- The handoff explains how to operate, recover, and safely retire the lab.

## Review rubric

| Area | Good evidence |
Expand Down
Loading