diff --git a/docs/lessons/01_where_to_start/1.12_specialization_and_adjacent_disciplines.md b/docs/lessons/01_where_to_start/1.12_specialization_and_adjacent_disciplines.md index 1bd24e3..422df38 100644 --- a/docs/lessons/01_where_to_start/1.12_specialization_and_adjacent_disciplines.md +++ b/docs/lessons/01_where_to_start/1.12_specialization_and_adjacent_disciplines.md @@ -1,48 +1,308 @@ ## 1.12 Specialization and Adjacent Disciplines - +Linux administration is not one job. It is a foundation that connects many +disciplines: infrastructure, networking, security, automation, databases, +cloud platforms, containers, reliability engineering, and developer tooling. +You do not need to master every specialty at once, but you do need to recognize +where your work ends, where another specialist should join, and what evidence +to hand them. !!! abstract "What you will learn" - - Explain where **1.12 Specialization and Adjacent Disciplines** fits in day-to-day Linux operations. - - Use current Linux tooling to inspect, change, and verify the relevant system behavior. - - Connect the concept to a real operational scenario: a first-week junior admin onboarding into a small cloud team. - -!!! example "Field story" - Imagine a first-week junior admin onboarding into a small cloud team. Your job is not to memorize a command; it is to build a short evidence trail, choose a low-risk change, and prove whether the system improved. + - Identify common Linux specialization paths. + - Explain how adjacent disciplines depend on Linux fundamentals. + - Decide when to solve a problem yourself and when to escalate. + - Create a useful handoff note for another technical specialty. !!! success "Operator principle" - Identify the OS, distribution, support window, and package source before changing anything. + A good Linux operator does not pretend every problem is "just Linux." + Build enough range to triage clearly, then involve the right specialist + with useful evidence. -## Hands-on practice +## Core Linux operations -Run these on a disposable VM, container, or lab machine unless the lesson explicitly says otherwise. +System administration is the center of gravity for this book. It includes: -1. Inspect the current state with a read-only command related to this topic. -2. Save the command and output in a short lab note. -3. Make one reversible change or simulate the change in a sandbox. -4. Re-run the inspection and explain what changed. +- Installing and updating operating systems. +- Managing users, groups, permissions, and access. +- Operating services. +- Reading logs. +- Managing storage and filesystems. +- Configuring networks. +- Automating routine work. +- Responding to incidents. -## Check your understanding +This path rewards careful habits: inspect before changing, document what you +did, keep rollback options, and avoid treating production systems like a lab. + +## Networking + +Linux operators frequently troubleshoot network behavior before a dedicated +network engineer gets involved. + +Useful Linux-side skills include: + +- Reading IP addresses and routes. +- Checking DNS behavior. +- Inspecting listening ports. +- Testing connectivity. +- Understanding firewalls and NAT. +- Capturing packets when appropriate. + +Common tools: + +```bash +ip addr +ip route +ss -ltnp +resolvectl status +dig example.com +ping -c 4 8.8.8.8 +traceroute example.com +``` + +Escalate when the evidence points beyond the host: switch ports, VLANs, BGP, +VPN gateways, upstream routing, load balancers, or provider networks. + +## Security + +Security work includes prevention, detection, response, and compliance. Linux +operators support it by keeping systems patched, access controlled, logged, and +reviewable. + +Useful Linux-side skills include: + +- Least-privilege account design. +- SSH hardening. +- Patch management. +- Service exposure review. +- Log review. +- File integrity and permission review. +- Basic incident preservation. + +Common tools: + +```bash +sudo -l +ss -ltnp +find / -perm -4000 -type f 2>/dev/null +journalctl --since "1 hour ago" +last +``` + +Escalate quickly for suspected compromise, credential exposure, regulated data, +legal hold, or anything that may require forensic preservation. Do not "clean +up" evidence before the right people see it. + +## Cloud infrastructure + +Most cloud compute still runs Linux somewhere. Cloud work adds a control plane +around the host. + +Useful Linux-side skills include: + +- Boot and initialization logs. +- Package and service state. +- Disk and filesystem growth. +- Agent health. +- Network interface and route inspection. +- Understanding where local config ends and cloud config begins. + +Cloud-adjacent topics include: + +- IAM and instance roles. +- VPCs, subnets, security groups, and routes. +- Block storage and snapshots. +- Load balancers. +- Managed databases. +- Metrics, logs, and alarms. +- Cost controls. + +Escalate when the problem involves identity policies, provider limits, account +billing, managed-service behavior, or shared cloud networking. + +## Containers and orchestration + +Containers package processes and dependencies, but the host still matters. +Container orchestration adds scheduling, service discovery, networking, storage, +and rollout behavior. + +Useful Linux-side skills include: + +- Process and resource inspection. +- Filesystem and mount understanding. +- Network namespaces and ports. +- Log collection. +- Service health checks. +- Image and runtime trust. + +Common tools: + +```bash +docker ps +docker logs CONTAINER +docker inspect CONTAINER +kubectl get pods -A +kubectl describe pod POD +``` + +Escalate when the issue involves cluster scheduling, admission policy, service +mesh behavior, persistent-volume claims, or production rollout strategy. + +## Automation and configuration management + +Automation turns repeatable operations into code. It also turns bad assumptions +into repeatable outages if used carelessly. + +Useful Linux-side skills include: -- What evidence would tell you that this system is healthy? -- What is the riskiest command in this lesson, and how would you make it safer? -- How would you explain section 1.12 to a teammate during an incident handoff? +- Knowing the desired state before writing automation. +- Testing changes on disposable or lower-risk systems. +- Making tasks idempotent. +- Separating secrets from code. +- Recording manual exceptions. +Common tools and patterns: -One of the most rewarding aspects of Linux is its versatility. Thanks to this, there are numerous ways to specialize and various disciplines adjacent to being a Linux professional. +- Shell scripts for small local tasks. +- Ansible, Salt, Puppet, or Chef for configuration management. +- Terraform, Pulumi, or OpenTofu for infrastructure provisioning. +- CI jobs for repeatable checks. -Let's start to think about your possible development path. As you delve into the Linux landscape, it's almost like you are entering into a branching pathway, each leading to different areas of specialization. +Escalate when automation affects many hosts, changes security posture, rotates +secrets, rewrites network policy, or touches production deployment paths. -Firstly, you could become an expert in Linux System Administration. Here, you would oversee the operation of systems, ensuring their health, upkeep, and defense against the pitfalls of the digital world. You would conquer the art of managing users, executing application installations, updates, and perhaps even system-wide upgrades. It's more than just operating the system; it's about ensuring its survival and productivity in a constantly evolving landscape. +## Databases and stateful systems -In contrast to this, you may lean towards becoming a Linux Kernel Developer—those who dive down to the very core of Linux. You'd be writing, debugging, and implementing code that fuels the very essence of Linux, the kernel. It's you who would be shaping the future direction of Linux and being part of the community creating the world’s largest open source project. +Databases run on Linux, but database reliability is its own specialty. -On another path, you could grasp onto the reigns of Network Administration within Linux domains. Your principles of USBs, ethernet, Wi-Fi, and even Bluetooth would harbor connectivity, information flow, and interactions that tether different touchpoints. You'd be the orchestrator of the symphony that is digital transmission. +Linux operators should understand: -Moreover, your new-found Linux skills could pave the way into Development Operations (DevOps), a field steadily rising in demand. This multidisciplinary role requires you to marry development and operations, smoothing out the process from code creation to deployment and ensuring that updates are smoothly carried out with the least disruption. +- Disk capacity and latency. +- Memory pressure. +- Process limits. +- Backup storage. +- Service startup and logs. +- Basic network listener checks. -Further, a career path in Site Reliability Engineering (SRE) could be within your grasp. As an SRE, you would bridge the gap between development and operations, similar to DevOps. You would ensure the reliability and scalability of systems, working to maintain uptime and swiftly fix problems when they arise—a key role in the age of 24/7 service availability. +Common host-side checks: -Last but certainly not least, your Linux mastery could even see you shifting cloud-wards as a Cloud Expert. Being familiar with Linux is especially beneficial given that the majority of cloud infrastructure today is built on Linux. This specialization will have you managing, configuring, and troubleshooting this infrastructure, ensuring the smooth running of operations in the cloud. +```bash +df -h +free -h +iostat -xz 1 5 +systemctl status postgresql --no-pager +ss -ltnp +``` + +Escalate to a database specialist for query plans, replication lag, schema +changes, backup restore strategy, corruption, and data-retention risk. + +## Site reliability and production operations + +Site reliability engineering connects Linux operations to service promises. +The focus is not only whether a host is running, but whether users can rely on +the service. + +Useful Linux-side skills include: + +- Defining symptoms clearly. +- Reading metrics and logs. +- Understanding saturation, errors, latency, and traffic. +- Making reversible changes. +- Writing incident notes. +- Separating mitigation from root-cause analysis. + +Escalate when a change affects error budgets, incident command, customer +communications, release policy, or cross-team service ownership. + +## Developer platform and tooling + +Many developer-experience problems are Linux operations problems in disguise: +slow builds, broken dependency installs, missing tools, permission problems, +container issues, and CI runner failures. + +Useful Linux-side skills include: + +- Shell environment inspection. +- Package and language runtime management. +- Filesystem permissions. +- Container and VM basics. +- CI log reading. + +Escalate when the issue belongs to build-system ownership, language ecosystem +policy, release engineering, or product delivery timelines. + +## A simple triage boundary + +Use this rule when you are unsure: + +```text +Can I prove the problem is local to this Linux host? + +Yes: + Continue with careful local troubleshooting. + +No: + Gather host evidence and involve the adjacent owner. + +Not sure: + Do one more read-only check, then write a handoff note. +``` + +The goal is not to avoid responsibility. The goal is to bring the right +evidence to the right person. + +## Handoff note template + +Use a short, concrete handoff when another specialty should join: + +```text +Problem: + API latency increased after 14:20 UTC. + +Linux host evidence: + Load average: normal. + CPU steal: 0%. + Memory: 70% used, no swap pressure. + Disk: /var at 82%, no obvious I/O wait. + Network: service listens on 0.0.0.0:8443; DNS resolves correctly. + +What changed: + Deployment completed at 14:12 UTC. + +Why I am escalating: + Host checks do not show local saturation. Need application and database + investigation. + +Suggested next owner: + Application team plus database owner. +``` + +This is more useful than "server looks fine." + +## Lab: map one incident to specialties + +Pick this scenario: + +```text +A web application is slow after a release. +``` + +Write a triage note with: + +1. Three Linux host checks you would run first. +2. One networking question. +3. One database question. +4. One application or release question. +5. The point where you would escalate and to whom. + +Run only read-only commands in your lab environment. + +## Check your understanding -We have covered some of the specializations within the world of Linux, each unique and challenging in its own right. 😊 There's a whole world waiting for you out there! Which path will you pick? Whatever your choice, keep exploring, keep tinkering, and keep questioning, for that is the Linux way. 🚀🐧🌟. +- Why is Linux administration a foundation rather than a single isolated job? +- What evidence should a Linux operator collect before escalating a network + problem? +- Why should suspected compromise be escalated before cleanup? +- How do cloud control planes change Linux troubleshooting? +- What makes a handoff note useful to another specialist?