Devops Interview Question

By: zigmoid
Posted on: 07/10/2025

2️⃣ How are your day-to-day activities as a DevOps Engineer?

  • Morning stand-ups — you know, pretending you know what you’re doing 😆
  • Monitoring infra: checking dashboards, alerts.
  • Managing CI/CD pipelines: debugging failed builds because someone pushed broken YAML at 2 AM.
  • Writing/maintaining IaC (Terraform, CloudFormation).
  • Patching servers, rotating secrets, yelling at Jenkins.
  • Automating repetitive tasks with Ansible/Bash/Python.
  • Reviewing logs, scaling clusters, keeping K8s happy.
  • Helping devs deploy code that worked on their machine.
  • Firefighting incidents — PagerDuty is your frenemy.

3️⃣ What are NAT Gateways?

  • NAT = Network Address Translation.
  • In AWS or cloud infra, a NAT Gateway lets instances in a private subnet access the internet (for updates, repo pulls, etc.) without exposing them directly to incoming traffic.
  • They translate private IP → public IP for outbound traffic.

4️⃣ What are pre-requisites to upgrade a K8s cluster?

  • Backup everything: etcd snapshots, manifests, secrets.
  • Validate current version compatibility.
  • Upgrade kubectl & client tools first.
  • Upgrade master nodes before worker nodes.
  • Test on a staging cluster.
  • Drain nodes properly.
  • Make sure all addons/CRDs are compatible.
  • Plan rollback.

5️⃣ What is a Pod Disruption Budget (PDB) in K8s?

  • A PDB limits how many pods can be voluntarily disrupted at once.
  • Example: draining nodes, upgrades.
  • E.g. maxUnavailable: 1 → always keeps at least one pod running.
  • Helps maintain service availability during maintenance.

6️⃣ Shell script for factorial:

bashCopy code#!/bin/bash

echo "Enter a number:"
read num
fact=1

for (( i=1; i<=num; i++ ))
do
  fact=$((fact * i))
done

echo "Factorial of $num is $fact"

7️⃣ Tell me about VPC structure setup in your project.

  • Multiple VPCs for dev, staging, prod.
  • Public & private subnets across multiple AZs.
  • Internet Gateway for public subnets.
  • NAT Gateway in public subnet for private subnet access.
  • Route tables: public subnets → IGW, private subnets → NAT.
  • Security groups & NACLs.
  • Peering for cross-VPC communication.

8️⃣ CI/CD pipeline & security tools integrated?

  • Git → Jenkins/GitLab → Docker build → push to ECR/ACR → deploy to K8s.
  • Stages: lint, test, build, scan, deploy.
  • Security tools:
    • Snyk/Trivy for container image scanning.
    • SonarQube for code quality.
    • HashiCorp Vault for secrets.
    • Static code analysis.
    • Secrets detection (like GitSecrets).

9️⃣ How do you manage them?

  • Pipelines as code (Jenkinsfile/GitLab CI).
  • Version-controlled scripts.
  • RBAC for pipeline access.
  • Rotate creds, use Vault.
  • Automated rollback on failure.
  • Dashboards for status.

🔟 Rough pipeline script for microservices arch (pseudo Jenkinsfile):

groovyCopy codepipeline {
  agent any

  stages {
    stage('Checkout') {
      steps {
        git 'https://repo.url'
      }
    }

    stage('Build') {
      steps {
        sh 'mvn clean package'
      }
    }

    stage('Docker Build & Push') {
      steps {
        sh '''
        docker build -t myapp:latest .
        docker tag myapp:latest repo/myapp:latest
        docker push repo/myapp:latest
        '''
      }
    }

    stage('Deploy to K8s') {
      steps {
        sh 'kubectl apply -f k8s/deployment.yaml'
      }
    }
  }
}

1️⃣1️⃣ What is multi-stage Docker build?

  • Multiple FROM instructions in a Dockerfile.
  • Compile in one stage, copy only the final build/artifact to the final image.
  • Reduces image size.
  • E.g. build a Go binary in golang:alpine → copy to scratch or alpine.

1️⃣2️⃣ What are manifest files?

  • YAML or JSON files that define K8s resources: Deployments, Services, ConfigMaps, etc.
  • Describes desired state.
  • Example: deployment.yaml declares how many replicas, containers, env vars, volumes.

1️⃣3️⃣ What is Ansible Vault?

  • Encrypts sensitive data: passwords, API keys.
  • ansible-vault encrypt secrets.yml
  • Decrypt at runtime or when editing.
  • Keeps secrets out of plain text.

1️⃣4️⃣ How to make a K8s cluster highly available?

  • Multiple master nodes spread across AZs.
  • External etcd cluster with odd number of members.
  • Load balancer in front of API servers.
  • Worker nodes spread across AZs.
  • Use anti-affinity rules for pods.
  • Backups for etcd.

1️⃣5️⃣ Monitoring tools & common pod errors:

  • Tools: Prometheus, Grafana, ELK/EFK, Alertmanager, Datadog.
  • Alerts on CPU, memory, pod restarts.
  • Common pod headaches:
    • CrashLoopBackOff → bad configs, failed init containers.
    • ImagePullBackOff → wrong image tag, missing creds.
    • Pending → insufficient node resources.
    • OOMKilled → container ran out of memory.

1️⃣6️⃣ Terraform script for VPC (rough):

hclCopy codeprovider "aws" {
  region = "ap-south-1"
}

resource "aws_vpc" "prod_vpc" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "public" {
  vpc_id            = aws_vpc.prod_vpc.id
  cidr_block        = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

resource "aws_internet_gateway" "gw" {
  vpc_id = aws_vpc.prod_vpc.id
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.prod_vpc.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
  }
}

resource "aws_route_table_association" "a" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

1️⃣7️⃣ How many objects can an S3 bucket store?

  • Unlimited.
  • Seriously, AWS will gladly take your money for trillions of objects.

1️⃣8️⃣ IAM Roles and Policies?

  • Roles: Temporary permissions, assumed by users/services.
  • Policies: JSON docs that define what actions are allowed or denied.
  • Policies attach to roles, users, or groups.

1️⃣9️⃣ What are artifacts?

  • Build outputs: binaries, Docker images, JARs, config packages.
  • Stored for deploy/reuse.
  • E.g. .jar file from Maven build → uploaded to Nexus/Artifactory.

2️⃣0️⃣ SATS and DATS?

  • Trick one — are you asking about ‘Stateful Application Tests’ and ‘Data Application Tests’?
  • Or SAT = System Acceptance Test, DAT = Data Acceptance Test?
  • Or you mean System Acceptance Testing (SAT) and Design Acceptance Testing (DAT) — both are QA/validation phases.

2️⃣1️⃣ How do you find errors in pipelines?

  • Logs. Lots of logs.
  • Jenkins/GitLab has logs for each stage.
  • Debug with echo or set -x.
  • Look at failed step’s stdout/stderr.
  • Use pipeline notifications & Slack/Webhooks.

2️⃣2️⃣ What are Ansible Roles?

  • Pre-structured way to organize playbooks.
  • Roles = reusable units: tasks, vars, handlers, templates, files.
  • Example: bashCopy coderoles/ nginx/ tasks/main.yml handlers/main.yml templates/nginx.conf.j2
  • Makes your playbooks modular, reusable, DRY.