Top 10 Cloud & AI Interview Questions

The Problem: Interviews Are Broken

Here’s the brutal truth: Most technical interviews test your ability to solve algorithm puzzles that you’ll never use in production. They ask you to reverse a binary tree or implement a sorting algorithm from scratch.

But here’s what companies actually need: Engineers who can deploy.

When I hire for my team, I don’t care if you can solve LeetCode Hard problems. I care if you can:

Deploy a scalable API to AWS without bringing down production
Debug a race condition in a Node.js backend
Build a RAG pipeline that actually works with real documents
Ship a React component that doesn’t tank our Lighthouse score

This post is your blueprint. These are the 10 questions we actually ask when hiring for cloud, AI, and full-stack roles. Master these, and you’ll prove you’re in the top 1% of candidates who can ship real solutions.

Cloud Architecture Questions (AWS)

What is the real-world difference between AWS Lambda and ECS Fargate?

A junior’s answer: “Lambda is serverless, and ECS runs containers.”

A senior engineer’s answer: “They solve different problems. Lambda is for event-driven, short-lived tasks where you pay per-millisecond and want zero infrastructure management. ECS Fargate is for long-running, stateful applications that need consistent performance and resource guarantees.

For example, I would use Lambda for:

An API endpoint that processes a single user request (< 15 minutes)
S3 event handlers for file uploads
Scheduled jobs that run on a cron

I would use ECS Fargate for:

A backend API that needs to maintain WebSocket connections
A data processing workload that runs for 8+ hours
Applications where cold starts would hurt the user experience

The decision comes down to execution duration, cost predictability, and whether you need persistent state.”

You deployed a Lambda function, and it works locally but fails in AWS with a timeout. How do you debug this?

The wrong approach: “I would just increase the timeout limit.”

The right approach: “First, I need to understand why it’s timing out. Here’s my debugging checklist:

Check CloudWatch Logs: Look for the actual error message. Is it a network timeout, database connection issue, or memory limit?
VPC Configuration: If the Lambda is in a VPC, does it have a NAT Gateway for internet access? Many timeout issues happen because the function can’t reach external APIs.
Cold Start vs. Warm Start: Is this happening on the first invocation? Cold starts can add 1-3 seconds for Python/Node, or 10+ seconds for Java.
Database Connections: Are you opening a new database connection on every invocation? You should reuse connections outside the handler.
Memory Allocation: Lambda CPU scales with memory. If you’re running heavy compute on 128MB, it will be slow. I’d test with 1024MB to see if it’s a resource issue.

Only after I’ve ruled out configuration issues would I increase the timeout. And if I do, I need to understand the cost implications—every second costs money.”

Lambda timeout debugging pattern

import time
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    start = time.time()
    logger.info(f"Remaining time: {context.get_remaining_time_in_millis()}ms")

    # Your logic here
    # Log timing at each step to find the bottleneck

    step1_start = time.time()
    process_step_1()
    logger.info(f"Step 1 took: {time.time() - step1_start:.2f}s")

    step2_start = time.time()
    process_step_2()
    logger.info(f"Step 2 took: {time.time() - step2_start:.2f}s")

    logger.info(f"Total execution: {time.time() - start:.2f}s")
    return {"statusCode": 200}

How would you design a cost-optimized architecture for a service that gets 1000 requests/day, but 80% of those requests happen between 9 AM and 5 PM?

A junior’s answer: “I’d use Lambda because it’s cheap and serverless.”

A senior engineer’s answer: “This is a classic case for a hybrid architecture. Here’s my design:

Option 1: Pure Serverless (Best for this scale)

API Gateway + Lambda for the backend
DynamoDB for storage (pay-per-request)
CloudFront for caching static assets

Cost breakdown:

1000 requests/day × 30 = 30,000 requests/month
Lambda free tier covers 1M requests/month
API Gateway: $3.50 per million requests = ~$0.10/month
DynamoDB: Pay-per-request is cheaper than provisioned for this scale

Option 2: If This Scales to 100K requests/day

Move to ECS Fargate with autoscaling (min 1 task, max 5)
Use Application Load Balancer
Schedule scaling: Scale up at 8:30 AM, scale down at 6 PM
Use Aurora Serverless v2 for the database (scales to zero)

The key insight: For 1000 requests/day, serverless is a no-brainer. But if this grows to 100K+, a scheduled ECS deployment with predictable scaling will be cheaper than Lambda + API Gateway at high volume.

I’d also add:

CloudWatch alarms for cost anomalies
AWS Cost Explorer to track spend by service
A budget alert at $50/month”

GenAI & RAG Questions

Explain the difference between fine-tuning an LLM and using RAG. When would you use each?

A junior’s answer: “Fine-tuning trains the model on your data. RAG gives the model your data at runtime.”

A senior engineer’s answer: “Correct, but that’s not the full picture. Here’s when you’d use each:

Fine-tuning is for changing the model’s behavior or style:

Teaching a model to respond in your company’s tone
Training it to follow a specific output format (e.g., JSON responses)
Making it better at domain-specific tasks (e.g., medical or legal language)

Downsides:

Expensive (you’re training a model)
Slow to update (re-training takes time)
Can’t handle real-time data (you’d need to retrain constantly)

RAG is for giving the model access to knowledge:

Answering questions from your documentation
Providing up-to-date information (e.g., today’s stock prices)
Grounding responses in specific source material

Downsides:

Retrieval quality matters (bad search = bad answers)
Token limits (you can only fit so much context)
Latency (you’re doing a vector search before every LLM call)

In practice, I use both:

Fine-tune a small model (e.g., GPT-3.5) to format responses correctly
Use RAG to inject the relevant knowledge at runtime

For example, in a customer support bot, I’d fine-tune for tone and structure, and use RAG to pull in the latest help docs.”

You built a RAG chatbot, but it's giving wrong answers even though the correct information is in the knowledge base. What are the top 3 things you'd check?

The debugging process:

1. Retrieval Quality (The #1 Culprit)

Are you retrieving the right documents? Print out what’s being sent to the LLM.
Check your embedding model: Are you using the same model for ingestion and query?
Test your vector search: Query for a known fact and see if the right chunk is in the top 3 results.

test_retrieval.py

# Debug script to test retrieval quality
query = "What is the Lambda timeout limit?"
results = knowledge_base.query(query, top_k=5)

for i, doc in enumerate(results):
    print(f"Result {i+1} (score: {doc.score:.3f}):")
    print(doc.content[:200])
    print("---")

2. Chunking Strategy

Are your chunks too small? (You lose context)
Are your chunks too large? (Irrelevant info dilutes the signal)
Did you split mid-sentence? (Embeddings will be poor)

Best practice: 500-1000 tokens per chunk, with 50-100 token overlap.

3. Prompt Engineering

Is your prompt telling the LLM to only use the provided context?
Are you handling cases where the answer isn’t in the docs?

improved_prompt.py

prompt = f"""You are a helpful assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't have that information."

Context:
{retrieved_docs}

Question: {user_question}

Answer:"""

What is the difference between a vector database and a regular SQL database? When would you use a vector DB for RAG?

A junior’s answer: “Vector databases store embeddings.”

A senior engineer’s answer: “Yes, but the key difference is how they search.

SQL databases search by exact matches or range queries:

SELECT * FROM products WHERE price > 100 AND category = 'electronics'

Vector databases search by semantic similarity. They find items that are close in meaning, not exact matches.

For example:

Query: ‘How do I deploy a Lambda function?’
A vector DB will find documents about ‘serverless deployment,’ ‘AWS Lambda setup,’ and ‘function deployment guides’—even if those exact words aren’t in the query.

When to use a vector DB for RAG:

Use a vector DB when your queries are natural language (e.g., chatbots, Q&A systems)
Use SQL when your queries are structured (e.g., ‘Show me all orders from last week’)

In a real RAG system, I use both:

Vector DB (Pinecone, Weaviate, or Bedrock Knowledge Bases) for semantic search
SQL (PostgreSQL with pgvector) for metadata filtering

For example:

Filter by metadata in SQL: ‘Only search docs published after 2024’
Then do semantic search in the vector DB on that subset

This hybrid approach is faster and more accurate than pure vector search.”

Backend Engineering Questions

What is a race condition, and how do you prevent it in a Node.js API?

A race condition occurs when two or more operations are supposed to happen in sequence, but they end up running in parallel, leading to unpredictable results. This is especially common in asynchronous code like Node.js.

race_condition_example.js

let balance = 100;

function withdraw(amount) {
  console.log(`Checking balance...`);
  setTimeout(() => { // Simulates network delay
    if (balance >= amount) {
      balance = balance - amount;
      console.log(`Withdrew $${amount}. New balance: $${balance}`);
    } else {
      console.log('Insufficient funds.');
    }
  }, 50);
}

// Two requests come in at the same time
withdraw(75);
withdraw(75);

The problem: Both functions read the balance before either one updated it. This is a critical bug.

How to fix it:

Option 1: Use a queue (Best for Node.js)

fixed_with_queue.js

const queue = [];
let isProcessing = false;

async function withdraw(amount) {
  return new Promise((resolve) => {
    queue.push({ amount, resolve });
    processQueue();
  });
}

async function processQueue() {
  if (isProcessing || queue.length === 0) return;
  isProcessing = true;

  const { amount, resolve } = queue.shift();
  if (balance >= amount) {
    balance -= amount;
    resolve({ success: true, newBalance: balance });
  } else {
    resolve({ success: false, reason: 'Insufficient funds' });
  }

  isProcessing = false;
  processQueue(); // Process next item
}

Option 2: Use database-level locking (Best for distributed systems)

BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE id = 123 FOR UPDATE;  -- Locks the row
UPDATE accounts SET balance = balance - 75 WHERE id = 123;
COMMIT;

How would you design an API rate limiter to prevent abuse?

A junior’s answer: “I’d count requests in memory and block users who exceed the limit.”

A senior engineer’s answer: “That works for a single server, but it breaks in a distributed system. Here’s how I’d design it:

Option 1: Token Bucket (Most Common)

Each user gets a ‘bucket’ with X tokens
Each request consumes 1 token
Tokens refill at a fixed rate (e.g., 10 per minute)

Option 2: Fixed Window

Allow 100 requests per minute
Reset counter every minute
Downside: Burst traffic at the window edge (e.g., 100 requests at 0:59, then 100 more at 1:00)

Option 3: Sliding Window (Best)

Track requests in the last 60 seconds
More accurate than fixed window

Implementation with Redis:

rate_limiter.js

const redis = require('redis');
const client = redis.createClient();

async function checkRateLimit(userId, limit = 100, window = 60) {
  const key = `rate_limit:${userId}`;
  const now = Date.now();
  const windowStart = now - (window * 1000);

  // Remove old requests outside the window
  await client.zRemRangeByScore(key, 0, windowStart);

  // Count requests in the current window
  const requestCount = await client.zCard(key);

  if (requestCount >= limit) {
    return { allowed: false, retryAfter: window };
  }

  // Add current request
  await client.zAdd(key, now, `${now}-${Math.random()}`);
  await client.expire(key, window);

  return { allowed: true, remaining: limit - requestCount - 1 };
}

Why Redis?

It’s fast (in-memory)
It’s shared across all servers (distributed rate limiting)
Built-in expiration (auto-cleanup)

Where I’d use this:

API Gateway: 1000 requests/minute per API key
Login endpoint: 5 attempts per minute per IP
Payment API: 10 requests/minute per user

I’d also add monitoring: CloudWatch alarms if > 10% of requests are rate-limited (might indicate a DDoS or a bug in a client).”

Frontend (React) Questions

Why is it bad to fetch data inside a map() in React?

The problem:

bad_example.jsx

function UserList({ userIds }) {
  return (
    <div>
      {userIds.map(id => {
        // BAD: Fetching inside the loop
        const [user, setUser] = useState(null);

        useEffect(() => {
          fetch(`/api/users/${id}`)
            .then(res => res.json())
            .then(setUser);
        }, [id]);

        return <div>{user?.name}</div>;
      })}
    </div>
  );
}

Why this breaks:

Hooks can’t be called inside loops (React rule)
Even if it worked, it would make N separate API calls (terrible performance)

The fix:

good_example.jsx

function UserList({ userIds }) {
  const [users, setUsers] = useState({});

  useEffect(() => {
    // Fetch all users in a single request
    fetch(`/api/users?ids=${userIds.join(',')}`)
      .then(res => res.json())
      .then(data => {
        const userMap = {};
        data.forEach(user => userMap[user.id] = user);
        setUsers(userMap);
      });
  }, [userIds]);

  return (
    <div>
      {userIds.map(id => (
        <div key={id}>{users[id]?.name || 'Loading...'}</div>
      ))}
    </div>
  );
}

Better yet, use React Query:

const { data: users } = useQuery(['users', userIds], () =>
  fetch(`/api/users?ids=${userIds.join(',')}`).then(r => r.json())
);

Why this matters in interviews:

Shows you understand React’s rendering rules
Shows you think about performance (1 request vs. N requests)
Shows you know modern data-fetching patterns (React Query)

What causes unnecessary re-renders in React, and how do you prevent them?

Common causes:

1. Inline object/array creation in props

rerender_problem.jsx

function Parent() {
  const [count, setCount] = useState(0);

  return (
    <div>
      <button onClick={() => setCount(count + 1)}>Count: {count}</button>
      {/* BAD: New object on every render */}
      <ChildComponent user={{ name: 'Alice' }} />
    </div>
  );
}

const ChildComponent = React.memo(({ user }) => {
  console.log('Child rendered!');
  return <div>{user.name}</div>;
});

The fix: useMemo

rerender_fixed.jsx

function Parent() {
  const [count, setCount] = useState(0);

  // Memoize the user object
  const user = useMemo(() => ({ name: 'Alice' }), []);

  return (
    <div>
      <button onClick={() => setCount(count + 1)}>Count: {count}</button>
      <ChildComponent user={user} />
    </div>
  );
}

Other common fixes:

Use useCallback for functions passed as props
Move static data outside the component
Use React.memo to prevent re-renders when props haven’t changed

In an interview, I’d also mention:

Use React DevTools Profiler to find unnecessary re-renders
Don’t over-optimize—only fix re-renders that cause performance issues

Conclusion: These Aren’t Algorithm Riddles

Notice that none of these questions ask you to reverse a linked list or implement a binary search tree.

These are real-world problems. The kind you’ll face on day one of the job.

When I hire engineers, I want to know: “Can this person be trusted with our production environment?”

That’s what these questions test:

Can you debug a Lambda timeout?
Can you prevent race conditions?
Can you design a cost-optimized architecture?
Can you build a RAG pipeline that actually works?

If you can answer these questions with confidence, you’re in the top 1% of candidates.

Ready to Build Real Cloud AI Solutions?

Stop just reading tutorials. Build real, deployable AI cloud solutions on AWS, Azure, and GCP platforms. Get hands-on with production-grade projects that prove you can ship—not just study.

Start Building Now Browse Projects

The 10 Interview Questions That Separate a Junior from a Senior Engineer (Cloud, AI, & Backend)