You're reading the guide. Ready to build it? Try this lab on a AWS, risk-free AWS sandbox.
Deploy NowThe 10 Interview Questions That Separate a Junior from a Senior Engineer (Cloud, AI, & Backend)
The Problem: Interviews Are Broken
Here’s the brutal truth: Most technical interviews test your ability to solve algorithm puzzles that you’ll never use in production. They ask you to reverse a binary tree or implement a sorting algorithm from scratch.
But here’s what companies actually need: Engineers who can deploy.
When I hire for my team, I don’t care if you can solve LeetCode Hard problems. I care if you can:
- Deploy a scalable API to AWS without bringing down production
- Debug a race condition in a Node.js backend
- Build a RAG pipeline that actually works with real documents
- Ship a React component that doesn’t tank our Lighthouse score
This post is your blueprint. These are the 10 questions we actually ask when hiring for cloud, AI, and full-stack roles. Master these, and you’ll prove you’re in the top 1% of candidates who can ship real solutions.
Cloud Architecture Questions (AWS)
What is the real-world difference between AWS Lambda and ECS Fargate?
A junior’s answer: “Lambda is serverless, and ECS runs containers.”
A senior engineer’s answer: “They solve different problems. Lambda is for event-driven, short-lived tasks where you pay per-millisecond and want zero infrastructure management. ECS Fargate is for long-running, stateful applications that need consistent performance and resource guarantees.
For example, I would use Lambda for:
- An API endpoint that processes a single user request (< 15 minutes)
- S3 event handlers for file uploads
- Scheduled jobs that run on a cron
I would use ECS Fargate for:
- A backend API that needs to maintain WebSocket connections
- A data processing workload that runs for 8+ hours
- Applications where cold starts would hurt the user experience
The decision comes down to execution duration, cost predictability, and whether you need persistent state.”
You deployed a Lambda function, and it works locally but fails in AWS with a timeout. How do you debug this?
The wrong approach: “I would just increase the timeout limit.”
The right approach: “First, I need to understand why it’s timing out. Here’s my debugging checklist:
-
Check CloudWatch Logs: Look for the actual error message. Is it a network timeout, database connection issue, or memory limit?
-
VPC Configuration: If the Lambda is in a VPC, does it have a NAT Gateway for internet access? Many timeout issues happen because the function can’t reach external APIs.
-
Cold Start vs. Warm Start: Is this happening on the first invocation? Cold starts can add 1-3 seconds for Python/Node, or 10+ seconds for Java.
-
Database Connections: Are you opening a new database connection on every invocation? You should reuse connections outside the handler.
-
Memory Allocation: Lambda CPU scales with memory. If you’re running heavy compute on 128MB, it will be slow. I’d test with 1024MB to see if it’s a resource issue.
Only after I’ve ruled out configuration issues would I increase the timeout. And if I do, I need to understand the cost implications—every second costs money.”
import time
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
start = time.time()
logger.info(f"Remaining time: {context.get_remaining_time_in_millis()}ms")
# Your logic here
# Log timing at each step to find the bottleneck
step1_start = time.time()
process_step_1()
logger.info(f"Step 1 took: {time.time() - step1_start:.2f}s")
step2_start = time.time()
process_step_2()
logger.info(f"Step 2 took: {time.time() - step2_start:.2f}s")
logger.info(f"Total execution: {time.time() - start:.2f}s")
return {"statusCode": 200} How would you design a cost-optimized architecture for a service that gets 1000 requests/day, but 80% of those requests happen between 9 AM and 5 PM?
A junior’s answer: “I’d use Lambda because it’s cheap and serverless.”
A senior engineer’s answer: “This is a classic case for a hybrid architecture. Here’s my design:
Option 1: Pure Serverless (Best for this scale)
- API Gateway + Lambda for the backend
- DynamoDB for storage (pay-per-request)
- CloudFront for caching static assets
Cost breakdown:
- 1000 requests/day × 30 = 30,000 requests/month
- Lambda free tier covers 1M requests/month
- API Gateway: $3.50 per million requests = ~$0.10/month
- DynamoDB: Pay-per-request is cheaper than provisioned for this scale
Option 2: If This Scales to 100K requests/day
- Move to ECS Fargate with autoscaling (min 1 task, max 5)
- Use Application Load Balancer
- Schedule scaling: Scale up at 8:30 AM, scale down at 6 PM
- Use Aurora Serverless v2 for the database (scales to zero)
The key insight: For 1000 requests/day, serverless is a no-brainer. But if this grows to 100K+, a scheduled ECS deployment with predictable scaling will be cheaper than Lambda + API Gateway at high volume.
I’d also add:
- CloudWatch alarms for cost anomalies
- AWS Cost Explorer to track spend by service
- A budget alert at $50/month”
GenAI & RAG Questions
Explain the difference between fine-tuning an LLM and using RAG. When would you use each?
A junior’s answer: “Fine-tuning trains the model on your data. RAG gives the model your data at runtime.”
A senior engineer’s answer: “Correct, but that’s not the full picture. Here’s when you’d use each:
Fine-tuning is for changing the model’s behavior or style:
- Teaching a model to respond in your company’s tone
- Training it to follow a specific output format (e.g., JSON responses)
- Making it better at domain-specific tasks (e.g., medical or legal language)
Downsides:
- Expensive (you’re training a model)
- Slow to update (re-training takes time)
- Can’t handle real-time data (you’d need to retrain constantly)
RAG is for giving the model access to knowledge:
- Answering questions from your documentation
- Providing up-to-date information (e.g., today’s stock prices)
- Grounding responses in specific source material
Downsides:
- Retrieval quality matters (bad search = bad answers)
- Token limits (you can only fit so much context)
- Latency (you’re doing a vector search before every LLM call)
In practice, I use both:
- Fine-tune a small model (e.g., GPT-3.5) to format responses correctly
- Use RAG to inject the relevant knowledge at runtime
For example, in a customer support bot, I’d fine-tune for tone and structure, and use RAG to pull in the latest help docs.”
You built a RAG chatbot, but it's giving wrong answers even though the correct information is in the knowledge base. What are the top 3 things you'd check?
The debugging process:
1. Retrieval Quality (The #1 Culprit)
- Are you retrieving the right documents? Print out what’s being sent to the LLM.
- Check your embedding model: Are you using the same model for ingestion and query?
- Test your vector search: Query for a known fact and see if the right chunk is in the top 3 results.
# Debug script to test retrieval quality
query = "What is the Lambda timeout limit?"
results = knowledge_base.query(query, top_k=5)
for i, doc in enumerate(results):
print(f"Result {i+1} (score: {doc.score:.3f}):")
print(doc.content[:200])
print("---") 2. Chunking Strategy
- Are your chunks too small? (You lose context)
- Are your chunks too large? (Irrelevant info dilutes the signal)
- Did you split mid-sentence? (Embeddings will be poor)
Best practice: 500-1000 tokens per chunk, with 50-100 token overlap.
3. Prompt Engineering
- Is your prompt telling the LLM to only use the provided context?
- Are you handling cases where the answer isn’t in the docs?
prompt = f"""You are a helpful assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't have that information."
Context:
{retrieved_docs}
Question: {user_question}
Answer:""" What is the difference between a vector database and a regular SQL database? When would you use a vector DB for RAG?
A junior’s answer: “Vector databases store embeddings.”
A senior engineer’s answer: “Yes, but the key difference is how they search.
SQL databases search by exact matches or range queries:
SELECT * FROM products WHERE price > 100 AND category = 'electronics'Vector databases search by semantic similarity. They find items that are close in meaning, not exact matches.
For example:
- Query: ‘How do I deploy a Lambda function?’
- A vector DB will find documents about ‘serverless deployment,’ ‘AWS Lambda setup,’ and ‘function deployment guides’—even if those exact words aren’t in the query.
When to use a vector DB for RAG:
- Use a vector DB when your queries are natural language (e.g., chatbots, Q&A systems)
- Use SQL when your queries are structured (e.g., ‘Show me all orders from last week’)
In a real RAG system, I use both:
- Vector DB (Pinecone, Weaviate, or Bedrock Knowledge Bases) for semantic search
- SQL (PostgreSQL with pgvector) for metadata filtering
For example:
- Filter by metadata in SQL: ‘Only search docs published after 2024’
- Then do semantic search in the vector DB on that subset
This hybrid approach is faster and more accurate than pure vector search.”
Backend Engineering Questions
What is a race condition, and how do you prevent it in a Node.js API?
A race condition occurs when two or more operations are supposed to happen in sequence, but they end up running in parallel, leading to unpredictable results. This is especially common in asynchronous code like Node.js.
let balance = 100;
function withdraw(amount) {
console.log(`Checking balance...`);
setTimeout(() => { // Simulates network delay
if (balance >= amount) {
balance = balance - amount;
console.log(`Withdrew $${amount}. New balance: $${balance}`);
} else {
console.log('Insufficient funds.');
}
}, 50);
}
// Two requests come in at the same time
withdraw(75);
withdraw(75); The problem: Both functions read the balance before either one updated it. This is a critical bug.
How to fix it:
Option 1: Use a queue (Best for Node.js)
const queue = [];
let isProcessing = false;
async function withdraw(amount) {
return new Promise((resolve) => {
queue.push({ amount, resolve });
processQueue();
});
}
async function processQueue() {
if (isProcessing || queue.length === 0) return;
isProcessing = true;
const { amount, resolve } = queue.shift();
if (balance >= amount) {
balance -= amount;
resolve({ success: true, newBalance: balance });
} else {
resolve({ success: false, reason: 'Insufficient funds' });
}
isProcessing = false;
processQueue(); // Process next item
} Option 2: Use database-level locking (Best for distributed systems)
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE id = 123 FOR UPDATE; -- Locks the row
UPDATE accounts SET balance = balance - 75 WHERE id = 123;
COMMIT; How would you design an API rate limiter to prevent abuse?
A junior’s answer: “I’d count requests in memory and block users who exceed the limit.”
A senior engineer’s answer: “That works for a single server, but it breaks in a distributed system. Here’s how I’d design it:
Option 1: Token Bucket (Most Common)
- Each user gets a ‘bucket’ with X tokens
- Each request consumes 1 token
- Tokens refill at a fixed rate (e.g., 10 per minute)
Option 2: Fixed Window
- Allow 100 requests per minute
- Reset counter every minute
- Downside: Burst traffic at the window edge (e.g., 100 requests at 0:59, then 100 more at 1:00)
Option 3: Sliding Window (Best)
- Track requests in the last 60 seconds
- More accurate than fixed window
Implementation with Redis:
const redis = require('redis');
const client = redis.createClient();
async function checkRateLimit(userId, limit = 100, window = 60) {
const key = `rate_limit:${userId}`;
const now = Date.now();
const windowStart = now - (window * 1000);
// Remove old requests outside the window
await client.zRemRangeByScore(key, 0, windowStart);
// Count requests in the current window
const requestCount = await client.zCard(key);
if (requestCount >= limit) {
return { allowed: false, retryAfter: window };
}
// Add current request
await client.zAdd(key, now, `${now}-${Math.random()}`);
await client.expire(key, window);
return { allowed: true, remaining: limit - requestCount - 1 };
} Why Redis?
- It’s fast (in-memory)
- It’s shared across all servers (distributed rate limiting)
- Built-in expiration (auto-cleanup)
Where I’d use this:
- API Gateway: 1000 requests/minute per API key
- Login endpoint: 5 attempts per minute per IP
- Payment API: 10 requests/minute per user
I’d also add monitoring: CloudWatch alarms if > 10% of requests are rate-limited (might indicate a DDoS or a bug in a client).”
Frontend (React) Questions
Why is it bad to fetch data inside a map() in React?
The problem:
function UserList({ userIds }) {
return (
<div>
{userIds.map(id => {
// BAD: Fetching inside the loop
const [user, setUser] = useState(null);
useEffect(() => {
fetch(`/api/users/${id}`)
.then(res => res.json())
.then(setUser);
}, [id]);
return <div>{user?.name}</div>;
})}
</div>
);
} Why this breaks:
- Hooks can’t be called inside loops (React rule)
- Even if it worked, it would make N separate API calls (terrible performance)
The fix:
function UserList({ userIds }) {
const [users, setUsers] = useState({});
useEffect(() => {
// Fetch all users in a single request
fetch(`/api/users?ids=${userIds.join(',')}`)
.then(res => res.json())
.then(data => {
const userMap = {};
data.forEach(user => userMap[user.id] = user);
setUsers(userMap);
});
}, [userIds]);
return (
<div>
{userIds.map(id => (
<div key={id}>{users[id]?.name || 'Loading...'}</div>
))}
</div>
);
} Better yet, use React Query:
const { data: users } = useQuery(['users', userIds], () =>
fetch(`/api/users?ids=${userIds.join(',')}`).then(r => r.json())
);Why this matters in interviews:
- Shows you understand React’s rendering rules
- Shows you think about performance (1 request vs. N requests)
- Shows you know modern data-fetching patterns (React Query)
What causes unnecessary re-renders in React, and how do you prevent them?
Common causes:
1. Inline object/array creation in props
function Parent() {
const [count, setCount] = useState(0);
return (
<div>
<button onClick={() => setCount(count + 1)}>Count: {count}</button>
{/* BAD: New object on every render */}
<ChildComponent user={{ name: 'Alice' }} />
</div>
);
}
const ChildComponent = React.memo(({ user }) => {
console.log('Child rendered!');
return <div>{user.name}</div>;
}); The fix: useMemo
function Parent() {
const [count, setCount] = useState(0);
// Memoize the user object
const user = useMemo(() => ({ name: 'Alice' }), []);
return (
<div>
<button onClick={() => setCount(count + 1)}>Count: {count}</button>
<ChildComponent user={user} />
</div>
);
} Other common fixes:
- Use
useCallbackfor functions passed as props - Move static data outside the component
- Use
React.memoto prevent re-renders when props haven’t changed
In an interview, I’d also mention:
- Use React DevTools Profiler to find unnecessary re-renders
- Don’t over-optimize—only fix re-renders that cause performance issues
Conclusion: These Aren’t Algorithm Riddles
Notice that none of these questions ask you to reverse a linked list or implement a binary search tree.
These are real-world problems. The kind you’ll face on day one of the job.
When I hire engineers, I want to know: “Can this person be trusted with our production environment?”
That’s what these questions test:
- Can you debug a Lambda timeout?
- Can you prevent race conditions?
- Can you design a cost-optimized architecture?
- Can you build a RAG pipeline that actually works?
If you can answer these questions with confidence, you’re in the top 1% of candidates.
Ready to Build Real Cloud AI Solutions?
Stop just reading tutorials. Build real, deployable AI cloud solutions on AWS, Azure, and GCP platforms. Get hands-on with production-grade projects that prove you can ship—not just study.
