System Design
Database Sharding
Sharding — katta database’ni kichik qismlarga (shards) bo’lish.
Muammo: Single database cheklangan
1 Database server
- Max 100K writes/sec
- Max 1TB storage efficient
- Vertical scaling qimmat ($100K+/year)
Billion users: Bitta server yetmaydi
Yechim: Sharding
Ma’lumotlarni horizontal bo’lish:
┌─────────┐
│ Router │
└────┬────┘
│
┌─────────┼─────────┬─────────┐
▼ ▼ ▼ ▼
┌────────┐┌────────┐┌────────┐┌────────┐
│Shard 1 ││Shard 2 ││Shard 3 ││Shard 4 │
│Users ││Users ││Users ││Users │
│0-25M ││25-50M ││50-75M ││75-100M │
└────────┘└────────┘└────────┘└────────┘
Har bir shard:
- Mustaqil database
- 25M users
- Full replica set
Sharding Strategies
1. Hash-based Sharding
User ID’ni hash qilib shard tanlash:
function getShard(userId) {
return hash(userId) % shardCount;
}
// user_id=12345 → hash → shard 2
// user_id=67890 → hash → shard 1
Afzalliklari:
- Teng taqsimlash (uniform distribution)
- Oddiy implementation
Kamchiliklari:
- Range query qiyin:
WHERE id BETWEEN 1000 AND 2000 - Shard qo’shish qiyin (rehashing kerak)
Qachon: User profiles, key-value data
2. Range-based Sharding
ID range bo’yicha:
Shard 1: user_id 0 - 1,000,000
Shard 2: user_id 1,000,001 - 2,000,000
Shard 3: user_id 2,000,001 - 3,000,000
Afzalliklari:
- Range query oson
- Shard qo’shish oson
Kamchiliklari:
- Notekis taqsimlash (hotspots)
- Yangi userlar bitta shard’da
Qachon: Time-series data, logs
3. Geographic Sharding
Location bo’yicha:
Shard 1: US users
Shard 2: Europe users
Shard 3: Asia users
Afzalliklari:
- Low latency (local data)
- Data residency compliance (GDPR)
Kamchiliklari:
- Notekis taqsimlash (US > others)
- Cross-region queries qiyin
Qachon: Global applications
4. Directory-based Sharding
Lookup table:
Shard Lookup Table:
user_id_range | shard
0-1M | shard1
1M-2M | shard2
2M-3M | shard3
Afzalliklari:
- Flexible (shard qo’shish oson)
- Custom logic
Kamchiliklari:
- Lookup table = single point of failure
- Extra hop (latency)
Qachon: Complex sharding logic kerak
Shard Key tanlash
Shard key — qaysi columndan sharding qilish.
Yaxshi shard key
High cardinality: Ko'p unique values
Even distribution: Teng taqsimlash
Query pattern: Ko'p query qilinadigan column
Misollar:
user_id(ko’p unique, teng)country(kam unique, notekis)gender(2 ta qiymat, notekis)
Application sharding example
// Sharding logic
class ShardedDB {
constructor(shards) {
this.shards = shards; // [db1, db2, db3, db4]
}
getShard(userId) {
return this.shards[userId % this.shards.length];
}
async getUser(userId) {
const shard = this.getShard(userId);
return await shard.query('SELECT * FROM users WHERE id = ?', [userId]);
}
async createUser(user) {
const shard = this.getShard(user.id);
return await shard.query('INSERT INTO users VALUES (?)', [user]);
}
}
Cross-shard Challenges
1. Cross-shard Queries
Muammo:
-- Get all users with name='Ali'
SELECT * FROM users WHERE name = 'Ali';
Users barcha shard’larda → har bir shard’dan query kerak!
Yechim:
async function findUsersByName(name) {
const promises = shards.map(shard =>
shard.query('SELECT * FROM users WHERE name = ?', [name])
);
const results = await Promise.all(promises);
return results.flat(); // Combine
}
Performance: N shards = N queries (sekin)
Alternative: Secondary index (Elasticsearch).
2. Cross-shard JOINs
Muammo:
-- Users va posts ikkala shard'da
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id;
Yechim 1: Denormalization
// posts table'da user ma'lumotini saqla
{
post_id: 123,
title: '...',
user_id: 456,
user_name: 'Ali' // Denormalized
}
Yechim 2: Application-level JOIN
const posts = await getPostsFromShard(...);
const userIds = posts.map(p => p.user_id);
const users = await getUsersFromShards(userIds);
// Manually join in code
3. Distributed Transactions
Muammo:
Transfer pul: user1 (shard 1) → user2 (shard 2)
Shard 1: UPDATE users SET balance = balance - 100 WHERE id = 1;
Shard 2: UPDATE users SET balance = balance + 100 WHERE id = 2;
Agar shard 1 success, shard 2 fail?
Yechim 1: Two-Phase Commit (2PC)
Phase 1: Prepare
Shard 1: Can you commit? → Yes
Shard 2: Can you commit? → Yes
Phase 2: Commit
Coordinator: Commit all
Shard 1: Committed
Shard 2: Committed
Kamchiliklari:
- Sekin (multiple round-trips)
- Coordinator failure = blocked
Yechim 2: Saga Pattern
Step 1: Deduct from user1 (shard 1)
Step 2: Add to user2 (shard 2) Failed
Compensation: Add back to user1 (rollback)
Afzalliklari: No blocking
Eventually consistent
4. Unique ID Generation
Muammo:
-- Auto-increment shard'larda collision
Shard 1: user_id = 1, 2, 3, ...
Shard 2: user_id = 1, 2, 3, ... Duplicate!
Yechim 1: UUID
const uuid = require('uuid');
const userId = uuid.v4(); // 550e8400-e29b-41d4-a716-446655440000
Unique globally
36 characters (vs 8-byte integer)
Yechim 2: Twitter Snowflake
64-bit ID:
[timestamp:41 bits][datacenter:5 bits][machine:5 bits][sequence:12 bits]
Example: 1234567890123456789
Unique, sortable, compact
Yechim 3: Shard-specific ranges
Shard 1: 0 - 1,000,000,000
Shard 2: 1,000,000,001 - 2,000,000,000
Shard 3: 2,000,000,001 - 3,000,000,000
Resharding (Shard qo’shish)
Muammo: Traffic oshdi, yangi shard kerak.
2 shards → 4 shards
Rehash kerak: hash(id) % 2 → hash(id) % 4
50% ma’lumot ko’chishi kerak!
Consistent Hashing
Oddiy hashing:
Shard count o'zgarsa → barcha keys rehash
Consistent hashing:
Shard qo'shilsa → faqat 1/N keys ko'chadi
Virtual nodes:
Ring structure:
[Shard1-vnode1] [Shard2-vnode1] [Shard1-vnode2] [Shard2-vnode2] ...
Yangi shard qo’shilsa → faqat neighbor vnode’lardan ko’chirish.
Real-world Examples
Instagram (PostgreSQL)
Sharding strategy: Hash-based (user_id % 4096)
Shards: 4096 logical shards
Physical servers: ~100 machines
Each machine: 40-50 logical shards
Pinterest (MySQL)
Sharding: Range-based initially
Problem: Hotspots (new pins)
Solution: Switched to hash-based
Shards: 4096
Discord (Cassandra)
Sharding: Built-in (partition key)
Partition key: channel_id
Replication factor: 3
Automatic rebalancing
Uber (MySQL)
Sharding: Geographic (city-based)
Each city: Separate shard
Benefit: Data locality, compliance
Challenge: Cross-city trips
Monitoring Sharded Systems
// Metrics per shard
{
shard_id: 1,
queries_per_sec: 5000,
storage_used: '500GB',
cpu_usage: 60%,
connection_count: 200
}
Alerts:
- Shard CPU > 80%
- Storage > 80%
- Replication lag > 5s
- Uneven distribution (1 shard >> others)
Best Practices
1. Design for sharding early
Hatto 1 shard bilan boshla, lekin shard-aware code yoz
2. Choose shard key carefully
user_id, tenant_id (multi-tenant), etc.
3. Avoid cross-shard queries
Denormalize, use secondary indexes
4. Monitor shard health
Hotspots, lag, disk usage
5. Plan for resharding
Consistent hashing, migration strategy
6. Test failure scenarios
Shard down, network partition
Xulosa
Sharding:
- Horizontal scaling
- Cheksiz capacity
- High throughput
Strategies:
- Hash-based: Uniform distribution
- Range-based: Range queries
- Geographic: Low latency
- Directory: Flexibility
Challenges:
- Cross-shard queries
- Distributed transactions
- Resharding
Real-world:
- Instagram: 4096 shards
- Discord: Cassandra (built-in)
- Design carefully!
Keyingi dars: CAP teoremasi - distributed systems trade-offs.