O SQL é enganadoramente simples — as consultas básicas são fáceis, mas resolver problemas complexos de negócio exige padrões avançados. Desde auto-junções a window functions, dominar estas técnicas transforma a forma como aborda desafios de dados. Este guia cobre padrões avançados essenciais de SQL na perspetiva de um programador sénior.
Porque é que o SQL Avançado é Importante
Dominar SQL avançado permite:
- Desempenho: Mover a lógica para a base de dados, onde é mais rápido
- Simplicidade: Substituir código da aplicação por consultas únicas
- Fiabilidade: As operações ao nível da base de dados são atómicas
- Análise: Reporting complexo sem exportação de dados
- Eficiência: Reduzir idas e voltas à rede
Subconsultas Correlacionadas
Obter Top N por Grupo
Encontrar os 15% melhores em vendas por região e semana:
-- Padrão para MS Access/MySQL
SELECT *
FROM sales_by_shops_weekly AS sbs
LEFT JOIN shops_count_weekly AS sc
ON sbs.group_id = sc.group_id
AND sbs.week = sc.week
WHERE (
SELECT COUNT(*)
FROM sales_by_shops_weekly AS sbs2
WHERE sbs2.sold > sbs.sold
AND sbs2.week = sbs.week
AND sbs2.group_id = sbs.group_id
) < sc.shops_count * 15 / 100;
Este padrão: 1. Conta quantas linhas têm vendas superiores à linha atual 2. Mantém as linhas em que essa contagem é inferior a 15% do total de lojas 3. Na prática, devolve os 15% com melhor desempenho
Encontrar Linhas Sem Registos Relacionados Correspondentes
-- Encontrar clientes sem encomendas
SELECT c.*
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
);
-- Alternativa usando LEFT JOIN
SELECT c.*
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;
Window Functions
As window functions efetuam cálculos sobre linhas relacionadas sem as colapsar.
Totais Acumulados
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS customer_running_total
FROM orders;
Classificação Dentro de Grupos
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
Diferenças:
RANK(): Lacunas após empates (1, 1, 3)DENSE_RANK(): Sem lacunas (1, 1, 2)ROW_NUMBER(): Números únicos (1, 2, 3)
Comparar com Linhas Anteriores/Seguintes
SELECT
sale_date,
amount,
LAG(amount, 1) OVER (ORDER BY sale_date) AS prev_amount,
LEAD(amount, 1) OVER (ORDER BY sale_date) AS next_amount,
amount - LAG(amount, 1) OVER (ORDER BY sale_date) AS change_from_prev
FROM daily_sales;
Primeiro/Último no Grupo
SELECT DISTINCT
department,
FIRST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY hire_date
) AS first_hire,
LAST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_hire
FROM employees;
Common Table Expressions (CTEs)
As CTEs melhoram a legibilidade e permitem consultas recursivas.
CTE Básica
WITH monthly_totals AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS total
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
monthly_growth AS (
SELECT
month,
total,
LAG(total) OVER (ORDER BY month) AS prev_total,
(total - LAG(total) OVER (ORDER BY month)) /
LAG(total) OVER (ORDER BY month) * 100 AS growth_pct
FROM monthly_totals
)
SELECT * FROM monthly_growth
WHERE growth_pct > 10;
CTE Recursiva para Hierarquias
-- Hierarquia de colaboradores
WITH RECURSIVE employee_tree AS (
-- Caso base: gestores de topo
SELECT
id,
name,
manager_id,
1 AS level,
name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Caso recursivo: colaboradores com gestores
SELECT
e.id,
e.name,
e.manager_id,
et.level + 1,
et.path || ' > ' || e.name
FROM employees e
INNER JOIN employee_tree et ON e.manager_id = et.id
)
SELECT * FROM employee_tree ORDER BY path;
CTE Recursiva para Intervalos de Datas
-- Gerar série de datas
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' AS date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < DATE '2024-12-31'
)
SELECT
ds.date,
COALESCE(SUM(o.amount), 0) AS daily_total
FROM date_series ds
LEFT JOIN orders o ON DATE(o.order_date) = ds.date
GROUP BY ds.date
ORDER BY ds.date;
Recuperação de Codificação de Caracteres
Quando os dados têm problemas de codificação (comum com texto cirílico ou asiático):
Recuperar cp1251 a partir de Mojibake em UTF-8
-- MySQL: corrigir texto cp1251 corrompido armazenado como UTF-8
SELECT
id,
CAST(
CONVERT(
CAST(CONVERT(text_column USING cp1251) AS BINARY)
USING utf8
)
AS CHAR CHARACTER SET cp1251
) COLLATE cp1251_general_ci AS fixed_text
FROM broken_table;
-- Atualizar os dados
UPDATE broken_table
SET text_column = CAST(
CONVERT(
CAST(CONVERT(text_column USING cp1251) AS BINARY)
USING utf8
)
AS CHAR CHARACTER SET cp1251
) COLLATE cp1251_general_ci
WHERE text_column LIKE '%Ð%'; -- Padrão para mojibake
Pivotar Dados
Pivot Estático
-- PostgreSQL/MySQL
SELECT
product_id,
SUM(CASE WHEN month = 1 THEN amount ELSE 0 END) AS jan,
SUM(CASE WHEN month = 2 THEN amount ELSE 0 END) AS feb,
SUM(CASE WHEN month = 3 THEN amount ELSE 0 END) AS mar,
SUM(CASE WHEN month = 4 THEN amount ELSE 0 END) AS apr
FROM sales
GROUP BY product_id;
-- SQL Server com PIVOT
SELECT product_id, [1] AS jan, [2] AS feb, [3] AS mar, [4] AS apr
FROM (
SELECT product_id, month, amount
FROM sales
) AS source
PIVOT (
SUM(amount) FOR month IN ([1], [2], [3], [4])
) AS pivot_table;
Unpivot (Colunas para Linhas)
-- Transformar colunas em linhas
SELECT product_id, 'jan' AS month, jan AS amount FROM monthly_sales
UNION ALL
SELECT product_id, 'feb' AS month, feb AS amount FROM monthly_sales
UNION ALL
SELECT product_id, 'mar' AS month, mar AS amount FROM monthly_sales;
-- PostgreSQL com VALUES
SELECT
product_id,
month,
amount
FROM monthly_sales
CROSS JOIN LATERAL (
VALUES
('jan', jan),
('feb', feb),
('mar', mar)
) AS unpivot(month, amount);
Operações JSON
Consultas JSON em PostgreSQL
-- Consultar dados JSON
SELECT
id,
data->>'name' AS name,
data->>'email' AS email,
(data->>'age')::int AS age
FROM users
WHERE data->>'status' = 'active';
-- Consultar JSON aninhado
SELECT
data->'address'->>'city' AS city,
data->'address'->>'zip' AS zip
FROM users;
-- Agregar para JSON
SELECT
department,
jsonb_agg(jsonb_build_object(
'name', name,
'salary', salary
)) AS employees
FROM employees
GROUP BY department;
JSON em MySQL
SELECT
id,
JSON_EXTRACT(data, '$.name') AS name,
JSON_EXTRACT(data, '$.address.city') AS city
FROM users
WHERE JSON_EXTRACT(data, '$.active') = true;
Deteção de Gaps e Islands
Encontrar sequências consecutivas:
WITH numbered AS (
SELECT
sale_date,
amount,
sale_date - (ROW_NUMBER() OVER (ORDER BY sale_date))::int AS grp
FROM daily_sales
WHERE amount > 1000 -- Dias de vendas elevadas
),
islands AS (
SELECT
grp,
MIN(sale_date) AS start_date,
MAX(sale_date) AS end_date,
COUNT(*) AS streak_length
FROM numbered
GROUP BY grp
)
SELECT * FROM islands
WHERE streak_length >= 5 -- Pelo menos 5 dias consecutivos
ORDER BY start_date;
Cálculo da Mediana
-- PostgreSQL
SELECT
department,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;
-- MySQL 8+
SELECT
department,
AVG(salary) AS avg_salary,
(
SELECT AVG(sub.salary)
FROM (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS cnt
FROM employees e2
WHERE e2.department = employees.department
) sub
WHERE sub.rn IN (FLOOR((cnt + 1) / 2), CEIL((cnt + 1) / 2))
) AS median_salary
FROM employees
GROUP BY department;
Padrões de Deduplicação
Manter o Registo Mais Recente
-- Usando ROW_NUMBER
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (
SELECT id FROM ranked WHERE rn > 1
);
-- Usando auto-junção
DELETE u1 FROM users u1
INNER JOIN users u2
ON u1.email = u2.email
AND u1.created_at < u2.created_at;
Principais Conclusões
- Window functions: Poderosas para analytics sem subconsultas
- CTEs para legibilidade: Dividir consultas complexas em etapas
- CTEs recursivas: Lidar com hierarquias e geração de séries
- Subconsultas correlacionadas: Cálculos linha a linha quando necessário
- Consciência de codificação: O charset da base de dados importa
- Testar desempenho: EXPLAIN ANALYZE antes de produção
Os padrões avançados de SQL substituem frequentemente centenas de linhas de código da aplicação por uma única consulta — invista em dominá-los e a sua forma de tratar dados transformar-se-á.