以下列举常用的使用自定义查询满足的需求:

1. 根据用户的 distinct_id 查询

直接使用 distinct_id 查询这个用户在某天的具体行为:

SELECT * FROM events WHERE distinct_id = 'wahaha' AND date = '2015-09-10' LIMIT 100
SQL

2. 查询每天上午 10 点至 11 点的下单用户数

使用标准的 SQL 日期函数 EXTRACT 来取出小时信息。

SELECT date, COUNT(*) FROM events 
WHERE EXTRACT(HOUR FROM time) IN (10, 11) AND event = 'SubmitOrder'
GROUP BY 1
SQL

3. 查询一段时间内的用户下单次数分布情况

首先计算每个用户的下单次数,然后使用 CASE..WHEN 语法来分组。

SELECT 
CASE
WHEN c < 10 THEN '<10'
WHEN c < 20 THEN '<20'
WHEN c < 100 THEN '<100'
ELSE '>100'
END,
COUNT(*)
FROM (
SELECT user_id, COUNT(*) AS c FROM events
WHERE date BETWEEN '2015-09-01' AND '2015-09-20' AND event = 'SubmitOrder'
GROUP BY 1
)a 
GROUP BY 1
SQL

4. 查询做了行为 A 而没有做行为 B 的用户数

使用 LEFT OUTER JOIN 计算差集。

SELECT a.user_id FROM (
SELECT DISTINCT user_id FROM events WHERE date='2015-10-1' AND event = 'BuyGold'
) a
LEFT OUTER JOIN (
SELECT DISTINCT user_id FROM events WHERE date='2015-10-1' AND event = 'SaleGold'
) b
ON a.user_id = b.user_id
WHERE b.user_id IS NULL
SQL

5. 计算用户的使用时长

使用分析函数,根据每个用户相邻的两个事件的间隔估算累计使用时长,如果两次使用间隔超出 10 分钟则不计算。

SELECT
user_id,
SUM(
CASE WHEN
end_time - begin_time < 600
THEN
end_time - begin_time
ELSE
0
END
) FROM (
SELECT
user_id,
EXTRACT(EPOCH FROM time) AS end_time,
LAG(EXTRACT(EPOCH FROM time), 1, NULL) OVER (PARTITION BY user_id ORDER BY time asc) AS begin_time
FROM events
WHERE date='2015-5-1'
) a
GROUP BY 1
SQL