Script—Oracle Mysql 获取字段连续分段间隔

前言

日常工作中我们常常遇到需要查询表中某个字段值的连续信息,使用最多的场景就是我们在各个APP上看到的连续签到信息,是不是很熟悉?今天我们总结下实现类似需求的相关Sql脚本。

建立测试数据

模拟用户签到信息表,两个字段:userid 用户ID ,dt 签到日期,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/*CREATE TABLE T_TEST AS
SELECT '1' USERID,TO_DATE('2015-01-10','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-09','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-08','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-05','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-04','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-02','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2015-01-01','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '1' USERID,TO_DATE('2014-12-30','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-22','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-21','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-20','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-19','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-16','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-15','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-14','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-10','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-09','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '2' USERID,TO_DATE('2015-01-08','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-16','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-15','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-14','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-10','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-09','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-08','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '3' USERID,TO_DATE('2015-01-07','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '4' USERID,TO_DATE('2015-01-06','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '4' USERID,TO_DATE('2015-01-05','yyyy-mm-dd') DT FROM DUAL UNION ALL
SELECT '4' USERID,TO_DATE('2015-01-03','yyyy-mm-dd') DT FROM DUAL ;*/

目标

BE:开始日期

EN:截止日期

DU:连续天数

1
2
3
4
5
6
7
8
9
10
11
12
13
USERID BE          EN                  DU
------ ----------- ----------- ----------
1 2015-01-08 2015-01-10 3
1 2015-01-04 2015-01-05 2
1 2015-01-01 2015-01-02 2
1 2014-12-30 2014-12-30 1
2 2015-01-19 2015-01-22 4
2 2015-01-14 2015-01-16 3
2 2015-01-08 2015-01-10 3
3 2015-01-14 2015-01-16 3
3 2015-01-07 2015-01-10 4
4 2015-01-05 2015-01-06 2
4 2015-01-03 2015-01-03 1

方案一

原理

利用oracle分析函数lead,lag,取得每个用户签到日期降序后的前一个值和后一个值,在此基础上,分别取日期不等于后一日期+1(分段的最小值),前一日期-1的数据(分段的最小值),最后相同行号的数据即为一个连续段的起始值。

sql脚本

1
2
3
4
5
6
7
8
9
10
11
WITH T AS (SELECT USERID,DT
,NVL(LEAD(DT) OVER(PARTITION BY USERID ORDER BY DT DESC),DT-2) LEAD_DATE_REAL
,NVL(LAG(DT) OVER(PARTITION BY USERID ORDER BY DT DESC),DT+2) LAG_DATE_REAL
FROM T_TEST)
SELECT A.USERID
,A.DT BE
,B.DT EN
,B.DT - A.DT + 1 DU
FROM (SELECT ROWNUM RN,USERID,DT FROM T WHERE DT <> LEAD_DATE_REAL + 1) A
,(SELECT ROWNUM RN,USERID,DT FROM T WHERE DT <> LAG_DATE_REAL - 1) B
WHERE A.RN = B.RN;

方案二

原理

利用oracle分析函数lead,取得每个用户签到日期降序后的后一个值,first_value取每个用户日期降序后的第一个值,过滤掉日期连续的值后即为所有不连续日期,剩余日期即为分段的最小值,再利用lag取已经计算出的lead值的上一个值即为此分段的最大值(为空去first_value)。

sql脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT USERID
,BE
,EN
,EN - BE + 1 DU
FROM (SELECT USERID
,DT BE
,NVL(LAG(LEAD_DT_REAL) OVER(PARTITION BY USERID ORDER BY DT DESC),LAST_DT) EN
FROM (SELECT USERID
,DT
,NVL(LEAD(DT) OVER(PARTITION BY USERID ORDER BY DT DESC),DT - 2) LEAD_DT_REAL
,FIRST_VALUE(DT) OVER(PARTITION BY USERID ORDER BY DT DESC) LAST_DT
FROM T_TEST)
WHERE DT <> LEAD_DT_REAL + 1);

方案三

原理

此方案没有使用特殊函数,使用sql标准语法,因此受限范围小,大多数数据库都能使用。其原理既是使用两个自关联子查询获取到用户向前、向后不连续日期的集合,两个集合相同行数对应的日期即为连续日期段的起始值。表述有点绕,直接看脚本:

sql脚本

1
2
3
4
5
6
7
8
9
10
11
12
SELECT R1.USERID,R2.DT BE,R1.DT EN,R1.DT - R2.DT + 1 DU
FROM (SELECT ROWNUM RN ,C.USERID,C.DT
FROM (SELECT USERID,DT
FROM T_TEST A
WHERE NOT EXISTS (SELECT 1 FROM T_TEST B WHERE A.DT = B.DT - 1 AND A.USERID=B.USERID)
ORDER BY USERID,DT DESC) C ) R1
,(SELECT ROWNUM RN ,C.USERID,C.DT
FROM (SELECT USERID,DT
FROM T_TEST A
WHERE NOT EXISTS (SELECT 1 FROM T_TEST B WHERE A.DT = B.DT + 1 AND A.USERID=B.USERID)
ORDER BY USERID,DT DESC) C ) R2
WHERE R1.RN = R2.RN;

TIPS

如果你有有更多方案记得联系我!