题号 | 题目 | 难度 | 通过率 |
---|---|---|---|
SQL206 | 获取每个部门中当前员工薪水最高的相关信息 | 困难 | 20.23% |
SQL264 | 牛客每个人最近的登录日期(五) | 困难 | 20.64% |
SQL215 | 查找在职员工自入职以来的薪水涨幅情况 | 困难 | 22.48% |
SQL280 | 实习广场投递简历分析(三) | 困难 | 26.82% |
SQL220 | 汇总各个部门当前员工的title类型的分配数目 | 困难 | 27.95% |
SQL270 | 考试分数(五) | 困难 | 28.20% |
SQL275 | 牛客的课程订单分析(五) | 困难 | 28.98% |
SQL219 | 获取员工其当前的薪水比其manager当前薪水还高的相关信息 | 困难 | 31.20% |
SQL285 | 获得积分最多的人(三) | 困难 | 33.74% |
- 传统方法实现
- 窗口函数实现
不建议直接看相关的答案, 先手动过一遍, 然后再交叉对比. 特别是264, 270, 179, 189
, 这几道题, 其解法非常巧妙.
(注: 由于运行代码需要注册账号登录, 这里的代码并没有提交到平台上运行, 部分代码只解析出逻辑, 并未提取最后要求得结果)
SQL206 最高工资
获取每个部门中当前员工薪水最高的相关信息.
这里没有具体细节, 例如最高, 假如最高存在多个人, 是全部取出还是只要一个即可.
drop table if exists `dept_emp` ;
drop table if exists `salaries` ;
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
INSERT INTO dept_emp VALUES(10003,'d002','1996-08-03','9999-01-01');
INSERT INTO salaries VALUES(10001,88958,'2002-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'2001-08-02','9999-01-01');
INSERT INTO salaries VALUES(10003,92527,'2001-08-02','9999-01-01');
mysql> select * from salaries;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10001 | 88958 | 2002-06-22 | 9999-01-01 |
| 10002 | 72527 | 2001-08-02 | 9999-01-01 |
| 10003 | 92527 | 2001-08-02 | 9999-01-01 |
+--------+--------+------------+------------+
3 rows in set (0.00 sec)
mysql> select * from dept_emp;
+--------+---------+------------+------------+
| emp_no | dept_no | from_date | to_date |
+--------+---------+------------+------------+
| 10001 | d001 | 1986-06-26 | 9999-01-01 |
| 10002 | d001 | 1996-08-03 | 9999-01-01 |
| 10003 | d002 | 1996-08-03 | 9999-01-01 |
+--------+---------+------------+------------+
3 rows in set (0.00 sec)
描述
有一个员工表dept_emp
简况如下:
emp_no | dept_no | from_date | to_date |
---|---|---|---|
10001 | d001 | 1986-06-26 | 9999-01-01 |
10002 | d001 | 1996-08-03 | 9999-01-01 |
10003 | d002 | 1996-08-03 | 9999-01-01 |
有一个薪水表salaries
简况如下:
emp_no | salary | from_date | to_date |
---|---|---|---|
10001 | 88958 | 2002-06-22 | 9999-01-01 |
10002 | 72527 | 2001-08-02 | 9999-01-01 |
10003 | 92527 | 2001-08-02 | 9999-01-01 |
获取每个部门中当前员工薪水最高的相关信息,给出dept_no, emp_no
以及其对应的salary,按照部门编号dept_no
升序排列,以上例子输出如下:
dept_no | emp_no | maxSalary |
---|---|---|
d001 | 10001 | 88958 |
d002 | 10003 | 92527 |
解答W
WITH a AS (
SELECT
s.emp_no,
s.salary,
d.dept_no
FROM
salaries s
LEFT JOIN dept_emp d ON s.emp_no = d.emp_no
) SELECT
*,
NTH_VALUE( salary, 1 ) over w AS top_salary,
rank() over w AS top_rank
FROM
a window w AS ( PARTITION BY dept_no ORDER BY salary DESC );
+--------+--------+---------+------------+----------+
| emp_no | salary | dept_no | top_salary | top_rank |
+--------+--------+---------+------------+----------+
| 10001 | 88958 | d001 | 88958 | 1 |
| 10002 | 72527 | d001 | 88958 | 2 |
| 10003 | 92527 | d002 | 92527 | 1 |
+--------+--------+---------+------------+----------+
解答T
不使用窗口函数
WITH a AS (
SELECT
s.emp_no,
s.salary,
d.dept_no
FROM
salaries s
LEFT JOIN dept_emp d ON s.emp_no = d.emp_no
),
b AS ( SELECT dept_no, max( salary ) AS ms FROM a GROUP BY dept_no ) SELECT
a.emp_no,
a.salary,
a.dept_no,
b.ms
FROM
a,
b
WHERE
b.ms = a.salary;
+--------+--------+---------+-------+
| emp_no | salary | dept_no | ms |
+--------+--------+---------+-------+
| 10001 | 88958 | d001 | 88958 |
| 10003 | 92527 | d002 | 92527 |
+--------+--------+---------+-------+
2 rows in set (0.01 sec)
SQL215 入职薪水的涨幅
查找在职员工自入职以来的薪水涨幅情况
drop table if exists `employees` ;
drop table if exists `salaries` ;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` char(1) NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO employees VALUES(10001,'1953-09-02','Georgi','Facello','M','2001-06-22');
INSERT INTO employees VALUES(10002,'1964-06-02','Bezalel','Simmel','F','1999-08-03');
INSERT INTO salaries VALUES(10001,85097,'2001-06-22','2002-06-22');
# 插入多一行的数据
INSERT INTO salaries VALUES(10001,86097,'2002-06-22','2003-06-22');
INSERT INTO salaries VALUES(10001,88958,'2003-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'1999-08-03','2000-08-02');
INSERT INTO salaries VALUES(10002,72527,'2000-08-02','2001-08-02');
mysql> select * from employees;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 10001 | 1953-09-02 | Georgi | Facello | M | 2001-06-22 |
| 10002 | 1964-06-02 | Bezalel | Simmel | F | 1999-08-03 |
+--------+------------+------------+-----------+--------+------------+
2 rows in set (0.00 sec)
mysql> select * from salaries;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10001 | 85097 | 2001-06-22 | 2002-06-22 |
| 10001 | 88958 | 2002-06-22 | 9999-01-01 |
| 10002 | 72527 | 1999-08-03 | 2000-08-02 |
| 10002 | 72527 | 2000-08-02 | 2001-08-02 |
+--------+--------+------------+------------+
4 rows in set (0.00 sec)
描述
请你查找在职员工自入职以来的薪水涨幅情况,给出在职员工编号emp_no
以及其对应的薪水涨幅growth
,并按照growth
进行升序,以上例子输出为
(注: to_date为薪资调整某个结束日期,或者为离职日期,to_date='9999-01-01'时,表示依然在职,无后续调整记录)
emp_no |
growth |
---|---|
10001 | 3861 |
解答W
WITH a AS (
SELECT
e.emp_no,
e.first_name,
e.hire_date,
s.salary,
s.to_date
FROM
employees e
LEFT JOIN salaries s ON e.emp_no = s.emp_no
) SELECT
emp_no,
hire_date,
salary,
lag( to_date ) over w AS td,
lag( salary ) over w AS ts
FROM
a window w AS ( PARTITION BY emp_no ORDER BY to_date DESC );
+--------+------------+--------+------------+-------+
| emp_no | hire_date | salary | td | ts |
+--------+------------+--------+------------+-------+
| 10001 | 2001-06-22 | 88958 | NULL | NULL |
| 10001 | 2001-06-22 | 85097 | 9999-01-01 | 88958 |
| 10002 | 1999-08-03 | 72527 | NULL | NULL |
| 10002 | 1999-08-03 | 72527 | 2001-08-02 | 72527 |
+--------+------------+--------+------------+-------+
# 这里没注意假如还有更多的中间数据, 而不是两行呢?
# 对上面的数据进行修改
INSERT INTO salaries VALUES(10001,85097,'2001-06-22','2002-06-22');
INSERT INTO salaries VALUES(10001,86097,'2002-06-22','2003-06-22');
INSERT INTO salaries VALUES(10001,88958,'2003-06-22','9999-01-01');
+--------+------------+--------+------------+-------+
| emp_no | hire_date | salary | td | ts |
+--------+------------+--------+------------+-------+
| 10001 | 2001-06-22 | 88958 | NULL | NULL |
| 10001 | 2001-06-22 | 86097 | 9999-01-01 | 88958 |
| 10001 | 2001-06-22 | 85097 | 2003-06-22 | 86097 |
| 10002 | 1999-08-03 | 72527 | NULL | NULL |
| 10002 | 1999-08-03 | 72527 | 2001-08-02 | 72527 |
+--------+------------+--------+------------+-------+
5 rows in set (0.00 sec)
# 修正
WITH a AS (
SELECT
e.emp_no,
e.first_name,
e.hire_date,
s.salary,
s.to_date
FROM
employees e
LEFT JOIN salaries s ON e.emp_no = s.emp_no
WHERE
# 筛选出两行的数据
s.to_date = '9999-01-01'
OR s.from_date = e.hire_date
),
b AS ( SELECT emp_no, salary, lag( salary ) over ( PARTITION BY emp_no ORDER BY to_date ) AS ls FROM a ) SELECT
emp_no,
salary - ls AS growth
FROM
b
WHERE
ls IS NOT NULL;
+--------+--------+
| emp_no | growth |
+--------+--------+
| 10001 | 3861 |
+--------+--------+
1 row in set (0.00 sec)
解答T
不使用窗口函数
WITH a AS (
SELECT
e.emp_no,
e.first_name,
e.hire_date,
s.salary,
s.to_date
FROM
employees e
LEFT JOIN salaries s ON e.emp_no = s.emp_no
WHERE
s.to_date = '9999-01-01'
OR s.from_date = e.hire_date
),
b AS (
SELECT
a1.emp_no,
a1.salary,
a2.salary AS ts,
a1.to_date
FROM
a AS a1
LEFT JOIN a AS a2 ON a1.emp_no = a2.emp_no
WHERE
a1.to_date != a2.to_date
AND a1.to_date = '9999-01-01'
) SELECT
emp_no,
salary - ts AS growth
FROM
b;
+--------+--------+
| emp_no | growth |
+--------+--------+
| 10001 | 3861 |
+--------+--------+
1 row in set (0.00 sec)
SQL264 最近登录
牛客每个人最近的登录日期(五)
drop table if exists login;
CREATE TABLE `login` (
`id` int(4) NOT NULL,
`user_id` int(4) NOT NULL,
`client_id` int(4) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`id`));
INSERT INTO login VALUES
(1,2,1,'2020-10-12'),
(2,3,2,'2020-10-12'),
(3,1,2,'2020-10-12'),
(4,2,2,'2020-10-13'),
(5,1,2,'2020-10-13'),
(6,3,1,'2020-10-14'),
(7,4,1,'2020-10-14'),
(8,4,1,'2020-10-15');
描述
牛客每天有很多人登录,请你统计一下牛客每个日期新用户的次日留存率。
有一个登录(login)记录表,简况如下:
id | user_id | client_id | date |
---|---|---|---|
1 | 2 | 1 | 2020-10-12 |
2 | 3 | 2 | 2020-10-12 |
3 | 1 | 2 | 2020-10-12 |
4 | 2 | 2 | 2020-10-13 |
5 | 1 | 2 | 2020-10-13 |
6 | 3 | 1 | 2020-10-14 |
7 | 4 | 1 | 2020-10-14 |
8 | 4 | 1 | 2020-10-15 |
第1行表示user_id为2的用户在2020-10-12使用了客户端id为1的设备登录了牛客网,因为是第1次登录,所以是新用户
......
第4行表示user_id为2的用户在2020-10-13使用了客户端id为2的设备登录了牛客网,因为是第2次登录,所以是老用户
......
最后1行表示user_id为4的用户在2020-10-15使用了客户端id为1的设备登录了牛客网,因为是第2次登录,所以是老用户
请你写出一个sql语句查询每个日期新用户的次日留存率,结果保留小数点后面3位数(3位之后的四舍五入),并且查询结果按照日期升序排序,上面的例子查询结果如下:
date | p |
---|---|
2020-10-12 | 0.667 |
2020-10-13 | 0.000 |
2020-10-14 | 1.000 |
2020-10-15 | 0.000 |
查询结果表明:
2020-10-12登录了3个(user_id为2,3,1)新用户,2020-10-13,只有2个(id为2,1)登录,故2020-10-12新用户次日留存率为2/3=0.667;
2020-10-13没有新用户登录,输出0.000;
2020-10-14登录了1个(user_id为4)新用户,2020-10-15,user_id为4的用户登录,故2020-10-14新用户次日留存率为1/1=1.000;2020-10-15没有新用户登录,输出0.000;
解答W
# 得到同一天所有的登录次数 ltimes
# 得到每个用户的在活跃次数(注意是活跃) atimes
# 然后偏移处理
# 这个语句很粗糙有点的伪代码的感觉
# 处理主要是需要执行 向前, 向后
WITH a AS ( SELECT *, count( date ) over ( PARTITION BY date ) AS ltimes FROM login ),
b AS ( SELECT *, lag( 1, 1, 0 ) over ( PARTITION BY user_id ORDER BY date ) AS atime FROM a ),
c AS ( SELECT date, ltimes, sum( atime ) AS atimes FROM b GROUP BY date, ltimes ),
d AS ( SELECT *, LEAD( atimes, 1, 0 ) over () AS ptimes FROM c ) SELECT
date,
(
ptimes / ( ltimes - atimes )) AS r
FROM
d;
# ptimes
+------------+--------+
| date | r |
+------------+--------+
| 2020-10-12 | 0.6667 |
| 2020-10-13 | NULL |
| 2020-10-14 | 1.0000 |
| 2020-10-15 | NULL |
+------------+--------+
4 rows in set, 2 warnings (0.00 sec)(warning: / 0)
解答T
# 计算出每天的登录总数
# 计算出每个用户每天登录数
# 计算出天登录的次数中有多少次是老用户登录
# 汇总
WITH a AS ( SELECT date, count(*) AS ltimes FROM login GROUP BY date ),
b AS ( SELECT date, user_id, count(*) AS x FROM login GROUP BY date, user_id ),
d AS (
SELECT
b1.date,
b1.user_id,
b2.x AS atimes
FROM
b b1
LEFT JOIN b b2 ON b1.user_id = b2.user_id
AND DATEDIFF( b1.date, b2.date ) > 0
),
c AS (
SELECT
b1.*,
b2.x AS t_x,
a.ltimes
FROM
b AS b1
LEFT JOIN b AS b2 ON b1.user_id = b2.user_id
AND date_add( b1.date, INTERVAL 1 DAY ) = b2.date
LEFT JOIN a ON a.date = b1.date
),
e AS (
SELECT
c.date,
c.user_id,
c.t_x,
c.ltimes,
d.atimes
FROM
c
LEFT JOIN d ON d.user_id = c.user_id
AND d.date = c.date
),
f AS ( SELECT date, sum( t_x ) AS txs, sum( atimes ) AS ats, ltimes FROM e GROUP BY date ) SELECT
date,
(
txs / (
ltimes -
IF
( ats IS NULL, 0, ats ))) AS r
FROM
f;
+------------+--------+
| date | r |
+------------+--------+
| 2020-10-12 | 0.6667 |
| 2020-10-13 | NULL |
| 2020-10-14 | 1.0000 |
| 2020-10-15 | NULL |
+------------+--------+
4 rows in set (0.01 sec)
优化
WITH a AS ( SELECT date, count(*) AS ltimes FROM login GROUP BY date ),
b AS ( SELECT date, user_id, count(*) AS x FROM login GROUP BY date, user_id ),
d AS (
SELECT
b1.date,
b1.user_id,
b2.x AS atimes
FROM
b b1
LEFT JOIN b b2 ON b1.user_id = b2.user_id
AND DATEDIFF( b1.date, b2.date ) > 0
),
m AS (
SELECT
d.date,
sum(
IF
( atimes IS NULL, 0, atimes )) AS atimes,
a.ltimes
FROM
d
LEFT JOIN a ON a.date = d.date
GROUP BY
date
) SELECT
t1.*,
t2.atimes AS otimes,
t2.atimes / ( t1.ltimes - t1.atimes ) AS r
FROM
m t1
LEFT JOIN m t2 ON DATE_ADD( t1.date, INTERVAL 1 DAY ) = t2.date;
# atimes, 为当天登录的老用户
# ltimes, 为当天登录的总次数
# otimes, 为第二天的老用户登录次数
+------------+--------+--------+--------+--------+
| date | atimes | ltimes | otimes | r |
+------------+--------+--------+--------+--------+
| 2020-10-12 | 0 | 3 | 2 | 0.6667 |
| 2020-10-13 | 2 | 2 | 1 | NULL |
| 2020-10-14 | 1 | 2 | 1 | 1.0000 |
| 2020-10-15 | 1 | 1 | NULL | NULL |
+------------+--------+--------+--------+--------+
4 rows in set, 1 warning (0.00 sec)
大神解答
注意代码的可阅读性.....
mysql> SELECT IFNULL("a", "b");
+----------------------------------+
| IFNULL("a", "b") |
+----------------------------------+
| a |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT IFNULL(NULL, 500);
+-------------------+
| IFNULL(NULL, 500) |
+-------------------+
| 500 |
+-------------------+
1 row in set (0.01 sec)
# IFNULL(expression, alt_value)
# 代码经过navicat美化, 但是可阅读性还是很差
# 这种写法很酷炫, 类似于JavaScript, 使用三元表达式, 进行复杂的嵌套, 不将里面的内容, 拆出来, 难以阅读.
SELECT
date,
ifnull(
round((
sum(
CASE
WHEN ( user_id, date ) IN ( SELECT user_id, date_add( date, INTERVAL - 1 DAY ) FROM login GROUP BY user_id ) THEN
1 ELSE 0
END
))/ (
sum(
CASE
WHEN ( user_id, date ) IN ( SELECT user_id, min( date ) FROM login GROUP BY user_id ) THEN
1 ELSE 0
END
)),
3
),
0
) AS p
FROM
login
GROUP BY
date
ORDER BY
date;
# ----------------------------- 修改版本, 变更为with, if 取代 case when
WITH a AS ( SELECT user_id, date_add( date, INTERVAL - 1 DAY ) FROM login GROUP BY user_id, date ),
b AS ( SELECT user_id, min( date ) FROM login GROUP BY user_id ) SELECT
date,
ifnull(
round((
sum(
IF
(( user_id, date ) IN ( SELECT * FROM a ), 1, 0 ))) / (
sum(
IF
( ( user_id, date ) IN ( SELECT * FROM b ), 1, 0 ))),
3
),
0
) AS p
FROM
login
GROUP BY
date
ORDER BY
date;
# 注意部分代码是sqlite3的, 需要改成mysql
SELECT
a.date,
ROUND( COUNT( b.user_id ) * 1.0 / COUNT( a.user_id ), 3 ) AS p
FROM
( SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id ) a
LEFT JOIN login b ON a.user_id = b.user_id
AND b.date = DATE_ADD( a.date, INTERVAL 1 DAY )
GROUP BY
a.date UNION
SELECT
date,
0.000 AS p
FROM
login
WHERE
date NOT IN ( SELECT MIN( date ) FROM login GROUP BY user_id )
ORDER BY
date;
关键在于这个语句: SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id
这个语句非常巧妙地将用户的登录记录分拆开来, 分开新用户部分, 和老用户登录的两个部分
+---------+------------+
| user_id | date |
+---------+------------+
| 2 | 2020-10-12 |
| 3 | 2020-10-12 |
| 1 | 2020-10-12 |
| 4 | 2020-10-14 |
+---------+------------+
4 rows in set (0.00 sec)
# 1, 2, 3 是 10-12 新登录的用户
# 4 为10-14的新用户
# 将上面的语句改成with
# user_id +
WITH a AS ( SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id ),
c AS (
SELECT
a.date,
ROUND( COUNT( b.user_id ) * 1.0 / COUNT( a.user_id ), 3 ) AS p
FROM
a
LEFT JOIN login b ON a.user_id = b.user_id
AND b.date = DATE_ADD( a.date, INTERVAL 1 DAY )
GROUP BY
a.date
),
# 没有的数据表明当天没有新的用户登录
d AS ( SELECT date, 0.000 AS p FROM login WHERE date NOT IN ( SELECT date FROM a ) ORDER BY date )
SELECT * FROM c UNION SELECT * FROM d;
SQL270 考试分数
考试分数(五)
drop table if exists grade;
CREATE TABLE grade(
`id` int(4) NOT NULL,
`job` varchar(32) NOT NULL,
`score` int(10) NOT NULL,
PRIMARY KEY (`id`));
INSERT INTO grade VALUES
(1,'C++',11001),
(2,'C++',10000),
(3,'C++',9000),
(4,'Java',12000),
(5,'Java',13000),
(6,'B',12000),
(7,'B',11000),
(8,'B',9999);
# 增加数据测试
insert into grade values(9, "Java", 11000), (10, "Java", 15000);
# (11, "B", 13000), (12, "C", 14000)
描述
牛客每次考试完,都会有一个成绩表(grade),如下:
id | job | score |
---|---|---|
1 | C++ | 11001 |
2 | C++ | 11000 |
3 | C++ | 9000 |
4 | JAVA | 12000 |
5 | JAVA | 13000 |
6 | B | 12000 |
7 | B | 11000 |
8 | B | 9999 |
第1行表示用户id为1的用户选择了C++岗位并且考了11001分
。。。
第8行表示用户id为8的用户选择了B语言岗位并且考了9999分
请你写一个sql语句查询各个岗位分数的中位数位置上的所有grade信息,并且按id升序排序,结果如下:
id | job | score | t_rank |
---|---|---|---|
2 | C++ | 10000 | 2 |
4 | Java | 12000 | 2 |
5 | Java | 13000 | 1 |
7 | B | 11000 | 2 |
解释
第1行表示C++岗位的中位数位置上的为用户id为2,分数为10000,在C++岗位里面排名是第2
第2,3行表示Java岗位的中位数位置上的为用户id为4,5,分数为12000,13000,在Java岗位里面排名是第2,1
第4行表示B语言岗位的中位数位置上的为用户id为7,分数为11000,在前端岗位里面排名是第2
(注意: sqlite
1/2得到的不是0.5,得到的是0,只有1*1.0/2才会得到0.5,sqlite
四舍五入的函数为round,sqlite
不支持floor函数,支持cast(x as integer) 函数,不支持if函数,支持case when ...then ...else ..end函数,sqlite
不支持自定义变量)
@Lian ➜ ~ ( base 3.9.12) 153ms sqlite3
SQLite version 3.38.2 2022-03-26 13:51:10
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select (1 / 2);
0
sqlite> select (3 / 2);
1
sqlite> select (1.0 / 2.0);
0.5
sqlite> select (3.0 / 1.0);
3.0
sqlite> select (3.0 / 2.0);
1.5
sqlite> select ( 1/ 2.0);
0.5
解答W
# 对于奇数的中位很容易确定
# 先得到每个job出现的次数
# 假如是奇数的, 假如是偶数的
WITH a AS (
SELECT
*,
ROW_NUMBER() over w AS rk
FROM
grade window w AS ( PARTITION BY job ORDER BY score DESC )),
b AS ( SELECT *, FIRST_VALUE( rk ) over ( PARTITION BY job ORDER BY rk DESC ) AS mx FROM a ),
c AS (
SELECT
*,
IF
(
mx % 2 = 0,
ceil(( mx / 2 + rk ) / 2 ),
ceil( mx / 2 )) AS tc
FROM
b
) SELECT
id,
job,
score,
rk AS t_rank
FROM
c
WHERE
rk = tc
ORDER BY
id;
# 原始数据
+----+------+-------+--------+
| id | job | score | t_rank |
+----+------+-------+--------+
| 2 | C++ | 10000 | 2 |
| 4 | Java | 12000 | 2 |
| 5 | Java | 13000 | 1 |
| 7 | B | 11000 | 2 |
+----+------+-------+--------+
4 rows in set (0.00 sec)
# 增加测试数据
+----+------+-------+--------+
| id | job | score | t_rank |
+----+------+-------+--------+
| 2 | C++ | 10000 | 2 |
| 4 | Java | 12000 | 3 |
| 5 | Java | 13000 | 2 |
| 7 | B | 11000 | 2 |
+----+------+-------+--------+
4 rows in set (0.00 sec)
解答T
不使用窗口函数, 暂未发现不引入变量的情况下, 完全使用使用传统的方式来解决这个问题.
WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ) SELECT
g.job,
g.score,
a.ic
FROM
grade AS g
LEFT JOIN a ON g.job = a.job
ORDER BY
job,
score DESC;
+------+-------+------+
| job | score | ic |
+------+-------+------+
| B | 12000 | 3 |
| B | 11000 | 3 |
| B | 9999 | 3 |
| C++ | 11001 | 3 |
| C++ | 10000 | 3 |
| C++ | 9000 | 3 |
| Java | 13000 | 2 |
| Java | 12000 | 2 |
+------+-------+------+
8 rows in set (0.00 sec)
# 在不引入临时变量的情况下, 还有没有方法进一步处理下去?
WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ) SELECT
@rw := @rw + 1 as rno,
g.job,
g.score,
a.ic
FROM (select @rw := 0) as r,
grade AS g
LEFT JOIN a ON g.job = a.job
ORDER BY
job,
score DESC;
+------+------+-------+------+
| rno | job | score | ic |
+------+------+-------+------+
| 1 | B | 12000 | 3 |
| 2 | B | 11000 | 3 |
| 3 | B | 9999 | 3 |
| 4 | C++ | 11001 | 3 |
| 5 | C++ | 10000 | 3 |
| 6 | C++ | 9000 | 3 |
| 7 | Java | 13000 | 2 |
| 8 | Java | 12000 | 2 |
+------+------+-------+------+
8 rows in set, 2 warnings (0.00 sec)
# 最终结果
# 这里引入了变量
WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ),
b AS (
SELECT
g.id,
@rw := @rw + 1 AS rno,
g.job,
g.score,
a.ic
FROM
( SELECT @rw := 0 ) AS r,
grade AS g
LEFT JOIN a ON g.job = a.job
ORDER BY
job,
score DESC
),
c AS (
SELECT
job,
@INDEX := @INDEX + 1 AS rx,
ic
FROM
( SELECT @INDEX := 0 ) AS i,
b
GROUP BY
job
),
d AS (
SELECT
b.id,
b.rno,
b.job,
b.score,
c.rx,
c.ic
FROM
b
LEFT JOIN c ON c.job = b.job
) SELECT
id,
rno,
job,
score,
rx,
ic,
IF
(
ic % 2 = 0,
floor(( ic / 2 + rno + rx ) / 2 ),
floor( ic / 2 ) + rx
) AS rindex
FROM
d;
+----+------+------+-------+------+------+--------+
| id | rno | job | score | rx | ic | rindex |
+----+------+------+-------+------+------+--------+
| 11 | 1 | B | 13000 | 1 | 4 | 2 |
| 6 | 2 | B | 12000 | 1 | 4 | 2 |
| 7 | 3 | B | 11000 | 1 | 4 | 3 |
| 8 | 4 | B | 9999 | 1 | 4 | 3 |
| 12 | 5 | C | 14000 | 5 | 1 | 5 |
| 1 | 6 | C++ | 11001 | 6 | 3 | 7 |
| 2 | 7 | C++ | 10000 | 6 | 3 | 7 |
| 3 | 8 | C++ | 9000 | 6 | 3 | 7 |
| 10 | 9 | Java | 15000 | 9 | 4 | 10 |
| 5 | 10 | Java | 13000 | 9 | 4 | 10 |
| 4 | 11 | Java | 12000 | 9 | 4 | 11 |
| 9 | 12 | Java | 11000 | 9 | 4 | 11 |
+----+------+------+-------+------+------+--------+
12 rows in set, 6 warnings (0.00 sec)
SQL285 积分最多
获得积分最多的人(三)
drop table if exists user;
drop table if exists grade_info;
CREATE TABLE user (
id int(4) NOT NULL,
name varchar(32) NOT NULL
);
CREATE TABLE grade_info (
user_id int(4) NOT NULL,
grade_num int(4) NOT NULL,
type varchar(32) NOT NULL
);
INSERT INTO user VALUES
(1,'tm'),
(2,'wwy'),
(3,'zk'),
(4,'qq'),
(5,'lm');
INSERT INTO grade_info VALUES
(1,3,'add'),
(2,3,'add'),
(1,1,'reduce'),
(3,3,'add'),
(4,3,'add'),
(5,3,'add'),
(3,1,'reduce');
描述
牛客每天有很多用户刷题,发帖,点赞,点踩等等,这些都会记录相应的积分。
有一个用户表(user),简况如下:
id | name |
---|---|
1 | tm |
2 | wwy |
3 | zk |
4 | |
5 | lm |
还有一个积分表(grade_info),简况如下:
user_id | grade_num | type |
---|---|---|
1 | 3 | add |
2 | 3 | add |
1 | 1 | reduce |
3 | 3 | add |
4 | 3 | add |
5 | 3 | add |
3 | 1 | reduce |
第1
行表示,user_id
为1的用户积分增加了3
分。
第2
行表示,user_id
为2的用户积分增加了3
分。
第3
行表示,user_id
为1的用户积分减少了1
分。
.......
最后1行表示,user_id为3
的用户积分减少了1
分。
请你写一个SQL
查找积分最高的用户的id
,名字,以及他的总积分是多少(可能有多个),查询结果按照id升序排序,以上例子查询结果如下:
id | name | grade_num |
---|---|---|
2 | wwy | 3 |
4 | 3 | |
5 | lm | 3 |
解释:
user_id
为1和3的先加了3分,但是后面又减了1分,他们2个是2分,
其他3个都是3分,所以输出其他三个的数据
解答W
WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ),
c AS ( SELECT *, rank() over ( ORDER BY total DESC ) AS rk FROM b ) SELECT
c.*,
user.NAME
FROM
c
LEFT JOIN user ON c.user_id = user.id
WHERE
rk = 1;
+---------+-------+----+------+
| user_id | total | rk | NAME |
+---------+-------+----+------+
| 2 | 3 | 1 | wwy |
| 4 | 3 | 1 | qq |
| 5 | 3 | 1 | lm |
+---------+-------+----+------+
3 rows in set (0.00 sec)
解答T
不使用窗口函数
WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ) SELECT
*
FROM
b
ORDER BY
total DESC;
# 这里得到了数值之后的最大值的处理?
+---------+-------+
| user_id | total |
+---------+-------+
| 2 | 3 |
| 4 | 3 |
| 5 | 3 |
| 1 | 2 |
| 3 | 2 |
+---------+-------+
5 rows in set (0.00 sec)
# 注意这里navicat会将user改成大写, 导致在linux下的MySQL存在问题
WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ) SELECT
b.*,
user.name
FROM
b
LEFT JOIN user user.id = b.user_id
WHERE
b.total = ( SELECT max( total ) FROM b );
# 这里执行两次的select from b, 对效率影响几何?
+---------+-------+------+
| user_id | total | name |
+---------+-------+------+
| 2 | 3 | wwy |
| 4 | 3 | qq |
| 5 | 3 | lm |
+---------+-------+------
SQL280 简历投递
实习广场投递简历分析(三)
drop table if exists resume_info;
CREATE TABLE resume_info (
id int(4) NOT NULL,
job varchar(64) NOT NULL,
date date NOT NULL,
num int(11) NOT NULL,
PRIMARY KEY (id));
INSERT INTO resume_info VALUES
(1,'C++','2025-01-02',53),
(2,'Python','2025-01-02',23),
(3,'Java','2025-01-02',12),
(4,'C++','2025-01-03',54),
(5,'Python','2025-01-03',43),
(6,'Java','2025-01-03',41),
(7,'Java','2025-02-03',24),
(8,'C++','2025-02-03',23),
(9,'Python','2025-02-03',34),
(10,'Java','2025-02-04',42),
(11,'C++','2025-02-04',45),
(12,'Python','2025-02-04',59),
(13,'C++','2026-01-04',230),
(14,'Java','2026-01-04',764),
(15,'Python','2026-01-04',644),
(16,'C++','2026-01-06',240),
(17,'Java','2026-01-06',714),
(18,'Python','2026-01-06',624),
(19,'C++','2026-02-14',260),
(20,'Java','2026-02-14',721),
(21,'Python','2026-02-14',321),
(22,'C++','2026-02-24',134),
(23,'Java','2026-02-24',928),
(24,'Python','2026-02-24',525),
(25,'C++','2027-02-06',231);
描述
在牛客实习广场有很多公司开放职位给同学们投递,同学投递完就会把简历信息存到数据库里。
现在有简历信息表(resume_info),部分信息简况如下:
id | job | date | num |
---|---|---|---|
1 | C++ | 2025-01-02 | 53 |
2 | Python | 2025-01-02 | 23 |
3 | Java | 2025-01-02 | 12 |
4 | C++ | 2025-01-03 | 54 |
5 | Python | 2025-01-03 | 43 |
6 | Java | 2025-01-03 | 41 |
7 | Java | 2025-02-03 | 24 |
8 | C++ | 2025-02-03 | 23 |
9 | Python | 2025-02-03 | 34 |
10 | Java | 2025-02-04 | 42 |
11 | C++ | 2025-02-04 | 45 |
12 | Python | 2025-02-04 | 59 |
13 | C++ | 2026-01-04 | 230 |
14 | Java | 2026-01-04 | 764 |
15 | Python | 2026-01-04 | 644 |
16 | C++ | 2026-01-06 | 240 |
17 | Java | 2026-01-06 | 714 |
18 | Python | 2026-01-06 | 624 |
19 | C++ | 2026-01-04 | 260 |
20 | Java | 2026-02-14 | 721 |
21 | Python | 2026-02-14 | 321 |
22 | C++ | 2026-02-14 | 134 |
23 | Java | 2026-02-24 | 928 |
24 | Python | 2026-02-24 | 525 |
25 | C++ | 2027-02-06 | 231 |
第1行表示,在2025年1月2号,C++岗位收到了53封简历
......
最后1行表示,在2027年2月6号,C++岗位收到了231封简历
请你写出SQL语句查询在2025年投递简历的每个岗位,每一个月内收到简历的数目,和对应的2026年的同一个月同岗位,收到简历的数目,最后的结果先按first_year_mon月份降序,再按job降序排序显示,以上例子查询结果如下:
job | first_year_mon | first_year_cnt | second_year_mon | second_year_cnt |
---|---|---|---|---|
Python | 2025-02 | 93 | 2026-02 | 846 |
Java | 2025-02 | 66 | 2026-02 | 1649 |
C++ | 2025-02 | 68 | 2026-02 | 394 |
Python | 2025-01 | 66 | 2026-01 | 1268 |
Java | 2025-01 | 53 | 2026-01 | 1478 |
C++ | 2025-01 | 107 | 2026-01 | 470 |
解析:
第1行表示Python岗位在2025年2月收到了93份简历,在对应的2026年2月收到了846份简历
......
最后1行表示C++岗位在2025年1月收到了107份简历,在对应的2026年1月收到了470份简历
解答W
# a, 得到基础的数据
WITH a AS (
SELECT
job,
CONVERT ( concat( substring( date, 1, 7 ), '-01' ), date ) AS m_month,
sum( num ) AS all_c
FROM
resume_info
WHERE
YEAR ( date ) != 2027
GROUP BY
job,
m_month
),
# 汇合数据, 注意这里只有两年的数据, 可以这样合并, 假如三年的数据呢?
b AS (
SELECT
job,
m_month AS fm,
all_c,
lag( m_month ) over w AS sm,
lag( all_c ) over w AS sc
FROM
a window w AS ( PARTITION BY job, MONTH ( m_month ) ORDER BY m_month )) SELECT
job,
DATE_FORMAT( sm, '%Y-%m' ) AS first_year_mon,
sc AS first_year_cnt,
DATE_FORMAT( fm, '%Y-%m' ) AS second_year_mon,
all_c AS second_year_cnt
FROM
b
WHERE
sm IS NOT NULL
ORDER BY
month(sm) desc, job desc;
+--------+----------------+----------------+-----------------+-----------------+
| job | first_year_mon | first_year_cnt | second_year_mon | second_year_cnt |
+--------+----------------+----------------+-----------------+-----------------+
| Python | 2025-02 | 93 | 2026-02 | 846 |
| Java | 2025-02 | 66 | 2026-02 | 1649 |
| C++ | 2025-02 | 68 | 2026-02 | 394 |
| Python | 2025-01 | 66 | 2026-01 | 1268 |
| Java | 2025-01 | 53 | 2026-01 | 1478 |
| C++ | 2025-01 | 107 | 2026-01 | 470 |
+--------+----------------+----------------+-----------------+-----------------+
6 rows in set (0.01 sec)
解答T
# 等到基础数据
# 按照年份拆分开两组数据
# 按照month, job合并数据
WITH a AS (
SELECT
job,
CONVERT ( concat( substring( date, 1, 7 ), '-01' ), date ) AS m_month,
sum( num ) AS all_c
FROM
resume_info
WHERE
YEAR ( date ) != 2027
GROUP BY
job,
m_month
),
b AS ( SELECT * FROM a WHERE YEAR ( m_month ) = 2025 ),
c AS ( SELECT * FROM a WHERE YEAR ( m_month ) = 2026 ) SELECT
b.*,
c.*
FROM
b
JOIN c ON MONTH ( b.m_month ) = MONTH ( c.m_month )
AND b.job = c.job;
+--------+------------+-------+--------+------------+-------+
| job | m_month | all_c | job | m_month | all_c |
+--------+------------+-------+--------+------------+-------+
| C++ | 2025-01-01 | 107 | C++ | 2026-01-01 | 470 |
| Python | 2025-01-01 | 66 | Python | 2026-01-01 | 1268 |
| Java | 2025-01-01 | 53 | Java | 2026-01-01 | 1478 |
| Java | 2025-02-01 | 66 | Java | 2026-02-01 | 1649 |
| C++ | 2025-02-01 | 68 | C++ | 2026-02-01 | 394 |
| Python | 2025-02-01 | 93 | Python | 2026-02-01 | 846 |
+--------+------------+-------+--------+------------+-------+
6 rows in set (0.00 sec)
SQL220 部门职衔
汇总各个部门当前员工的title类型的分配数目
drop table if exists `departments` ;
drop table if exists `dept_emp` ;
drop table if exists titles ;
CREATE TABLE `departments` (
`dept_no` char(4) NOT NULL,
`dept_name` varchar(40) NOT NULL,
PRIMARY KEY (`dept_no`));
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE titles (
`emp_no` int(11) NOT NULL,
`title` varchar(50) NOT NULL,
`from_date` date NOT NULL,
`to_date` date DEFAULT NULL);
INSERT INTO departments VALUES('d001','Marketing');
INSERT INTO departments VALUES('d002','Finance');
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
INSERT INTO dept_emp VALUES(10003,'d002','1995-12-03','9999-01-01');
INSERT INTO titles VALUES(10001,'Senior Engineer','1986-06-26','9999-01-01');
INSERT INTO titles VALUES(10002,'Staff','1996-08-03','9999-01-01');
INSERT INTO titles VALUES(10003,'Senior Engineer','1995-12-03','9999-01-01');
描述
有一个部门表departments
简况如下:
dept_no | dept_name |
---|---|
d001 | Marketing |
d002 | Finance |
有一个,部门员工关系表dept_emp
简况如下:
emp_no | dept_no | from_date | to_date |
---|---|---|---|
10001 | d001 | 1986-06-26 | 9999-01-01 |
10002 | d001 | 1996-08-03 | 9999-01-01 |
10003 | d002 | 1995-12-03 | 9999-01-01 |
有一个职称表titles
简况如下:
emp_no | title | form_date | to_date |
---|---|---|---|
10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |
10002 | Staff | 1996-08-03 | 9999-01-01 |
10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |
汇总各个部门当前员工的title
类型的分配数目,即结果给出部门编号dept_no
、dept_name
、其部门下所有的员工的title以及该类型title对应的数目count
,结果按照dept_no
升序排序,dept_no
一样的再按title
升序排序
dept_no | dept_name | title | count |
---|---|---|---|
d001 | Marketing | Senior Engineer | 1 |
d001 | Marketing | Staff | 1 |
d002 | Finance | Senior Engineer | 1 |
解答W
这种需求, 窗口函数的实现属于入门级.
WITH a AS (
SELECT
de.emp_no,
de.dept_no,
dp.dept_name,
tl.title
FROM
dept_emp AS de
LEFT JOIN departments AS dp ON de.dept_no = dp.dept_no
LEFT JOIN titles tl ON de.emp_no = tl.emp_no
) SELECT
*,
# 根据部门 - title的使用 -> 统计title的使用次数
count( title ) over ( PARTITION BY dept_no, title ) AS count
FROM
a;
+--------+---------+-----------+-----------------+-------+
| emp_no | dept_no | dept_name | title | count |
+--------+---------+-----------+-----------------+-------+
| 10001 | d001 | Marketing | Senior Engineer | 1 |
| 10002 | d001 | Marketing | Staff | 1 |
| 10003 | d002 | Finance | Senior Engineer | 1 |
+--------+---------+-----------+-----------------+-------+
3 rows in set (0.00 sec)
解答T
WITH a AS (
SELECT
t.emp_no,
de.dept_no,
t.title
FROM
titles AS t
LEFT JOIN dept_emp de ON de.emp_no = t.emp_no
) SELECT
a.dept_no,
a.title,
dp.dept_name,
count(*) AS count
FROM
a
LEFT JOIN departments AS dp ON dp.dept_no = a.dept_no
GROUP BY
dept_no,
title;
+---------+-----------------+-----------+-------+
| dept_no | title | dept_name | count |
+---------+-----------------+-----------+-------+
| d001 | Senior Engineer | Marketing | 1 |
| d001 | Staff | Marketing | 1 |
| d002 | Senior Engineer | Finance | 1 |
+---------+-----------------+-----------+-------+
3 rows in set (0.00 sec)
SQL219 员工薪水
获取员工其当前的薪水比其manager
当前薪水还高的相关信息
drop table if exists `dept_emp` ;
drop table if exists `dept_manager` ;
drop table if exists `salaries` ;
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `dept_manager` (
`dept_no` char(4) NOT NULL,
`emp_no` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
# 增加测试数据
INSERT INTO dept_emp VALUES(10003,'d001','1997-08-03','9999-01-01');
INSERT INTO dept_manager VALUES('d001',10002,'1996-08-03','9999-01-01');
INSERT INTO salaries VALUES(10001,88958,'2002-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'1996-08-03','9999-01-01');
# 增加测试数据
INSERT INTO salaries VALUES(10003,62527,'1996-08-03','9999-01-01');
描述
有一个,部门关系表dept_emp
简况如下:
emp_no | dept_no | from_date | to_date |
---|---|---|---|
10001 | d001 | 1986-06-26 | 9999-01-01 |
10002 | d001 | 1996-08-03 | 9999-01-01 |
有一个部门经理表dept_manager
简况如下:
dept_no | emp_no | from_date | to_date |
---|---|---|---|
d001 | 10002 | 1996-08-03 | 9999-01-01 |
有一个薪水表salaries
简况如下:
emp_no | salary | from_date | to_date |
---|---|---|---|
10001 | 88958 | 2002-06-22 | 9999-01-01 |
10002 | 72527 | 1996-08-03 | 9999-01-01 |
获取员工其当前的薪水比其manager
当前薪水还高的相关信息,
第一列给出员工的emp_no
,
第二列给出其manager
的manager_no
,
第三列给出该员工当前的薪水emp_salary
,
第四列给该员工对应的manager当前的薪水manager_salary
以上例子输出如下:
emp_no |
manager_no |
emp_salary |
manager_salary |
---|---|---|---|
10001 | 10002 | 88958 | 72527 |
解答W
WITH temp AS (
SELECT
a.emp_no,
b.emp_no AS m_n,
c.salary
FROM
dept_emp a
LEFT JOIN dept_manager b ON a.dept_no = b.dept_no
LEFT JOIN salaries c ON a.emp_no = c.emp_no
),
# 加入一个辅助列, 用于将manager抽离出来
m AS ( SELECT *, IF ( emp_no = m_n, 1, 0 ) AS is_m FROM temp ),
# 将manager的数据逐个添加到常规的emp_no之下(新增列)
n AS (
SELECT
*,
FIRST_VALUE( salary ) over w AS ms,
FIRST_VALUE( emp_no ) over w AS m_m
FROM
m window w AS ( PARTITION BY m_n ORDER BY is_m DESC )) SELECT
emp_no,
m_m AS manager_no,
salary AS emp_salary,
ms AS manager_salary
FROM
n
WHERE
is_m = 0
AND salary > ms;
+--------+------------+------------+----------------+
| emp_no | manager_no | emp_salary | manager_salary |
+--------+------------+------------+----------------+
| 10001 | 10002 | 88958 | 72527 |
+--------+------------+------------+----------------+
1 row in set (0.00 sec)
解答T
不使用窗口函数
WITH temp AS (
SELECT
a.emp_no,
b.emp_no AS m_n,
c.salary
FROM
dept_emp a
LEFT JOIN dept_manager b ON a.dept_no = b.dept_no
LEFT JOIN salaries c ON a.emp_no = c.emp_no
),
# 将manger提取出来
m AS ( SELECT * FROM temp AS t1 WHERE t1.emp_no != t1.m_n ),
n AS ( SELECT * FROM temp AS t2 WHERE t2.emp_no = t2.m_n ) SELECT
m.emp_no,
n.emp_no AS manager_no,
m.salary AS emp_salary,
n.salary AS manager_salary
FROM
m
LEFT JOIN n ON m.m_n = n.m_n
WHERE
m.salary > n.salary;
+--------+------------+------------+----------------+
| emp_no | manager_no | emp_salary | manager_salary |
+--------+------------+------------+----------------+
| 10001 | 10002 | 88958 | 72527 |
+--------+------------+------------+----------------+
1 row in set (0.00 sec)
SQL275 课程订单
牛客的课程订单分析(五)
drop table if exists order_info;
CREATE TABLE order_info (
id int(4) NOT NULL,
user_id int(11) NOT NULL,
product_name varchar(256) NOT NULL,
status varchar(32) NOT NULL,
client_id int(4) NOT NULL,
date date NOT NULL,
PRIMARY KEY (id));
INSERT INTO order_info VALUES
(1,557336,'C++','no_completed',1,'2025-10-10'),
(2,230173543,'Python','completed',2,'2025-10-12'),
(3,57,'JS','completed',3,'2025-10-23'),
(4,57,'C++','completed',3,'2025-10-23'),
(5,557336,'Java','completed',1,'2025-10-23'),
(6,57,'Java','completed',1,'2025-10-24'),
(7,557336,'C++','completed',1,'2025-10-25'),
(8,557336,'Python','completed',1,'2025-10-26');
描述
有很多同学在牛客购买课程来学习,购买会产生订单存到数据库里。
有一个订单信息表(order_info),简况如下:
id | user_id | product_name | status | client_id | date |
---|---|---|---|---|---|
1 | 557336 | C++ | no_completed | 1 | 2025-10-10 |
2 | 230173543 | Python | completed | 2 | 2025-10-12 |
3 | 57 | JS | completed | 3 | 2025-10-23 |
4 | 57 | C++ | completed | 3 | 2025-10-23 |
5 | 557336 | Java | completed | 1 | 2025-10-23 |
6 | 57 | Java | completed | 1 | 2025-10-24 |
7 | 557336 | C++ | completed | 1 | 2025-10-25 |
8 | 557336 | Python | completed | 1 | 2025-10-26 |
第1行表示user_id为557336的用户在2025-10-10的时候使用了client_id为1的客户端下了C++课程的订单,但是状态为没有购买成功。
第2行表示user_id为230173543的用户在2025-10-12的时候使用了client_id为2的客户端下了Python课程的订单,状态为购买成功。
......
最后1行表示user_id为557336的用户在2025-10-26的时候使用了client_id为1的客户端下了Python课程的订单,状态为购买成功。
请你写出一个sql语句查询在2025-10-15以后,如果有一个用户下单2个以及2个以上状态为购买成功的C++课程或Java课程或Python课程,那么输出这个用户的user_id,以及满足前面条件的第一次购买成功的C++课程或Java课程或Python课程的日期first_buy_date,以及满足前面条件的第二次购买成功的C++课程或Java课程或Python课程的日期second_buy_date,以及购买成功的C++课程或Java课程或Python课程的次数cnt,并且输出结果按照user_id升序排序,以上例子查询结果如下:
user_id | first_buy_date | second_buy_date | cnt |
---|---|---|---|
57 | 2025-10-23 | 2025-10-24 | 2 |
557336 | 2025-10-23 | 2025-10-25 | 3 |
解析:
id为4,6的订单满足以上条件,输出57,id为4的订单为第一次购买成功,输出first_buy_date为2025-10-23,id为6的订单为第二次购买,输出second_buy_date为2025-10-24,总共成功购买了2次;
id为5,7,8的订单满足以上条件,输出557336,id为5的订单为第一次购买成功,输出first_buy_date为2025-10-23,id为7的订单为第二次购买,输出second_buy_date为2025-10-25,总共成功购买了3次;
解答W
WITH a AS (
SELECT
user_id,
FIRST_VALUE( date ) over w AS first_buy_date,
NTH_VALUE( date, 2 ) over w AS second_buy_date,
count( user_id ) over w AS cnt
FROM
order_info
WHERE
STATUS = 'completed'
AND date > '2025-10-15'
AND product_name IN ( 'C++', 'Python', 'Java' ) window w AS ( PARTITION BY user_id )
ORDER BY
user_id,
date
) SELECT DISTINCT
*
FROM
a
WHERE
cnt > 1;
+---------+-----+----------------+-----------------+
| user_id | cnt | first_buy_date | second_buy_date |
+---------+-----+----------------+-----------------+
| 57 | 2 | 2025-10-23 | 2025-10-24 |
| 557336 | 3 | 2025-10-23 | 2025-10-25 |
+---------+-----+----------------+-----------------+
2 rows in set (0.00 sec)
解答T
WITH a AS ( SELECT * FROM order_info WHERE STATUS = 'completed' AND date > '2025-10-15' AND product_name IN ( 'C++', 'Python', 'Java' ) ORDER BY date ),
b AS ( SELECT user_id, min( date ) AS first_buy_date, count( user_id ) AS cnt FROM a GROUP BY user_id ),
c AS ( SELECT user_id, min( date ) AS second_buy_date FROM a WHERE date NOT IN ( SELECT first_buy_date FROM b ) GROUP BY user_id ) SELECT
b.*,
c.second_buy_date
FROM
b
LEFT JOIN c ON b.user_id = c.user_id
WHERE
b.cnt > 1;
+---------+----------------+-----+-----------------+
| user_id | first_buy_date | cnt | second_buy_date |
+---------+----------------+-----+-----------------+
| 57 | 2025-10-23 | 2 | 2025-10-24 |
| 557336 | 2025-10-23 | 3 | 2025-10-25 |
+---------+----------------+-----+-----------------+
2 rows in set (0.00 sec)
小结
在窗口函数下, 大部分问题的解决都会变得更容易, 且通常情况下, 窗口函数的SQL
语句可阅读性(即第一眼阅读可以大概有一个轮廓, 这些语句的作用)更好.
由于传统的处理方式, 如: 行之间的运算, 一般需要一些巧妙的方式来实现其计算, 如果不是仔细的解析代码, 其中的部分的代码并不易于理解.
例如上面的SQL
语句
# window, 相当明确的表明取值的方式, 取值, 是date的第二个值
NTH_VALUE( date, 2 ) over w AS second_buy_date,
# 传统模式下
# 显然乍看之下, 不容易理解这个语句为什么能取出第二的数值
# 这是间接实现, 筛选出最小的, 然后再次筛选最小的, 即得到第二小的
SELECT user_id, min( date ) AS second_buy_date FROM a WHERE date NOT IN ( SELECT first_buy_date FROM b)
大厂真题
以下内容相对复杂, 将全部采用窗口函数来解决.
部分内容描述得十分含糊, 仅作参考, 解法可能有所差异.
SQL161 视频热度
近一个月发布的视频中热度最高的top3视频
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
video_id INT NOT NULL COMMENT '视频ID',
start_time datetime COMMENT '开始观看时间',
end_time datetime COMMENT '结束观看时间',
if_follow TINYINT COMMENT '是否关注',
if_like TINYINT COMMENT '是否点赞',
if_retweet TINYINT COMMENT '是否转发',
comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_video_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
video_id INT UNIQUE NOT NULL COMMENT '视频ID',
author INT NOT NULL COMMENT '创作者ID',
tag VARCHAR(16) NOT NULL COMMENT '类别标签',
duration INT NOT NULL COMMENT '视频时长(秒数)',
release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
(101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:30', 1, 1, 1, null)
,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:31', 1, 1, 0, null)
,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:35', 0, 0, 1, null)
,(103, 2001, '2021-10-03 11:00:50', '2021-10-03 11:01:35', 1, 1, 0, 1732526)
,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:04', 2, 0, 1, null)
,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:06', 1, 0, 0, null)
,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:01', 0, 1, 0, null)
,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 0, 1, null)
,(101, 2003, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 0, null)
,(101, 2003, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 0, null);
INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
(2001, 901, '旅游', 30, '2021-09-05 7:00:00')
,(2002, 901, '旅游', 60, '2021-09-05 7:00:00')
,(2003, 902, '影视', 90, '2021-09-05 7:00:00')
,(2004, 902, '影视', 90, '2021-09-05 8:00:00');
描述
现有用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
---|---|---|---|---|---|---|---|---|
1 | 101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:30 | 1 | 1 | 1 | NULL |
2 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:31 | 1 | 1 | 0 | NULL |
3 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:35 | 0 | 0 | 1 | NULL |
4 | 103 | 2001 | 2021-10-03 11:00:50 | 2021-10-03 10:00:35 | 1 | 1 | 0 | 1732526 |
5 | 106 | 2002 | 2021-10-02 11:00:05 | 2021-10-02 11:01:04 | 2 | 0 | 1 | NULL |
6 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:06 | 1 | 0 | 0 | NULL |
7 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
8 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:01 | 0 | 1 | 0 | NULL |
9 | 105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 0 | 1 | NULL |
10 | 101 | 2003 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1 | 0 | 0 | NULL |
11 | 101 | 2003 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
---|---|---|---|---|---|
1 | 2001 | 901 | 旅游 | 30 | 2021-09-05 07:00:00 |
2 | 2002 | 901 | 旅游 | 60 | 2021-09-05 07:00:00 |
3 | 2003 | 902 | 影视 | 90 | 2021-09-05 07:00:00 |
4 | 2004 | 902 | 影视 | 90 | 2021-09-05 08:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:找出近一个月发布的视频中热度最高的top3视频。
注:
- 热度=(a视频完播率+b点赞数+c评论数+d转发数)*新鲜度;
- 新鲜度=1/(最近无播放天数+1);(关键在于此)
- 当前配置的参数a,b,c,d分别为100、5、3、2。
- 最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按[T-29, T]闭区间统计。
- 结果中热度保留为整数,并按热度降序排序。
输出示例:
示例数据的输出结果如下
video_id | hot_index |
---|---|
2001 | 122 |
2002 | 56 |
2003 | 1 |
解释:
mysql> select date_sub("2021-10-03", interval 1 month);
+------------------------------------------+
| date_sub("2021-10-03", interval 1 month) |
+------------------------------------------+
| 2021-09-03 |
+------------------------------------------+
1 row in set (0.00 sec)
最近播放日期为2021-10-03,记作当天日期;近一个月(2021-09-04及之后)发布的视频有2001、2002、2003、2004,不过2004暂时还没有播放记录;
视频2001完播率1.0(被播放次数4次,完成播放4次),被点赞3次,评论1次,转发2次,最近无播放天数为0,因此热度为:(100*1.0+5*3+3*1+2*2)/(0+1)=122
同理,视频2003完播率0,被点赞数1,评论和转发均为0,最近无播放天数为3,因此热度为:(100*0+5*1+3*0+2*0)/(3+1)=1 (1.2保留为整数).
解答
这道题并不复杂, 也不难, 但是这里写的说明不是很清楚, 让这个题看起来很复杂.
WITH a AS (
SELECT
video_id,
if_like,
if_follow,
if_retweet,
IF
( comment_id IS NULL, 0, 1 ) AS comments,
TIMESTAMPDIFF( SECOND, start_time, end_time ) AS gap,
max( end_time ) over () AS md,
max( end_time ) over ( PARTITION BY video_id ) AS rd
FROM
tb_user_video_log
),
b AS (
SELECT
a.video_id,
a.if_like,
a.if_follow,
a.if_retweet,
a.comments,
DATEDIFF( a.md, a.rd ) AS gap,
IF
( a.gap > ( tv.duration - 1 ), 1, 0 ) AS ftimes
FROM
a
LEFT JOIN tb_video_info tv ON a.video_id = tv.video_id
WHERE
DATEDIFF(a.md, tv.release_time) <= 29
),
c AS (
SELECT
video_id,
count( video_id ) AS ptimes,
sum( if_like ) AS likes,
sum( if_follow ) AS follows,
sum( if_retweet ) AS retweets,
sum( comments ) AS comments,
avg( gap ) AS gap,
avg( ftimes ) AS finish_ratio
FROM
b
GROUP BY
video_id
) SELECT
video_id,
round((
100 * finish_ratio + 5 * likes + 3 * comments + 2 * retweets
) * (
1 / ( gap + 1 )), 0) AS hot_index
FROM
c order by hot_index desc limit 3;
SQL167 连续签到
连续签到领金币
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
artical_id INT NOT NULL COMMENT '视频ID',
in_time datetime COMMENT '进入时间',
out_time datetime COMMENT '离开时间',
sign_in TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
(101, 0, '2021-07-07 10:00:00', '2021-07-07 10:00:09', 1),
(101, 0, '2021-07-08 10:00:00', '2021-07-08 10:00:09', 1),
(101, 0, '2021-07-09 10:00:00', '2021-07-09 10:00:42', 1),
(101, 0, '2021-07-10 10:00:00', '2021-07-10 10:00:09', 1),
(101, 0, '2021-07-11 23:59:55', '2021-07-11 23:59:59', 1),
(101, 0, '2021-07-12 10:00:28', '2021-07-12 10:00:50', 1),
(101, 0, '2021-07-13 10:00:28', '2021-07-13 10:00:50', 1),
(102, 0, '2021-10-01 10:00:28', '2021-10-01 10:00:50', 1),
(102, 0, '2021-10-02 10:00:01', '2021-10-02 10:01:50', 1),
(102, 0, '2021-10-03 11:00:55', '2021-10-03 11:00:59', 1),
(102, 0, '2021-10-04 11:00:45', '2021-10-04 11:00:55', 0),
(102, 0, '2021-10-05 11:00:53', '2021-10-05 11:00:59', 1),
(102, 0, '2021-10-06 11:00:45', '2021-10-06 11:00:55', 1);
描述
用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_in |
---|---|---|---|---|---|
1 | 101 | 0 | 2021-07-07 10:00:00 | 2021-07-07 10:00:09 | 1 |
2 | 101 | 0 | 2021-07-08 10:00:00 | 2021-07-08 10:00:09 | 1 |
3 | 101 | 0 | 2021-07-09 10:00:00 | 2021-07-09 10:00:42 | 1 |
4 | 101 | 0 | 2021-07-10 10:00:00 | 2021-07-10 10:00:09 | 1 |
5 | 101 | 0 | 2021-07-11 23:59:55 | 2021-07-11 23:59:59 | 1 |
6 | 101 | 0 | 2021-07-12 10:00:28 | 2021-07-12 10:00:50 | 1 |
7 | 101 | 0 | 2021-07-13 10:00:28 | 2021-07-13 10:00:50 | 1 |
8 | 102 | 0 | 2021-10-01 10:00:28 | 2021-10-01 10:00:50 | 1 |
9 | 102 | 0 | 2021-10-02 10:00:01 | 2021-10-02 10:01:50 | 1 |
10 | 102 | 0 | 2021-10-03 10:00:55 | 2021-10-03 11:00:59 | 1 |
11 | 102 | 0 | 2021-10-04 10:00:45 | 2021-10-04 11:00:55 | 0 |
12 | 102 | 0 | 2021-10-05 10:00:53 | 2021-10-05 11:00:59 | 1 |
13 | 102 | 0 | 2021-10-06 10:00:45 | 2021-10-06 11:00:55 | 1 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:
- artical_id-文章ID代表用户浏览的文章的ID,特殊情况artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。注意:只有artical_id为0时sign_in值才有效。
- 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
- 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)
问题:计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。
注:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了。
输出****示例:
示例数据的输出结果如下:
uid | month | coin |
---|---|---|
101 | 202107 | 15 |
102 | 202110 | 7 |
解释:
101在活动期内连续签到了7天,因此获得1*7+2+6=15金币;
102在10.01~10.03连续签到3天获得5金币( 1* 3 + 2)
10.04断签(断签, 但是还是存在了,10.05~10.06连续签到2天获得2金币,共得到7金币。
解答W
# r - 1, 是因后续需要按照月份来归类数据
# a, 遴选出满足要求的基本数据
# b, 对每个uid组的日期进行排序, 来确定这个日期和最小日期的时间间隔, 来间接判断这个日期是否为间隔一天的日期
# c, 将日期以排名的间隔进行详见, 假如之间的间隔为1天, 那么就会得到相同的日期
# d, 按照uid, 日期将数据归档
# 最终的数据获取
WITH a AS ( SELECT uid, CONVERT ( in_time, date ) AS in_time, sign_in FROM tb_user_log WHERE in_time >= '2021-07-01' AND in_time < '2021-11-01' ),
b AS ( SELECT *, dense_rank() over ( PARTITION BY uid ORDER BY in_time ) AS r FROM a WHERE sign_in > 0 ),
c AS ( SELECT *, date_sub( in_time, INTERVAL r - 1 DAY ) AS sub FROM b ),
d AS ( SELECT uid, sum( sign_in ) AS sm, sub FROM c GROUP BY uid, sub ),
e AS (
SELECT
uid,
sub,
(
sm + ( CASE WHEN sm > 6 THEN 8 WHEN sm > 2 THEN 2 ELSE 0 END )) AS total
FROM
d
) SELECT
uid,
substring( sub, 1, 7 ) AS MONTH,
sum( total ) AS coin
FROM
e
GROUP BY
uid,
MONTH;
+-----+---------+------+
| uid | MONTH | coin |
+-----+---------+------+
| 101 | 2021-07 | 15 |
| 102 | 2021-10 | 7 |
+-----+---------+------+
2 rows in set (0.00 sec)
SQL173 商品动销率
店铺901国庆期间的7日动销率和滞销率
DROP TABLE IF EXISTS tb_order_overall;
CREATE TABLE tb_order_overall (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
order_id INT NOT NULL COMMENT '订单号',
uid INT NOT NULL COMMENT '用户ID',
event_time datetime COMMENT '下单时间',
total_amount DECIMAL NOT NULL COMMENT '订单总金额',
total_cnt INT NOT NULL COMMENT '订单商品总件数',
`status` TINYINT NOT NULL COMMENT '订单状态'
) CHARACTER SET utf8 COLLATE utf8_bin;
DROP TABLE IF EXISTS tb_product_info;
CREATE TABLE tb_product_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
product_id INT NOT NULL COMMENT '商品ID',
shop_id INT NOT NULL COMMENT '店铺ID',
tag VARCHAR(12) COMMENT '商品类别标签',
in_price DECIMAL NOT NULL COMMENT '进货价格',
quantity INT NOT NULL COMMENT '进货数量',
release_time datetime COMMENT '上架时间'
) CHARACTER SET utf8 COLLATE utf8_bin;
DROP TABLE IF EXISTS tb_order_detail;
CREATE TABLE tb_order_detail (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
order_id INT NOT NULL COMMENT '订单号',
product_id INT NOT NULL COMMENT '商品ID',
price DECIMAL NOT NULL COMMENT '商品单价',
cnt INT NOT NULL COMMENT '下单数量'
) CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_product_info(product_id, shop_id, tag, in_price, quantity, release_time) VALUES
(8001, 901, '日用', 60, 1000, '2020-01-01 10:00:00'),
(8002, 901, '零食', 140, 500, '2020-01-01 10:00:00'),
(8003, 901, '零食', 160, 500, '2020-01-01 10:00:00');
INSERT INTO tb_order_overall(order_id, uid, event_time, total_amount, total_cnt, `status`) VALUES
(301004, 102, '2021-09-30 10:00:00', 170, 1, 1),
(301005, 104, '2021-10-01 10:00:00', 160, 1, 1),
(301003, 101, '2021-10-02 10:00:00', 300, 2, 1),
(301002, 102, '2021-10-03 11:00:00', 235, 2, 1);
INSERT INTO tb_order_detail(order_id, product_id, price, cnt) VALUES
(301004, 8002, 180, 1),
(301005, 8002, 170, 1),
(301002, 8001, 85, 1),
(301002, 8003, 180, 1),
(301003, 8002, 150, 1),
(301003, 8003, 180, 1);
描述
商品信息表tb_product_info
id | product_id | shop_id | tag | int_ | quantity | release_time |
---|---|---|---|---|---|---|
1 | 8001 | 901 | 日用 | 60 | 1000 | 2020-01-01 10:00:00 |
2 | 8002 | 901 | 零食 | 140 | 500 | 2020-01-01 10:00:00 |
3 | 8003 | 901 | 零食 | 160 | 500 | 2020-01-01 10:00:00 |
(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)
订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
---|---|---|---|---|---|---|
1 | 301004 | 102 | 2021-09-30 10:00:00 | 170 | 1 | 1 |
2 | 301005 | 104 | 2021-10-01 10:00:00 | 160 | 1 | 1 |
3 | 301003 | 101 | 2021-10-02 10:00:00 | 300 | 2 | 1 |
4 | 301002 | 102 | 2021-10-03 11:00:00 | 235 | 2 | 1 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
订单明细表tb_order_detail
id | order_id | product_id | price | cnt |
---|---|---|---|---|
1 | 301004 | 8002 | 180 | 1 |
2 | 301005 | 8002 | 170 | 1 |
3 | 301002 | 8001 | 85 | 1 |
4 | 301002 | 8003 | 180 | 1 |
5 | 301003 | 8002 | 150 | 1 |
6 | 301003 | 8003 | 180 | 1 |
(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)
问题:请计算店铺901在2021年国庆头3天的7日动销率和滞销率,结果保留3位小数,按日期升序排序。
注:
- 动销率定义为店铺中一段时间内有销量的商品占当前已上架总商品数的比例(有销量的商品/已上架总商品数)。
- 滞销率定义为店铺中一段时间内没有销量的商品占当前已上架总商品数的比例。(没有销量的商品/已上架总商品数)。
- 只要当天任一店铺有任何商品的销量就输出该天的结果,即使店铺901当天的动销率为0。
当天是指? 假如10.1当天没有销量, 前面有销量, 这怎么处理?, 直接为 0 ?
输出示例:
示例数据的输出结果如下:
dt | sale_rate | unsale_rate |
---|---|---|
2021-10-01 | 0.333 | 0.667 |
2021-10-02 | 0.667 | 0.333 |
2021-10-03 | 1.000 | 0.000 |
解释:
10月1日的近7日(9月25日---10月1日)店铺901有销量的商品有8002,截止当天在售商品数为3,动销率为0.333,滞销率为0.667;
10月2日的近7日(9月26日---10月2日)店铺901有销量的商品有8002、8003,截止当天在售商品数为3,动销率为0.667,滞销率为0.333;
10月3日的近7日(9月27日---10月3日)店铺901有销量的商品有8002、8003、8001,截止当天店铺901在售商品数为3,动销率为1.000,滞销率为0.000;
解答W
这道题没有什么相对较好得解法, 暴力破解.
# 取自大神的答案
SELECT
dt1,
round(
count(
DISTINCT
IF
( timestampdiff( DAY, dt, dt1 ) BETWEEN 0 AND 6, tb1.product_id, NULL ))/ count(
DISTINCT
IF
( dt1 >= DATE ( release_time ), tb3.product_id, NULL )),
3
) sale_rate,
1-round (
count(
DISTINCT
IF
( timestampdiff( DAY, dt, dt1 ) BETWEEN 0 AND 6, tb1.product_id, NULL ))/ count(
DISTINCT
IF
( dt1 >= DATE ( release_time ), tb3.product_id, NULL )),
3
) unsale_rate
FROM
( SELECT DATE ( event_time ) dt1 FROM tb_order_overall HAVING dt1 BETWEEN '2021-10-01' AND '2021-10-03' ) tb2,
(
SELECT
b.product_id,
DATE ( event_time ) dt
FROM
tb_order_overall a
LEFT JOIN tb_order_detail b ON a.order_id = b.order_id
LEFT JOIN tb_product_info c ON b.product_id = c.product_id
WHERE
shop_id = 901
) tb1
LEFT JOIN tb_product_info tb3 ON tb1.product_id = tb3.product_id
WHERE
shop_id = 901
GROUP BY
dt1;
SQL194 连续问答
某乎问答最大连续回答问题天数大于等于3天的用户及其对应等级.
drop table if exists author_tb;
CREATE TABLE author_tb(
author_id int(10) NOT NULL,
author_level int(10) NOT NULL,
sex char(10) NOT NULL);
INSERT INTO author_tb VALUES(101 , 6, 'm');
INSERT INTO author_tb VALUES(102 , 1, 'f');
INSERT INTO author_tb VALUES(103 , 1, 'm');
INSERT INTO author_tb VALUES(104 , 3, 'm');
INSERT INTO author_tb VALUES(105 , 4, 'f');
INSERT INTO author_tb VALUES(106 , 2, 'f');
INSERT INTO author_tb VALUES(107 , 2, 'm');
INSERT INTO author_tb VALUES(108 , 5, 'f');
INSERT INTO author_tb VALUES(109 , 6, 'f');
INSERT INTO author_tb VALUES(110 , 5, 'm');
drop table if exists answer_tb;
CREATE TABLE answer_tb(
answer_date date NOT NULL,
author_id int(10) NOT NULL,
issue_id char(10) NOT NULL,
char_len int(10) NOT NULL);
INSERT INTO answer_tb VALUES('2021-11-1', 101, 'E001' ,150);
INSERT INTO answer_tb VALUES('2021-11-1', 101, 'E002', 200);
INSERT INTO answer_tb VALUES('2021-11-1',102, 'C003' ,50);
INSERT INTO answer_tb VALUES('2021-11-1' ,103, 'P001', 35);
INSERT INTO answer_tb VALUES('2021-11-1', 104, 'C003', 120);
INSERT INTO answer_tb VALUES('2021-11-1' ,105, 'P001', 125);
INSERT INTO answer_tb VALUES('2021-11-1' , 102, 'P002', 105);
INSERT INTO answer_tb VALUES('2021-11-2', 101, 'P001' ,201);
INSERT INTO answer_tb VALUES('2021-11-2', 110, 'C002', 200);
INSERT INTO answer_tb VALUES('2021-11-2', 110, 'C001', 225);
INSERT INTO answer_tb VALUES('2021-11-2' , 110, 'C002', 220);
INSERT INTO answer_tb VALUES('2021-11-3', 101, 'C002', 180);
INSERT INTO answer_tb VALUES('2021-11-4' ,109, 'E003', 130);
INSERT INTO answer_tb VALUES('2021-11-4', 109, 'E001',123);
INSERT INTO answer_tb VALUES('2021-11-5', 108, 'C001',160);
INSERT INTO answer_tb VALUES('2021-11-5', 108, 'C002', 120);
INSERT INTO answer_tb VALUES('2021-11-5', 110, 'P001', 180);
INSERT INTO answer_tb VALUES('2021-11-5' , 106, 'P002' , 45);
INSERT INTO answer_tb VALUES('2021-11-5' , 107, 'E003', 56);
描述
现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):
author_id | author_level | sex |
---|---|---|
101 | 6 | m |
102 | 1 | f |
103 | 1 | m |
104 | 3 | m |
105 | 4 | f |
106 | 2 | f |
107 | 2 | m |
108 | 5 | f |
109 | 6 | f |
110 | 5 | m |
创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
---|---|---|---|
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:
author_id | author_level | days_cnt |
---|---|---|
101 | 6 | 3 |
解答W
# dense_rank
+-----------+-------------+---+
| author_id | answer_date | r |
+-----------+-------------+---+
| 101 | 2021-11-01 | 1 |
| 101 | 2021-11-02 | 2 |
| 101 | 2021-11-03 | 3 |
| 102 | 2021-11-01 | 1 |
| 103 | 2021-11-01 | 1 |
| 104 | 2021-11-01 | 1 |
| 105 | 2021-11-01 | 1 |
| 106 | 2021-11-05 | 1 |
| 107 | 2021-11-05 | 1 |
| 108 | 2021-11-05 | 1 |
| 109 | 2021-11-04 | 1 |
| 110 | 2021-11-02 | 1 |
| 110 | 2021-11-05 | 2 |
+-----------+-------------+---+
13 rows in set (0.00 sec)
---
WITH tmp AS ( SELECT DISTINCT author_id, answer_date FROM answer_tb ),
tmp_a AS ( SELECT *, dense_rank() over ( PARTITION BY author_id ORDER BY answer_date ) AS r FROM tmp ),
tmp_b AS ( SELECT *, DATE_SUB( answer_date, INTERVAL r DAY ) AS sub FROM tmp_a ),
temp_c AS ( SELECT tmp_b.author_id, count(*) AS ic FROM tmp_b GROUP BY tmp_b.author_id, tmp_b.sub HAVING ic > 2 ) SELECT
tc.author_id,
AT.author_level,
tc.ic
FROM
temp_c AS tc
JOIN author_tb AS AT ON AT.author_id = tc.author_id;
+-----------+--------------+----+
| author_id | author_level | ic |
+-----------+--------------+----+
| 101 | 6 | 3 |
+-----------+--------------+----+
1 row in set (0.00 sec)
SQL189 直播在线
牛客直播各科目同时在线人数
drop table if exists course_tb;
CREATE TABLE course_tb(
course_id int(10) NOT NULL,
course_name char(10) NOT NULL,
course_datetime char(30) NOT NULL);
INSERT INTO course_tb VALUES(1, 'Python', '2021-12-1 19:00-21:00');
INSERT INTO course_tb VALUES(2, 'SQL', '2021-12-2 19:00-21:00');
INSERT INTO course_tb VALUES(3, 'R', '2021-12-3 19:00-21:00');
drop table if exists attend_tb;
CREATE TABLE attend_tb(
user_id int(10) NOT NULL,
course_id int(10) NOT NULL,
in_datetime datetime NOT NULL,
out_datetime datetime NOT NULL
);
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:00:00', '2021-12-1 19:28:00');
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:30:00', '2021-12-1 19:53:00');
INSERT INTO attend_tb VALUES(101, 1, '2021-12-1 19:00:00', '2021-12-1 20:55:00');
INSERT INTO attend_tb VALUES(102, 1, '2021-12-1 19:00:00', '2021-12-1 19:05:00');
INSERT INTO attend_tb VALUES(104, 1, '2021-12-1 19:00:00', '2021-12-1 20:59:00');
INSERT INTO attend_tb VALUES(101, 2, '2021-12-2 19:05:00', '2021-12-2 20:58:00');
INSERT INTO attend_tb VALUES(102, 2, '2021-12-2 18:55:00', '2021-12-2 21:00:00');
INSERT INTO attend_tb VALUES(104, 2, '2021-12-2 18:57:00', '2021-12-2 20:56:00');
INSERT INTO attend_tb VALUES(107, 2, '2021-12-2 19:10:00', '2021-12-2 19:18:00');
INSERT INTO attend_tb VALUES(100, 3, '2021-12-3 19:01:00', '2021-12-3 21:00:00');
INSERT INTO attend_tb VALUES(102, 3, '2021-12-3 18:58:00', '2021-12-3 19:05:00');
INSERT INTO attend_tb VALUES(108, 3, '2021-12-3 19:01:00', '2021-12-3 19:56:00');
描述
牛客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb
如下(其中course_id
代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
---|---|---|
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id | course_id | in_datetime | out_datetime |
---|---|---|---|
100 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:28:00 |
100 | 1 | 2021-12-01 19:30:00 | 2021-12-01 19:53:00 |
101 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:55:00 |
102 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:05:00 |
104 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:59:00 |
101 | 2 | 2021-12-02 19:05:00 | 2021-12-02 20:58:00 |
102 | 2 | 2021-12-02 18:55:00 | 2021-12-02 21:00:00 |
104 | 2 | 2021-12-02 18:57:00 | 2021-12-02 20:56:00 |
107 | 2 | 2021-12-02 19:10:00 | 2021-12-02 19:18:00 |
100 | 3 | 2021-12-03 19:01:00 | 2021-12-03 21:00:00 |
102 | 3 | 2021-12-03 18:58:00 | 2021-12-03 19:05:00 |
108 | 3 | 2021-12-03 19:01:00 | 2021-12-03 19:56:00 |
请你统计每个科目最大同时在线人数(按course_id排序),以上数据的输出结果如下:
course_id | course_name | max_num |
---|---|---|
1 | Python | 4 |
2 | SQL | 4 |
3 | R | 3 |
解答
课程的时间是一个误导项, 并不需要考虑这个时间(字符串类型的数据)
-
将
attend
中的数据, 进入和离开整合在一起, 形成了一个时间的序列, 通过判断这个时间序列的的人数的出和入来判断每个直播间的人数的变化. 因为通过窗口函数对时间序列按照课程编号进行了拆分, 也就是说这个partition
中集合同一个course_id
的进入和离开直播间数据. -
通过对时间序列中的进, 标记为1, 离开, 则标记为 -1, 对每个
partition
中的flag
项进行求和操作, 进行二次排序(降序)(确保time_index相同的情况, 如何处理这个节点的数据问题, 是先加还是先减的问题, 这里采用降序, 是进行先加的操作).这里这道题没有具体指明如何处理相同节时间节点的先后.
WITH a AS (
SELECT
user_id,
course_id,
in_datetime AS time_index,
1 flag
FROM
attend_tb UNION ALL
SELECT
user_id,
course_id,
out_datetime AS time_index,
- 1 flag
FROM
attend_tb
),
b AS ( SELECT a.*, c.course_name FROM a LEFT JOIN course_tb c ON a.course_id = c.course_id ) SELECT
*,
sum( flag ) over ( PARTITION BY course_name ORDER BY time_index, flag DESC ) AS on_lines
FROM
b;
+---------+-----------+---------------------+------+-------------+----------+
| user_id | course_id | time_index | flag | course_name | on_lines |
+---------+-----------+---------------------+------+-------------+----------+
| 100 | 1 | 2021-12-01 19:00:00 | 1 | Python | 4 | # 每个课程的id区间, 就可以计算出人数的变化情况
| 101 | 1 | 2021-12-01 19:00:00 | 1 | Python | 4 |
| 102 | 1 | 2021-12-01 19:00:00 | 1 | Python | 4 |
| 104 | 1 | 2021-12-01 19:00:00 | 1 | Python | 4 |
| 102 | 1 | 2021-12-01 19:05:00 | -1 | Python | 3 |
| 100 | 1 | 2021-12-01 19:28:00 | -1 | Python | 2 |
| 100 | 1 | 2021-12-01 19:30:00 | 1 | Python | 3 |
| 100 | 1 | 2021-12-01 19:53:00 | -1 | Python | 2 |
| 101 | 1 | 2021-12-01 20:55:00 | -1 | Python | 1 |
| 104 | 1 | 2021-12-01 20:59:00 | -1 | Python | 0 | # 按照课程id 将数据聚合在一个区间
| 102 | 3 | 2021-12-03 18:58:00 | 1 | R | 1 |
| 100 | 3 | 2021-12-03 19:01:00 | 1 | R | 3 |
| 108 | 3 | 2021-12-03 19:01:00 | 1 | R | 3 |
| 102 | 3 | 2021-12-03 19:05:00 | -1 | R | 2 |
| 108 | 3 | 2021-12-03 19:56:00 | -1 | R | 1 |
| 100 | 3 | 2021-12-03 21:00:00 | -1 | R | 0 |
| 102 | 2 | 2021-12-02 18:55:00 | 1 | SQL | 1 |
| 104 | 2 | 2021-12-02 18:57:00 | 1 | SQL | 2 |
| 101 | 2 | 2021-12-02 19:05:00 | 1 | SQL | 3 |
| 107 | 2 | 2021-12-02 19:10:00 | 1 | SQL | 4 |
| 107 | 2 | 2021-12-02 19:18:00 | -1 | SQL | 3 |
| 104 | 2 | 2021-12-02 20:56:00 | -1 | SQL | 2 |
| 101 | 2 | 2021-12-02 20:58:00 | -1 | SQL | 1 |
| 102 | 2 | 2021-12-02 21:00:00 | -1 | SQL | 0 |
+---------+-----------+---------------------+------+-------------+----------+
24 rows in set (0.00 sec)
WITH a AS ( SELECT course_id, in_datetime AS time_index, 1 flag FROM attend_tb UNION ALL SELECT course_id, out_datetime AS time_index, - 1 flag FROM attend_tb ),
b AS ( SELECT a.*, c.course_name FROM a LEFT JOIN course_tb c ON a.course_id = c.course_id ),
c AS ( SELECT course_id, course_name, sum( flag ) over ( PARTITION BY course_name ORDER BY time_index, flag DESC ) AS on_lines FROM b ) SELECT
course_id,
course_name,
max( on_lines ) AS max_num
FROM
c
GROUP BY
course_id,
course_name
ORDER BY
course_id;
+-----------+-------------+---------+
| course_id | course_name | max_num |
+-----------+-------------+---------+
| 1 | Python | 4 |
| 2 | SQL | 4 |
| 3 | R | 3 |
+-----------+-------------+---------+
3 rows in set (0.00 sec)
SQL184 顾客连续购买
某宝店铺连续2天及以上购物的用户及其对应的天数
drop table if exists sales_tb;
CREATE TABLE sales_tb(
sales_date date NOT NULL,
user_id int(10) NOT NULL,
item_id char(10) NOT NULL,
sales_num int(10) NOT NULL,
sales_price int(10) NOT NULL
);
INSERT INTO sales_tb VALUES('2021-11-1', 1, 'A001', 1, 90);
INSERT INTO sales_tb VALUES('2021-11-1', 2, 'A002', 2, 220);
INSERT INTO sales_tb VALUES('2021-11-1', 2, 'B001', 1, 120);
INSERT INTO sales_tb VALUES('2021-11-2', 3, 'C001', 2, 500);
INSERT INTO sales_tb VALUES('2021-11-2', 4, 'B001', 1, 120);
INSERT INTO sales_tb VALUES('2021-11-3', 5, 'C001', 1, 240);
INSERT INTO sales_tb VALUES('2021-11-3', 6, 'C002', 1, 270);
INSERT INTO sales_tb VALUES('2021-11-4', 7, 'A003', 1, 180);
INSERT INTO sales_tb VALUES('2021-11-4', 8, 'B002', 1, 140);
INSERT INTO sales_tb VALUES('2021-11-4', 9, 'B001', 1, 125);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'B003', 1, 120);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'B004', 1, 150);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'A003', 1, 180);
INSERT INTO sales_tb VALUES('2021-11-6', 11, 'B003', 1, 120);
INSERT INTO sales_tb VALUES('2021-11-6', 10, 'B004', 1, 150);
# 插入多一行测试数据
INSERT INTO sales_tb VALUES('2021-11-8', 10, 'B004', 1, 150); # 8
INSERT INTO sales_tb VALUES('2021-11-9', 10, 'B004', 1, 150);
描述
11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):
sales_date | user_id | item_id | sales_num | sales_price |
---|---|---|---|---|
2021-11-01 | 1 | A001 | 1 | 90 |
2021-11-01 | 2 | A002 | 2 | 220 |
2021-11-01 | 2 | B001 | 1 | 120 |
2021-11-02 | 3 | C001 | 2 | 500 |
2021-11-02 | 4 | B001 | 1 | 120 |
2021-11-03 | 5 | C001 | 1 | 240 |
2021-11-03 | 6 | C002 | 1 | 270 |
2021-11-04 | 7 | A003 | 1 | 180 |
2021-11-04 | 8 | B002 | 1 | 140 |
2021-11-04 | 9 | B001 | 1 | 125 |
2021-11-05 | 10 | B003 | 1 | 120 |
2021-11-05 | 10 | B004 | 1 | 150 |
2021-11-05 | 10 | A003 | 1 | 180 |
2021-11-06 | 11 | B003 | 1 | 120 |
2021-11-06 | 10 | B004 | 1 | 150 |
请你统计连续2天及以上在该店铺购物的用户及其对应的次数(若有多个用户,按user_id升序排序),以上例子的输出结果如下:
user_id | days_count |
---|---|
10 | 2 |
解答W
# 用于处理这种非指定区间的跨行
WITH tmp AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
tmp_a AS ( SELECT *, dense_rank() over ( PARTITION BY user_id ORDER BY sales_date ) AS r FROM tmp ),
tmp_b AS ( SELECT *, DATE_SUB( sales_date, INTERVAL r DAY ) AS sub FROM tmp_a ) SELECT
tmp_b.user_id,
count(*) AS c
FROM
tmp_b
GROUP BY
tmp_b.user_id,
tmp_b.sub;
# 加入了上面两行作为测试
+---------+---+
| user_id | c |
+---------+---+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 2 |
| 10 | 2 |
| 11 | 1 |
+---------+---+
12 rows in set (0.00 sec)
滑动窗口
在处理这道题所遇到的各种问题问题.
# 这里只是简单列出发生过连续两天的数据
# 假如希望取到最长连续数据的用户?(不中断, 而1,2,3..; 1,3,5不行)
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
SELECT
user_id, # 受限于1 (N)
count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN INTERVAL 1 DAY preceding AND current ROW ) AS ic
FROM
a
) SELECT
user_id,
ic
FROM
b
WHERE
ic > 1;
#
+---------+----+
| user_id | ic |
+---------+----+
| 10 | 2 |
+---------+----+
2 rows in set (0.00 sec)
# 插入多一行的测试数据
+---------+----+
| user_id | ic |
+---------+----+
| 10 | 2 |
| 10 | 2 |
+---------+----+
2 rows in set (0.00 sec)
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
SELECT
user_id,
count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN unbounded preceding AND current ROW ) AS ic
FROM
a
) SELECT
user_id,
max( ic ) AS mic
FROM
b
GROUP BY
user_id
HAVING
mic > 1;
+---------+------+
| user_id | mic |
+---------+------+
| 10 | 3 |
+---------+------+
1 row in set (0.00 sec)
# 就无法控制间隔为 1
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
SELECT
user_id,
sales_date,
count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN unbounded preceding AND current ROW ) AS ic
FROM
a
)
select * from b;
+---------+------------+----+
| user_id | sales_date | ic |
+---------+------------+----+
| 1 | 2021-11-01 | 1 |
| 1 | 2021-11-04 | 2 | # 不连续
| 2 | 2021-11-01 | 1 |
| 3 | 2021-11-02 | 1 |
| 4 | 2021-11-02 | 1 |
| 5 | 2021-11-03 | 1 |
| 6 | 2021-11-03 | 1 |
| 7 | 2021-11-04 | 1 |
| 8 | 2021-11-04 | 1 |
| 9 | 2021-11-04 | 1 |
| 10 | 2021-11-05 | 1 |
| 10 | 2021-11-06 | 2 |
| 10 | 2021-11-07 | 3 |
| 11 | 2021-11-06 | 1 |
+---------+------------+----+
14 rows in set (0.00 sec)
# 对间隔进行判断是否是隔天
# 还是存在问题
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS ( SELECT user_id, sales_date, LEAD( sales_date ) over ( PARTITION BY user_id ORDER BY sales_date ) AS next_date FROM a ),
c AS ( SELECT *, next_date - sales_date AS gap FROM b having next_date IS NOT NULL and gap = 1 ) SELECT
user_id,
sum( gap ) + 1 AS ic
FROM
c
GROUP BY
user_id;
+---------+------+
| user_id | ic |
+---------+------+
| 10 | 3 |
+---------+------+
1 row in set (0.00 sec)
mysql> select * from sales_tb;
+------------+---------+---------+-----------+-------------+
| sales_date | user_id | item_id | sales_num | sales_price |
+------------+---------+---------+-----------+-------------+
| 2021-11-01 | 1 | A001 | 1 | 90 |
| 2021-11-01 | 2 | A002 | 2 | 220 |
| 2021-11-01 | 2 | B001 | 1 | 120 |
| 2021-11-02 | 3 | C001 | 2 | 500 |
| 2021-11-02 | 4 | B001 | 1 | 120 |
| 2021-11-03 | 5 | C001 | 1 | 240 |
| 2021-11-03 | 6 | C002 | 1 | 270 |
| 2021-11-04 | 7 | A003 | 1 | 180 |
| 2021-11-04 | 8 | B002 | 1 | 140 |
| 2021-11-04 | 9 | B001 | 1 | 125 |
| 2021-11-05 | 10 | B003 | 1 | 120 |
| 2021-11-05 | 10 | B004 | 1 | 150 | # 这里对数据进行修改 原来的 5 - 6
| 2021-11-05 | 10 | A003 | 1 | 180 | # 第一次改成 5 - 6- 7
| 2021-11-06 | 11 | B003 | 1 | 120 |
| 2021-11-06 | 10 | B004 | 1 | 150 |
| 2021-11-08 | 10 | B004 | 1 | 150 | # 5-6
| 2021-11-09 | 10 | B004 | 1 | 150 | # 这里对数据进行修改 5 - 6, 8-9
+------------+---------+---------+-----------+-------------+
SQL179 同时等车
各城市最大同时等车人数
DROP TABLE IF EXISTS tb_get_car_record,tb_get_car_order;
CREATE TABLE tb_get_car_record (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
city VARCHAR(10) NOT NULL COMMENT '城市',
event_time datetime COMMENT '打车时间',
end_time datetime COMMENT '打车结束时间',
order_id INT COMMENT '订单号'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_get_car_order (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
order_id INT NOT NULL COMMENT '订单号',
uid INT NOT NULL COMMENT '用户ID',
driver_id INT NOT NULL COMMENT '司机ID',
order_time datetime COMMENT '接单时间',
start_time datetime COMMENT '开始计费的上车时间',
finish_time datetime COMMENT '订单结束时间',
mileage FLOAT COMMENT '行驶里程数',
fare FLOAT COMMENT '费用',
grade TINYINT COMMENT '评分'
) CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_get_car_record(uid, city, event_time, end_time, order_id) VALUES
(108, '北京', '2021-10-20 08:00:00', '2021-10-20 08:00:40', 9008),
(108, '北京', '2021-10-20 08:00:10', '2021-10-20 08:00:45', 9018),
(102, '北京', '2021-10-20 08:00:30', '2021-10-20 08:00:50', 9002),
(106, '北京', '2021-10-20 08:05:41', '2021-10-20 08:06:00', 9006),
(103, '北京', '2021-10-20 08:05:50', '2021-10-20 08:07:10', 9003),
(104, '北京', '2021-10-20 08:01:01', '2021-10-20 08:01:20', 9004),
(103, '北京', '2021-10-20 08:01:15', '2021-10-20 08:01:30', 9019),
(101, '北京', '2021-10-20 08:28:10', '2021-10-20 08:30:00', 9011);
INSERT INTO tb_get_car_order(order_id, uid, driver_id, order_time, start_time, finish_time, mileage, fare, grade) VALUES
(9008, 108, 204, '2021-10-20 08:00:40', '2021-10-20 08:03:00', '2021-10-20 08:31:00', 13.2, 38, 4),
(9018, 108, 214, '2021-10-20 08:00:45', '2021-10-20 08:04:50', '2021-10-20 08:21:00', 14, 38, 5),
(9002, 102, 202, '2021-10-20 08:00:50', '2021-10-20 08:06:00', '2021-10-20 08:31:00', 10.0, 41.5, 5),
(9006, 106, 203, '2021-10-20 08:06:00', '2021-10-20 08:09:00', '2021-10-20 08:31:00', 8.0, 25.5, 4),
(9003, 103, 202, '2021-10-20 08:07:10', '2021-10-20 08:15:00', '2021-10-20 08:31:00', 11.0, 41.5, 4),
(9004, 104, 202, '2021-10-20 08:01:20', '2021-10-20 08:13:00', '2021-10-20 08:31:00', 7.5, 22, 4),
(9019, 103, 202, '2021-10-20 08:01:30', '2021-10-20 08:11:00', '2021-10-20 08:51:00', 10, 39, 4),
(9011, 101, 211, '2021-10-20 08:30:00', '2021-10-20 08:31:00', '2021-10-20 08:54:00', 10, 35, 5);
描述
用户打车记录表tb_get_car_record
id | uid | city | event_time | end_time | order_id |
---|---|---|---|---|---|
1 | 108 | 北京 | 2021-10-20 08:00:00 | 2021-10-20 08:00:40 | 9008 |
2 | 118 | 北京 | 2021-10-20 08:00:10 | 2021-10-20 08:00:45 | 9018 |
3 | 102 | 北京 | 2021-10-20 08:00:30 | 2021-10-20 08:00:50 | 9002 |
4 | 106 | 北京 | 2021-10-20 08:05:41 | 2021-10-20 08:06:00 | 9006 |
5 | 103 | 北京 | 2021-10-20 08:05:50 | 2021-10-20 08:07:10 | 9003 |
6 | 104 | 北京 | 2021-10-20 08:01:01 | 2021-10-20 08:01:20 | 9004 |
7 | 105 | 北京 | 2021-10-20 08:01:15 | 2021-10-20 08:01:30 | 9019 |
8 | 101 | 北京 | 2021-10-20 08:28:10 | 2021-10-20 08:30:00 | 9011 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | order_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
---|---|---|---|---|---|---|---|---|---|
1 | 9008 | 108 | 204 | 2021-10-20 08:00:40 | 2021-10-20 08:03:00 | 2021-10-20 08:31:00 | 13.2 | 38 | 4 |
2 | 9018 | 108 | 214 | 2021-10-20 08:00:45 | 2021-10-20 08:04:50 | 2021-10-20 08:21:00 | 14 | 38 | 5 |
3 | 9002 | 102 | 202 | 2021-10-20 08:00:50 | 2021-10-20 08:06:00 | 2021-10-20 08:31:00 | 10 | 41.5 | 5 |
4 | 9006 | 106 | 206 | 2021-10-20 08:06:00 | 2021-10-20 08:09:00 | 2021-10-20 08:31:00 | 8 | 25.5 | 4 |
5 | 9003 | 103 | 203 | 2021-10-20 08:07:10 | 2021-10-20 08:15:00 | 2021-10-20 08:31:00 | 11 | 41.5 | 4 |
6 | 9004 | 104 | 204 | 2021-10-20 08:01:20 | 2021-10-20 08:13:00 | 2021-10-20 08:31:00 | 7.5 | 22 | 4 |
7 | 9019 | 105 | 205 | 2021-10-20 08:01:30 | 2021-10-20 08:11:00 | 2021-10-20 08:51:00 | 10 | 39 | 4 |
8 | 9011 | 101 | 211 | 2021-10-20 08:30:00 | 2021-10-20 08:31:00 | 2021-10-20 08:54:00 | 10 | 35 | 5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
- 用户提交打车请求后,在用户打车记录表生成一条打车记录,订单号-order_id设为null;
- 当有司机接单时,在打车订单表生成一条订单,填充接单时间-order_time及其左边的字段,上车时间及其右边的字段全部为null,并把订单号和接单时间(打车结束时间)写入打车记录表;若一直无司机接单、超时或中途用户主动取消打车,则记录打车结束时间。
- 若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的订单完成时间-finish_time填充为取消时间,其余字段设为null。
- 当司机接上乘客时,填充打车订单表中该订单的上车时间start_time。
- 当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请统计各个城市在2021年10月期间,单日中最大的同时等车人数。
注: 等车指从开始打车起,直到取消打车、取消等待或上车前的这段时间里用户的状态。
如果同一时刻有人停止等车,有人开始等车,等车人数记作先增加后减少。
结果按各城市最大等车人数升序排序,相同时按城市升序排序。
输出示例:
示例结果如下
city | max_wait_uv |
---|---|
北京 | 5 |
解释:由打车订单表可以得知北京2021年10月20日有8条打车记录,108号乘客从08:00:00等到08:03:00,118号乘客从08:00:10等到08:04:50....,由此得知08:02:00秒时刻,共有5人在等车。
解答W
# 将打车开始的时间和订单执行时间合并在一起
# 分别标记 1, -1
# 按照时间进行排列
# 显然当所有的时间都聚集在一起, 利用sum() (order by) 可以得到各个阶段的累积情况
WITH a AS (
SELECT
city,
event_time as time_index,
1 as status
FROM
tb_get_car_record UNION ALL
SELECT
city,
# 这一步的目的在于保证数据存在, 订单表中的finish_time 优先于 记录表中的end_time
IFNULL( start_time, finish_time ) as time_index,
- 1 as status
FROM
tb_get_car_order
JOIN tb_get_car_record USING ( order_id )
),
b AS (
SELECT
city,
time_index,
SUM( status ) OVER ( PARTITION BY city, day(time_index) ORDER BY time_index, status DESC ) as sum_wait_num
FROM
a
) SELECT
city,
MAX( sum_wait_num ) AS max_wait_uv
FROM
b
WHERE
date_format( time_index, '%Y-%m' ) = '2021-10'
GROUP BY
city
ORDER BY
max_wait_uv,
city;
+--------+-------------+
| city | max_wait_uv |
+--------+-------------+
| 北京 | 5 |
+--------+-------------+
1 row in set (0.00 sec)
# 错误的思路, 错误进行最小值的比较, 这个只能得到区间的数据而不是全局(当天的最大值)
WITH a AS (
SELECT
t2.event_time,
date ( event_time ) AS s_day,
IF
( t2.end_time = t1.order_time, t1.start_time, t2.end_time ) AS start_time,
t2.city
FROM
tb_get_car_order AS t1
LEFT JOIN tb_get_car_record t2 ON t2.order_id = t1.order_id
WHERE
date_format( t2.event_time, '%Y-%m' ) = '2021-10'
),
b AS ( SELECT *, min( start_time ) over ( PARTITION BY city, s_day ) AS ms FROM a ),
e AS ( SELECT city, sum( IF ( event_time > ms, 0, 1 ) ) AS ic FROM b GROUP BY city, s_day ) SELECT
city,
max( ic ) AS max_wait_uv
FROM
e
GROUP BY
city
ORDER BY
max_wait_uv,
city;
相关函数
上述内容常用的函数.
日期相关
# 增加
mysql> select date_add("2021-01-01", interval 1 day);
+----------------------------------------+
| date_add("2021-01-01", interval 1 day) |
+----------------------------------------+
| 2021-01-02 |
+----------------------------------------+
1 row in set (0.00 sec)
# 减少
mysql> select date_sub("2021-01-01", interval 1 day);
+----------------------------------------+
| date_sub("2021-01-01", interval 1 day) |
+----------------------------------------+
| 2020-12-31 |
+----------------------------------------+
1 row in set (0.00 sec)
# 时间间隔的具体数值(单位, 年月日时分秒等)
mysql> select TIMESTAMPDIFF(second, '2022-01-01 19:00:52', '2022-01-01 19:00:57');
+---------------------------------------------------------------------+
| TIMESTAMPDIFF(second, '2022-01-01 19:00:52', '2022-01-01 19:00:57') |
+---------------------------------------------------------------------+
| 5 |
+---------------------------------------------------------------------+
# 间隔的天数(只限于day, 天数)
mysql> select datediff('2022-01-01 19:00:52', '2022-01-01 19:00:57');
+--------------------------------------------------------+
| datediff('2022-01-01 19:00:52', '2022-01-01 19:00:57') |
+--------------------------------------------------------+
| 0 |
+--------------------------------------------------------+
1 row in set (0.00 sec)
1 row in set (0.00 sec)
# 只保留日期, 去掉时间
mysql> select date("2021-01-01 12:01:01");
+-----------------------------+
| date("2021-01-01 12:01:01") |
+-----------------------------+
| 2021-01-01 |
+-----------------------------+
1 row in set (0.00 sec)
# 日期格式化
mysql> select date_format("2021-01-01 12:01:01", '%Y%m');
+--------------------------------------------+
| date_format("2021-01-01 12:01:01", '%Y%m') |
+--------------------------------------------+
| 202101 |
+--------------------------------------------+
1 row in set (0.00 sec)
# 字符串转日期/时间
mysql> SELECT STR_TO_DATE("August 10 2017", "%M %d %Y");
+-------------------------------------------+
| STR_TO_DATE("August 10 2017", "%M %d %Y") |
+-------------------------------------------+
| 2017-08-10 |
+-------------------------------------------+
1 row in set (0.00 sec)
取值相关
# 处理null值, ifnull()的强化版本
# coalesce, 返回第一个不是null的值
mysql> select coalesce(null, 2, 1);
+----------------------+
| coalesce(null, 2, 1) |
+----------------------+
| 2 |
+----------------------+
1 row in set (0.00 sec)
mysql> select coalesce(null, null, 1);
+-------------------------+
| coalesce(null, null, 1) |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
mysql> select coalesce(1, 1, null);
+----------------------+
| coalesce(1, 1, null) |
+----------------------+
| 1 |
+----------------------+
1 row in set (0.00 sec)
mysql> select coalesce(1, 2, null);
+----------------------+
| coalesce(1, 2, null) |
+----------------------+
| 1 |
+----------------------+
1 row in set (0.00 sec)
mysql> select ifnull(null, 1);
+-----------------+
| ifnull(null, 1) |
+-----------------+
| 1 |
+-----------------+
1 row in set (0.00 sec)
mysql> select ifnull(1, 2);
+--------------+
| ifnull(1, 2) |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)