牛客网-SQL测试(困难)

题号 题目 难度 通过率
SQL206 获取每个部门中当前员工薪水最高的相关信息 困难 20.23%
SQL264 牛客每个人最近的登录日期(五) 困难 20.64%
SQL215 查找在职员工自入职以来的薪水涨幅情况 困难 22.48%
SQL280 实习广场投递简历分析(三) 困难 26.82%
SQL220 汇总各个部门当前员工的title类型的分配数目 困难 27.95%
SQL270 考试分数(五) 困难 28.20%
SQL275 牛客的课程订单分析(五) 困难 28.98%
SQL219 获取员工其当前的薪水比其manager当前薪水还高的相关信息 困难 31.20%
SQL285 获得积分最多的人(三) 困难 33.74%

牛客网SQL测试

  • 传统方法实现
  • 窗口函数实现

不建议直接看相关的答案, 先手动过一遍, 然后再交叉对比. 特别是264, 270, 179, 189, 这几道题, 其解法非常巧妙.

(注: 由于运行代码需要注册账号登录, 这里的代码并没有提交到平台上运行, 部分代码只解析出逻辑, 并未提取最后要求得结果)

SQL206 最高工资

获取每个部门中当前员工薪水最高的相关信息.

这里没有具体细节, 例如最高, 假如最高存在多个人, 是全部取出还是只要一个即可.

drop table if exists  `dept_emp` ; 
drop table if exists  `salaries` ; 
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
INSERT INTO dept_emp VALUES(10003,'d002','1996-08-03','9999-01-01');

INSERT INTO salaries VALUES(10001,88958,'2002-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'2001-08-02','9999-01-01');
INSERT INTO salaries VALUES(10003,92527,'2001-08-02','9999-01-01');

mysql> select * from salaries;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  88958 | 2002-06-22 | 9999-01-01 |
|  10002 |  72527 | 2001-08-02 | 9999-01-01 |
|  10003 |  92527 | 2001-08-02 | 9999-01-01 |
+--------+--------+------------+------------+
3 rows in set (0.00 sec)

mysql> select * from dept_emp;
+--------+---------+------------+------------+
| emp_no | dept_no | from_date  | to_date    |
+--------+---------+------------+------------+
|  10001 | d001    | 1986-06-26 | 9999-01-01 |
|  10002 | d001    | 1996-08-03 | 9999-01-01 |
|  10003 | d002    | 1996-08-03 | 9999-01-01 |
+--------+---------+------------+------------+
3 rows in set (0.00 sec)

描述

有一个员工表dept_emp简况如下:

emp_no dept_no from_date to_date
10001 d001 1986-06-26 9999-01-01
10002 d001 1996-08-03 9999-01-01
10003 d002 1996-08-03 9999-01-01

有一个薪水表salaries简况如下:

emp_no salary from_date to_date
10001 88958 2002-06-22 9999-01-01
10002 72527 2001-08-02 9999-01-01
10003 92527 2001-08-02 9999-01-01

获取每个部门中当前员工薪水最高的相关信息,给出dept_no, emp_no以及其对应的salary,按照部门编号dept_no升序排列,以上例子输出如下:

dept_no emp_no maxSalary
d001 10001 88958
d002 10003 92527

解答W

WITH a AS (
	SELECT
		s.emp_no,
		s.salary,
		d.dept_no 
	FROM
		salaries s
		LEFT JOIN dept_emp d ON s.emp_no = d.emp_no 
	) SELECT
	*,
	NTH_VALUE( salary, 1 ) over w AS top_salary,
	rank() over w AS top_rank 
FROM
	a window w AS ( PARTITION BY dept_no ORDER BY salary DESC );
	
+--------+--------+---------+------------+----------+
| emp_no | salary | dept_no | top_salary | top_rank |
+--------+--------+---------+------------+----------+
|  10001 |  88958 | d001    |      88958 |        1 |
|  10002 |  72527 | d001    |      88958 |        2 |
|  10003 |  92527 | d002    |      92527 |        1 |
+--------+--------+---------+------------+----------+

解答T

不使用窗口函数

WITH a AS (
	SELECT
		s.emp_no,
		s.salary,
		d.dept_no 
	FROM
		salaries s
		LEFT JOIN dept_emp d ON s.emp_no = d.emp_no 
	),
	b AS ( SELECT dept_no, max( salary ) AS ms FROM a GROUP BY dept_no ) SELECT
	a.emp_no,
	a.salary,
	a.dept_no,
	b.ms 
FROM
	a,
	b 
WHERE
	b.ms = a.salary;
	
+--------+--------+---------+-------+
| emp_no | salary | dept_no | ms    |
+--------+--------+---------+-------+
|  10001 |  88958 | d001    | 88958 |
|  10003 |  92527 | d002    | 92527 |
+--------+--------+---------+-------+
2 rows in set (0.01 sec)

SQL215 入职薪水的涨幅

查找在职员工自入职以来的薪水涨幅情况

drop table if exists  `employees` ; 
drop table if exists  `salaries` ;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` char(1) NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO employees VALUES(10001,'1953-09-02','Georgi','Facello','M','2001-06-22');
INSERT INTO employees VALUES(10002,'1964-06-02','Bezalel','Simmel','F','1999-08-03');

INSERT INTO salaries VALUES(10001,85097,'2001-06-22','2002-06-22');
# 插入多一行的数据
INSERT INTO salaries VALUES(10001,86097,'2002-06-22','2003-06-22');
INSERT INTO salaries VALUES(10001,88958,'2003-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'1999-08-03','2000-08-02');
INSERT INTO salaries VALUES(10002,72527,'2000-08-02','2001-08-02');

mysql> select * from employees;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date  |
+--------+------------+------------+-----------+--------+------------+
|  10001 | 1953-09-02 | Georgi     | Facello   | M      | 2001-06-22 |
|  10002 | 1964-06-02 | Bezalel    | Simmel    | F      | 1999-08-03 |
+--------+------------+------------+-----------+--------+------------+
2 rows in set (0.00 sec)

mysql> select * from salaries;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  85097 | 2001-06-22 | 2002-06-22 |
|  10001 |  88958 | 2002-06-22 | 9999-01-01 |
|  10002 |  72527 | 1999-08-03 | 2000-08-02 |
|  10002 |  72527 | 2000-08-02 | 2001-08-02 |
+--------+--------+------------+------------+
4 rows in set (0.00 sec)

描述

请你查找在职员工自入职以来的薪水涨幅情况,给出在职员工编号emp_no以及其对应的薪水涨幅growth,并按照growth进行升序,以上例子输出为

(注: to_date为薪资调整某个结束日期,或者为离职日期,to_date='9999-01-01'时,表示依然在职,无后续调整记录)

emp_no growth
10001 3861

解答W

WITH a AS (
	SELECT
		e.emp_no,
		e.first_name,
		e.hire_date,
		s.salary,
		s.to_date 
	FROM
		employees e
		LEFT JOIN salaries s ON e.emp_no = s.emp_no 
	) SELECT
	emp_no,
	hire_date,
	salary,
	lag( to_date ) over w AS td,
	lag( salary ) over w AS ts 
FROM
	a window w AS ( PARTITION BY emp_no ORDER BY to_date DESC );
	
+--------+------------+--------+------------+-------+
| emp_no | hire_date  | salary | td         | ts    |
+--------+------------+--------+------------+-------+
|  10001 | 2001-06-22 |  88958 | NULL       |  NULL |
|  10001 | 2001-06-22 |  85097 | 9999-01-01 | 88958 |
|  10002 | 1999-08-03 |  72527 | NULL       |  NULL |
|  10002 | 1999-08-03 |  72527 | 2001-08-02 | 72527 |
+--------+------------+--------+------------+-------+

# 这里没注意假如还有更多的中间数据, 而不是两行呢?

# 对上面的数据进行修改
INSERT INTO salaries VALUES(10001,85097,'2001-06-22','2002-06-22');
INSERT INTO salaries VALUES(10001,86097,'2002-06-22','2003-06-22');
INSERT INTO salaries VALUES(10001,88958,'2003-06-22','9999-01-01');

+--------+------------+--------+------------+-------+
| emp_no | hire_date  | salary | td         | ts    |
+--------+------------+--------+------------+-------+
|  10001 | 2001-06-22 |  88958 | NULL       |  NULL |
|  10001 | 2001-06-22 |  86097 | 9999-01-01 | 88958 |
|  10001 | 2001-06-22 |  85097 | 2003-06-22 | 86097 |
|  10002 | 1999-08-03 |  72527 | NULL       |  NULL |
|  10002 | 1999-08-03 |  72527 | 2001-08-02 | 72527 |
+--------+------------+--------+------------+-------+
5 rows in set (0.00 sec)
# 修正
WITH a AS (
	SELECT
		e.emp_no,
		e.first_name,
		e.hire_date,
		s.salary,
		s.to_date 
	FROM
		employees e
		LEFT JOIN salaries s ON e.emp_no = s.emp_no 
		WHERE
        # 筛选出两行的数据
		s.to_date = '9999-01-01' 
		OR s.from_date = e.hire_date 
	),
	b AS ( SELECT emp_no, salary, lag( salary ) over ( PARTITION BY emp_no ORDER BY to_date ) AS ls FROM a ) SELECT
	emp_no,
	salary - ls AS growth 
FROM
	b 
WHERE
	ls IS NOT NULL;
	
+--------+--------+
| emp_no | growth |
+--------+--------+
|  10001 |   3861 |
+--------+--------+
1 row in set (0.00 sec)

解答T

不使用窗口函数

WITH a AS (
	SELECT
		e.emp_no,
		e.first_name,
		e.hire_date,
		s.salary,
		s.to_date 
	FROM
		employees e
		LEFT JOIN salaries s ON e.emp_no = s.emp_no 
	WHERE
		s.to_date = '9999-01-01' 
		OR s.from_date = e.hire_date 
	),
	b AS (
	SELECT
		a1.emp_no,
		a1.salary,
		a2.salary AS ts,
		a1.to_date 
	FROM
		a AS a1
		LEFT JOIN a AS a2 ON a1.emp_no = a2.emp_no 
	WHERE
		a1.to_date != a2.to_date 
		AND a1.to_date = '9999-01-01' 
	) SELECT
	emp_no,
	salary - ts AS growth 
FROM
	b;

+--------+--------+
| emp_no | growth |
+--------+--------+
|  10001 |   3861 |
+--------+--------+
1 row in set (0.00 sec)

SQL264 最近登录

牛客每个人最近的登录日期(五)

drop table if exists login;
CREATE TABLE `login` (
`id` int(4) NOT NULL,
`user_id` int(4) NOT NULL,
`client_id` int(4) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`id`));

INSERT INTO login VALUES
(1,2,1,'2020-10-12'),
(2,3,2,'2020-10-12'),
(3,1,2,'2020-10-12'),
(4,2,2,'2020-10-13'),
(5,1,2,'2020-10-13'),
(6,3,1,'2020-10-14'),
(7,4,1,'2020-10-14'),
(8,4,1,'2020-10-15');

描述

牛客每天有很多人登录,请你统计一下牛客每个日期新用户的次日留存率。
有一个登录(login)记录表,简况如下:

id user_id client_id date
1 2 1 2020-10-12
2 3 2 2020-10-12
3 1 2 2020-10-12
4 2 2 2020-10-13
5 1 2 2020-10-13
6 3 1 2020-10-14
7 4 1 2020-10-14
8 4 1 2020-10-15

第1行表示user_id为2的用户在2020-10-12使用了客户端id为1的设备登录了牛客网,因为是第1次登录,所以是新用户
......
第4行表示user_id为2的用户在2020-10-13使用了客户端id为2的设备登录了牛客网,因为是第2次登录,所以是老用户
......
最后1行表示user_id为4的用户在2020-10-15使用了客户端id为1的设备登录了牛客网,因为是第2次登录,所以是老用户

请你写出一个sql语句查询每个日期新用户的次日留存率,结果保留小数点后面3位数(3位之后的四舍五入),并且查询结果按照日期升序排序,上面的例子查询结果如下:

date p
2020-10-12 0.667
2020-10-13 0.000
2020-10-14 1.000
2020-10-15 0.000

查询结果表明:
2020-10-12登录了3个(user_id为2,3,1)新用户,2020-10-13,只有2个(id为2,1)登录,故2020-10-12新用户次日留存率为2/3=0.667;
2020-10-13没有新用户登录,输出0.000;
2020-10-14登录了1个(user_id为4)新用户,2020-10-15,user_id为4的用户登录,故2020-10-14新用户次日留存率为1/1=1.000;2020-10-15没有新用户登录,输出0.000;

解答W

# 得到同一天所有的登录次数 ltimes
# 得到每个用户的在活跃次数(注意是活跃) atimes
# 然后偏移处理
# 这个语句很粗糙有点的伪代码的感觉

# 处理主要是需要执行 向前, 向后

WITH a AS ( SELECT *, count( date ) over ( PARTITION BY date ) AS ltimes FROM login ),
b AS ( SELECT *, lag( 1, 1, 0 ) over ( PARTITION BY user_id ORDER BY date ) AS atime FROM a ),
c AS ( SELECT date, ltimes, sum( atime ) AS atimes FROM b GROUP BY date, ltimes ),
d AS ( SELECT *, LEAD( atimes, 1, 0 ) over () AS ptimes FROM c ) SELECT
date,
(
ptimes / ( ltimes - atimes )) AS r 
FROM
	d;
# ptimes	
+------------+--------+
| date       | r      |
+------------+--------+
| 2020-10-12 | 0.6667 |
| 2020-10-13 |   NULL |
| 2020-10-14 | 1.0000 |
| 2020-10-15 |   NULL |
+------------+--------+
4 rows in set, 2 warnings (0.00 sec)(warning: / 0)

解答T

# 计算出每天的登录总数
# 计算出每个用户每天登录数
# 计算出天登录的次数中有多少次是老用户登录
# 汇总
WITH a AS ( SELECT date, count(*) AS ltimes FROM login GROUP BY date ),
b AS ( SELECT date, user_id, count(*) AS x FROM login GROUP BY date, user_id ),
d AS (
	SELECT
		b1.date,
		b1.user_id,
		b2.x AS atimes 
	FROM
		b b1
		LEFT JOIN b b2 ON b1.user_id = b2.user_id 
		AND DATEDIFF( b1.date, b2.date ) > 0 
	),
	c AS (
	SELECT
		b1.*,
		b2.x AS t_x,
		a.ltimes 
	FROM
		b AS b1
		LEFT JOIN b AS b2 ON b1.user_id = b2.user_id 
		AND date_add( b1.date, INTERVAL 1 DAY ) = b2.date
		LEFT JOIN a ON a.date = b1.date 
	),
	e AS (
	SELECT
		c.date,
		c.user_id,
		c.t_x,
		c.ltimes,
		d.atimes 
	FROM
		c
		LEFT JOIN d ON d.user_id = c.user_id 
		AND d.date = c.date 
	),
	f AS ( SELECT date, sum( t_x ) AS txs, sum( atimes ) AS ats, ltimes FROM e GROUP BY date ) SELECT
	date,
	(
		txs / (
			ltimes -
		IF
		( ats IS NULL, 0, ats ))) AS r 
FROM
	f;
	
+------------+--------+
| date       | r      |
+------------+--------+
| 2020-10-12 | 0.6667 |
| 2020-10-13 |   NULL |
| 2020-10-14 | 1.0000 |
| 2020-10-15 |   NULL |
+------------+--------+
4 rows in set (0.01 sec)

优化

WITH a AS ( SELECT date, count(*) AS ltimes FROM login GROUP BY date ),
b AS ( SELECT date, user_id, count(*) AS x FROM login GROUP BY date, user_id ),
d AS (
	SELECT
		b1.date,
		b1.user_id,
		b2.x AS atimes 
	FROM
		b b1
		LEFT JOIN b b2 ON b1.user_id = b2.user_id 
		AND DATEDIFF( b1.date, b2.date ) > 0 
	),
	m AS (
	SELECT
		d.date,
		sum(
		IF
		( atimes IS NULL, 0, atimes )) AS atimes,
		a.ltimes 
	FROM
		d
		LEFT JOIN a ON a.date = d.date 
	GROUP BY
		date 
	) SELECT
	t1.*,
	t2.atimes AS otimes,
	t2.atimes / ( t1.ltimes - t1.atimes ) AS r 
FROM
	m t1
	LEFT JOIN m t2 ON DATE_ADD( t1.date, INTERVAL 1 DAY ) = t2.date;
	
# atimes, 为当天登录的老用户
# ltimes, 为当天登录的总次数
# otimes, 为第二天的老用户登录次数
+------------+--------+--------+--------+--------+
| date       | atimes | ltimes | otimes | r      |
+------------+--------+--------+--------+--------+
| 2020-10-12 |      0 |      3 |      2 | 0.6667 |
| 2020-10-13 |      2 |      2 |      1 |   NULL |
| 2020-10-14 |      1 |      2 |      1 | 1.0000 |
| 2020-10-15 |      1 |      1 |   NULL |   NULL |
+------------+--------+--------+--------+--------+
4 rows in set, 1 warning (0.00 sec)

大神解答

注意代码的可阅读性.....

mysql> SELECT IFNULL("a", "b");
+----------------------------------+
| IFNULL("a", "b") |
+----------------------------------+
| a                                |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT IFNULL(NULL, 500);
+-------------------+
| IFNULL(NULL, 500) |
+-------------------+
|               500 |
+-------------------+
1 row in set (0.01 sec)
# IFNULL(expression, alt_value)
# 代码经过navicat美化, 但是可阅读性还是很差
# 这种写法很酷炫, 类似于JavaScript, 使用三元表达式, 进行复杂的嵌套, 不将里面的内容, 拆出来, 难以阅读.
SELECT
	date,
	ifnull(
		round((
				sum(
				CASE
						
						WHEN ( user_id, date ) IN ( SELECT user_id, date_add( date, INTERVAL - 1 DAY ) FROM login GROUP BY user_id ) THEN
						1 ELSE 0 
					END 
					))/ (
					sum(
					CASE
							
							WHEN ( user_id, date ) IN ( SELECT user_id, min( date ) FROM login GROUP BY user_id ) THEN
							1 ELSE 0 
						END 
						)),
					3 
				),
				0 
				) AS p 
		FROM
			login 
		GROUP BY
			date 
	ORDER BY
	date;
	
# ----------------------------- 修改版本, 变更为with, if 取代 case when

WITH a AS ( SELECT user_id, date_add( date, INTERVAL - 1 DAY ) FROM login GROUP BY user_id, date ),
b AS ( SELECT user_id, min( date ) FROM login GROUP BY user_id ) SELECT
date,
ifnull(
	round((
			sum(
				IF
				(( user_id, date ) IN ( SELECT * FROM a ), 1, 0 ))) / (
				sum(
				IF
				( ( user_id, date ) IN ( SELECT * FROM b ), 1, 0 ))),
			3 
		),
		0 
	) AS p 
FROM
	login 
GROUP BY
	date 
ORDER BY
	date;
# 注意部分代码是sqlite3的, 需要改成mysql
SELECT
	a.date,
	ROUND( COUNT( b.user_id ) * 1.0 / COUNT( a.user_id ), 3 ) AS p 
FROM
	( SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id ) a
	LEFT JOIN login b ON a.user_id = b.user_id 
	AND b.date = DATE_ADD( a.date, INTERVAL 1 DAY ) 
GROUP BY
	a.date UNION
SELECT
	date,
	0.000 AS p 
FROM
	login 
WHERE
	date NOT IN ( SELECT MIN( date ) FROM login GROUP BY user_id ) 
ORDER BY
	date;

关键在于这个语句: SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id

这个语句非常巧妙地将用户的登录记录分拆开来, 分开新用户部分, 和老用户登录的两个部分

+---------+------------+
| user_id | date       |
+---------+------------+
|       2 | 2020-10-12 |
|       3 | 2020-10-12 |
|       1 | 2020-10-12 |
|       4 | 2020-10-14 |
+---------+------------+
4 rows in set (0.00 sec)
# 1, 2, 3 是 10-12 新登录的用户
# 4 为10-14的新用户
# 将上面的语句改成with
# user_id + 
WITH a AS ( SELECT user_id, MIN( date ) AS date FROM login GROUP BY user_id ),
c AS (
	SELECT
		a.date,
		ROUND( COUNT( b.user_id ) * 1.0 / COUNT( a.user_id ), 3 ) AS p 
	FROM
		a
		LEFT JOIN login b ON a.user_id = b.user_id 
		AND b.date = DATE_ADD( a.date, INTERVAL 1 DAY ) 
	GROUP BY
		a.date 
	),
	# 没有的数据表明当天没有新的用户登录
	d AS ( SELECT date, 0.000 AS p FROM login WHERE date NOT IN ( SELECT date FROM a ) ORDER BY date )

SELECT * FROM c UNION SELECT * FROM d;

SQL270 考试分数

考试分数(五)

drop table if exists grade;
CREATE TABLE  grade(
`id` int(4) NOT NULL,
`job` varchar(32) NOT NULL,
`score` int(10) NOT NULL,
PRIMARY KEY (`id`));

INSERT INTO grade VALUES
(1,'C++',11001),
(2,'C++',10000),
(3,'C++',9000),
(4,'Java',12000),
(5,'Java',13000),
(6,'B',12000),
(7,'B',11000),
(8,'B',9999);

# 增加数据测试
insert into grade values(9, "Java", 11000), (10, "Java", 15000);

# (11, "B", 13000), (12, "C", 14000)

描述

牛客每次考试完,都会有一个成绩表(grade),如下:

id job score
1 C++ 11001
2 C++ 11000
3 C++ 9000
4 JAVA 12000
5 JAVA 13000
6 B 12000
7 B 11000
8 B 9999

第1行表示用户id为1的用户选择了C++岗位并且考了11001分

。。。

第8行表示用户id为8的用户选择了B语言岗位并且考了9999分

请你写一个sql语句查询各个岗位分数的中位数位置上的所有grade信息,并且按id升序排序,结果如下:

id job score t_rank
2 C++ 10000 2
4 Java 12000 2
5 Java 13000 1
7 B 11000 2

解释

第1行表示C++岗位的中位数位置上的为用户id为2,分数为10000,在C++岗位里面排名是第2

第2,3行表示Java岗位的中位数位置上的为用户id为4,5,分数为12000,13000,在Java岗位里面排名是第2,1

第4行表示B语言岗位的中位数位置上的为用户id为7,分数为11000,在前端岗位里面排名是第2

(注意: sqlite 1/2得到的不是0.5,得到的是0,只有1*1.0/2才会得到0.5,sqlite四舍五入的函数为round,sqlite不支持floor函数,支持cast(x as integer) 函数,不支持if函数,支持case when ...then ...else ..end函数,sqlite不支持自定义变量)

@Lian ➜ ~ ( base 3.9.12) 153ms sqlite3
SQLite version 3.38.2 2022-03-26 13:51:10
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select (1 / 2);
0
sqlite> select (3 / 2);
1
sqlite> select (1.0 / 2.0);
0.5
sqlite> select (3.0 / 1.0);
3.0
sqlite> select (3.0 / 2.0);
1.5
sqlite> select ( 1/ 2.0);
0.5

解答W

# 对于奇数的中位很容易确定
# 先得到每个job出现的次数
# 假如是奇数的, 假如是偶数的
WITH a AS (
	SELECT
		*,
		ROW_NUMBER() over w AS rk 
	FROM
	grade window w AS ( PARTITION BY job ORDER BY score DESC )),
	b AS ( SELECT *, FIRST_VALUE( rk ) over ( PARTITION BY job ORDER BY rk DESC ) AS mx FROM a ),
	c AS (
	SELECT
		*,
	IF
		(
			mx % 2 = 0,
			ceil(( mx / 2 + rk ) / 2 ),
		ceil( mx / 2 )) AS tc 
	FROM
		b 
	) SELECT
	id,
	job,
	score,
	rk AS t_rank 
FROM
	c 
WHERE
	rk = tc 
ORDER BY
	id;

# 原始数据
+----+------+-------+--------+
| id | job  | score | t_rank |
+----+------+-------+--------+
|  2 | C++  | 10000 |      2 |
|  4 | Java | 12000 |      2 |
|  5 | Java | 13000 |      1 |
|  7 | B    | 11000 |      2 |
+----+------+-------+--------+
4 rows in set (0.00 sec)

# 增加测试数据
+----+------+-------+--------+
| id | job  | score | t_rank |
+----+------+-------+--------+
|  2 | C++  | 10000 |      2 |
|  4 | Java | 12000 |      3 |
|  5 | Java | 13000 |      2 |
|  7 | B    | 11000 |      2 |
+----+------+-------+--------+
4 rows in set (0.00 sec)

解答T

不使用窗口函数, 暂未发现不引入变量的情况下, 完全使用使用传统的方式来解决这个问题.

WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ) SELECT
g.job,
g.score,
a.ic 
FROM
	grade AS g
	LEFT JOIN a ON g.job = a.job 
ORDER BY
	job,
	score DESC;
+------+-------+------+
| job  | score | ic   |
+------+-------+------+
| B    | 12000 |    3 |
| B    | 11000 |    3 |
| B    |  9999 |    3 |
| C++  | 11001 |    3 |
| C++  | 10000 |    3 |
| C++  |  9000 |    3 |
| Java | 13000 |    2 |
| Java | 12000 |    2 |
+------+-------+------+
8 rows in set (0.00 sec)
# 在不引入临时变量的情况下, 还有没有方法进一步处理下去?
WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ) SELECT
@rw := @rw + 1 as rno,
g.job,
g.score,
a.ic 
FROM (select @rw := 0) as r, 
	grade AS g
	LEFT JOIN a ON g.job = a.job 
ORDER BY
	job,
	score DESC;
+------+------+-------+------+
| rno  | job  | score | ic   |
+------+------+-------+------+
|    1 | B    | 12000 |    3 |
|    2 | B    | 11000 |    3 |
|    3 | B    |  9999 |    3 |
|    4 | C++  | 11001 |    3 |
|    5 | C++  | 10000 |    3 |
|    6 | C++  |  9000 |    3 |
|    7 | Java | 13000 |    2 |
|    8 | Java | 12000 |    2 |
+------+------+-------+------+
8 rows in set, 2 warnings (0.00 sec)
# 最终结果
# 这里引入了变量
WITH a AS ( SELECT job, count( job ) AS ic FROM grade GROUP BY job ),
b AS (
	SELECT
		g.id,
		@rw := @rw + 1 AS rno,
		g.job,
		g.score,
		a.ic 
	FROM
		( SELECT @rw := 0 ) AS r,
		grade AS g
		LEFT JOIN a ON g.job = a.job 
	ORDER BY
		job,
		score DESC 
	),
	c AS (
	SELECT
		job,
		@INDEX := @INDEX + 1 AS rx,
		ic 
	FROM
		( SELECT @INDEX := 0 ) AS i,
		b 
	GROUP BY
		job 
	),
	d AS (
	SELECT
		b.id,
		b.rno,
		b.job,
		b.score,
		c.rx,
		c.ic 
	FROM
		b
		LEFT JOIN c ON c.job = b.job 
	) SELECT
	id,
	rno,
	job,
	score,
	rx,
	ic,
IF
	(
		ic % 2 = 0,
		floor(( ic / 2 + rno + rx ) / 2 ),
		floor( ic / 2 ) + rx 
	) AS rindex 
FROM
	d;

+----+------+------+-------+------+------+--------+
| id | rno  | job  | score | rx   | ic   | rindex |
+----+------+------+-------+------+------+--------+
| 11 |    1 | B    | 13000 |    1 |    4 |      2 |
|  6 |    2 | B    | 12000 |    1 |    4 |      2 |
|  7 |    3 | B    | 11000 |    1 |    4 |      3 |
|  8 |    4 | B    |  9999 |    1 |    4 |      3 |
| 12 |    5 | C    | 14000 |    5 |    1 |      5 |
|  1 |    6 | C++  | 11001 |    6 |    3 |      7 |
|  2 |    7 | C++  | 10000 |    6 |    3 |      7 |
|  3 |    8 | C++  |  9000 |    6 |    3 |      7 |
| 10 |    9 | Java | 15000 |    9 |    4 |     10 |
|  5 |   10 | Java | 13000 |    9 |    4 |     10 |
|  4 |   11 | Java | 12000 |    9 |    4 |     11 |
|  9 |   12 | Java | 11000 |    9 |    4 |     11 |
+----+------+------+-------+------+------+--------+
12 rows in set, 6 warnings (0.00 sec)
pS2nJeI.png

SQL285 积分最多

获得积分最多的人(三)

drop table if exists user;
drop table if exists grade_info;

CREATE TABLE user (
id  int(4) NOT NULL,
name varchar(32) NOT NULL
);

CREATE TABLE grade_info (
user_id  int(4) NOT NULL,
grade_num int(4) NOT NULL,
type varchar(32) NOT NULL
);

INSERT INTO user VALUES
(1,'tm'),
(2,'wwy'),
(3,'zk'),
(4,'qq'),
(5,'lm');

INSERT INTO grade_info VALUES
(1,3,'add'),
(2,3,'add'),
(1,1,'reduce'),
(3,3,'add'),
(4,3,'add'),
(5,3,'add'),
(3,1,'reduce');

描述

牛客每天有很多用户刷题,发帖,点赞,点踩等等,这些都会记录相应的积分。

有一个用户表(user),简况如下:

id name
1 tm
2 wwy
3 zk
4 qq
5 lm

还有一个积分表(grade_info),简况如下:

user_id grade_num type
1 3 add
2 3 add
1 1 reduce
3 3 add
4 3 add
5 3 add
3 1 reduce

1行表示,user_id为1的用户积分增加了3分。

2行表示,user_id为2的用户积分增加了3分。

3行表示,user_id为1的用户积分减少了1分。

.......

最后1行表示,user_id为3的用户积分减少了1分。

请你写一个SQL查找积分最高的用户的id,名字,以及他的总积分是多少(可能有多个),查询结果按照id升序排序,以上例子查询结果如下:

id name grade_num
2 wwy 3
4 qq 3
5 lm 3

解释:

user_id为1和3的先加了3分,但是后面又减了1分,他们2个是2分,

其他3个都是3分,所以输出其他三个的数据

解答W

WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ),
c AS ( SELECT *, rank() over ( ORDER BY total DESC ) AS rk FROM b ) SELECT
c.*,
user.NAME 
FROM
	c
	LEFT JOIN user ON c.user_id = user.id 
WHERE
	rk = 1;
	
+---------+-------+----+------+
| user_id | total | rk | NAME |
+---------+-------+----+------+
|       2 |     3 |  1 | wwy  |
|       4 |     3 |  1 | qq   |
|       5 |     3 |  1 | lm   |
+---------+-------+----+------+
3 rows in set (0.00 sec)

解答T

不使用窗口函数

WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ) SELECT
* 
FROM
	b 
ORDER BY
	total DESC;

# 这里得到了数值之后的最大值的处理?
+---------+-------+
| user_id | total |
+---------+-------+
|       2 |     3 |
|       4 |     3 |
|       5 |     3 |
|       1 |     2 |
|       3 |     2 |
+---------+-------+
5 rows in set (0.00 sec)

# 注意这里navicat会将user改成大写, 导致在linux下的MySQL存在问题

WITH a AS ( SELECT *, IF ( type = 'add', 1, - 1 ) * grade_num AS rnum FROM grade_info ),
b AS ( SELECT user_id, sum( rnum ) AS total FROM a GROUP BY user_id ) SELECT
b.*,
user.name
FROM
	b
	LEFT JOIN user user.id = b.user_id 
WHERE
	b.total = ( SELECT max( total ) FROM b );

# 这里执行两次的select from b, 对效率影响几何?

+---------+-------+------+
| user_id | total | name |
+---------+-------+------+
|       2 |     3 | wwy  |
|       4 |     3 | qq   |
|       5 |     3 | lm   |
+---------+-------+------

SQL280 简历投递

实习广场投递简历分析(三)

drop table if exists resume_info;
CREATE TABLE resume_info (
id int(4) NOT NULL,
job varchar(64) NOT NULL,
date date NOT NULL,
num int(11) NOT NULL,
PRIMARY KEY (id));

INSERT INTO resume_info VALUES
(1,'C++','2025-01-02',53),
(2,'Python','2025-01-02',23),
(3,'Java','2025-01-02',12),
(4,'C++','2025-01-03',54),
(5,'Python','2025-01-03',43),
(6,'Java','2025-01-03',41),
(7,'Java','2025-02-03',24),
(8,'C++','2025-02-03',23),
(9,'Python','2025-02-03',34),
(10,'Java','2025-02-04',42),
(11,'C++','2025-02-04',45),
(12,'Python','2025-02-04',59),
(13,'C++','2026-01-04',230),
(14,'Java','2026-01-04',764),
(15,'Python','2026-01-04',644),
(16,'C++','2026-01-06',240),
(17,'Java','2026-01-06',714),
(18,'Python','2026-01-06',624),
(19,'C++','2026-02-14',260),
(20,'Java','2026-02-14',721),
(21,'Python','2026-02-14',321),
(22,'C++','2026-02-24',134),
(23,'Java','2026-02-24',928),
(24,'Python','2026-02-24',525),
(25,'C++','2027-02-06',231);

描述

在牛客实习广场有很多公司开放职位给同学们投递,同学投递完就会把简历信息存到数据库里。

现在有简历信息表(resume_info),部分信息简况如下:

id job date num
1 C++ 2025-01-02 53
2 Python 2025-01-02 23
3 Java 2025-01-02 12
4 C++ 2025-01-03 54
5 Python 2025-01-03 43
6 Java 2025-01-03 41
7 Java 2025-02-03 24
8 C++ 2025-02-03 23
9 Python 2025-02-03 34
10 Java 2025-02-04 42
11 C++ 2025-02-04 45
12 Python 2025-02-04 59
13 C++ 2026-01-04 230
14 Java 2026-01-04 764
15 Python 2026-01-04 644
16 C++ 2026-01-06 240
17 Java 2026-01-06 714
18 Python 2026-01-06 624
19 C++ 2026-01-04 260
20 Java 2026-02-14 721
21 Python 2026-02-14 321
22 C++ 2026-02-14 134
23 Java 2026-02-24 928
24 Python 2026-02-24 525
25 C++ 2027-02-06 231

第1行表示,在2025年1月2号,C++岗位收到了53封简历

......

最后1行表示,在2027年2月6号,C++岗位收到了231封简历

请你写出SQL语句查询在2025年投递简历的每个岗位,每一个月内收到简历的数目,和对应的2026年的同一个月同岗位,收到简历的数目,最后的结果先按first_year_mon月份降序,再按job降序排序显示,以上例子查询结果如下:

job first_year_mon first_year_cnt second_year_mon second_year_cnt
Python 2025-02 93 2026-02 846
Java 2025-02 66 2026-02 1649
C++ 2025-02 68 2026-02 394
Python 2025-01 66 2026-01 1268
Java 2025-01 53 2026-01 1478
C++ 2025-01 107 2026-01 470

解析:

第1行表示Python岗位在2025年2月收到了93份简历,在对应的2026年2月收到了846份简历

......

最后1行表示C++岗位在2025年1月收到了107份简历,在对应的2026年1月收到了470份简历

解答W

# a, 得到基础的数据
WITH a AS (
	SELECT
		job,
		CONVERT ( concat( substring( date, 1, 7 ), '-01' ), date ) AS m_month,
		sum( num ) AS all_c 
	FROM
		resume_info 
	WHERE
		YEAR ( date ) != 2027 
	GROUP BY
		job,
		m_month 
	),
	# 汇合数据, 注意这里只有两年的数据, 可以这样合并, 假如三年的数据呢?
	b AS (
	SELECT
		job,
		m_month AS fm,
		all_c,
		lag( m_month ) over w AS sm,
		lag( all_c ) over w AS sc 
	FROM
	a window w AS ( PARTITION BY job, MONTH ( m_month ) ORDER BY m_month )) SELECT
	job,
	DATE_FORMAT( sm, '%Y-%m' ) AS first_year_mon,
	sc AS first_year_cnt,
	DATE_FORMAT( fm, '%Y-%m' ) AS second_year_mon,
	all_c AS second_year_cnt 
FROM
	b 
WHERE
	sm IS NOT NULL 
ORDER BY
	month(sm) desc, job desc;
	
+--------+----------------+----------------+-----------------+-----------------+
| job    | first_year_mon | first_year_cnt | second_year_mon | second_year_cnt |
+--------+----------------+----------------+-----------------+-----------------+
| Python | 2025-02        |             93 | 2026-02         |             846 |
| Java   | 2025-02        |             66 | 2026-02         |            1649 |
| C++    | 2025-02        |             68 | 2026-02         |             394 |
| Python | 2025-01        |             66 | 2026-01         |            1268 |
| Java   | 2025-01        |             53 | 2026-01         |            1478 |
| C++    | 2025-01        |            107 | 2026-01         |             470 |
+--------+----------------+----------------+-----------------+-----------------+
6 rows in set (0.01 sec)

解答T

# 等到基础数据
# 按照年份拆分开两组数据
# 按照month, job合并数据
WITH a AS (
	SELECT
		job,
		CONVERT ( concat( substring( date, 1, 7 ), '-01' ), date ) AS m_month,
		sum( num ) AS all_c 
	FROM
		resume_info 
	WHERE
		YEAR ( date ) != 2027 
	GROUP BY
		job,
		m_month 
	),
	b AS ( SELECT * FROM a WHERE YEAR ( m_month ) = 2025 ),
	c AS ( SELECT * FROM a WHERE YEAR ( m_month ) = 2026 ) SELECT
	b.*,
	c.* 
FROM
	b
	JOIN c ON MONTH ( b.m_month ) = MONTH ( c.m_month ) 
	AND b.job = c.job;
	
+--------+------------+-------+--------+------------+-------+
| job    | m_month    | all_c | job    | m_month    | all_c |
+--------+------------+-------+--------+------------+-------+
| C++    | 2025-01-01 |   107 | C++    | 2026-01-01 |   470 |
| Python | 2025-01-01 |    66 | Python | 2026-01-01 |  1268 |
| Java   | 2025-01-01 |    53 | Java   | 2026-01-01 |  1478 |
| Java   | 2025-02-01 |    66 | Java   | 2026-02-01 |  1649 |
| C++    | 2025-02-01 |    68 | C++    | 2026-02-01 |   394 |
| Python | 2025-02-01 |    93 | Python | 2026-02-01 |   846 |
+--------+------------+-------+--------+------------+-------+
6 rows in set (0.00 sec)

SQL220 部门职衔

汇总各个部门当前员工的title类型的分配数目

drop table if exists  `departments` ; 
drop table if exists  `dept_emp` ; 
drop table if exists  titles ;
CREATE TABLE `departments` (
`dept_no` char(4) NOT NULL,
`dept_name` varchar(40) NOT NULL,
PRIMARY KEY (`dept_no`));
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE titles (
`emp_no` int(11) NOT NULL,
`title` varchar(50) NOT NULL,
`from_date` date NOT NULL,
`to_date` date DEFAULT NULL);
INSERT INTO departments VALUES('d001','Marketing');
INSERT INTO departments VALUES('d002','Finance');
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
INSERT INTO dept_emp VALUES(10003,'d002','1995-12-03','9999-01-01');
INSERT INTO titles VALUES(10001,'Senior Engineer','1986-06-26','9999-01-01');
INSERT INTO titles VALUES(10002,'Staff','1996-08-03','9999-01-01');
INSERT INTO titles VALUES(10003,'Senior Engineer','1995-12-03','9999-01-01');

描述

有一个部门表departments简况如下:

dept_no dept_name
d001 Marketing
d002 Finance

有一个,部门员工关系表dept_emp简况如下:

emp_no dept_no from_date to_date
10001 d001 1986-06-26 9999-01-01
10002 d001 1996-08-03 9999-01-01
10003 d002 1995-12-03 9999-01-01

有一个职称表titles简况如下:

emp_no title form_date to_date
10001 Senior Engineer 1986-06-26 9999-01-01
10002 Staff 1996-08-03 9999-01-01
10003 Senior Engineer 1995-12-03 9999-01-01

汇总各个部门当前员工的title类型的分配数目,即结果给出部门编号dept_nodept_name、其部门下所有的员工的title以及该类型title对应的数目count,结果按照dept_no升序排序,dept_no一样的再按title升序排序

dept_no dept_name title count
d001 Marketing Senior Engineer 1
d001 Marketing Staff 1
d002 Finance Senior Engineer 1

解答W

这种需求, 窗口函数的实现属于入门级.

WITH a AS (
	SELECT
		de.emp_no,
		de.dept_no,
		dp.dept_name,
		tl.title 
	FROM
		dept_emp AS de
		LEFT JOIN departments AS dp ON de.dept_no = dp.dept_no
		LEFT JOIN titles tl ON de.emp_no = tl.emp_no 
	) SELECT
	*,
	# 根据部门 - title的使用 -> 统计title的使用次数
	count( title ) over ( PARTITION BY dept_no, title ) AS count 
FROM
	a;

+--------+---------+-----------+-----------------+-------+
| emp_no | dept_no | dept_name | title           | count |
+--------+---------+-----------+-----------------+-------+
|  10001 | d001    | Marketing | Senior Engineer |     1 |
|  10002 | d001    | Marketing | Staff           |     1 |
|  10003 | d002    | Finance   | Senior Engineer |     1 |
+--------+---------+-----------+-----------------+-------+
3 rows in set (0.00 sec)

解答T

WITH a AS (
	SELECT
		t.emp_no,
		de.dept_no,
		t.title 
	FROM
		titles AS t
		LEFT JOIN dept_emp de ON de.emp_no = t.emp_no 
	) SELECT
	a.dept_no,
	a.title,
	dp.dept_name,
	count(*) AS count 
FROM
	a
	LEFT JOIN departments AS dp ON dp.dept_no = a.dept_no 
GROUP BY
	dept_no,
	title;

+---------+-----------------+-----------+-------+
| dept_no | title           | dept_name | count |
+---------+-----------------+-----------+-------+
| d001    | Senior Engineer | Marketing |     1 |
| d001    | Staff           | Marketing |     1 |
| d002    | Senior Engineer | Finance   |     1 |
+---------+-----------------+-----------+-------+
3 rows in set (0.00 sec)

SQL219 员工薪水

获取员工其当前的薪水比其manager当前薪水还高的相关信息

drop table if exists  `dept_emp` ; 
drop table if exists  `dept_manager` ; 
drop table if exists  `salaries` ; 
CREATE TABLE `dept_emp` (
`emp_no` int(11) NOT NULL,
`dept_no` char(4) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `dept_manager` (
`dept_no` char(4) NOT NULL,
`emp_no` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`dept_no`));
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`));
INSERT INTO dept_emp VALUES(10001,'d001','1986-06-26','9999-01-01');
INSERT INTO dept_emp VALUES(10002,'d001','1996-08-03','9999-01-01');
# 增加测试数据
INSERT INTO dept_emp VALUES(10003,'d001','1997-08-03','9999-01-01');

INSERT INTO dept_manager VALUES('d001',10002,'1996-08-03','9999-01-01');

INSERT INTO salaries VALUES(10001,88958,'2002-06-22','9999-01-01');
INSERT INTO salaries VALUES(10002,72527,'1996-08-03','9999-01-01');
# 增加测试数据
INSERT INTO salaries VALUES(10003,62527,'1996-08-03','9999-01-01');

描述

有一个,部门关系表dept_emp简况如下:

emp_no dept_no from_date to_date
10001 d001 1986-06-26 9999-01-01
10002 d001 1996-08-03 9999-01-01

有一个部门经理表dept_manager简况如下:

dept_no emp_no from_date to_date
d001 10002 1996-08-03 9999-01-01

有一个薪水表salaries简况如下:

emp_no salary from_date to_date
10001 88958 2002-06-22 9999-01-01
10002 72527 1996-08-03 9999-01-01

获取员工其当前的薪水比其manager当前薪水还高的相关信息,

第一列给出员工的emp_no
第二列给出其managermanager_no
第三列给出该员工当前的薪水emp_salary,
第四列给该员工对应的manager当前的薪水manager_salary

以上例子输出如下:

emp_no manager_no emp_salary manager_salary
10001 10002 88958 72527

解答W

WITH temp AS (
	SELECT
		a.emp_no,
		b.emp_no AS m_n,
		c.salary 
	FROM
		dept_emp a
		LEFT JOIN dept_manager b ON a.dept_no = b.dept_no
		LEFT JOIN salaries c ON a.emp_no = c.emp_no 
	),
	# 加入一个辅助列, 用于将manager抽离出来
	m AS ( SELECT *, IF ( emp_no = m_n, 1, 0 ) AS is_m FROM temp ),
	# 将manager的数据逐个添加到常规的emp_no之下(新增列)
	n AS (
	SELECT
		*,
		FIRST_VALUE( salary ) over w AS ms,
		FIRST_VALUE( emp_no ) over w AS m_m 
	FROM
	m window w AS ( PARTITION BY m_n ORDER BY is_m DESC )) SELECT
	emp_no,
	m_m AS manager_no,
	salary AS emp_salary,
	ms AS manager_salary 
FROM
	n 
WHERE
	is_m = 0 
	AND salary > ms;
	
+--------+------------+------------+----------------+
| emp_no | manager_no | emp_salary | manager_salary |
+--------+------------+------------+----------------+
|  10001 |      10002 |      88958 |          72527 |
+--------+------------+------------+----------------+
1 row in set (0.00 sec)

解答T

不使用窗口函数

WITH temp AS (
	SELECT
		a.emp_no,
		b.emp_no AS m_n,
		c.salary 
	FROM
		dept_emp a
		LEFT JOIN dept_manager b ON a.dept_no = b.dept_no
		LEFT JOIN salaries c ON a.emp_no = c.emp_no 
	),
	# 将manger提取出来
	m AS ( SELECT * FROM temp AS t1 WHERE t1.emp_no != t1.m_n ),
	n AS ( SELECT * FROM temp AS t2 WHERE t2.emp_no = t2.m_n ) SELECT
	m.emp_no,
	n.emp_no AS manager_no,
	m.salary AS emp_salary,
	n.salary AS manager_salary 
FROM
	m
	LEFT JOIN n ON m.m_n = n.m_n 
WHERE
	m.salary > n.salary;
	
+--------+------------+------------+----------------+
| emp_no | manager_no | emp_salary | manager_salary |
+--------+------------+------------+----------------+
|  10001 |      10002 |      88958 |          72527 |
+--------+------------+------------+----------------+
1 row in set (0.00 sec)

SQL275 课程订单

牛客的课程订单分析(五)

drop table if exists order_info;
CREATE TABLE order_info (
id int(4) NOT NULL,
user_id int(11) NOT NULL,
product_name varchar(256) NOT NULL,
status varchar(32) NOT NULL,
client_id int(4) NOT NULL,
date date NOT NULL,
PRIMARY KEY (id));

INSERT INTO order_info VALUES
(1,557336,'C++','no_completed',1,'2025-10-10'),
(2,230173543,'Python','completed',2,'2025-10-12'),
(3,57,'JS','completed',3,'2025-10-23'),
(4,57,'C++','completed',3,'2025-10-23'),
(5,557336,'Java','completed',1,'2025-10-23'),
(6,57,'Java','completed',1,'2025-10-24'),
(7,557336,'C++','completed',1,'2025-10-25'),
(8,557336,'Python','completed',1,'2025-10-26');

描述

有很多同学在牛客购买课程来学习,购买会产生订单存到数据库里。

有一个订单信息表(order_info),简况如下:

id user_id product_name status client_id date
1 557336 C++ no_completed 1 2025-10-10
2 230173543 Python completed 2 2025-10-12
3 57 JS completed 3 2025-10-23
4 57 C++ completed 3 2025-10-23
5 557336 Java completed 1 2025-10-23
6 57 Java completed 1 2025-10-24
7 557336 C++ completed 1 2025-10-25
8 557336 Python completed 1 2025-10-26

第1行表示user_id为557336的用户在2025-10-10的时候使用了client_id为1的客户端下了C++课程的订单,但是状态为没有购买成功。

第2行表示user_id为230173543的用户在2025-10-12的时候使用了client_id为2的客户端下了Python课程的订单,状态为购买成功。

......

最后1行表示user_id为557336的用户在2025-10-26的时候使用了client_id为1的客户端下了Python课程的订单,状态为购买成功。

请你写出一个sql语句查询在2025-10-15以后,如果有一个用户下单2个以及2个以上状态为购买成功的C++课程或Java课程或Python课程,那么输出这个用户的user_id,以及满足前面条件的第一次购买成功的C++课程或Java课程或Python课程的日期first_buy_date,以及满足前面条件的第二次购买成功的C++课程或Java课程或Python课程的日期second_buy_date,以及购买成功的C++课程或Java课程或Python课程的次数cnt,并且输出结果按照user_id升序排序,以上例子查询结果如下:

user_id first_buy_date second_buy_date cnt
57 2025-10-23 2025-10-24 2
557336 2025-10-23 2025-10-25 3

解析:

id为4,6的订单满足以上条件,输出57,id为4的订单为第一次购买成功,输出first_buy_date为2025-10-23,id为6的订单为第二次购买,输出second_buy_date为2025-10-24,总共成功购买了2次;

id为5,7,8的订单满足以上条件,输出557336,id为5的订单为第一次购买成功,输出first_buy_date为2025-10-23,id为7的订单为第二次购买,输出second_buy_date为2025-10-25,总共成功购买了3次;

解答W

WITH a AS (
	SELECT
		user_id,
		FIRST_VALUE( date ) over w AS first_buy_date,
		NTH_VALUE( date, 2 ) over w AS second_buy_date,
		count( user_id ) over w AS cnt 
	FROM
		order_info 
	WHERE
		STATUS = 'completed' 
		AND date > '2025-10-15' 
		AND product_name IN ( 'C++', 'Python', 'Java' ) window w AS ( PARTITION BY user_id ) 
	ORDER BY
		user_id,
		date 
	) SELECT DISTINCT
	* 
FROM
	a 
WHERE
	cnt > 1;
	
+---------+-----+----------------+-----------------+
| user_id | cnt | first_buy_date | second_buy_date |
+---------+-----+----------------+-----------------+
|      57 |   2 | 2025-10-23     | 2025-10-24      |
|  557336 |   3 | 2025-10-23     | 2025-10-25      |
+---------+-----+----------------+-----------------+
2 rows in set (0.00 sec)

解答T

WITH a AS ( SELECT * FROM order_info WHERE STATUS = 'completed' AND date > '2025-10-15' AND product_name IN ( 'C++', 'Python', 'Java' ) ORDER BY date ),
b AS ( SELECT user_id, min( date ) AS first_buy_date, count( user_id ) AS cnt FROM a GROUP BY user_id ),
c AS ( SELECT user_id, min( date ) AS second_buy_date FROM a WHERE date NOT IN ( SELECT first_buy_date FROM b ) GROUP BY user_id ) SELECT
b.*,
c.second_buy_date 
FROM
	b
	LEFT JOIN c ON b.user_id = c.user_id 
WHERE
	b.cnt > 1;
	
+---------+----------------+-----+-----------------+
| user_id | first_buy_date | cnt | second_buy_date |
+---------+----------------+-----+-----------------+
|      57 | 2025-10-23     |   2 | 2025-10-24      |
|  557336 | 2025-10-23     |   3 | 2025-10-25      |
+---------+----------------+-----+-----------------+
2 rows in set (0.00 sec)

小结

在窗口函数下, 大部分问题的解决都会变得更容易, 且通常情况下, 窗口函数的SQL语句可阅读性(即第一眼阅读可以大概有一个轮廓, 这些语句的作用)更好.

由于传统的处理方式, 如: 行之间的运算, 一般需要一些巧妙的方式来实现其计算, 如果不是仔细的解析代码, 其中的部分的代码并不易于理解.

例如上面的SQL语句

# window, 相当明确的表明取值的方式, 取值, 是date的第二个值
NTH_VALUE( date, 2 ) over w AS second_buy_date,

# 传统模式下
# 显然乍看之下, 不容易理解这个语句为什么能取出第二的数值
# 这是间接实现, 筛选出最小的, 然后再次筛选最小的,  即得到第二小的
SELECT user_id, min( date ) AS second_buy_date FROM a WHERE date NOT IN ( SELECT first_buy_date FROM b)

大厂真题

以下内容相对复杂, 将全部采用窗口函数来解决.

部分内容描述得十分含糊, 仅作参考, 解法可能有所差异.

SQL161 视频热度

近一个月发布的视频中热度最高的top3视频

DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:31', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:35', 0, 0, 1, null)
  ,(103, 2001, '2021-10-03 11:00:50', '2021-10-03 11:01:35', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:04', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:06', 1, 0, 0, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:01', 0, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 0, 1, null)
  ,(101, 2003, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 0, null)
  ,(101, 2003, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2021-09-05 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-09-05 7:00:00')
  ,(2003, 902, '影视', 90, '2021-09-05 7:00:00')
  ,(2004, 902, '影视', 90, '2021-09-05 8:00:00');

描述

现有用户-视频互动表tb_user_video_log

id uid video_id start_time end_time if_follow if_like if_retweet comment_id
1 101 2001 2021-09-24 10:00:00 2021-09-24 10:00:30 1 1 1 NULL
2 101 2001 2021-10-01 10:00:00 2021-10-01 10:00:31 1 1 0 NULL
3 102 2001 2021-10-01 10:00:00 2021-10-01 10:00:35 0 0 1 NULL
4 103 2001 2021-10-03 11:00:50 2021-10-03 10:00:35 1 1 0 1732526
5 106 2002 2021-10-02 11:00:05 2021-10-02 11:01:04 2 0 1 NULL
6 107 2002 2021-10-02 10:59:05 2021-10-02 11:00:06 1 0 0 NULL
7 108 2002 2021-10-02 10:59:05 2021-10-02 11:00:05 1 1 1 NULL
8 109 2002 2021-10-03 10:59:05 2021-10-03 11:00:01 0 1 0 NULL
9 105 2002 2021-09-25 11:00:00 2021-09-25 11:00:30 1 0 1 NULL
10 101 2003 2021-09-26 11:00:00 2021-09-26 11:00:30 1 0 0 NULL
11 101 2003 2021-09-30 11:00:00 2021-09-30 11:00:30 1 1 0 NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

id video_id author tag duration release_time
1 2001 901 旅游 30 2021-09-05 07:00:00
2 2002 901 旅游 60 2021-09-05 07:00:00
3 2003 902 影视 90 2021-09-05 07:00:00
4 2004 902 影视 90 2021-09-05 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:找出近一个月发布的视频中热度最高的top3视频。

  • 热度=(a视频完播率+b点赞数+c评论数+d转发数)*新鲜度;
  • 新鲜度=1/(最近无播放天数+1);(关键在于此)
  • 当前配置的参数a,b,c,d分别为100、5、3、2。
  • 最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按[T-29, T]闭区间统计。
  • 结果中热度保留为整数,并按热度降序排序。

输出示例

示例数据的输出结果如下

video_id hot_index
2001 122
2002 56
2003 1

解释:

mysql> select date_sub("2021-10-03", interval 1 month);
+------------------------------------------+
| date_sub("2021-10-03", interval 1 month) |
+------------------------------------------+
| 2021-09-03                               |
+------------------------------------------+
1 row in set (0.00 sec)

最近播放日期为2021-10-03,记作当天日期;近一个月(2021-09-04及之后)发布的视频有2001、2002、2003、2004,不过2004暂时还没有播放记录;

视频2001完播率1.0(被播放次数4次,完成播放4次),被点赞3次,评论1次,转发2次,最近无播放天数为0,因此热度为:(100*1.0+5*3+3*1+2*2)/(0+1)=122

同理,视频2003完播率0,被点赞数1,评论和转发均为0,最近无播放天数为3,因此热度为:(100*0+5*1+3*0+2*0)/(3+1)=1 (1.2保留为整数).

解答

这道题并不复杂, 也不难, 但是这里写的说明不是很清楚, 让这个题看起来很复杂.

WITH a AS (
	SELECT
		video_id,
		if_like,
		if_follow,
		if_retweet,
	IF
		( comment_id IS NULL, 0, 1 ) AS comments,
		TIMESTAMPDIFF( SECOND, start_time, end_time ) AS gap,
		max( end_time ) over () AS md,
		max( end_time ) over ( PARTITION BY video_id ) AS rd 
	FROM
		tb_user_video_log 
	),
	b AS (
	SELECT
		a.video_id,
		a.if_like,
		a.if_follow,
		a.if_retweet,
		a.comments,
		DATEDIFF( a.md, a.rd ) AS gap,
	IF
		( a.gap > ( tv.duration - 1 ), 1, 0 ) AS ftimes 
	FROM
		a
		LEFT JOIN tb_video_info tv ON a.video_id = tv.video_id 
	WHERE
		DATEDIFF(a.md, tv.release_time) <= 29
	),
	c AS (
	SELECT
		video_id,
		count( video_id ) AS ptimes,
		sum( if_like ) AS likes,
		sum( if_follow ) AS follows,
		sum( if_retweet ) AS retweets,
		sum( comments ) AS comments,
		avg( gap ) AS gap,
		avg( ftimes ) AS finish_ratio 
	FROM
		b 
	GROUP BY
		video_id 
	) SELECT
	video_id,
	round((
			100 * finish_ratio + 5 * likes + 3 * comments + 2 * retweets 
			) * (
		1 / ( gap + 1 )), 0) AS hot_index 
FROM
	c order by hot_index desc limit 3;

SQL167 连续签到

连续签到领金币

DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    artical_id INT NOT NULL COMMENT '视频ID',
    in_time datetime COMMENT '进入时间',
    out_time datetime COMMENT '离开时间',
    sign_in TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
  (101, 0, '2021-07-07 10:00:00', '2021-07-07 10:00:09', 1),
  (101, 0, '2021-07-08 10:00:00', '2021-07-08 10:00:09', 1),
  (101, 0, '2021-07-09 10:00:00', '2021-07-09 10:00:42', 1),
  (101, 0, '2021-07-10 10:00:00', '2021-07-10 10:00:09', 1),
  (101, 0, '2021-07-11 23:59:55', '2021-07-11 23:59:59', 1),
  (101, 0, '2021-07-12 10:00:28', '2021-07-12 10:00:50', 1),
  (101, 0, '2021-07-13 10:00:28', '2021-07-13 10:00:50', 1),
  (102, 0, '2021-10-01 10:00:28', '2021-10-01 10:00:50', 1),
  (102, 0, '2021-10-02 10:00:01', '2021-10-02 10:01:50', 1),
  (102, 0, '2021-10-03 11:00:55', '2021-10-03 11:00:59', 1),
  (102, 0, '2021-10-04 11:00:45', '2021-10-04 11:00:55', 0),
  (102, 0, '2021-10-05 11:00:53', '2021-10-05 11:00:59', 1),
  (102, 0, '2021-10-06 11:00:45', '2021-10-06 11:00:55', 1);

描述

用户行为日志表tb_user_log

id uid artical_id in_time out_time sign_in
1 101 0 2021-07-07 10:00:00 2021-07-07 10:00:09 1
2 101 0 2021-07-08 10:00:00 2021-07-08 10:00:09 1
3 101 0 2021-07-09 10:00:00 2021-07-09 10:00:42 1
4 101 0 2021-07-10 10:00:00 2021-07-10 10:00:09 1
5 101 0 2021-07-11 23:59:55 2021-07-11 23:59:59 1
6 101 0 2021-07-12 10:00:28 2021-07-12 10:00:50 1
7 101 0 2021-07-13 10:00:28 2021-07-13 10:00:50 1
8 102 0 2021-10-01 10:00:28 2021-10-01 10:00:50 1
9 102 0 2021-10-02 10:00:01 2021-10-02 10:01:50 1
10 102 0 2021-10-03 10:00:55 2021-10-03 11:00:59 1
11 102 0 2021-10-04 10:00:45 2021-10-04 11:00:55 0
12 102 0 2021-10-05 10:00:53 2021-10-05 11:00:59 1
13 102 0 2021-10-06 10:00:45 2021-10-06 11:00:55 1

(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)

场景逻辑说明

  • artical_id-文章ID代表用户浏览的文章的ID,特殊情况artical_id-文章ID0表示用户在非文章内容页(比如App内的列表页、活动页等)。注意:只有artical_id为0时sign_in值才有效。
  • 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
  • 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)

问题:计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。

:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了。

输出****示例:

示例数据的输出结果如下:

uid month coin
101 202107 15
102 202110 7

解释:

101在活动期内连续签到了7天,因此获得1*7+2+6=15金币;

102在10.01~10.03连续签到3天获得5金币( 1* 3 + 2)

10.04断签(断签, 但是还是存在了,10.05~10.06连续签到2天获得2金币,共得到7金币。

解答W

# r - 1, 是因后续需要按照月份来归类数据
# a, 遴选出满足要求的基本数据
# b, 对每个uid组的日期进行排序, 来确定这个日期和最小日期的时间间隔, 来间接判断这个日期是否为间隔一天的日期
# c, 将日期以排名的间隔进行详见, 假如之间的间隔为1天, 那么就会得到相同的日期
# d, 按照uid, 日期将数据归档
# 最终的数据获取
WITH a AS ( SELECT uid, CONVERT ( in_time, date ) AS in_time, sign_in FROM tb_user_log WHERE in_time >= '2021-07-01' AND in_time < '2021-11-01' ),
b AS ( SELECT *, dense_rank() over ( PARTITION BY uid ORDER BY in_time ) AS r FROM a WHERE sign_in > 0 ),
c AS ( SELECT *, date_sub( in_time, INTERVAL r - 1 DAY ) AS sub FROM b ),
d AS ( SELECT uid, sum( sign_in ) AS sm, sub FROM c GROUP BY uid, sub ),
e AS (
	SELECT
		uid,
		sub,
		(
		sm + ( CASE WHEN sm > 6 THEN 8 WHEN sm > 2 THEN 2 ELSE 0 END )) AS total 
	FROM
		d 
	) SELECT
	uid,
	substring( sub, 1, 7 ) AS MONTH,
	sum( total ) AS coin 
FROM
	e 
GROUP BY
	uid,
	MONTH;
	
+-----+---------+------+
| uid | MONTH   | coin |
+-----+---------+------+
| 101 | 2021-07 |   15 |
| 102 | 2021-10 |    7 |
+-----+---------+------+
2 rows in set (0.00 sec)

SQL173 商品动销率

店铺901国庆期间的7日动销率和滞销率

DROP TABLE IF EXISTS tb_order_overall;
CREATE TABLE tb_order_overall (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    order_id INT NOT NULL COMMENT '订单号',
    uid INT NOT NULL COMMENT '用户ID',
    event_time datetime COMMENT '下单时间',
    total_amount DECIMAL NOT NULL COMMENT '订单总金额',
    total_cnt INT NOT NULL COMMENT '订单商品总件数',
    `status` TINYINT NOT NULL COMMENT '订单状态'
) CHARACTER SET utf8 COLLATE utf8_bin;

DROP TABLE IF EXISTS tb_product_info;
CREATE TABLE tb_product_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    product_id INT NOT NULL COMMENT '商品ID',
    shop_id INT NOT NULL COMMENT '店铺ID',
    tag VARCHAR(12) COMMENT '商品类别标签',
    in_price DECIMAL NOT NULL COMMENT '进货价格',
    quantity INT NOT NULL COMMENT '进货数量',
    release_time datetime COMMENT '上架时间'
) CHARACTER SET utf8 COLLATE utf8_bin;

DROP TABLE IF EXISTS tb_order_detail;
CREATE TABLE tb_order_detail (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    order_id INT NOT NULL COMMENT '订单号',
    product_id INT NOT NULL COMMENT '商品ID',
    price DECIMAL NOT NULL COMMENT '商品单价',
    cnt INT NOT NULL COMMENT '下单数量'
) CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_product_info(product_id, shop_id, tag, in_price, quantity, release_time) VALUES
  (8001, 901, '日用', 60, 1000, '2020-01-01 10:00:00'),
  (8002, 901, '零食', 140, 500, '2020-01-01 10:00:00'),
  (8003, 901, '零食', 160, 500, '2020-01-01 10:00:00');

INSERT INTO tb_order_overall(order_id, uid, event_time, total_amount, total_cnt, `status`) VALUES
  (301004, 102, '2021-09-30 10:00:00', 170, 1, 1),
  (301005, 104, '2021-10-01 10:00:00', 160, 1, 1),
  (301003, 101, '2021-10-02 10:00:00', 300, 2, 1),
  (301002, 102, '2021-10-03 11:00:00', 235, 2, 1);

INSERT INTO tb_order_detail(order_id, product_id, price, cnt) VALUES
  (301004, 8002, 180, 1),
  (301005, 8002, 170, 1),
  (301002, 8001, 85, 1),
  (301002, 8003, 180, 1),
  (301003, 8002, 150, 1),
  (301003, 8003, 180, 1);

描述

商品信息表tb_product_info

id product_id shop_id tag int_ quantity release_time
1 8001 901 日用 60 1000 2020-01-01 10:00:00
2 8002 901 零食 140 500 2020-01-01 10:00:00
3 8003 901 零食 160 500 2020-01-01 10:00:00

(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)

订单总表tb_order_overall

id order_id uid event_time total_amount total_cnt status
1 301004 102 2021-09-30 10:00:00 170 1 1
2 301005 104 2021-10-01 10:00:00 160 1 1
3 301003 101 2021-10-02 10:00:00 300 2 1
4 301002 102 2021-10-03 11:00:00 235 2 1

(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)

订单明细表tb_order_detail

id order_id product_id price cnt
1 301004 8002 180 1
2 301005 8002 170 1
3 301002 8001 85 1
4 301002 8003 180 1
5 301003 8002 150 1
6 301003 8003 180 1

(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)

问题:请计算店铺901在2021年国庆头3天的7日动销率和滞销率,结果保留3位小数,按日期升序排序。

  • 动销率定义为店铺中一段时间内有销量的商品当前已上架总商品数的比例(有销量的商品/已上架总商品数)。
  • 滞销率定义为店铺中一段时间内没有销量的商品占当前已上架总商品数的比例。(没有销量的商品/已上架总商品数)。
  • 只要当天任一店铺有任何商品的销量就输出该天的结果,即使店铺901当天的动销率为0。

当天是指? 假如10.1当天没有销量, 前面有销量, 这怎么处理?, 直接为 0 ?

输出示例

示例数据的输出结果如下:

dt sale_rate unsale_rate
2021-10-01 0.333 0.667
2021-10-02 0.667 0.333
2021-10-03 1.000 0.000

解释:

10月1日的近7日(9月25日---10月1日)店铺901有销量的商品有8002,截止当天在售商品数为3,动销率为0.333,滞销率为0.667;

10月2日的近7日(9月26日---10月2日)店铺901有销量的商品有8002、8003,截止当天在售商品数为3,动销率为0.667,滞销率为0.333;

10月3日的近7日(9月27日---10月3日)店铺901有销量的商品有8002、8003、8001,截止当天店铺901在售商品数为3,动销率为1.000,滞销率为0.000;

解答W

这道题没有什么相对较好得解法, 暴力破解.

# 取自大神的答案
SELECT
	dt1,
	round(
		count(
		DISTINCT
		IF
			( timestampdiff( DAY, dt, dt1 ) BETWEEN 0 AND 6, tb1.product_id, NULL ))/ count(
		DISTINCT
		IF
		( dt1 >= DATE ( release_time ), tb3.product_id, NULL )),
		3 
	) sale_rate,
	1-round (
		count(
		DISTINCT
		IF
			( timestampdiff( DAY, dt, dt1 ) BETWEEN 0 AND 6, tb1.product_id, NULL ))/ count(
		DISTINCT
		IF
		( dt1 >= DATE ( release_time ), tb3.product_id, NULL )),
		3 
	) unsale_rate 
FROM
	( SELECT DATE ( event_time ) dt1 FROM tb_order_overall HAVING dt1 BETWEEN '2021-10-01' AND '2021-10-03' ) tb2,
	(
	SELECT
		b.product_id,
		DATE ( event_time ) dt 
	FROM
		tb_order_overall a
		LEFT JOIN tb_order_detail b ON a.order_id = b.order_id
		LEFT JOIN tb_product_info c ON b.product_id = c.product_id 
	WHERE
		shop_id = 901 
	) tb1
	LEFT JOIN tb_product_info tb3 ON tb1.product_id = tb3.product_id 
WHERE
	shop_id = 901 
GROUP BY
	dt1;

SQL194 连续问答

某乎问答最大连续回答问题天数大于等于3天的用户及其对应等级.

drop table if exists author_tb;
CREATE TABLE author_tb(
author_id int(10) NOT NULL, 
author_level int(10) NOT NULL,
sex char(10) NOT NULL);
INSERT INTO author_tb VALUES(101 , 6, 'm');
INSERT INTO author_tb VALUES(102 , 1, 'f');
INSERT INTO author_tb VALUES(103 , 1, 'm');
INSERT INTO author_tb VALUES(104 , 3, 'm');
INSERT INTO author_tb VALUES(105 , 4, 'f');
INSERT INTO author_tb VALUES(106 , 2, 'f');
INSERT INTO author_tb VALUES(107 , 2, 'm');
INSERT INTO author_tb VALUES(108 , 5, 'f');
INSERT INTO author_tb VALUES(109 , 6, 'f');
INSERT INTO author_tb VALUES(110 , 5, 'm');

drop table if exists answer_tb;
CREATE TABLE answer_tb(
answer_date date NOT NULL, 
author_id int(10) NOT NULL,
issue_id char(10) NOT NULL,
char_len int(10) NOT NULL);
INSERT INTO answer_tb VALUES('2021-11-1', 101, 'E001' ,150);
INSERT INTO answer_tb VALUES('2021-11-1', 101, 'E002', 200);
INSERT INTO answer_tb VALUES('2021-11-1',102, 'C003' ,50);
INSERT INTO answer_tb VALUES('2021-11-1' ,103, 'P001', 35);
INSERT INTO answer_tb VALUES('2021-11-1', 104, 'C003', 120);
INSERT INTO answer_tb VALUES('2021-11-1' ,105, 'P001', 125);
INSERT INTO answer_tb VALUES('2021-11-1' , 102, 'P002', 105);
INSERT INTO answer_tb VALUES('2021-11-2',  101, 'P001' ,201);
INSERT INTO answer_tb VALUES('2021-11-2',  110, 'C002', 200);
INSERT INTO answer_tb VALUES('2021-11-2',  110, 'C001', 225);
INSERT INTO answer_tb VALUES('2021-11-2' , 110, 'C002', 220);
INSERT INTO answer_tb VALUES('2021-11-3', 101, 'C002', 180);
INSERT INTO answer_tb VALUES('2021-11-4' ,109, 'E003', 130);
INSERT INTO answer_tb VALUES('2021-11-4', 109, 'E001',123);
INSERT INTO answer_tb VALUES('2021-11-5', 108, 'C001',160);
INSERT INTO answer_tb VALUES('2021-11-5', 108, 'C002', 120);
INSERT INTO answer_tb VALUES('2021-11-5', 110, 'P001', 180);
INSERT INTO answer_tb VALUES('2021-11-5' , 106, 'P002' , 45);
INSERT INTO answer_tb VALUES('2021-11-5' , 107, 'E003', 56);

描述

现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):

author_id author_level sex
101 6 m
102 1 f
103 1 m
104 3 m
105 4 f
106 2 f
107 2 m
108 5 f
109 6 f
110 5 m

创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):

answer_date author_id issue_id char_len
2021-11-01 101 E001 150
2021-11-01 101 E002 200
2021-11-01 102 C003 50
2021-11-01 103 P001 35
2021-11-01 104 C003 120
2021-11-01 105 P001 125
2021-11-01 102 P002 105
2021-11-02 101 P001 201
2021-11-02 110 C002 200
2021-11-02 110 C001 225
2021-11-02 110 C002 220
2021-11-03 101 C002 180
2021-11-04 109 E003 130
2021-11-04 109 E001 123
2021-11-05 108 C001 160
2021-11-05 108 C002 120
2021-11-05 110 P001 180
2021-11-05 106 P002 45
2021-11-05 107 E003 56

请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:

author_id author_level days_cnt
101 6 3

解答W

# dense_rank
+-----------+-------------+---+
| author_id | answer_date | r |
+-----------+-------------+---+
|       101 | 2021-11-01  | 1 |
|       101 | 2021-11-02  | 2 |
|       101 | 2021-11-03  | 3 |
|       102 | 2021-11-01  | 1 |
|       103 | 2021-11-01  | 1 |
|       104 | 2021-11-01  | 1 |
|       105 | 2021-11-01  | 1 |
|       106 | 2021-11-05  | 1 |
|       107 | 2021-11-05  | 1 |
|       108 | 2021-11-05  | 1 |
|       109 | 2021-11-04  | 1 |
|       110 | 2021-11-02  | 1 |
|       110 | 2021-11-05  | 2 |
+-----------+-------------+---+
13 rows in set (0.00 sec)
---

WITH tmp AS ( SELECT DISTINCT author_id, answer_date FROM answer_tb ),
tmp_a AS ( SELECT *, dense_rank() over ( PARTITION BY author_id ORDER BY answer_date ) AS r FROM tmp ),
tmp_b AS ( SELECT *, DATE_SUB( answer_date, INTERVAL r DAY ) AS sub FROM tmp_a ),
temp_c AS ( SELECT tmp_b.author_id, count(*) AS ic FROM tmp_b GROUP BY tmp_b.author_id, tmp_b.sub HAVING ic > 2 ) SELECT
tc.author_id,
AT.author_level,
tc.ic 
FROM
	temp_c AS tc
	JOIN author_tb AS AT ON AT.author_id = tc.author_id;
	
+-----------+--------------+----+
| author_id | author_level | ic |
+-----------+--------------+----+
|       101 |            6 |  3 |
+-----------+--------------+----+
1 row in set (0.00 sec)

SQL189 直播在线

牛客直播各科目同时在线人数

drop table if exists course_tb;
CREATE TABLE course_tb(
course_id int(10) NOT NULL, 
course_name char(10) NOT NULL,
course_datetime char(30) NOT NULL);
INSERT INTO course_tb VALUES(1, 'Python', '2021-12-1 19:00-21:00');
INSERT INTO course_tb VALUES(2, 'SQL', '2021-12-2 19:00-21:00');
INSERT INTO course_tb VALUES(3, 'R', '2021-12-3 19:00-21:00');

drop table if exists attend_tb;
CREATE TABLE attend_tb(
user_id int(10) NOT NULL, 
course_id int(10) NOT NULL,
in_datetime datetime NOT NULL,
out_datetime datetime NOT NULL
);
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:00:00', '2021-12-1 19:28:00');
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:30:00', '2021-12-1 19:53:00');
INSERT INTO attend_tb VALUES(101, 1, '2021-12-1 19:00:00', '2021-12-1 20:55:00');
INSERT INTO attend_tb VALUES(102, 1, '2021-12-1 19:00:00', '2021-12-1 19:05:00');
INSERT INTO attend_tb VALUES(104, 1, '2021-12-1 19:00:00', '2021-12-1 20:59:00');
INSERT INTO attend_tb VALUES(101, 2, '2021-12-2 19:05:00', '2021-12-2 20:58:00');
INSERT INTO attend_tb VALUES(102, 2, '2021-12-2 18:55:00', '2021-12-2 21:00:00');
INSERT INTO attend_tb VALUES(104, 2, '2021-12-2 18:57:00', '2021-12-2 20:56:00');
INSERT INTO attend_tb VALUES(107, 2, '2021-12-2 19:10:00', '2021-12-2 19:18:00');
INSERT INTO attend_tb VALUES(100, 3, '2021-12-3 19:01:00', '2021-12-3 21:00:00');
INSERT INTO attend_tb VALUES(102, 3, '2021-12-3 18:58:00', '2021-12-3 19:05:00');
INSERT INTO attend_tb VALUES(108, 3, '2021-12-3 19:01:00', '2021-12-3 19:56:00');

描述

牛客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。

已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):

course_id course_name course_datetime
1 Python 2021-12-1 19:00-21:00
2 SQL 2021-12-2 19:00-21:00
3 R 2021-12-3 19:00-21:00

上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):

user_id course_id in_datetime out_datetime
100 1 2021-12-01 19:00:00 2021-12-01 19:28:00
100 1 2021-12-01 19:30:00 2021-12-01 19:53:00
101 1 2021-12-01 19:00:00 2021-12-01 20:55:00
102 1 2021-12-01 19:00:00 2021-12-01 19:05:00
104 1 2021-12-01 19:00:00 2021-12-01 20:59:00
101 2 2021-12-02 19:05:00 2021-12-02 20:58:00
102 2 2021-12-02 18:55:00 2021-12-02 21:00:00
104 2 2021-12-02 18:57:00 2021-12-02 20:56:00
107 2 2021-12-02 19:10:00 2021-12-02 19:18:00
100 3 2021-12-03 19:01:00 2021-12-03 21:00:00
102 3 2021-12-03 18:58:00 2021-12-03 19:05:00
108 3 2021-12-03 19:01:00 2021-12-03 19:56:00

请你统计每个科目最大同时在线人数(按course_id排序),以上数据的输出结果如下:

course_id course_name max_num
1 Python 4
2 SQL 4
3 R 3

解答

课程的时间是一个误导项, 并不需要考虑这个时间(字符串类型的数据)

  1. attend中的数据, 进入和离开整合在一起, 形成了一个时间的序列, 通过判断这个时间序列的的人数的出和入来判断每个直播间的人数的变化. 因为通过窗口函数对时间序列按照课程编号进行了拆分, 也就是说这个partition中集合同一个course_id的进入和离开直播间数据.

  2. 通过对时间序列中的进, 标记为1, 离开, 则标记为 -1, 对每个partition中的flag项进行求和操作, 进行二次排序(降序)(确保time_index相同的情况, 如何处理这个节点的数据问题, 是先加还是先减的问题, 这里采用降序, 是进行先加的操作).

    这里这道题没有具体指明如何处理相同节时间节点的先后.

WITH a AS (
	SELECT
		user_id,
		course_id,
		in_datetime AS time_index,
		1 flag 
	FROM
		attend_tb UNION ALL
	SELECT
		user_id,
		course_id,
		out_datetime AS time_index,
		- 1 flag 
	FROM
		attend_tb 
	),
	b AS ( SELECT a.*, c.course_name FROM a LEFT JOIN course_tb c ON a.course_id = c.course_id ) SELECT
	*,
	sum( flag ) over ( PARTITION BY course_name ORDER BY time_index, flag DESC ) AS on_lines 
FROM
	b;
+---------+-----------+---------------------+------+-------------+----------+
| user_id | course_id | time_index          | flag | course_name | on_lines |
+---------+-----------+---------------------+------+-------------+----------+
|     100 |         1 | 2021-12-01 19:00:00 |    1 | Python      |        4 | # 每个课程的id区间, 就可以计算出人数的变化情况
|     101 |         1 | 2021-12-01 19:00:00 |    1 | Python      |        4 |
|     102 |         1 | 2021-12-01 19:00:00 |    1 | Python      |        4 |
|     104 |         1 | 2021-12-01 19:00:00 |    1 | Python      |        4 |
|     102 |         1 | 2021-12-01 19:05:00 |   -1 | Python      |        3 |
|     100 |         1 | 2021-12-01 19:28:00 |   -1 | Python      |        2 |
|     100 |         1 | 2021-12-01 19:30:00 |    1 | Python      |        3 |
|     100 |         1 | 2021-12-01 19:53:00 |   -1 | Python      |        2 |
|     101 |         1 | 2021-12-01 20:55:00 |   -1 | Python      |        1 |
|     104 |         1 | 2021-12-01 20:59:00 |   -1 | Python      |        0 | # 按照课程id 将数据聚合在一个区间
|     102 |         3 | 2021-12-03 18:58:00 |    1 | R           |        1 |
|     100 |         3 | 2021-12-03 19:01:00 |    1 | R           |        3 |
|     108 |         3 | 2021-12-03 19:01:00 |    1 | R           |        3 |
|     102 |         3 | 2021-12-03 19:05:00 |   -1 | R           |        2 |
|     108 |         3 | 2021-12-03 19:56:00 |   -1 | R           |        1 |
|     100 |         3 | 2021-12-03 21:00:00 |   -1 | R           |        0 |
|     102 |         2 | 2021-12-02 18:55:00 |    1 | SQL         |        1 |
|     104 |         2 | 2021-12-02 18:57:00 |    1 | SQL         |        2 |
|     101 |         2 | 2021-12-02 19:05:00 |    1 | SQL         |        3 |
|     107 |         2 | 2021-12-02 19:10:00 |    1 | SQL         |        4 |
|     107 |         2 | 2021-12-02 19:18:00 |   -1 | SQL         |        3 |
|     104 |         2 | 2021-12-02 20:56:00 |   -1 | SQL         |        2 |
|     101 |         2 | 2021-12-02 20:58:00 |   -1 | SQL         |        1 |
|     102 |         2 | 2021-12-02 21:00:00 |   -1 | SQL         |        0 |
+---------+-----------+---------------------+------+-------------+----------+
24 rows in set (0.00 sec)
WITH a AS ( SELECT course_id, in_datetime AS time_index, 1 flag FROM attend_tb UNION ALL SELECT course_id, out_datetime AS time_index, - 1 flag FROM attend_tb ),
b AS ( SELECT a.*, c.course_name FROM a LEFT JOIN course_tb c ON a.course_id = c.course_id ),
c AS ( SELECT course_id, course_name, sum( flag ) over ( PARTITION BY course_name ORDER BY time_index, flag DESC ) AS on_lines FROM b ) SELECT
course_id,
course_name,
max( on_lines ) AS max_num 
FROM
	c 
GROUP BY
	course_id,
	course_name 
ORDER BY
	course_id;

+-----------+-------------+---------+
| course_id | course_name | max_num |
+-----------+-------------+---------+
|         1 | Python      |       4 |
|         2 | SQL         |       4 |
|         3 | R           |       3 |
+-----------+-------------+---------+
3 rows in set (0.00 sec)

SQL184 顾客连续购买

某宝店铺连续2天及以上购物的用户及其对应的天数

drop table if exists sales_tb;
CREATE TABLE sales_tb(
sales_date date NOT NULL,
user_id int(10) NOT NULL,
item_id char(10) NOT NULL,
sales_num int(10) NOT NULL,
sales_price int(10) NOT NULL
);

INSERT INTO sales_tb VALUES('2021-11-1', 1, 'A001',  1, 90);
INSERT INTO sales_tb VALUES('2021-11-1', 2, 'A002',  2, 220);
INSERT INTO sales_tb VALUES('2021-11-1', 2, 'B001',  1, 120);
INSERT INTO sales_tb VALUES('2021-11-2', 3, 'C001',  2, 500);
INSERT INTO sales_tb VALUES('2021-11-2', 4, 'B001',  1, 120);
INSERT INTO sales_tb VALUES('2021-11-3', 5, 'C001',  1, 240);
INSERT INTO sales_tb VALUES('2021-11-3', 6, 'C002',  1, 270);
INSERT INTO sales_tb VALUES('2021-11-4', 7, 'A003',  1, 180);
INSERT INTO sales_tb VALUES('2021-11-4', 8, 'B002',  1, 140);
INSERT INTO sales_tb VALUES('2021-11-4', 9, 'B001',  1, 125);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'B003',  1, 120);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'B004',  1, 150);
INSERT INTO sales_tb VALUES('2021-11-5', 10, 'A003',  1, 180);
INSERT INTO sales_tb VALUES('2021-11-6', 11, 'B003',  1, 120);
INSERT INTO sales_tb VALUES('2021-11-6', 10, 'B004',  1, 150);

# 插入多一行测试数据
INSERT INTO sales_tb VALUES('2021-11-8', 10, 'B004',  1, 150); # 8

INSERT INTO sales_tb VALUES('2021-11-9', 10, 'B004',  1, 150);

描述

11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。

11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):

sales_date user_id item_id sales_num sales_price
2021-11-01 1 A001 1 90
2021-11-01 2 A002 2 220
2021-11-01 2 B001 1 120
2021-11-02 3 C001 2 500
2021-11-02 4 B001 1 120
2021-11-03 5 C001 1 240
2021-11-03 6 C002 1 270
2021-11-04 7 A003 1 180
2021-11-04 8 B002 1 140
2021-11-04 9 B001 1 125
2021-11-05 10 B003 1 120
2021-11-05 10 B004 1 150
2021-11-05 10 A003 1 180
2021-11-06 11 B003 1 120
2021-11-06 10 B004 1 150

请你统计连续2天以上在该店铺购物的用户及其对应的次数(若有多个用户,按user_id升序排序),以上例子的输出结果如下:

user_id days_count
10 2

解答W

# 用于处理这种非指定区间的跨行
WITH tmp AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
tmp_a AS ( SELECT *, dense_rank() over ( PARTITION BY user_id ORDER BY sales_date ) AS r FROM tmp ),
tmp_b AS ( SELECT *, DATE_SUB( sales_date, INTERVAL r DAY ) AS sub FROM tmp_a ) SELECT
tmp_b.user_id,
count(*) AS c 
FROM
	tmp_b 
GROUP BY
	tmp_b.user_id,
	tmp_b.sub;

# 加入了上面两行作为测试
+---------+---+
| user_id | c |
+---------+---+
|       1 | 1 |
|       2 | 1 |
|       3 | 1 |
|       4 | 1 |
|       5 | 1 |
|       6 | 1 |
|       7 | 1 |
|       8 | 1 |
|       9 | 1 |
|      10 | 2 |
|      10 | 2 |
|      11 | 1 |
+---------+---+
12 rows in set (0.00 sec)

滑动窗口在处理这道题所遇到的各种问题问题.

# 这里只是简单列出发生过连续两天的数据
# 假如希望取到最长连续数据的用户?(不中断, 而1,2,3..; 1,3,5不行)
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
	SELECT
		user_id,                                                                     # 受限于1 (N) 
		count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN INTERVAL 1 DAY preceding AND current ROW ) AS ic 
	FROM
		a 
	) SELECT
	user_id,
	ic 
FROM
	b 
WHERE
	ic > 1;
# 
+---------+----+
| user_id | ic |
+---------+----+
|      10 |  2 |
+---------+----+
2 rows in set (0.00 sec)

# 插入多一行的测试数据
+---------+----+
| user_id | ic |
+---------+----+
|      10 |  2 |
|      10 |  2 |
+---------+----+
2 rows in set (0.00 sec)
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
	SELECT
		user_id,
		count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN unbounded preceding AND current ROW ) AS ic 
	FROM
		a 
	) SELECT
	user_id,
	max( ic ) AS mic 
FROM
	b 
GROUP BY
	user_id 
HAVING
	mic > 1;
	
+---------+------+
| user_id | mic  |
+---------+------+
|      10 |    3 |
+---------+------+
1 row in set (0.00 sec)
# 就无法控制间隔为 1
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS (
	SELECT
		user_id,
		sales_date,
		count( user_id ) over ( PARTITION BY user_id ORDER BY sales_date RANGE BETWEEN unbounded preceding AND current ROW ) AS ic 
	FROM
		a 
	) 
	select * from b;
	
+---------+------------+----+
| user_id | sales_date | ic |
+---------+------------+----+
|       1 | 2021-11-01 |  1 |
|       1 | 2021-11-04 |  2 | # 不连续
|       2 | 2021-11-01 |  1 |
|       3 | 2021-11-02 |  1 |
|       4 | 2021-11-02 |  1 |
|       5 | 2021-11-03 |  1 |
|       6 | 2021-11-03 |  1 |
|       7 | 2021-11-04 |  1 |
|       8 | 2021-11-04 |  1 |
|       9 | 2021-11-04 |  1 |
|      10 | 2021-11-05 |  1 |
|      10 | 2021-11-06 |  2 |
|      10 | 2021-11-07 |  3 |
|      11 | 2021-11-06 |  1 |
+---------+------------+----+
14 rows in set (0.00 sec)
# 对间隔进行判断是否是隔天
# 还是存在问题
WITH a AS ( SELECT DISTINCT user_id, sales_date FROM sales_tb ),
b AS ( SELECT user_id, sales_date, LEAD( sales_date ) over ( PARTITION BY user_id ORDER BY sales_date ) AS next_date FROM a ),
c AS ( SELECT *, next_date - sales_date AS gap FROM b having next_date IS NOT NULL and gap = 1 ) SELECT
user_id,
sum( gap ) + 1 AS ic 
FROM
	c 
GROUP BY
	user_id;
+---------+------+
| user_id | ic   |
+---------+------+
|      10 |    3 |
+---------+------+
1 row in set (0.00 sec)

mysql> select * from sales_tb;
+------------+---------+---------+-----------+-------------+
| sales_date | user_id | item_id | sales_num | sales_price |
+------------+---------+---------+-----------+-------------+
| 2021-11-01 |       1 | A001    |         1 |          90 |
| 2021-11-01 |       2 | A002    |         2 |         220 |
| 2021-11-01 |       2 | B001    |         1 |         120 |
| 2021-11-02 |       3 | C001    |         2 |         500 |
| 2021-11-02 |       4 | B001    |         1 |         120 |
| 2021-11-03 |       5 | C001    |         1 |         240 |
| 2021-11-03 |       6 | C002    |         1 |         270 |
| 2021-11-04 |       7 | A003    |         1 |         180 |
| 2021-11-04 |       8 | B002    |         1 |         140 |
| 2021-11-04 |       9 | B001    |         1 |         125 |
| 2021-11-05 |      10 | B003    |         1 |         120 |
| 2021-11-05 |      10 | B004    |         1 |         150 | # 这里对数据进行修改 原来的 5 - 6
| 2021-11-05 |      10 | A003    |         1 |         180 | # 第一次改成 5 - 6- 7
| 2021-11-06 |      11 | B003    |         1 |         120 |
| 2021-11-06 |      10 | B004    |         1 |         150 |
| 2021-11-08 |      10 | B004    |         1 |         150 | # 5-6
| 2021-11-09 |      10 | B004    |         1 |         150 | # 这里对数据进行修改 5 - 6, 8-9
+------------+---------+---------+-----------+-------------+

SQL179 同时等车

各城市最大同时等车人数

DROP TABLE IF EXISTS tb_get_car_record,tb_get_car_order;
CREATE TABLE tb_get_car_record (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    city VARCHAR(10) NOT NULL COMMENT '城市',
    event_time datetime COMMENT '打车时间',
    end_time datetime COMMENT '打车结束时间',
    order_id INT COMMENT '订单号'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_get_car_order (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    order_id INT NOT NULL COMMENT '订单号',
    uid INT NOT NULL COMMENT '用户ID',
    driver_id INT NOT NULL COMMENT '司机ID',
    order_time datetime COMMENT '接单时间',
    start_time datetime COMMENT '开始计费的上车时间',
    finish_time datetime COMMENT '订单结束时间',
    mileage FLOAT COMMENT '行驶里程数',
    fare FLOAT COMMENT '费用',
    grade TINYINT COMMENT '评分'
) CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_get_car_record(uid, city, event_time, end_time, order_id) VALUES
 (108, '北京', '2021-10-20 08:00:00', '2021-10-20 08:00:40', 9008),
 (108, '北京', '2021-10-20 08:00:10', '2021-10-20 08:00:45', 9018),
 (102, '北京', '2021-10-20 08:00:30', '2021-10-20 08:00:50', 9002),
 (106, '北京', '2021-10-20 08:05:41', '2021-10-20 08:06:00', 9006),
 (103, '北京', '2021-10-20 08:05:50', '2021-10-20 08:07:10', 9003),
 (104, '北京', '2021-10-20 08:01:01', '2021-10-20 08:01:20', 9004),
 (103, '北京', '2021-10-20 08:01:15', '2021-10-20 08:01:30', 9019),
 (101, '北京', '2021-10-20 08:28:10', '2021-10-20 08:30:00', 9011);

INSERT INTO tb_get_car_order(order_id, uid, driver_id, order_time, start_time, finish_time, mileage, fare, grade) VALUES
 (9008, 108, 204, '2021-10-20 08:00:40', '2021-10-20 08:03:00', '2021-10-20 08:31:00', 13.2, 38, 4),
 (9018, 108, 214, '2021-10-20 08:00:45', '2021-10-20 08:04:50', '2021-10-20 08:21:00', 14, 38, 5),
 (9002, 102, 202, '2021-10-20 08:00:50', '2021-10-20 08:06:00', '2021-10-20 08:31:00', 10.0, 41.5, 5),
 (9006, 106, 203, '2021-10-20 08:06:00', '2021-10-20 08:09:00', '2021-10-20 08:31:00', 8.0, 25.5, 4),
 (9003, 103, 202, '2021-10-20 08:07:10', '2021-10-20 08:15:00', '2021-10-20 08:31:00', 11.0, 41.5, 4),
 (9004, 104, 202, '2021-10-20 08:01:20', '2021-10-20 08:13:00', '2021-10-20 08:31:00', 7.5, 22, 4),
 (9019, 103, 202, '2021-10-20 08:01:30', '2021-10-20 08:11:00', '2021-10-20 08:51:00', 10, 39, 4),
 (9011, 101, 211, '2021-10-20 08:30:00', '2021-10-20 08:31:00', '2021-10-20 08:54:00', 10, 35, 5);

描述

用户打车记录表tb_get_car_record

id uid city event_time end_time order_id
1 108 北京 2021-10-20 08:00:00 2021-10-20 08:00:40 9008
2 118 北京 2021-10-20 08:00:10 2021-10-20 08:00:45 9018
3 102 北京 2021-10-20 08:00:30 2021-10-20 08:00:50 9002
4 106 北京 2021-10-20 08:05:41 2021-10-20 08:06:00 9006
5 103 北京 2021-10-20 08:05:50 2021-10-20 08:07:10 9003
6 104 北京 2021-10-20 08:01:01 2021-10-20 08:01:20 9004
7 105 北京 2021-10-20 08:01:15 2021-10-20 08:01:30 9019
8 101 北京 2021-10-20 08:28:10 2021-10-20 08:30:00 9011

(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)

打车订单表tb_get_car_order

id order_id uid driver_id order_time start_time finish_time mileage fare grade
1 9008 108 204 2021-10-20 08:00:40 2021-10-20 08:03:00 2021-10-20 08:31:00 13.2 38 4
2 9018 108 214 2021-10-20 08:00:45 2021-10-20 08:04:50 2021-10-20 08:21:00 14 38 5
3 9002 102 202 2021-10-20 08:00:50 2021-10-20 08:06:00 2021-10-20 08:31:00 10 41.5 5
4 9006 106 206 2021-10-20 08:06:00 2021-10-20 08:09:00 2021-10-20 08:31:00 8 25.5 4
5 9003 103 203 2021-10-20 08:07:10 2021-10-20 08:15:00 2021-10-20 08:31:00 11 41.5 4
6 9004 104 204 2021-10-20 08:01:20 2021-10-20 08:13:00 2021-10-20 08:31:00 7.5 22 4
7 9019 105 205 2021-10-20 08:01:30 2021-10-20 08:11:00 2021-10-20 08:51:00 10 39 4
8 9011 101 211 2021-10-20 08:30:00 2021-10-20 08:31:00 2021-10-20 08:54:00 10 35 5

(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)

场景逻辑说明

  • 用户提交打车请求后,在用户打车记录表生成一条打车记录,订单号-order_id设为null
  • 当有司机接单时,在打车订单表生成一条订单,填充接单时间-order_time及其左边的字段,上车时间及其右边的字段全部为null,并把订单号和接单时间(打车结束时间)写入打车记录表;若一直无司机接单、超时或中途用户主动取消打车,则记录打车结束时间。
  • 若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的订单完成时间-finish_time填充为取消时间,其余字段设为null
  • 当司机接上乘客时,填充打车订单表中该订单的上车时间start_time
  • 当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。

问题:请统计各个城市在2021年10月期间,单日中最大的同时等车人数。

: 等车指从开始打车起,直到取消打车、取消等待或上车前的这段时间里用户的状态。

如果同一时刻有人停止等车,有人开始等车,等车人数记作先增加后减少。

结果按各城市最大等车人数升序排序,相同时按城市升序排序。

输出示例

示例结果如下

city max_wait_uv
北京 5

解释:由打车订单表可以得知北京2021年10月20日有8条打车记录,108号乘客从08:00:00等到08:03:00,118号乘客从08:00:10等到08:04:50....,由此得知08:02:00秒时刻,共有5人在等车。

解答W

# 将打车开始的时间和订单执行时间合并在一起
# 分别标记 1, -1
# 按照时间进行排列
# 显然当所有的时间都聚集在一起, 利用sum() (order by) 可以得到各个阶段的累积情况
WITH a AS (
	SELECT
		city,
		event_time as time_index,
		1 as status
	FROM
		tb_get_car_record UNION ALL
	SELECT
		city,
    	# 这一步的目的在于保证数据存在, 订单表中的finish_time 优先于 记录表中的end_time
		IFNULL( start_time, finish_time ) as time_index,
		- 1 as status
	FROM
		tb_get_car_order
		JOIN tb_get_car_record USING ( order_id ) 
	),
	b AS (
	SELECT
		city,
		time_index,
		SUM( status ) OVER ( PARTITION BY city, day(time_index) ORDER BY time_index, status DESC ) as sum_wait_num 
	FROM
		a 
	) SELECT
	city,
	MAX( sum_wait_num ) AS max_wait_uv 
FROM
	b 
WHERE
	date_format( time_index, '%Y-%m' ) = '2021-10'
GROUP BY
	city 
ORDER BY
	max_wait_uv,
	city;
+--------+-------------+
| city   | max_wait_uv |
+--------+-------------+
| 北京   |           5 |
+--------+-------------+
1 row in set (0.00 sec)
# 错误的思路, 错误进行最小值的比较, 这个只能得到区间的数据而不是全局(当天的最大值)
WITH a AS (
	SELECT
		t2.event_time,
		date ( event_time ) AS s_day,
	IF
		( t2.end_time = t1.order_time, t1.start_time, t2.end_time ) AS start_time,
		t2.city 
	FROM
		tb_get_car_order AS t1
		LEFT JOIN tb_get_car_record t2 ON t2.order_id = t1.order_id 
	WHERE
		date_format( t2.event_time, '%Y-%m' ) = '2021-10' 
	),
	b AS ( SELECT *, min( start_time ) over ( PARTITION BY city, s_day ) AS ms FROM a ),
	e AS ( SELECT city, sum( IF ( event_time > ms, 0, 1 ) ) AS ic FROM b GROUP BY city, s_day ) SELECT
	city,
	max( ic ) AS max_wait_uv 
FROM
	e 
GROUP BY
	city 
ORDER BY
	max_wait_uv,
	city;

相关函数

上述内容常用的函数.

日期相关

# 增加
mysql> select date_add("2021-01-01", interval 1 day);
+----------------------------------------+
| date_add("2021-01-01", interval 1 day) |
+----------------------------------------+
| 2021-01-02                             |
+----------------------------------------+
1 row in set (0.00 sec)

# 减少
mysql> select date_sub("2021-01-01", interval 1 day);
+----------------------------------------+
| date_sub("2021-01-01", interval 1 day) |
+----------------------------------------+
| 2020-12-31                             |
+----------------------------------------+
1 row in set (0.00 sec)

# 时间间隔的具体数值(单位, 年月日时分秒等)
mysql> select TIMESTAMPDIFF(second, '2022-01-01 19:00:52', '2022-01-01 19:00:57');
+---------------------------------------------------------------------+
| TIMESTAMPDIFF(second, '2022-01-01 19:00:52', '2022-01-01 19:00:57') |
+---------------------------------------------------------------------+
|                                                                   5 |
+---------------------------------------------------------------------+

# 间隔的天数(只限于day, 天数)
mysql> select datediff('2022-01-01 19:00:52', '2022-01-01 19:00:57');
+--------------------------------------------------------+
| datediff('2022-01-01 19:00:52', '2022-01-01 19:00:57') |
+--------------------------------------------------------+
|                                                      0 |
+--------------------------------------------------------+
1 row in set (0.00 sec)
1 row in set (0.00 sec)

# 只保留日期, 去掉时间
mysql> select date("2021-01-01 12:01:01");
+-----------------------------+
| date("2021-01-01 12:01:01") |
+-----------------------------+
| 2021-01-01                  |
+-----------------------------+
1 row in set (0.00 sec)

# 日期格式化
mysql> select date_format("2021-01-01 12:01:01", '%Y%m');
+--------------------------------------------+
| date_format("2021-01-01 12:01:01", '%Y%m') |
+--------------------------------------------+
| 202101                                     |
+--------------------------------------------+
1 row in set (0.00 sec)

# 字符串转日期/时间
mysql> SELECT STR_TO_DATE("August 10 2017", "%M %d %Y");
+-------------------------------------------+
| STR_TO_DATE("August 10 2017", "%M %d %Y") |
+-------------------------------------------+
| 2017-08-10                                |
+-------------------------------------------+
1 row in set (0.00 sec)

取值相关

# 处理null值, ifnull()的强化版本
# coalesce, 返回第一个不是null的值
mysql> select coalesce(null, 2, 1);
+----------------------+
| coalesce(null, 2, 1) |
+----------------------+
|                    2 |
+----------------------+
1 row in set (0.00 sec)

mysql> select coalesce(null, null, 1);
+-------------------------+
| coalesce(null, null, 1) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.00 sec)

mysql> select coalesce(1, 1, null);
+----------------------+
| coalesce(1, 1, null) |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

mysql> select coalesce(1, 2, null);
+----------------------+
| coalesce(1, 2, null) |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)
mysql> select ifnull(null, 1);
+-----------------+
| ifnull(null, 1) |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.00 sec)

mysql> select ifnull(1, 2);
+--------------+
| ifnull(1, 2) |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)