SQL几道题回忆一下
各个视频的平均完播率
两张表:
用户-视频互动表tb_user_video_log:记录视频的观看开始,结束事件,视频id号。
短视频信息表tb_video_info:视频的id,时长,发布时间。
问题:找2021年里有播放记录的每个视频的完播率,保留三位小数,并按完播率降序排序。
步骤:
首先,要join表
SELECT a.video_id, ==xxx 1==
AS avg_comp_play_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
==xxx 2==
ORDER BY avg_comp_play_rate DESC;
然后问题就是1和2处该怎么写。由于要统计每个视频的完播率,所以要将每个视频的数据汇总在一起。
GROUP BY语句将一个或者多个列对结果集进行分组,在分组的列上可以使用COUNT,SUM,AVG等函数。
SELECT a.video_id, round(sum(视频看完的个数)/count(end_time),3)
AS avg_comp_play_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY a.video_id
ORDER BY avg_comp_play_rate DESC;
其中,count(end_time)
和count(start_time)
都无所谓。
round(xxx,3)
保留三位小数。
IF (expr1,expr2,expr3),如果expr1的值为true,则返回expr2的值,如果expr1的值为false, 则返回expr3的值。
所以:
SELECT a.video_id, round(sum(if(timestampdiff(second,start_time,end_time)>=duration,1,0))/count(end_time),3)
AS avg_comp_play_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY a.video_id
ORDER BY avg_comp_play_rate DESC;
平均播放进度大于60%的视频类别
原来时间相减一直结果不对,要用timestampdiff。
SELECT tag,concat(round(sum(timestampdiff(second,start_time,end_time)/duration)/count(start_time)*100,2),'%')
AS avg_play_progress
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY tag
ORDER BY avg_play_progress DESC;
但是还有两个问题,一个是播放时长超过duration问题,一个是60%问题,按照这样的拼接形式,无法比较大小。
HAVING WHERE区别:where在分组前过滤不符合条件的行,HAVING过滤分组后的组。
结果:
SELECT tag,concat(round(avg_play_progress,2),'%') AS avg_play_progress
FROM
(
SELECT tag, sum(
if(timestampdiff(second,start_time,end_time)>duration,1,
timestampdiff(second,start_time,end_time)/duration
))/count(start_time)*100
AS avg_play_progress
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY tag
HAVING avg_play_progress > 60
ORDER BY avg_play_progress DESC
) AS c;
每类视频近一个月的转发量
这个是当前时间的:
SELECT tag,
SUM(if_retweet) as retweet_cut,
round(SUM(if_retweet)/count(if_retweet),3) as retweet_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
WHERE timestampdiff(day, now(),start_time)<=30
GROUP BY tag
ORDER BY retweet_rate DESC;
题目要求是有用户互动的一个月内。要用max,需要用到子查询,不然没法用。
SELECT tag,
SUM(if_retweet) as retweet_cut,
round(SUM(if_retweet)/count(if_retweet),3) as retweet_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
WHERE start_time > (
SELECT DATE_SUB(MAX(start_time),INTERVAL 30 DAY) FROM tb_user_video_log
)
GROUP BY tag
ORDER BY retweet_rate DESC
每个创作者每月的涨粉率及截止当前的总粉丝量
如果只是每个月新增的粉丝量:
SELECT author,
DATE_FORMAT(end_time,"%Y-%m") as month,
round(SUM(if(if_follow=2,-1,if_follow))/count(*),3) as fans_growth_rate,
SUM(if(if_follow=2,-1,if_follow)) as total_fans
FROM tb_user_video_log a LEFT JOIN tb_video_info b
ON a.video_id = b.video_id WHERE year(a.start_time)=2021
GROUP BY author,month
ORDER BY author,total_fans;
需要用到累计求和:sum over。over(partition by)只是单纯的将同一个分类里的数值相加,但是如果后面跟上order by或者order by desc的话,前面的sum()值便会在同一个组内按照排名进行数值累加,
SELECT author,
DATE_FORMAT(end_time,"%Y-%m") as month,
round(SUM(if(if_follow=2,-1,if_follow))/count(*),3) as fans_growth_rate,
sum(SUM(if(if_follow=2,-1,if_follow))) over (partition by author order by DATE_FORMAT(end_time,"%Y-%m")) as total_fans
FROM tb_user_video_log a LEFT JOIN tb_video_info b
ON a.video_id = b.video_id WHERE year(a.start_time)=2021
GROUP BY author,month
ORDER BY author,total_fans;
==几道面试题==
查找student表里重名的学生。
SELECT id,name
FROM student
WHERE name in (
SELECT name FROM student group by name HAVING COUNT(*)>1
) as c ORDER BY name;
查找course表中平均分不及格的学生,列出id和平均分
SELECT id,AVG(score) as avg_score
FROM course
GROUP BY id
HAVING avg_score<60;
在course表中查询每门课成绩都不低于80的学生id
SELECT id
NOT IN (
SELECT DISTINCT(id)
FROM course
WHERE score<80
) AS tmp;