SQL几道题回忆一下


各个视频的平均完播率

两张表

用户-视频互动表tb_user_video_log:记录视频的观看开始,结束事件,视频id号。

image-20220329151940127

短视频信息表tb_video_info:视频的id,时长,发布时间。

image-20220329151959924

问题:找2021年里有播放记录的每个视频的完播率,保留三位小数,并按完播率降序排序。

image-20220329152009748

步骤:

首先,要join表

SELECT a.video_id, ==xxx 1==
    AS avg_comp_play_rate
FROM tb_user_video_log a 
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
==xxx 2==
ORDER BY avg_comp_play_rate DESC;

然后问题就是1和2处该怎么写。由于要统计每个视频的完播率,所以要将每个视频的数据汇总在一起。

GROUP BY语句将一个或者多个列对结果集进行分组,在分组的列上可以使用COUNT,SUM,AVG等函数。

SELECT a.video_id, round(sum(视频看完的个数)/count(end_time),3)
    AS avg_comp_play_rate
FROM tb_user_video_log a 
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY a.video_id
ORDER BY avg_comp_play_rate DESC;

其中,count(end_time)count(start_time)都无所谓。

round(xxx,3)保留三位小数。

IF (expr1,expr2,expr3),如果expr1的值为true,则返回expr2的值,如果expr1的值为false, 则返回expr3的值。

所以:

SELECT a.video_id, round(sum(if(timestampdiff(second,start_time,end_time)>=duration,1,0))/count(end_time),3)
    AS avg_comp_play_rate
FROM tb_user_video_log a 
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY a.video_id
ORDER BY avg_comp_play_rate DESC;

平均播放进度大于60%的视频类别

原来时间相减一直结果不对,要用timestampdiff。

SELECT tag,concat(round(sum(timestampdiff(second,start_time,end_time)/duration)/count(start_time)*100,2),'%')
    AS avg_play_progress
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
GROUP BY tag
ORDER BY avg_play_progress DESC;

但是还有两个问题,一个是播放时长超过duration问题,一个是60%问题,按照这样的拼接形式,无法比较大小。

HAVING WHERE区别:where在分组前过滤不符合条件的行,HAVING过滤分组后的组。

结果:

SELECT tag,concat(round(avg_play_progress,2),'%') AS avg_play_progress
FROM 
    (
    SELECT tag, sum(
        if(timestampdiff(second,start_time,end_time)>duration,1,
           timestampdiff(second,start_time,end_time)/duration
            ))/count(start_time)*100
    AS avg_play_progress
    FROM tb_user_video_log a
    LEFT JOIN tb_video_info b
    ON a.video_id = b.video_id
    GROUP BY tag
    HAVING avg_play_progress > 60
    ORDER BY avg_play_progress DESC
) AS c;

每类视频近一个月的转发量

这个是当前时间的:

SELECT tag, 
    SUM(if_retweet) as retweet_cut, 
    round(SUM(if_retweet)/count(if_retweet),3) as retweet_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id
WHERE timestampdiff(day, now(),start_time)<=30
GROUP BY tag
ORDER BY retweet_rate DESC;

题目要求是有用户互动的一个月内。要用max,需要用到子查询,不然没法用。

SELECT tag, 
    SUM(if_retweet) as retweet_cut, 
round(SUM(if_retweet)/count(if_retweet),3) as retweet_rate
FROM tb_user_video_log a
LEFT JOIN tb_video_info b
ON a.video_id = b.video_id

WHERE start_time > (
    SELECT DATE_SUB(MAX(start_time),INTERVAL 30 DAY) FROM tb_user_video_log
)


GROUP BY tag
ORDER BY retweet_rate DESC

每个创作者每月的涨粉率及截止当前的总粉丝量

如果只是每个月新增的粉丝量:

SELECT author,
    DATE_FORMAT(end_time,"%Y-%m") as month,
    round(SUM(if(if_follow=2,-1,if_follow))/count(*),3) as fans_growth_rate,
    SUM(if(if_follow=2,-1,if_follow)) as total_fans
FROM tb_user_video_log a LEFT JOIN tb_video_info b
ON a.video_id = b.video_id WHERE year(a.start_time)=2021
GROUP BY author,month
ORDER BY author,total_fans;

需要用到累计求和:sum over。over(partition by)只是单纯的将同一个分类里的数值相加,但是如果后面跟上order by或者order by desc的话,前面的sum()值便会在同一个组内按照排名进行数值累加,

SELECT author,
    DATE_FORMAT(end_time,"%Y-%m") as month,
    round(SUM(if(if_follow=2,-1,if_follow))/count(*),3) as fans_growth_rate,
    sum(SUM(if(if_follow=2,-1,if_follow))) over (partition by author order by DATE_FORMAT(end_time,"%Y-%m")) as total_fans
FROM tb_user_video_log a LEFT JOIN tb_video_info b
ON a.video_id = b.video_id WHERE year(a.start_time)=2021
GROUP BY author,month
ORDER BY author,total_fans;

==几道面试题==

查找student表里重名的学生。

SELECT id,name 
FROM student
WHERE name in (
    SELECT name FROM student group by name HAVING COUNT(*)>1
) as c ORDER BY name;

查找course表中平均分不及格的学生,列出id和平均分

SELECT id,AVG(score) as avg_score
FROM course 
GROUP BY id
HAVING avg_score<60;

在course表中查询每门课成绩都不低于80的学生id

SELECT id 
NOT IN (
    SELECT DISTINCT(id)
    FROM course
    WHERE score<80
) AS tmp;