基于回归和聚类的豆瓣电影分析
首发时间:2023-01-12
摘要:由于观影人数、观影习惯以及电影题材与电影的品质有着密切联系,因此本文利用大数据技术对豆瓣数据进行分析。所使用的数据均来自于豆瓣官网爬取。本文利用线性回归、热力图分别对豆瓣电影数据集中的电影评分、电影年份、参评人数进行关联分析,对电影的品质影响因素的分析具有积极作用。通过利用协同过滤算法,处理多位用户观影记录,为每一个用户推荐一个最适合的电影榜单。 本文利用k-means聚类算法合并了众多电影类型,最终合并为几个簇,大大简化了协同过滤算法。
关键词: 数据挖掘 电影品质 推荐榜单 线性回归 热力图分析 协同过滤算法
For information in English, please click here
Analysis of douban movies based on regression and clustering
Abstract:Since the number of moviegoers, movie-watching habits and movie subject matter are closely related to the quality of movies, we use big data technology to analyze Douban data. All the data used are from the official website of Douban. In this paper, linear regression and heat map are used to conduct correlation analysis on the film score, film year and number of participants in Douban film data set, which has a positive effect on the analysis of influencing factors of film quality. Through collaborative filtering algorithm, multiple users\' viewing records are processed to recommend a list of the most suitable movies for each user. At the same time, we use the k-means clustering algorithm to merge many movie types into several clusters, which greatly simplifies the collaborative filtering algorithm.
Keywords: data mining Film quality The recommended list Linear regression Thermal map analysis Collaborative filtering algorithm
基金:
引用
No.****
同行评议
勘误表
基于回归和聚类的豆瓣电影分析
评论
全部评论