基于数据挖掘的识别网络恶意流量分析算法研究(论文12000字)
摘要:随着互联网应用的越来越广泛,互联网技术的发展也越来越迅速,互联网方便了人类的生活,人们从中获益颇多。互联网在造福人类的同时,网络安全的问题也层出不穷,新的攻击逐渐增多,严重威胁了人们的生活。特别在大数据时代,网络安全这个问题越来越受到人们的关注。
人们长期以来一直在研究网络安全,随着技术的进步和时代的发展,网络安全处理问题不断得到更新。目前研究人员在综合前人研究成果的基础上取得了很大的成就,但同时存在着效率低、检测率低等问题。如今,大量的数据需要去处理,这些数据的速度还很快,因此,需要进一步深入对网络恶意流量检测的研究。
为了提高网络恶意流量检测算法的性能,以检测率和误报率作为出发点,尝试寻找到一种算法既能提高检测率也能降低误报率。首先,综述了网络恶意流量的相关知识。然后,分析了数据挖掘在网络恶意流量检测中的应用。接着分析对比了三种K-means算法,在此基础上,为了降低算法复杂度,改进了二分K-means算法,并利用改进后的算法建立出正常的网络行为模型。最后,研究分析TCM-KNN算法,并利用该算法思想对待检测数据进行恶意流量检测。
本文选用了KDDCup99数据集对算法的性能进行实验验证。首先对数据集进行预处理。然后对数据进行实验检测,实验结果表明与传统的检测算法相比较,TCM-KNN算法在进行异常检测时,既能保持较高的检测率,也能保证较低的误报率,很大程度上改善了算法的性能。
关键词:数据挖掘;网络恶意流量检测;改进的二分K-means算法;TCM-KNN算法
Research on Identifying Malicious Traffic Analysis Algorithm Based on Data Mining
Abstract:With the increasing use of the Internet, the Internet technology is developing at a high level. The Internet has not only facilitated human life, but also benefited a lot. While the Internet is benefiting mankind, the problems of network security are also emerging, and new attacks are gradually increasing, which has brought great harm to people's lives. Especially in the era of big data times, the issue of network security is attracting more and more attention.
People have been studying network security for a long time. With the advancement of technology and the development of the times, the problem of network security processing has been continuously updated. At present, researchers have made great achievements on the basis of comprehensive research results, but at the same time there are problems such as low efficiency and low detection rate. Nowadays, the data that needs to be processed is fast and huge, so the research on network anomaly detection needs further research.
In order to improve the efficiency of building normal behavior model and detection rate ,and reduce the rate of false positives. First of all, the related knowledge of malicious network traffic is reviewed. Then, the application of data mining in network malicious traffic detection is analyzed. Then the three K-means algorithms are analyzed and compared, and the modified binary K-means algorithm of feature attribute index is used to establish normal behavior characteristics of the training set. Finally, according to the characteristics of normal behavior training set to check the detection data, the TCM-KNN algorithm is studied and analyzed, and the algorithm is used to complete the anomaly detection of the data to be tested.
In order to verify the algorithm, this paper selects the KDDCup99 data set.The data set is preprocessed first, and the preprocessing is divided into two steps : numerical standardization and numerical normalization. Then the experimental data is tested. It is showed in the results of the experiments that compared with the traditional detection algorithm, the balance between the detection rate and the false alarm rate of the TCM-KNN algorithm is much better than that of the traditional detection algorithm.
Key words: data mining; network malicious traffic; anomaly detection; TCM-KNN algorithm
目录
1 绪论 1
1.1研究背景 1
1.2研究意义 1
1.3网络恶意流量检测研究现状 1
2 网络恶意流量 3
2.1网络恶意流量的定义 3
2.2网络恶意流量的产生 3
2.3危害网络的恶意流量分析 3
2.4恶意流量检测技术分类 4
3基于数据挖掘的网络恶意流量检测 6
3.1数据挖掘与恶意流量检测 6
3.2数据挖掘在网络恶意流量检测中的应用 6
3.3基于k-means的网络恶意流量检测方法 7
3.3.1 K-means聚类算法 7
3.3.2二分K-means聚类算法 8
3.3.3改进的二分K-means聚类算法 9
3.4基于TCM-KNN的网络恶意流量检测方法 9
3.4.1TCM-KNN网络恶意流量检测算法原理 9
3.4.2基于TCM-KNN网络恶意流量检测算法 10
3.5网络恶意流量检测算法整体设计 10
4实验验证及结果分析 11
4.1实验数据来源 11
4.2实验数据预处理 12
4.2.1预处理 12
4.2.2数值标准化 13
4.2.3数值归一化 13
4.2.4预处理代码实现 14
4.3实验验证及结果分析 16
4.3.1实验步骤 16
4.3.2实验结果及分析 16
5 总结 20
参考文献: 21
|