{$cfg_webname}
主页 > 计算机 > 论文 >

基于Hadoop的大数据存储策略研究

来源:56doc.com  资料编号:5D21006 资料等级:★★★★★ %E8%B5%84%E6%96%99%E7%BC%96%E5%8F%B7%EF%BC%9A5D21006
资料以网页介绍的为准,下载后不会有水印.资料仅供学习参考之用. 帮助
资料介绍

基于Hadoop的大数据存储策略研究(任务书,开题报告,论文13000字)
摘要
现阶段,我们处在了大数据时代,然而随着网络通信和信息技术的不断发展与普及,全球数据量呈现出爆炸式增长。面对前所未有的数据存储压力,分布式存储系统应用而生,有效地整合和利用分布在不同地理位置并通过网络相连的存储资源。利用廉价商用机器取代传统高昂硬件组已经成为了趋势必然。然而,其实对于分布式存储系统的研究在近些年从未停止,Hadoop自2004年发行以来,不断发展,其内部化结构不断完善,Hive子项目开发,淘宝云梯的建立,其对大量数据处理速度不断提升,多家大型企业包括雅虎,戴尔等推出Hadoop解决方案,其版本不断开发完善,使其在我们生活和实践中获得了更高的地位。目前,分布式存储在国内外发展的越来越好,因此Hadoop作为其一个主要的开源开发平台,对其存储策略进行研究有着重要的意义。
然而,虽然分布式存储这个概念其实已经深入人心,但是我们对于Hadoop的认知,对于大数据的存储方式了解还不够深入,因此,本论文旨在分析目前的存储现状,搭建合理的结构化Hadoop平台,基于Hadoop平台对大数据存储策略进行分析模拟。
本文首先分析了分布式存储方式的产生以及其国内外研究现状。紧接着介绍了存储系统的历史、发展,讨论了传统存储方式、市面上常见分布式存储例如GFS等、与基于Hadoop的分布式存储方式的共同性和差异性,随后搭建了Hadoop的运载环境,引入了Hadoop分布式存储系统的原理和技术、强调其一式三份的特色,随后分析了Hadoop分布式文件系统中的文件存取方式,异常修复的方式,并且探索了在分布式存储之后面临着数据分析、数据处理甚至涉及之后的云计算问题的情况,因此引入了MapReduce的编程模型,并利用基础的wordcount对其进行了测试,最终利用其实现了单词统计的功能,本研究对了解大数据存储方式有着较大的帮助,对了解并且使用Hadoop进行云存储乃至云计算的学习和理解具有重要的意义。
关键词:Hadoop;分布式存储;HDFS;云存储;大数据

Abstract
At this stage, we are in the era of big data. However, with the continuous development and popularization of network communication and information technology, the amount of global data has exploded.Faced with unprecedented data storage pressures, distributed storage systems are being used to effectively integrate and utilize storage resources that are geographically connected and connected across the network. Replacing traditional high-end hardware groups with cheap commercial machines has become a trend. However, in fact, the research on distributed storage systems has never stopped in recent years. Since it’s first established in 2004, Hadoop has been continuously developed, its internalization structure has been continuously improved, Hive subproject development, Taobao ladder construction, and its massive data processing. The speed continues to increase, and many large enterprises, including Yahoo and Dell, have launched Hadoop solutions, and their versions have been continuously developed to achieve a higher status in our lives and practices. Recently, distributed storage is getting better and better at home and abroad. Therefore, Hadoop considered as one of its main open source developing platforms, has important significance in researching its storage strategy.
However, although the concept of distributed storage is already deeply rooted in people's minds, our understanding of Hadoop and the understanding of how big data is stored is not deep enough. Therefore, this paper aims to analyze the current storage status and build a reasonable structured Hadoop. Platform, based on the Hadoop platform to analyze and simulate big data storage strategies.
This paper first analyzes the generation of distributed storage methods and their research status at home and abroad. Then, the history and development of the storage system are introduced. The common storage and storage methods, such as GFS and other distributed storage methods based on Hadoop are discussed. Then the Hadoop carrier environment is built. Introduced the principle and technology of Hadoop distributed storage system, emphasizing its features in three copies, then analyzed the file access methods in Hadoop distributed file system, the way of abnormal repair, and explored the face of distributed storage. Data analysis, data processing and even the subsequent cloud computing problems, so introduced the MapReduce programming model, and used the basic wordcount to test it, and finally use it to achieve the function of word statistics, this study is to understand big data The storage method is of great help, and it is of great significance to understand and use Hadoop for the learning and understanding of cloud storage and even cloud computing.
Key Words:Hadoop;Distributed Storage System;HDFS;Cloud Storage;Big Data
 

基于Hadoop的大数据存储策略研究


目录
第1章绪论    1
1.1 目的及意义    1
1.2 国内外研究现状分析    2
1.3 研究(设计)的基本内容、目标    2
第2章存储策略研究    4
2.1 传统存储方式    4
2.2 分布式存储系统    5
2.2.1 分布式存储系统简介    5
2.2.2 Hadoop分布式存储系统    6
2.3 Hadoop版本选择    6
2.4 Hadoop完全分布式搭建    7
第3章 HDFS技术    9
3.1 HDFS整体架构    9
3.2 HDFS基本概念    10
3.2.1 数据块    10
3.2.2 主节点    10
3.2.3 数据节点    10
3.3 HDFS特点    11
3.4 HDFS通信协议    11
3.4.1 RPC通信协议调用    11
3.4.2 Client Protocol    12
3.4.3 DataNode Protocol    12
3.5 HDFS文件操作    12
3.5.1 文件写入    12
3.5.2 文件读取    13
3.5.3 心跳检测    13
3.6 文件读取异常与恢复    14
3.6.1 读取文件异常    14
3.6.2 写入文件异常    14
3.7 HDFS的扩展    15
第4章 Mapreduce介绍    16
4.1 MapReduce编程模型    16
4.2 MapReduce流程分析    17
4.3 MapReduce的Task运行    17
第五章 Wordcount实例测试    19
5.1 运行流程图    19
5.2 本地文件创建    19
5.4 测试运行    20
5.5 结果展示    21
5.6 结果分析    21
第6章总结    23
参考文献    24
致谢    26

推荐资料