Självständigt arbete på grundnivå
Sj?lvst?ndigt arbete p? grundniv?
Independent degree project - first cycle
Datateknik
Computer Engineering
Big Data och Hadoop
N?sta generation av lagring
Johan Lindberg
Big Data och Hadoop ¨C N?sta generation av lagring
Johan Lindberg
MITTUNIVERSITETET
Institutionen f?r informationsteknologi och medier (ITM)
Examinator: Ulf Jennehag, ulf.jennehag@miun.se
Handledare: Martin Kjellqvist, martin.kjellqvist@miun.se
F?rfattare: Johan Lindberg, joli1400@student.miun.se
Utbildningsprogram: Datateknik, 180 hp
Huvudomr?de: Datateknik
Termin, ?r: VT, 2017
ii
2017-06-12
Big Data och Hadoop ¨C N?sta generation av lagring
Johan Lindberg
2017-06-12
Sammanfattning
M?let med rapporten och unders?kningen ?r att p? en teoretisk niv? unders?ka
m?jligheterna f?r F?rs?kringskassan IT att byta plattform f?r lagring av data
och information som anv?nds i deras dagliga arbete. F?rs?kringskassan samlar
p? sig oerh?rda m?ngder data p? daglig basis inneh?llandes allt fr?n personuppgifter, programkod, utbetalningar och kundtj?nst?renden. Idag lagrar man allt
detta i stora relationsdatabaser vilket leder till problem med skalbarhet och prestanda. Den nya plattformen som unders?ks bygger p? en lagringsteknik vid
namn Hadoop. Hadoop ?r utvecklat f?r att b?de lagra och processerna data distribuerat ?ver s? kallade kluster best?ende av billigare serverh?rdvara. Plattformen utlovar n?st intill linj?r skalbarhet, m?jlighet att lagra all data med h?g feltolerans samt att hantera enorma datam?ngder. Unders?kningen genomf?rs genom teoristudier och ett proof of concept. Teoristudierna fokuserar p? bakgrunden p? Hadoop, dess uppbyggnad och struktur samt hur framtiden ser ut. Dagens uppl?gg f?r lagring hos F?rs?kringskassan specificeras och j?mf?rs med
den nya plattformen. Ett proof of concept genomf?rs p? en testmilj? hos F?rs?kringskassan d?r en Hadoop plattform fr?n Hortonworks anv?nds f?r att p?visa hur lagring kan fungera samt att s? kallad ostrukturerad data kan lagras. Unders?kningen p?visar inga teoretiska problem i att byta till den nya plattformen.
Dock identifieras ett behov av att flytta hanteringen av data fr?n inl?sning till
utl?sning. Detta beror p? att dagens l?sning med relationsdatabaser kr?ver v?l
strukturerad data f?r att kunna lagra den medan Hadoop kan lagra allt utan n?gon struktur. D?remot kr?ver Hadoop mer handp?l?ggning n?r det kommer till
att h?mta data och arbeta med den.
Nyckelord: Big Data, Hadoop, Hortonworks
iii
Big Data och Hadoop ¨C N?sta generation av lagring
Johan Lindberg
2017-06-12
Abstract
The goal of this report and study is to at a theoretical level determine the possibilities for F?rs?kringskassan IT to change platform for storage of data used in
their daily activities. F?rs?kringskassan collects immense amounts of data everyday containing personal information, lines of programming code, payments
and customer service tickets. Today, everything is stored in large relationship
databases which leads to problems with scalability and performance. The new
platform studied in this report is built on a storage technology named Hadoop.
Hadoop is developed to store and process data distributed in what is called clusters. Clusters that consists of commodity server hardware. The platform promises near linear scalability, possibility to store all data with a high fault tolerance
and that it can handle massive amounts of data. The study is done through theoretical studies as well as a proof of concept. The theory studies focus on the
background of Hadoop, it¡¯s structure and what to expect in the future. The platform being used at F?rs?kringskassan today is to be specified and compared to
the new platform. A proof of concept will be conducted in a test environment at
F?rs?kringskassan running a Hadoop platform from Hortonworks. Its purpose is
to show how storing data is done as well as to show that unstructured data can
be stored. The study shows that no theoretical problems have been found and
that a move to the new platform should be possible. It does however move handling of the data from before storage to after. This is because todays platform is
reliant on relationship databases that require data to be structured neatly to be
stored. Hadoop however stores all data but require more work and knowledge
to retrieve the data.
Keywords: Big Data, Hadoop, Hortonworks.
iv
Big Data och Hadoop ¨C N?sta generation av lagring
Johan Lindberg
2017-06-12
F?rord
Stort tack till Andreas Henningsson som varit min handledare p? F?rs?kringskassan IT under detta projekt.
Tack till alla ?vriga p? F?rs?kringskassan som varit behj?lpliga och svarat p?
mina ih?rdiga f?rfr?gningar.
Tack till Martin Kjellqvist och Ulf Jennehag p? Mittuniversitetet.
v
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- the hadoop distributed file system architecture and design
- challenges and issues in big data analytics bda
- a novel parallel approach of cuckoo search using mapreduce
- cloudera hadoop administration guide pdf
- hadoop distributed file system mailing lists
- hdfs hadoop distributed file system
- survey on frame works for distributed computing hadoop
- the hadoop distributed file system
- självständigt arbete på grundnivå
- a proposed rack aware model for high availability of