SCOPE: Easy and Efficient Parallel Processing of …
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. @ {Microsoft Corporation} PVLDB, 2008
Presented by: G?ne Alu?
08 March 2010
1
Problem
(a) accumulation of massive data sets search logs, web content collected by crawlers, ad-click streams, etc.
necessitates the development of cost-efficient distributed storage solutions: GFS, BigTable, ... (i.e. exploit large clusters of commodity hardware)
(b) business value in analyzing massive data sets better ad-placement, improved service (e.g. web search), data-mining opportunities, fraudulent activity detection, etc.
necessitates the development of distributed computing frameworks: MapReduce, Hadoop, ...
(c) the need to describe and execute ad-hoc large-scale data analysis tasks in-house experiments
necessitates the development of high-level distributed dataflow languages: PigLatin, Dryad, SCOPE
08 March 2010
2
Focus
a declarative and extensible scripting language: SCOPE "(S)tructured (C)omputations (O)ptimized for (P)arallel (E)xecution"
? Declarative: users describe large-scale data analysis tasks as a flow of data transformations, w/o worrying about how they are parallelized on the underlying platform
? Extensible: user-defined functions and operators
? Structured Computations: data transformations consume and produce "rowsets" that conform to a schema
? Optimized for Parallel Execution: ??? plan optimization not explicitly discussed in this paper
08 March 2010
3
Yet Another High-Level Language for Large-Scale Data Analysis?
A hybrid scripting language supporting not only user-defined map-reducemerge operations, but also SQL-flavored constructs to define large-scale data analysis tasks
How about PigLatin? ? Somewhere in between SQL and MapReduce
? Has support for a nested data model
08 March 2010
4
Overview
SCOPE Scripts
EXTRACT OUTPUT
COSMOS files
regular files external storage
{int/long/double/float/ dateTime/string/bool/...}
schema IN_1
. . . schema
IN_K
schema OUT_1
? -
PROCESS
REDUCE
COMBINE
data
built-in/ rowset(s)
source(s) custom
extractors
dataflow
COSMOS files
user-defined functions user-defined operators
regular files external storage
built-in/ custom outputters
data sink(s)
COSMOS Execution Environment
08 March 2010
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- t sql data types
- scope easy and efficient parallel processing of
- primitive data types
- oracle to bigquery sql translation reference
- how to convert sql from the microsoft sql server database
- string functions create a stored procedure create
- data ypes t home springer
- destring — convert string variables to numeric
- fixing problems with numeric text fields using the field
Related searches
- easy and fast science experiments
- easy and free resume builder
- parallel processing with python
- python parallel processing example
- python parallel processing for loop
- rifle scope basics and terminology
- easy and quick dinner recipe
- easy and quick meal ideas for dinner
- effective and efficient management
- effective and efficient organization
- easy and fun arts and craft ideas
- 100 easy and quick recipes dinner