Glow Introduction

Glow Introduction

A map reduce system for Golang

Architecture: Resource Management

1. Agents run on each server. 2. Agents report resources to

master via heartbeats.

Master

Agent

Agent

Agent

Agent

Architecture: Resource Allocation

1. Driver asks Master for agents with resources

2. Driver asks assigned agents to run tasks

Driver

Master

Agent

Agent

Agent

Agent

Architecture: DAG execution

1. Driver divides tasks into DAG 2. One group of tasks is assigned

to one agent

Driver

Agent Tasks

Agent Tasks

Agent Tasks

Agent Tasks

Architecture: Data Flow

1. Outputs of tasks are saved by local agents

2. Driver remembers all data locations

3. Inputs of next group of tasks are pulled from the specified locations

Driver

Agent

Agent

Tasks Data

Tasks Data

Agent Tasks Data

Agent Tasks

Data

Architecture: DAG Optimization

Data are streamed to disk only when necessary:

1. when one task produces data for 2 or more tasks

2. when one task consumes data from 2 or more tasks

Internal: A lot of channels

Data flow between tasks via Go channels, Read remote data via Go channels. Write results to Go channels.

Distributed Mode vs Standalone mode

1. Standalone mode is efficient without disk IO.

Parallelize tasks via goroutines. No need for idiomatic but verbose sync/wait, etc

2. Use distributed mode when need to scale up.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download