THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY



THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY

MSBD5005: Data Visualization

Final Examination

(Key)

Fall 2016

Instructor: Huamin Qu

Wednesday, 7 Dec 2016, 7:30pm - 22:30pm

Student Name: ___________________________________________

Student ID: ______________________________________________

Instructions:

1. This is a closed-book, closed-notes examination.

2. Write your name and student ID on this page. There are totally 12 pages and 7 questions.

Answer all questions in the space provided. The last two pages can be detached.

3. Please sign the following declaration:

Declaration of Academic Integrity

I confirm that I have answered the questions using only materials specifically approved for use in this examination, that all the answers are my own work, and that I have not received any assistance during the examination.

Signature: ____________________________________________

For Grading Purposes Only:

Problem 1 _______________/20

Problem 2 _______________/10

Problem 3 _______________/10

Problem 4 _______________/10

Problem 5 _______________/15

Problem 6 _______________/15

Problem 7 ________________/20

Total: ______________/ 100

1. Multiple Choice Questions [20 marks]

Some questions may have multiple correct answers. 2 marks for each question.

1. Regarding using color in visualization, which of the following is (are) NOT recommended?

A. Using red font over a blue background

B. Using rainbow color to encode temperature change

C. Using 26 colors to encode the 26 different departments in HKUST

D. Using complementary color combination to achieve color harmonization

Answer: _____ABC______ (1 correct choice: 1 mark; 2 correct choices: 1.5 marks; 3 correct choices: 2 marks; -0.5 mark for each wrong choice)

2. Which of the followings is (are) a divergent color scheme?

A. Rainbow

B. Red-black

C. Orange-blue

D. Green-blue

Answer: ____BCD_______ (1 correct choice: 1 mark; 2 correct choices: 1.5 marks; 3 correct choices: 2 marks; -0.5 mark for each wrong choice)

3. Which of the followings is (are) the principles suggested by Edward Tufte?

A. Resolution beats immersion

B. Consistent, linear scale

C. Avoid area, volume encoding

D. Function first, form next

Answer: ____BC______ (1 correct choice: 1 mark; 2 correct choices: 2 marks; -0.5 mark for each wrong choice)

4. Which of the following visual channels is (are) fully separable?

A. Size and Orientation

B. Color and Shape

C. Color and Motion

D. Color and Position

Answer: _____D_____ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

5. Which of the following visual channels is the LEAST precise way to encode scalar values?

A. Length

B. Position on common scale

C. Depth

D. Area

Answer: ____C_______ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

6. Which of the following statements is called the visualization mantra?

A. No unjustified 3D

B. No unjustified 2D

C. Eyes beat memory

D. Overview first, zoom and filter, details on demand

Answer: _____D______ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

7. Which of the following visualizations is (are) often used to reduce visual clutter in graphs?

A. Edge bundling

B. Node clustering

C. Graph Splatting

D. Volume rendering

Answer: _____ABC______ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

8. Which of the following visualizations is (are) often used to show spatio-temporal data?

A. Chernof faces

B. Scatter plot matrices

C. ThemeRiver

D. Geo-time

Answer: ____D______ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

9. Which of the followings is (are) a genre of narrative visualization?

A. Slide show

B. Flow chart

C. Animation

D. Comic strip

Answer: ____ABCD______ (0.5 for each correct choice)

10. Which of the following visualization techniques is (are) often used to show the relations of

words in text data?

A. Wordle

B. SparkCloud

C. PhraseNet

D. WordTree

Answer: _____CD_____ (1 correct choice: 2 mark; -0.5 mark for each wrong choice)

2. Problem Solving [10 marks]

There are problems with the following visualization designs. Please write down what you feel wrong with these designs.

(a) [5 marks]

[pic]

y-axis upside down, needs to show the full scale (Tufte design principle)

(b) [5 marks]

[pic]

unjustified 3D design

3. Problem Solving [10 marks]

In the class, the professor mentioned that visualization is also some kind of translation, i.e., translating data to visual forms. It should obey the same rules of language translations (e.g., translating English to Chinese). Do you agree or disagree with his metaphor? Specifically, please write down the similarity and difference between language translation (e.g., English -> Chinese) and visual encoding (i.e., Data -> Visual Form).

Similarity: it is necessary to truthfully represent the data and to avoid fake patterns. (5 marks)

Difference: in visualization, it is common to utilize some interaction techniques such as filtering, clustering to represent a subset of the data for amplifying human cognition. (5 marks)

4. Problem Solving [10 marks]

Multi-dimension data is very common in many applications. Here is a sample dataset with five attributes.

[pic]

(a) Please design a visualization to show the data. [5 marks]

Multi-dimensional data visualization techniques such as parallel coordinates, scatter plot matrix

(b) From your visualization, what kind of tasks can be performed? [5 marks]

1. Clustering

2. Outlier detection

3. Positive / negative relation between different attributes

...

(5 marks for two or above reasonable tasks)

5. Visualization Design [15 marks]

Your parents travelled in Hong Kong a few days ago and they would like you to help them “record” their pleasant trips. Suppose you have the access to the GPS data from their mobile phone and you also know the accurate time they visited each location. Their traveling data look like this:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2016/12/4

8 a.m. the ICC Tower

12.30 a.m. Victoria Harbor

3 p.m. Victoria Peak

5 p.m. Temple Street Night Market

7.30 p.m. Night lights spectacle: the Symphony of Lights

2016/12/5

9 a.m. Hong Kong Museum of History

1 p.m. Ocean Park

2016/12/6

8.30 a.m. the Golden Mile of Nathan Road

2 p.m.  Zoological and Botanical Garden

….

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(a) Please design a spatio-temproal visualization scheme to display their trips [8 marks]

E.g.: flowmap + color (encoding time)

Ineffective design (e.g. like 3D): -2 marks

Violate design principles (e.g. categorical color for ordinal attributes): -4 marks

(b) Please design some narrative visualizaiton schemes to make your visualization more engaging [7 marks]

E.g. using video or slides to represent the visualization

-3 marks if no description is provided for narrative / storytelling

-3 marks if no description is provided for the engagement

6. Visualization Design [15 marks]

The proliferation of online social media has enabled people to spread opinions and ideas in an unprecedented speed. Such opinions and ideas can be expressed by individuals, and divergences often occur when people oppose each other and want to achieve incompatible goals. For example, in a political campaign, people supporting different parties may debate through social media for their own political perspectives. Another example is marketing, where the makers of competing products can launch persuasion campaigns on social media to gain the attention of social media users. Some topics or events may get many people involved and they could have diverging views on the same topic. For example, the US presidential election debates in 2016 triggered a series of arguments because of the different political views of two parties. It attracted great attention of the public, leading to heated discussions on Twitter. On social media sites, some people show their support by expressing positive comments on certain persons or events, while others may attack their opponents with negative words.

Suppose you have the following data and also attributes computed from the data: 1) The original Twitter Data containing all the Tweets about a topic. Each tweet has a time stamp, some text, and is associated with a user; 2) Two sets of people who have different opinions on this topic. To simply the design, only user names will be used in your system; 3) A sentiment score is already computed for every tweet about this topic from these two sets of people.

You ask asked to design a visualization system to answer the following questions: 1) When does a divergence start and end? How does it evolve? Your visualization should be able to reveal temporal data patterns that represent the process of a social divergence. 2) Who are involved in each divergence side? Your visualization should highlight the people who were involved in the divergence. 3) Why does a divergence occur? The analysts may need to read the tweets to find out the reasons.

Please first describe how you want to design the system, especially the principles to follow, and then sketch some key visualization schemes in your system.

[pic]

Principle: The overall design is based on a metaphor of a DNA molecule that consists of two twisting helices. Likewise, our design visually represents two conflicting sides.

Community strands: By default, we polarize the community strand between two sentiment poles (e.g., positive and negative) and interpolate the data samples at different timestamps along a horizontal timeline. Thus, a smooth curving strand is created. In our design, we encode the sentiment information on each community strand to enhance the visual patterns driven by sentiments. First, the sentiment information is quantitatively represented as the screen distance between a community strand and a sentiment pole. Leaning toward one sentiment pole indicates that most of the people in the community share that sentiment. Second, the sentiment information is also represented by a color gradient from green to red, pertaining to two sentiment poles. The community size implies its influence and is encoded as the thickness of the strand. With this encoding scheme, we are able to identify whether the sentiment change of a community is caused by thousands of people or only by tens of people.

Event box: Inside the event box, keywords are represented as small bars. The size and color of the bar encode the normalized frequency and the sentiment of the keyword, respectively. Within different time windows, users may discuss the topics using different keywords, thus resulting in various events under the same topic. By default, an event box’s size varies based on the distance between two community strands. This design spontaneously assigns more space and forms a multi-foc view for displaying the details when divergence is large.

User group: A user group is visualized as a circle embedded within the community strand. The users within the group are represented as dots whose sizes and colors represent the users’ normalized activeness and their sentiments, respectively.

The design can support the tasks like comparison between two group of opinions clearly: 5 marks

Follow design principles & intuitive design: 5 marks

Good scalability: 3 marks

Design rationale: 2 marks

7. Visualization Evaluation [20 marks]

Edge-bundling is widely used to reduce visual clutter in graphs. The following figures show the original graph and three different edge-bundling algorithms. You do not need to know the details of these algorithms for answering this question.

[pic]

(a) The original graph with 1715 nodes and 9780 edges showing the immigration among different states in the USA; (b) The edges are bundled using FDEB with inverse-linear model; (c) The edges are bundled with GBEB; (d) The edges are bundled using FDEB with inverse-quadratic model.

A. Based on your comparing Fig. (a) and Fig. (b)(c)(d), in your opinion, what are the advantages and disadvantages of edge bundling? [5 marks]

Advantage: reduce visual clutter and show the overall pattern (2.5 marks)

Disadvantage: lose detailed information (2.5 marks)

B. You are asked to design a controlled experiment to evaluate the three edge-bundling algorithms. We hope the evaluation should be quantitative and as rigorous as possible. Please write down your detailed plan to conduct the evaluation. [15 marks]

Basic Info: with-in subject design, recruit a group of participants (say 20) (3 marks)

Task: Track how many destination points an edge-bundle can split into (5 marks)

Dataset: 5 different real-world graph (as synthetic graphs may have a uniform trend for each edge direction)

Technique: 3 different edge bundling algorithms with the default parameters

Independent variables: Edge bundling techniques & datasets

Controlled variables: 1920*1080 resolution display, same keyboard & mouse

(5 marks for above design setting)

Dependent variables: the number of destination points & response time

Analysis: use the original graph as the baseline, using statistical methods such as ANOVA to check whether there are significant differences between different techniques

(2 marks)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download