CSCI 453

Large-Scale Data Analytics and Visualization

Coordinator: jingnan xie

Credits: 4.0

Description

A practical introduction to data analytics, visualization, and blending theory. Students will learn about and apply various clustering algorithms and techniques for dealing with noisy data, use a distributed data analytics framework, complete laboratory assignments using version control, and enforce reproducibility by having all science easily sharable. Students will become familiar with modern data analytics methods and explore real-world data sets. Visualization of results will be a large component of the course through interactive and static frameworks. Offered Periodically.

Prerequisites

CSCI 366 AND (MATH 235 OR MATH 333 OR MATH 335).

Course Outcomes

At the end of this course, a student will:

Create reproducible, explainable data science workflows
Use modern distributed Map-Reduce framework, such as Apache Spark, to analyze data
Implement parallel clustering methods
Develop strategies for overcoming common imperfections in real-world datasets
Apply visualization techniques to multi-dimensional data
Apply gained skills to extract insights from multi-dimensional, real-word datasets

These goals will be accomplished through the content of the lectures and textbook, as well as hands-on experience. This hands-on experience includes writing programs (both in the lab and in project assignments). There will also be a significant course project in which you identify an analysis topic, discover data, model the data using data mining techniques, analyze the results, and report outcomes. The achievement of the goals will be measured through your performance on approximately 7 lab assignments, the project, and two exams (midterm and final).

Tentative Semester Schedule

Week 1: Introductory materials on experimental design and data

Week 2: Data operations: filtering, transforming, reducing

Week 3: Distributed computing

Week 4: Distributed regression

Week 5: Visualization of one-dimensional data

Week 6: Visualization of two-dimensional data

Week 7: Exam

Week 8: Case Study: K-Means Clustering

Week 9: Distributed Graph Algorithms

Week 10: Case Study: Page Rank

Week 11: Distributed Regression

Week 12: Distributed Machine Learning + Cross Validation

Week 13: Distributed SQL

Week 14: Presentations

03/28
Softball vs East Stroudsburg
Learn More
03/28
Baseball vs Shippensburg (DH)
Learn More
03/29
Softball vs Bloomsburg
Learn More

See All Events

03/27
24/7 Ambulance Facility Opens at MU
Learn More
03/19
Remembrance and Reflection: Ville’s 40th Holocaust Conference
Learn More
03/07
Millersville University Explores AI
Learn More

See All News

CSCI 453

Large-Scale Data Analytics and Visualization

Coordinator: jingnan xie

Credits: 4.0

Description

Prerequisites

Course Outcomes

Softball vs East Stroudsburg

Baseball vs Shippensburg (DH)

Softball vs Bloomsburg

24/7 Ambulance Facility Opens at MU

Remembrance and Reflection: Ville’s 40th Holocaust Conference

Millersville University Explores AI