CSCI 452
Data Mining
Coordinator: Stephanie Schwartz
Credits: 4.0
Description
An introduction to data mining, including data cleaning, the application of statistical and machine learning techniques to discover patterns in data, and the analysis of the quality and meaning of results. Machine learning topics may include algorithms for discovering association rules, classification, prediction, and clustering. Lab assignments provide practice applying specific techniques and analyzing results. An independent project provides students with the opportunity to guide a project from data selection and cleaning through to presentation of results.
Prerequisites
CSCI 366 AND (MATH 235 OR MATH 333 OR MATH 335).
Course Outcomes
At the end of this course, a student will:
- Understand the fundamentals of data mining, including what kinds of data can be mined, what kinds of patterns can be mined, and what kinds of applications are targeted
-
Understand and apply the underlying mathematical and statistical methods used in data mining
-
Apply machine learning techniques and statistical techniques in data mining applications
-
Analyze data in both an exploratory and targeted manner
-
Evaluate the appropriateness of various algorithms and techniques for different domains and problems
-
Evaluate results in terms of significance, reliability and meaning
These goals will be accomplished through the content of the lectures and textbook, as well as hands-on experience. This hands-on experience includes writing programs (both in the lab and in project assignments). There will also be a significant course project in which you identify an analysis topic, discover data, model the data using data mining techniques, analyze the results, and report outcomes. The achievement of the goals will be measured through your performance on approximately 7 lab assignments, the project, and two exams (midterm and final).
Tentative Semester Schedule
Week 1: Introductory materials on experimental design and data
Week 2: Data and Linear/Logistic Regression
Week 3: Decision Trees
Week 4: Evaluation
Week 5: Naïve Bayes
Week 6: KNN
Week 7: SVMs
Week 8: Ensemble Classification Methods
Week 9: Class Imbalance
Week 10: Association Analysis
Week 11: Cluster Analysis
Week 12: Outlier Analysis
Week 13: Time Series, Graph, Spatial Data
Week 14: Presentations