DS 552  (3 credits)





Course policies 


Top of Page | Course Materials | Course Policies | Grading & Objectives | Lecture Outline

Page last updated 08/18/2021

Course dates:
On-line Aug. 26 - Dec. 17, 2021
Bootcamp Saturday, Aug. 28, 2021 9:30 am - 3:00 pm


Loren Rhodes
E-mail:  (preferred first point of contact)
Office location: Brumbaugh Academic Center, C203
Office: 814-641-3620

Cell: 814-644-3309
(texts are fine)

Office Hours are  Monday and Wednesday evenings as regular available times. See Moodle for exact times and Zoom link.


Moodle is the course management system for this course and will be used for material access, assignment/project submissions with their timing and deadlines, and grade posting.Materials will be found there. 

Make sure you are logging into Moodle several times each week.

Required text and resources:

Alternative data mining and machine learning tools downloads or links:

Course description:

Course description: This course considers the use of machine learning (ML) and data mining (DM) algorithms for the data scientist to discover information embedded in wide ranging datasets, from the simple tables to complex data sets and big data situations. Topics include ML and DM techniques such as classification, clustering, predictive and statistical modeling using tools such as R, Python, Matlab, Weka and others.
Prerequisite: DS 500, DS 510 or by permission


This course considers the organization of data, the current techniques, overview of algorithms and tools in mining information from these sources.

Students will build skills and/or gain understanding in:


Top of Page | Course Materials | Grading & Objectives | Lecture Outline



10%Weekly quizzes, on-line in Moodle

25% Assignments

25% Late Midterm (10th week)

Below are links to example paper exams from the undergraduate DS 352 (Fall '18) course. Use these to get a sense of testing form and style, but not necessarily a study guide.

40% Data mining project

Identify an existing, substantial data set that can be used to demonstrate the data mining techniques covered in class. The data set must meet size criteria as outlined in the detailed description. You will apply the data mining tools techniques covered in class on the data set for knowledge discovery and classification, present the results of the project during the last two weeks of the semester and turn in a written project in lieu of a final exam.

Detailed Project Description



Course Policies

My standard policies across all of my courses on attendance, late assignments, academic integrity, etc., are described on my Course Policies web page. Please read them carefully.

Accessibility Policy

Juniata is committed to provide equitable access for student learning.  To arrange for an accommodation based on a documented medical condition, mental health condition or learning disability (or if you suspect you have one), please contact Patty Klug, Director of Student Accessibility Services, by emailing her at or calling 814-641-5840.  I encourage you to confirm that I have received a copy of your accommodation letter and schedule a time for us to meet to discuss your needs. It is best to submit accommodation requests before the semester begins, although requests can be made at any time during the semester. 


Further details are found in Rhodes's course policies page.

Top of Page | Course Materials | Grading & Objectives | Lecture Outline

FALL 2021

Topics sequence and readings based on the Introduction to Data Mining, 2nd ed., Tan et al text.

Please note that videos of notes and lectures are found on the Moodle page.

Topics and Readings
Links for Study and Lectures

Exercises and Project Notes

Week 1

Course Introduction

Course overview

Ch 1


Review Ch 1 ex: 1,3

See Getting Started with Python


Data Exploration Review
(1st edition book chapter)

Data types:
Ch 2.1


Data types

Exercises for this review are linked here

Python tutorial for data exploration-- understand the code you are using. You will apply these to your chosen datasetThe Python script for Data Exploration is here.

Data types exercises: Ch2 ex: 2,3,4,5,7,9

Week 2




Data quality:
Ch 2.2

Data preprocessing:
Ch 2.3

Data: measures of similarity:
Ch 2.4

Data preprocessing overview





Similarities and Dissimilarities



Data quality exercises: Ch2 ex: 11,12

Identification of a dataset for analysis; preprocessing of the data

Python tutorial for data preprocessing-- understand the options and techniques. Script is here. The pic folder is here as a zip file.) Apply these to prepare your dataset as needed. Write up what you can do or still need to do.

Application of tutorial4 to a dataset

Apply the Python techniques for data exploration to your dataset. Write up your results.

Ch2 ex: 13,14,16,18,19 (reviewed in class)

What similarities and dissimilarities can you find with your dataset? Write up your observations into your project.


Week 3




Decision trees

Ch 3.1-3.3



Python implementation of the similarity measures. The exercise is linked here to load into Jupyter.  Upload your solution to Moodle.

Ch3 ex: 2, 3, 7, 8abc

Python tutorial on classification (sections 6.1, 3.2, 3.4.1). Script is here. Apply techniques to your project dataset. Write up your observations into your project.



Week 4


Model overfitting

Model selection and evaluation

Ch 3.4-3.8


Python tutorial on overfitting (section 3.3 from above tutorial)

Classification Homework



Week 5


Alternative classification techniques

Ch 4.1-4.3

Rule-based Classifier

Linear Regression (simple review)

Nearest Neighbor Classifier

Ch4 ex: 1, 3, 4ab, 17

Apply techniques to project dataset in Weka. Write up your observations into your project.

Optional: Python tutorial for regression. Script is here. Your dataset may lend itself to regression modeling. You should attempt some simple regression models. Write up your observations into your project, or why it doesn't apply.


Week 6


Ch 5.1-5.4 (maybe not 5.4)

Association analysis

Association analysis advanced concepts

5.7.3 Simpson's paradox

Ch 6.1-6.4.2, 6.5

Association Rule generation

Additional Assoc. Rule topics

Association Rule Advanced Topics

Ch5 ex: 2a-d,6a-d, 7a, 8, 9ab, 10, 11a, 13a, 14 (you will do variations on #2 and #8 for the homework below)

Ch6 ex: 1, 2, 5, 6, 9, 10, 11, 12, 13ab, 14 (you will do variations on #1,#2 and #5)

Association Rules Homework

Apply techniques to your project dataset in Weka. Write up your observations into your project.

Week 7

Clustering Analysis


Ch 7

Sequence Pattern Mining

Subgraph mining (not considered)


Ch7 ex: 2, 5, 6ab, 7


Week 8


Clustering and Hierarchical clustering

Ch 7

Clustering Analysis

Hierarchical clustering

Python tutorial on cluster analysis. Script is here.--Apply techniques to your project dataset. Write up your observations into your project.

Week 9  


DBSCAN Clustering

Ch 7.4 and Ch 8.3



Clustering homework

Ch7 ex: 13, 16, 20

Week 10



Week 11


Cluster Validity

Ch 7.5

Anomaly Detection

Ch 9

Cluster Validity

Anomaly/Outlier Detection

Anomaly homework (adapted from tutorial below)

Ch9 ex: 1,2,3

Week 12


Neural Nets

Ch 4.5-4.9

Artificial Neural Networks



See Python tutorial for Anomaly Detection. Script is here.--Apply techniques to your project dataset. Write up your observations into your project.

Week 13



Ensemble methods ch 4.10

Class imbalance problem
ch 4.11

Avoiding false discoveries

Ch 10

Ensemble Methods

Class Imbalance

Neural Net project



Week 14


Final project submission, presentation and reactions






Top of Page | Course Materials | Grading & Objectives | Lecture Outline