STAT570 Data Handling and Visualization

Course Code:2460570
METU Credit (Theoretical-Laboratory hours/week):3 (3.00 - 0.00)
ECTS Credit:8.0
Department:Statistics
Language of Instruction:English
Level of Study:Graduate
Course Coordinator:
Offered Semester:Fall and Spring Semesters.

Course Objectives

This course aims to equip students with the essential skills to proficiently manage, manipulate, and visually represent data. By the end of this course, students will be adept at handling diverse data formats, cleaning and preparing data for downstream tasks, and creating informative data visualizations. They will also develop an understanding of data ethics, privacy, and the significance of effective data communication. Through hands-on experience and practical projects, students will become confident in their ability to work with data across various domains and industries.


Course Content

Structures, semi-structured and unstructured data types. Data manipulation and preprocessing. Dimension reduction. Sampling, oversampling, undersampling. Data scraping and wrangling. Visualization of multivariate data. Panel displays, surface plots, 3D scatterplots, contour plots. 2D representation of multivariate data. Interactive graphics revealing any structure in data: Asimov?s grand tour, projection pursuit explanatory data analysis (PPEDA). Visualization of categorical data. Dynamic graphics.


Course Learning Outcomes

In this course, students will develop a comprehensive set of skills and competencies to proficiently handle and visualize data using Unix and R. They will learn to collect data from various sources, ensuring data quality and integrity, with an emphasis on Unix-based data processing. Through hands-on experience, they will gain expertise in preprocessing and cleaning data within the Unix environment and R. Students will become adept at using R for data analysis and data visualization, creating insightful visual representations, and interpreting these visualizations to extract valuable insights. Furthermore, ethical considerations surrounding data privacy and security will be emphasized. Students will enhance their communication skills by effectively conveying data-driven findings through visualizations and reports created using R. Collaborative project work will promote teamwork, while critical evaluation of visualizations will foster a discerning approach. Staying adaptable to emerging technologies and applying these skills to domain-specific contexts will be integral to the learning journey. Additionally, students will utilize GitHub as a key tool for version control, collaboration, and sharing of data-related projects within the Unix and R environments, reinforcing their ability to work effectively in data-related teams and showcasing their work to a broader audience.