STAT570 Data Handling and Visualization
Course Code: | 2460570 |
METU Credit (Theoretical-Laboratory hours/week): | 3 (3.00 - 0.00) |
ECTS Credit: | 8.0 |
Department: | Statistics |
Language of Instruction: | English |
Level of Study: | Graduate |
Course Coordinator: | |
Offered Semester: | Fall and Spring Semesters. |
Course Objectives
This course aims to equip students with the essential skills to proficiently manage, manipulate, and visually represent data. By the end of this course, students will be adept at handling diverse data formats, cleaning and preparing data for downstream tasks, and creating informative data visualizations. They will also develop an understanding of data ethics, privacy, and the significance of effective data communication. Through hands-on experience and practical projects, students will become confident in their ability to work with data across various domains and industries.
Course Content
Structures, semi-structured and unstructured data types. Data manipulation and preprocessing. Dimension reduction. Sampling, oversampling, undersampling. Data scraping and wrangling. Visualization of multivariate data. Panel displays, surface plots, 3D scatterplots, contour plots. 2D representation of multivariate data. Interactive graphics revealing any structure in data: Asimov?s grand tour, projection pursuit explanatory data analysis (PPEDA). Visualization of categorical data. Dynamic graphics.
Course Learning Outcomes
In this course, students will develop a comprehensive set of skills and competencies to proficiently handle and visualize data using Unix and R. They will learn to collect data from various sources, ensuring data quality and integrity, with an emphasis on Unix-based data processing. Through hands-on experience, they will gain expertise in preprocessing and cleaning data within the Unix environment and R. Students will become adept at using R for data analysis and data visualization, creating insightful visual representations, and interpreting these visualizations to extract valuable insights. Furthermore, ethical considerations surrounding data privacy and security will be emphasized. Students will enhance their communication skills by effectively conveying data-driven findings through visualizations and reports created using R. Collaborative project work will promote teamwork, while critical evaluation of visualizations will foster a discerning approach. Staying adaptable to emerging technologies and applying these skills to domain-specific contexts will be integral to the learning journey. Additionally, students will utilize GitHub as a key tool for version control, collaboration, and sharing of data-related projects within the Unix and R environments, reinforcing their ability to work effectively in data-related teams and showcasing their work to a broader audience.