Summer School Syllabus
  • Home
  • Introduction to Data Science
  • Applications
  • Venue

​Summer School Syllabus

Angra do Heroísmo, Terceira Island, Azores, Portugal

Introduction to Data Science

4-8 August 2025
Apply
Instructor
Gabor Pozsgai
Pos-Doc of the University of Azores
Picture
​Course description and aims
While most postgraduate curricula contain basic statistical courses, these most commonly use carefully selected and tidy datasets for demonstrating how methods are used. However, students/researchers face great challenges when they have to cope with real-life datasets which are often poorly structured, scattered with errors, or contain special characters. This summer school aims to provide a hands-on, applied introduction to data science with a particular focus on working with messy, complex datasets in R.

Participants will learn:
  • How to identify and handle issues in datasets.
  • How to structure and store data effectively.
  • Best practices for creating robust and reusable datasets.
  • Key analytical workflows using R.
The course combines morning theoretical lectures with hands-on R-based practicals in the afternoon. Each session is 3 hours. Full attendance and participation are required for successful completion. Evaluation will be based on active participation and a data wrangling task at the end of the course.

About the instructor
Dr Gábor Pozsgai is an insect ecologist and data scientist with nearly two decades of research experience. He holds a PhD in Ecology from the University of Aberdeen (UK) and is currently a Postdoctoral Research Fellow at the University of the Azores. His research explores ecological patterns, ecological networks, and, more recently, spatial interaction models in regional science. He is proficient in a range of modelling techniques, multivariate statistics, spatial analysis, and machine learning, with a particular interest in AI-based biodiversity monitoring. Dr Pozsgai is an expert in R and Python programming and regularly publishes in scientific journals, for which he also serves as a reviewer.

Program schedule
Day 1: Introduction to data and data science
Morning:
  • What is data?
  • Overview of data science and its applications in ecology and beyond
  • Data types, formats, and structures
  • Introduction to metadata
Afternoon Practical:
  • RStudio setup and essentials
  • Introduction to R: syntax, variables, data structures
  • Loading and exploring basic datasets
Day 2: Data collection and input
Morning:
  • Designing a data collection plan
  • Common pitfalls in data entry and formatting
  • Reading and importing data from various sources (CSV, Excel, web, MySQL)
  • Dealing with character encodings and locale issues
Afternoon Practical:
  • Hands-on importing and inspecting real-life datasets
  • Spotting and correcting format issues
  • Intro to cleaning data with tidyverse
Day 3: Data wrangling in R
Morning:
  • Cleaning and transforming data with dplyr and tidyr
  • Handling missing values and outliers
  • Understanding dataset properties: factors, ranges, summaries
Afternoon Practical:
  • Step-by-step wrangling tasks
  • Creating new variables, filtering and summarizing
  • Visualizing data with ggplot2 (histograms, scatterplots, boxplots)
Day 4: Working with special data formats
Morning:
  • Introduction to spatial data
  • Working with image and video data
  • Basics of network data and ecological interaction networks
  • Brief look at Python as a tool for data analysis
Afternoon Practical:
  • Mapping with sf and ggmap
  • Network visualisation using igraph
  • Optional: data analysis experiment with Python
Day 5: Open data and project evaluation
​Morning:
  • Finding and using open data repositories
  • Data ethics and reproducibility
  • Wrap-up discussion: integrating data science into your research
Afternoon Practical:
  • Final project: wrangling and visualising a messy dataset
  • Peer feedback and group discussion
  • Summary of techniques and future learning paths

Evaluation
Participants will be assessed based on:
  • Active engagement in practical sessions
  • A final-day data wrangling and visualisation mini-project

Suggested readings & resources
  • Wickham, H. & Grolemund, G. (2017). "R for Data Science." https://r4ds.had.co.nz
  • Ellison, A.M. (2010). "Repeatability and transparency in ecological research." Ecology 91(9): 2536–2539.
  • Marwick, B., Boettiger, C. & Mullen, L. (2018). "Packaging Data Analytical Work Reproducibly Using R (and Friends)." The American Statistician 72(1): 80–88.
  • https://datacarpentry.org – Free hands-on lessons for data science
  • https://ropensci.org – R tools for open science
  • https://www.tidyverse.org – Core tools for modern R data science
Powered by Create your own unique website with customizable templates.
  • Home
  • Introduction to Data Science
  • Applications
  • Venue