
The preliminary schedule for the Joint Workshop and Summer School can
be consulted here
. Please, take this program only as
preliminary during March and the beginning of April.
You can find a brief list of introductory concepts with links to
useful wikipedia articles here
Models: Specification, complexity and choice (David Hogg)
What is a model? What freedoms does a model have and how can we
capture that? Are qualitatively different models comparable? What is
the difference between a likelihood and a probability for a model or
for model parameters? How do we decide among models that are
qualitatively similar but quantitatively different? How do we decide
among models that are qualitatively different? The most important
content will be conveyed through a lab session in which participants
paircode solutions to some model selection problems.
Table of contents
 Lecture 0: (to be provided in advance as links or bibliography if needed)
 Lecture 1: Model specification and likelihood formulation
 Lecture 2: Model complexity and choice
 Lecture 3: (paircoding) Model selection workshop
 Lecture 4: (paircoding) workshop continued
Knowledge Discovery and Data Mining (Giuseppe Longo)
Feature selection: filter approach, wrapper approach, PCA, Diffusion
Maps. Supervised classification: the curse of dimensionality,
biasvariance tradeoff, the kernel trick, support vector machines,
crossvalidation, evaluation of classifiers. Unsupervised
classification taxonomy, evaluation measures.
Table of contents:
 Lecture 0: (to be provided in advance as links or bibliography if needed)
 Lecture 1: what is data mining
 Lecture 2: feature selection and dimensionality reduction
 Lecture 3: classification tasks and supervised methods
 Lecture 4: clustering methods
Statistical Image Analysis (Robert Lupton)
The source detection problem, source modelling, catalogue cross
correlations, combination of images...
Table of contents
 Lecture 0 (to be provided in advance as links or bibliography if needed)
 Lecture 1 The Sampling Theorem and Image Resampling
 Lecture 2 Object Detection and Measurement as Statistical Estimation
 Lecture 3 Workshop: object detection and measurement
 Lecture 4 (workshop continued, if needed)
Technical aspects of the analysis of petabytesize databases (Matthew Graham)
It would take over 33 years to watch a 1 PB MP3 movie yet, within the
decade, data sets of this size will be as everyday a feature of
astronomical life as astroph or APOD. This section will cover the
practical aspects of handling petascale (and larger) data sets and
streams including new computational approaches needed to work with
them from an astronomer's perspective.
Table of contents
 Lecture 0 (to be provided in advance as links or bibliography if needed)
 How big is a petabyte?
 Big data sets en route: astronomy, other sciences
 Lecture 1: How to store a petabyte
 What do you store?
 Cost and performance of storage
 Databases: relational vs nonrelational, indexing
 Lecture 2: How to work with a petabyte
 Distribution
 Divide and conquer: MapReduce, Hadoop (how to sort 1 PB)
 Putting things together: PIG
 Lecture 3: How to analyze a petabyte
 Random access
 Characterizing data
 Streaming statistics
 Ideas for paircoding examples (to be discussed with SOC / other lecturers).
 Coding up a simple analysis routine using Hadoop
Time series analysis (Suzanne Aigrain)
This section will cover common tool for exploring and characterising
timeseries and ensembles thereof. The first two lectures are devoted
to time and frequency domain techniques respectively, and cover some
frequently used exploratory . Particular attention will be devoted to
the treatment of stochastic processes and mixtures of stochastic and
periodic processes.
Table of contents
 Lecture 0 (to be provided in advance as links or bibliography if needed)
 stationarity, autocorrelation function, (discrete) Fourier transform, window function
 properties of the Gaussian distribution
 Lecture 1: Timedomain analysis
 autocorrelation techniques
 common timedomain filters
 stochastic processes: ARIMA models, Gaussian processes
 Lecture 2: Frequency analysis
 noise properties in the frequency domain
 periodic signal detection
 timefrequency analysis, wavelet transforms
 Lecture 3: Ensembles of time series
 principal component analysis in the time and frequency domains
 classification and clustering
