Trigonometric Regression#

Earl Bellinger (Yale University)

Description: A series of exercises at the advanced undergraduate level building up the intuition and practice of usuing trigonometric regression to fit periodically varying data (in this case, classically pulsating Cepheid stars).

Intended Audience: Advanced Undergraduate / Early Graduate

tags: linear-regression, penalized-regression, feature-selection, machine-learning, scikit-learn, Fourier-analysis, data-visualization, variable-stars

Requirements: requirements.txt

Last Updated: July 23, 2024

Learning Objectives

  1. Create linear regression models

  2. Understand how to use a different basis (in this case, a trigonometric basis)

  3. Apply a penalty to the regression model and learn about cross-validation

  4. Learn about classically pulsating stars: this will enable you to determine exactly how bright a given pulsating star will be at any given time in the past or future! In addition, these statistical models we are fitting are a compact description of the lightcurve, which then enables us to compare observations of a star with a theoretical simulation. That in turn would enable us to determine things about the star, such as its mass, radius, metallicity, and age.

Introduction#

Classically pulsating stars such as Cepheids and RR Lyrae stars brighten and dim periodically.

Here is an example of a phased lightcurve obtained by the OGLE project of a Cepheid star in the Large Magellanic Cloud:

In this series of exercises, we will use linear regression to fit a trigonometric model to this phased lightcurve.

Lastly, we will use penalized regression for feature selection.

The Hubble Constant#

In this exercise we will use linear regression to derive \(H_0\). Below is the 1929 data from Edwin Hubble of Cepheid-host galaxies, which made him conclude that the Universe is expanding:

distances = np.array([0.21, 0.26, 0.27, 0.27, 0.45, 0.5, 0.8, 1.1, 1.4, 2.0]) # Mpc
velocities = np.array([130, 70, 185, 220, 200, 270, 300, 450, 500, 800]) # km/s

3D Linear Model#

In this exercise, we will practice generating data from a linear model (with noise) and visualizing it. We’ll then use the sklearn library to fit a linear model to the data.

Cepheid Lightcurves#

We can now use the techniques we’ve practiced thus far to begin an investigation of the light curves (magnitude vs. time) of classically pulsating Cepheid stars.

Trigonometric Regression#

Finally, we can apply trigonometric (a.k.a. sinusoidal) regression to investigate the relevant parameters of our Cepheids. A short review of the technique can be found in the dropdown.

Penalized Regressions#

It can be beneficial to regularize / penalize the fitting process. Here we will use two common methods: LASSO (L1) and Ridge (L2) regularization, and we’ll use cross-validation to determine the best penalty to choose. (For a review on cross-validation, see https://scikit-learn.org/stable/modules/cross_validation.html)