Analyze the Messier Catalog

Analyze the Messier Catalog#

Yasmeen Asali, William Cerny, Pratik Gandhi (Yale University)

Description: Use the Messier catalog to practice using for loops and and logic

Intended Audience: Beginner Undergraduate

tags: libraries:numpy, loops, program-flow, logic

Requirements: requirements.txt

Last Updated: July 18, 2025

Learning Objectives

Understand the structure of 2D arrays and how to access data.
Practice using logic to downselect rows from an array and practice using for loops to iterate through objects.

In this assignment, you will practice using logic and loops using a dataset of objects from the Messier Catalog. The dataset includes basic information about each object, such as its type, magnitude, distance, constellation, and best viewing season. Your tasks will involve reading in the data, analyzing it, and using numpy, logic, and loops to answer questions about it!

Opening the Dataset#

The data is stored in a .npy file, which you can load with np.load(). You can download it here (right click and save as a .npy file). You can use the following line of code to open up the file in your code and store its contents in a variable called data.

data = np.load('/Users/username/Downloads/messier_data.npy')

Caution

Will the above line run on your computer? No! Do you remember what you need to change about it?

Hint

You need to update the PATH to match where the data lives on your computer! You can put the data anywhere on your computer, and specify the absolute PATH in your code as in the example above. Alternatively, if you put the data in the same directory (folder) as the python script you are writing, you can open it like this: np.load(messier_data.npy).

Each row in the dataset corresponds to a single Messier object, with the following fields:

Messier: Name of the Messier object as a string (e.g., 'M107', 'M108')
RA and DEC: the Right Ascension and Declination of the object (coordinates in the sky)
Type: Type of object (e.g., 'Gc' for Globular Cluster, 'Sp' for Spiral, 'Ba' for Barred Spiral)
Mag: Magnitude (brightness) of the object. Magnitudes are a unitless system, and lower numbers mean brighter objects.
Distance: Distance from Earth in units of light-years
Constellation: The constellation in which the object resides
Season: The best viewing season (spring, summer, autumn, winter)

Here are some tips and reminders for using large 2D datasets:

You can access each row of the dataset using indexing. For instance, data[0] will return the first row of the array (aka all of the above column values for a single Messier object).
You can access each column of the dataset using data['Messier']. For example, rather than indexing by an integer, we are now indexing by a column key (a column name). This will return a numpy array of all the Messier numbers for all of the objects.

Exercise 1

We begin with an overview analysis of the Messier Catalog.

How many objects are in the dataset? You can either write a loop to count the number of objects or use a built-in python function.
Investigate the brightness of Messier objects: Recall that magnitude is a measure of brightness, and brighter objects have lower magnitude values!
- Calculate the average magnitude of all the Messier objects.
- What magnitude is the brightest object in the catalog, and what magnitude is the faintest?
- Once you have that, calculate the difference in magnitude between the brightest and the faintest object.
How many objects are there in each viewing season (spring, summer, autumn, winter)? Before looking at the hint, see if you can approach this problem on your own!

Hint

Hint for Problem 3: You may want to define some variables as “counters” for the number of objects in each season, and then loop through every row of the dataset and add 1 to the corresponding counter if an object is in that season. Recall that you can use the syntax a += 1 to add 1 to an existing variable (equivalent to a = a + 1).

Exercise 2

Let’s get more practice working with the data.

Calculate the average distance of spiral galaxies (Type: 'Sp'), globular clusters (Type: 'Gc'), and open clusters (Type: 'Oc'). Which type of object is typically farther away? Does this make sense?

Hint

Hint for Problem 1: You can do this (at least) two ways! First, try using a condition to index the array. Your condition would be something that is checking the object types, and you can use that condition directly as the index to select only rows in the array where that condition is True. Once you have the subset of the array that meets your condition, you can compute the average (as in exercise 1 problem 2).

Another way you could approach this problem is by using a for loop to iterate over every row, and then using if statements to select based on type. How would you compute the average? Try to think creatively about how you can perform math on values (recall Exercise 1, Problem 3) iteratively in a loop. Maybe you will need to define two variables that you operate on each iteration of the loop, then do something to the two variables at the end after the loop finishes…

How many constellations are there in the dataset, and how many objects belong to each constellation?
- First, you need to find how many unique constellations exist in the dataset.
- Then, count how many Messier objects belong to each constellation.

Hint

Hint for Problem 2: You can try using a set to find the unique constellations in the dataset. A set is a data structure that automatically removes duplicates, so it could be useful for counting how many distinct constellations there are.

If you feel comfortable, try using a set to collect the constellations and a dict (dictionary) to count the number of objects in each one (recall Exercise 1, Problem 3 for counting in loops). Remember, if you’re unsure about how to create or use a set, think about how a list works—sets work similarly but don’t allow duplicate values!

How many Messier objects are in the northern vs. southern sky? You could try to do this using the constellation information from the previous problem, but the hemisphere can more easily be determined by the declination.

Analyze the Messier Catalog

Contents

Analyze the Messier Catalog#

Opening the Dataset#