My God, It’s Full of Stars (4/7) — Plotting a Color-Magnitude Diagram in Python from Stellar Evolution Code Written in C

11 min readDec 11, 2023

Having imported my structures and preprocessor macros from C into Python, it was time to get started plotting this stuff.

The goal is to plot a color-magnitude diagram (CMD) for a given set of cluster parameters (number of stars, model set, age, distance, etc.) and a series of interactive buttons and sliders so that we can dynamically change the parameters and update the CMD in real time.

This is what the final product will look like:

My first step was to create a class to store all of the relevant parameters:

class Params:
    def __init__(self):
        # RNG and screen output parameters
        self.seed = 1
        self.verbose = 0

        # Cluster parameters. Each cluster can have more 
        # than one population of stars. The parameters
        # for each population are entered as lists
        self.nPops = 1
        self.nSystems = [200]
        self.WDMassUp = [8.0]   
        self.percentBinary = [50.]
        self.percentDB = [0.]
        self.logClusAge = [9.1]
        self.Fe_H = [0.1]
        self.distMod = [10.0]
        self.Av = [0.1]
        self.Y = [0.29]

        # Model set parameters
        self.msRgbModels = 0    # 0 = Girardi, 1 = Chaboyer-Dotter w/helium sampling, 2 = Yale-Yonsei, 3 = Dartmouth (DSED)
        self.filterSet = 0      # 0 = UBVRIJHK, 1 = ACS, 2 = SDSS + JHK
        self.IFMR =  0          # 0 = Weidemann, 1 = Williams, 2 = Salaris Linear, 3 = Salaris piecewise linear
        self.BDmodel = 0        # 0 = None, 1 = Baraffe

        # Scattering parameters
        self.brightLimit =  [13,22,2]  # brightLimit, faintLimit, filter index;
        self.limitS2N =  10
        self.exposures =  [.03,.02,.02,.02,0.1,0,0,0,0,0,0,0,0,0] # Exposure times for each filter band
        self.nFieldStars = 50
        self.nBrownDwarfs = 0

        # Names for files to output final results to
        self.suffix = 'fe' + str(self.Fe_H[0])\
                            + '.a' + "{:.2f}".format(self.logClusAge[0]) \
                            + '.u' + str(self.WDMassUp[0])  \
                            + '.n' + str(self.nSystems[0]) \
                            + '.fs' + str(self.nFieldStars) #filename setting more compact
        self.path = "../outputs/"
        self.updateFileStem()
        self.modelPath = "../models"

    # Updates the filename when parameters change
    def updateFileStem(self):
        self.suffix = 'fe' + "{:.2f}".format(self.Fe_H[0])\
                    + '.a' + "{:.2f}".format(self.logClusAge[0]) \
                    + '.u' + "{:.1f}".format(self.WDMassUp[0])  \
                    + '.n' + str(self.nSystems[0]) \
                    + '.fs' + str(self.nFieldStars) #filename setting more compact
        self.file_stem  = self.path + 'sim.' + self.suffix

In addition to everything the C code needs to create a simulated cluster, this class also stores the names of the files that the data will ultimately be outputted to, and tells the C code where to find the stellar evolution models. Next I needed a method to create the clusters and tell it which models to use:

### Creates a cluster, allocates memory for it (in C), initializes
### memory for the cluster (which in turn initiailzes the memory for the
### array of stars. Returns a pointer to the cluster
### Does not evolve the cluster, just creates the structure in memory
def createCluster(params: Params) -> clusterPtr:
    pCluster = c.allocCluster(params.nSystems[0])
    pCluster.contents.evoModels.mainSequenceEvol = params.msRgbModels
    pCluster.contents.evoModels.filterSet = params.filterSet
    pCluster.contents.evoModels.IFMR = params.IFMR
    pCluster.contents.evoModels.brownDwarfEvol = params.BDmodel
    pCluster.contents.evoModels.minMass = getMinMass(pCluster.contents.evoModels.mainSequenceEvol)
    return pCluster

And another two functions to set and change the parameters of the cluster:

### Set initial cluster parameters. 
### Cluster needs to be evolved to get photometry
def setClusterParams(pCluster: clusterPtr, params: Params, pop: int):
    pCluster.contents.nStars = params.nSystems[pop]
    c.reallocStarsArray(pCluster)
    pCluster.contents.M_wd_up = params.WDMassUp[pop]
    pCluster.contents.parameter[AGE] = params.logClusAge[pop]
    pCluster.contents.parameter[YYY] = params.Y[pop]
    pCluster.contents.parameter[FEH] = params.Fe_H[pop]
    pCluster.contents.parameter[MOD] = params.distMod[pop]
    pCluster.contents.parameter[ABS] = params.Av[pop]

### Changes the values of cluster parameters without reallocating stars array
### Cluster needs to be re-evolved to get new photometry
def changeClusterParams(pCluster: clusterPtr, params: Params, pop: int):
    pCluster.contents.M_wd_up = params.WDMassUp[pop]
    pCluster.contents.parameter[AGE] = params.logClusAge[pop]
    pCluster.contents.parameter[YYY] = params.Y[pop]
    pCluster.contents.parameter[FEH] = params.Fe_H[pop]
    pCluster.contents.parameter[MOD] = params.distMod[pop]
    pCluster.contents.parameter[ABS] = params.Av[pop]

A word on nomenclature: the cluster-wide parameters that we need in order to evolve the cluster (i.e. to determine the colors and brightnesses of each star) are age, chemical composition, distance from Earth, and the amount of gas and dust between us and the cluster (which blocks some of the light). Instead of age, astronmers generally deal with the log₁₀ of the age. As for chemical composition, stars are made mostly of hydrogen, a little helium, and trace amounts of everything else, which astronomers collectively call “metals” (yes, even stuff like carbon and oxygen is a “metal” to an astronomer). The fraction of helium in the models is denoted by “Y” (though it is only used by one particular model set and not very important to our analysis). The fraction of “metals” (also known as the “metallicity”), is denoted by “[Fe/H]” (pronounced “eff ee on H” or “eff ee over H”). The distance to the cluster is parameterized in something called the “distance modulus,” a logarithmic scale designed to make it easier to calculate the apparent brightness of the object at a given distance. Finally, line-of-sight absorption, denoted Av, determines how much light from the star is absorbed or scattered by gas and dust on its way through the galaxy before it reaches us. Absorption makes stars seem fainter, but also tends to make them redder, and hence is also often referred to as “reddening”.

Now it was time to do the actual astrophysics, or rather, call my C code to do it for me.

### Randomly draws from the IMF to create a cluster
def populateCluster(pCluster: clusterPtr, fractionBinary: float = 0.0) -> None:
    for j in range(pCluster.contents.nStars):   # for all systems in the cluster
        star = c.getStarPtr(pCluster,j)
        star.contents.id = j
        # create single stars in arbitrary mass order 
        while True:
            star.contents.U = drawFromIMF()
            if star.contents.U > pCluster.contents.evoModels.minMass:
                break
        ## Need to evolve primaries to see if they are neutron stars or black holes
        c.evolve(pCluster, j)
        # create binaries among appropriate fraction 
        if star.contents.status[0] != NSBH \
            and star.contents.status[0] != WD:
            if np.random.random() < fractionBinary:
                massRatio = np.random.normal(1,0.5)
                while massRatio < 0 or massRatio > 1:
                    massRatio = np.random.normal(1,0.5)
                star.contents.massRatio = massRatio
            else:
                star.contents.massRatio = 0.0
        else:
            star.contents.massRatio = 0.0

    c.evolve(pCluster, -1)                          # Evolve stars

Working through this function one bit at a time: For each star that I want it to generate, the first thing I do is tell our C code to give me a pointer to that star’s structure in memory. Now I can access (and set) the parameters of that specific star. The code picks a mass at random from a function called the Initial Mass Function (IMF). The IMF (not to be confused with the Tom Cruise spy organization) is an empirically derived estimate of the distribution of the masses of stars at the birth of any star cluster. It is a Gaussian in the log₁₀ of the mass, centered at a mass of 0.1, giving us a distribution strongly weighted toward less massive stars.

def drawFromIMF() -> float:
    mf_sigma = 0.67729
    mf_mu = -1.02

    logMass = np.random.normal(mf_mu, mf_sigma)
    while (logMass < -4) or (logMass > 2.0):    # keep within mass (0.15 + EPS to 100 Mo) limits
        logMass = np.random.normal(mf_mu, mf_sigma)

    zamsMass = 10.0 ** logMass
    return zamsMass

In log space, it looks like this:

In linear space:

Python then calls the C function evolve(), which takes all of my cluster and stellar parameters and determines the brightness and color of the star.

Another note on nomenclature: I’ve been using “color” and “brightness” because those are the easiest terms for a non-astronomical audience to understand. What astronomers actually measure — and what our code is simulating — is the star’s “photometry” in different filter bands. The way that we measure a star’s brightness is by pointing a telescope at it to focus the light. We pass that focussed light through filter that only lets a narrow band of wavelengths through. We then count the number of photons that hit a detector. We repeat the process with various different filters. The standard set of filters are labelled “U”, “B”, “V”, “R” and “I” and cover the spectrum of visible light. Other filter sets are also available (these are set in in the “param” class).

Filter profiles for different filter sets showing what fraction of light gets through at each wavelength for each filter. Source: http://astroweb.case.edu/ssm/ASTR620/mags.html

The “color” of the star is determined by subtracting the brightness in one band from the brightness in another. So for instance, the color in the CMD at the top of this post is derived from subtracting the V band from the B band. Bluer stars tend to be relatively brighter in B, so they are on the left. Redder stars are brighter in V, so they are on the right. The Y axis is the brightness (or “magnitude”) in V. (Since magnitude is an inverted scale, the lower numerical values are at the top).

The evolve method takes the cluster parameters of age, metallicity, distance, and absorption, together with the mass of the star, and runs it through theoretical models to determine the photometry in each band in the filter set. Some of these randomly created stars have ages and masses such that their stellar lifetime are already over. A handful of those stars are now black holes or neutron stars, which are of no interest here. Those stars are thrown out. The rest are white dwarfs. A different set of models in the C code (called “white dwarf cooling models”) determines their photometry.

For a cluster of a given age, etc., it what I had simulated so far yielded something like:

Of the tens of thousands of stars in this image, only a few hundred are actually part of the Hyades cluster. Source: https://www.newscientist.com/article/mg25634110-800-how-to-spot-the-hyades-the-closest-star-cluster-to-earth/

I now needed to handle what are called “unresolved binaries”. These are stellar systems where two stars orbit around one another, but they are so close together that from here on Earth we do not see them as separate points of light. Instead, they appear as a single point which contains contributions from both stars, and so “its” color and brightness will look different than a single star.

To simulate these unresolved binaries, I chose a fraction of the stars to turn into binaries. For those binaries, I chose the ratio of the masses of the two stars. The “primary” is always the larger of the two stars, so the mass ratio is a number between zero and one, which is chosen at random. When these binary stars are evolved, the C code figures out the photometry of each star separately then adds the contributions together, giving the final photometry of that single point of light. That got me a CMD that looks like:

I have simulated a star cluster! A perfect star cluster, unsullied by such things as noise and contamination and observational error. Let’s muck it up to make it more realistic!

From here on Earth, a cluster is just a patch of stars in the sky. The sky, unlike space, is two dimensional, so in addition to the stars in the cluster, some of the stars that we see in the same patch of sky are in front of or behind the cluster and not part of the cluster at all. We call these “field stars”.

There are ways for astronomers to remove field stars from their data, but some contamination is inevitable. So my code also simulates field stars via a separate function. I wanted to make sure that the field stars look like actual stars (i.e. that their colors in the different bands make sense for a star in front of or behind the cluster), so I picked an age, distance, and metallicity at random, and then used those to evolve the star.

Finally, I needed to simulate observational errors. After converting the relevant data in the cluster structure into a pandas DataFrame, I scattered the photometry randomly based on a simulated data collection process.

# The signal 2 noise coefficients for each band of photometry
# Used by scatterPhot to add noise to simulated cluster star photometry
# Originally defined in scatterCuster.c
s2nCoeffs = [
    [9.33989,  0.3375778],  # U 
    [10.0478,  0.3462758],  # B 
    [10.48098, 0.368201 ],  # V 
    [10.71151, 0.3837847],  # R 
    [10.61035, 0.3930941],  # I 
    [9.282385, 0.386258 ],  # J 
    [9.197463, 0.3970419],  # H 
    [9.024068, 0.3985604],  # K
    [9.024068, 0.3985604],  # IRAC Blue
    [9.024068, 0.3985604],  # IRAC Red
    [9.024068, 0.3985604],  # Band 1
    [9.024068, 0.3985604],  # Band 2
    [9.024068, 0.3985604],  # Band 3
    [9.024068, 0.3985604]   # Band 4
]

### Scatter photometry and return
def scatterPhot(simdf: pdDataFrame, params: Params) -> pdDataFrame:
    #For each filter that we are outputting
    scatterdf = simdf.copy()
    for filt in range(FILTS):
        if c.getUseFilt(filt):
            if params.exposures[filt] > 0.0:
                #Check if the star's photometry is above our desired limiting S/N
                logS2N  = s2nCoeffs[filt][0] - s2nCoeffs[filt][1] * scatterdf[c.getFilterName(filt).decode()]
                s2n = 10. ** logS2N
                s2n *= np.sqrt(params.exposures[filt])
                sigma = 1./(s2n)
                sigma[sigma < 0.005] = 0.005
                scatterdf["sig" + c.getFilterName(filt).decode()] = sigma
                scatterdf[c.getFilterName(filt).decode()] += np.random.normal(0.,sigma)
    return scatterdf

This function pretends that the cluster is being observed via a certain telescope at Keck observatory in Hawaii. I tell the function how long to expose for, and the function determines how much noise I would expect to see for that band for that exposure time. It adds Gaussian noise to the photometry accordingly.

The final result with field stars and scattering looks something like this:

The stars — especially the fainter stars — have scattered and now have error bars. (The brighter stars have error bars, too, but they are too small to see).

The code I was using to plot this makes use of another piece of code that I originally wrote in R way back in 2006 and adapted into Python for this project:

def HR(data:PandasDataFrame,
        band1: str="B",
        band2: str="V",
        marker: str=".",
        color="black",
        size=20,
        type: str="p",
        overplot: bool = False,
        ebars: bool = True):
    """
    Takes a data frame that contains multiband stellar photometry 
    and plots a Hertzsprung-Russel diagram (aka color-magnitude diagram)
    with or without error bars.Returns the points and error bars as a tuple of points objects (so
    they can be hidden or removed in the parent program as needed)
    """
    data=data[data[band2]<90]
    points = []
    if not overplot:
        plt.ion()
        plt.figure()
        plt.show()
        plt.gca().invert_yaxis()
        plt.gca().set_ylabel(band2)
        plt.gca().set_xlabel(band1 + " - " + band2)
    if type == "l":
        if 'stage2' in data.columns:
            data = data[data['stage2'] == 9]
        pops = data.value_counts(["pop"])
        groups = data.groupby("pop")
        popnames = [pop[0] for pop in pops.index if pop[0] != 9]
        for popname in popnames:
            group = groups.get_group(popname)
            group = group.sort_values(["mass1"])
            WDs = group[group["stage1"]==3]
            MS = group[group["stage1"]==1]
            points.append(plt.plot(WDs[band1]-WDs[band2],WDs[band2],c=color))
            points.append(plt.plot(MS[band1]-MS[band2],MS[band2],c=color))
    
    if type == "p":
        point = plt.scatter(data[band1]-data[band2],data[band2],marker=marker,c=color,s=size)
        points.append(point)
        if ebars:
            #first check to see if you have error bars in the file
            sig1 = "sig" + band1
            sig2 = "sig" + band2
            if sig1 in data.columns and sig2 in data.columns:
                sig_color = np.sqrt(data[sig1]**2 + data[sig2]**2)
                errors = plt.errorbar(data[band1]-data[band2],
                                        data[band2],
                                        yerr = data[sig2],
                                        xerr=sig_color,
                                        c=color,
                                        ls="none")
                points.append(errors)
    return points

I now had a simulated cluster and a function with which to plot it. In my next installment, I will start adding buttons to our interface.

Complete code available at https://github.com/stevendegennaro/mcmc

My God, It’s Full of Stars (4/7) — Plotting a Color-Magnitude Diagram in Python from Stellar Evolution Code Written in C

Written by Data Science Filmmaker