Peak finding in the XDF

8.3.2. Peak finding in the XDF#

8.3.2.1. Data for this notebook#

We will be manipulating Hubble eXtreme Deep Field (XDF) data, which was collected using the Advanced Camera for Surveys (ACS) on Hubble between 2002 and 2012. The image we use here is the result of 1.8 million seconds (500 hours!) of exposure time, and includes some of the faintest and most distant galaxies that had ever been observed.

The methods demonstrated here are also available in the photutils.detection documentation.

The original authors of this notebook were Lauren Chambers, Erik Tollerud and Tom Wilson.

We begin with imports.

import warnings

from astropy.nddata import CCDData
from astropy.stats import sigma_clipped_stats, SigmaClip
import astropy.units as u
from astropy.visualization import ImageNormalize, LogStretch

import matplotlib.pyplot as plt
from matplotlib.ticker import LogLocator

import numpy as np

from photutils.background import Background2D, MeanBackground
from photutils.centroids import centroid_2dg
from photutils.detection import find_peaks, DAOStarFinder, IRAFStarFinder

# Show plots in the notebook
%matplotlib inline

Next, we load the image and calculate the background statistics.

xdf_image = CCDData.read('hlsp_xdf_hst_acswfc-60mas_hudf_f435w_v1_sci.fits')
# Define the mask
mask = xdf_image.data == 0
xdf_image.mask = mask

mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, maxiters=5, mask=xdf_image.mask)

plt.style.use('../photutils_notebook_style.mplstyle')

8.3.2.2. Source Detection with peak finding#

For more general source detection cases that do not require comparison with models, photutils offers the find_peaks function.

This function simply finds sources by identifying local maxima above a given threshold and separated by a given distance, rather than trying to fit data to a given model. Unlike the previous detection algorithms, find_peaks does not necessarily calculate objects’ centroids. Unless the centroid_func argument is passed a function like photutils.centroids.centroid_2dg that can handle source position centroiding, find_peaks will return just the integer value of the peak pixel for each source.

The centroid-finding, which we do here only so that we can compare positions found by this method to positions found by the star-finding methods, can generate many warnings when an attempt is made to fit a 2D Gaussian to the sources. The warnings are suppressed below to avoid cluttering the output.

Let’s see how it does:

with warnings.catch_warnings(action="ignore"):
    sources_findpeaks = find_peaks(xdf_image.data - median, mask=xdf_image.mask,
                                   threshold=20. * std, box_size=29,
                                   centroid_func=centroid_2dg)
print(sources_findpeaks)

 id  x_peak y_peak  peak_value     x_centroid         y_centroid    
---- ------ ------ ----------- ------------------ ------------------
 2525    393  0.13548362 2525.7825499490064 393.88055503239474
 2466    400 0.029366337 2465.7798479347243 398.05573502870715
 2552    459 0.018309232  2553.144113059114  458.5071866239441
 2614    463 0.018226365 2612.9628856780873 463.90212782823716
 2635    471 0.021132588 2635.6120154601685 473.07924452934066
 2635    493  0.08027035 2634.6718726918175  492.2803943395769
 2678    496 0.026410304 2677.4804921300647  496.9715217867422
 2586    507 0.037171733  2585.462527787139  507.3101665735999
 2326    554 0.018130405  2325.528858931348  553.6418835727972
 2616    560 0.021488318  2615.938885064026  559.9740345848103
 ...    ...    ...         ...                ...                ...
 2836   4662  0.01867783  2828.840372102822  4661.979368247575
 2956   4663 0.030241651   2957.43798265238  4664.380775142155
 2814   4664 0.016782979  2818.781489666815  4662.160254025871
 2452   4711  0.04217213 2449.2187375121634  4711.273878402247
 2631   4712 0.015904581  2630.494634031711  4712.276375599891
 2676   4733 0.014843852 2676.3994552743634  4733.100046176483
 2549   4736 0.015206217 2551.5624408418103  4733.959834146592
 2878   4736  0.03463787 2879.7315243346816  4737.664616308867
 2607   4758 0.019089993  2603.851675357389   4761.25199341925
 2515   4764  0.02098297 2499.6013132419002  4716.863606119463
Length = 1033 rows

There are roughly 1,000 sources identified in the image. Recall from the notebook on star-finding detection methods that roughly 1,500 sources were found with DAOStarFinder.

# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
# Set up the normalization and colormap
norm_image = ImageNormalize(vmin=1e-4, vmax=5e-2, stretch=LogStretch(), clip=False)
cmap = plt.get_cmap('viridis')
cmap.set_bad('white') # Show masked data as white
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image.data),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_findpeaks['x_peak'], sources_findpeaks['y_peak'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')

# Define the colorbar
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator())

def format_colorbar(bar):
    # Add minor tickmarks
    bar.ax.yaxis.set_minor_locator(LogLocator(subs=range(1, 10)))

    # Force the labels to be displayed as powers of ten and only at exact powers of ten
    bar.ax.set_yticks([1e-4, 1e-3, 1e-2])
    labels = [f'$10^{{{pow:.0f}}}$' for pow in np.log10(bar.ax.get_yticks())]
    bar.ax.set_yticklabels(labels)

format_colorbar(cbar)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('find_peaks Sources');

../../_images/af42b0504f042afa83b6d10ee688705b5648dce7138be121059a40ccb3f062aa.png

We once again look at region around the large galaxy. A comparison of this approach with the star-finding approaches appears later in this section.

fig, ax = plt.subplots(figsize=(8, 8))
top = 1125
left = 2350
cutout_size = 500
inset_display_size = 5 * cutout_size

fitsplot = ax.imshow(xdf_image, cmap=cmap, norm=norm_image, alpha=0.4)
ax.scatter(sources_findpeaks['x_peak'], sources_findpeaks['y_peak'], s=30, marker='o',
           lw=1, alpha=0.1, facecolor='None', edgecolor='r')

ax.set_xlabel('X (pixels)')
ax.set_ylabel('Y (pixels)')
ax.set_title('Peak Finding Sources')

ax2 = ax.inset_axes([2600 - inset_display_size // 2, 2300, inset_display_size, inset_display_size], 
                    xlim=(left, left + cutout_size), ylim=(top, top + cutout_size),
                    transform=ax.transData)
ax.indicate_inset_zoom(ax2, edgecolor="black")

ax2.imshow(xdf_image, cmap=cmap, norm=norm_image)

in_region = (
    (left < sources_findpeaks['x_peak']) &
    (sources_findpeaks['x_peak'] < (left + cutout_size)) & 
    (top < sources_findpeaks['y_peak']) &
    (sources_findpeaks['y_peak'] < (top + cutout_size))
)
sources_findpeaks_to_plot = sources_findpeaks[in_region]
ax2.scatter(sources_findpeaks_to_plot['x_peak'], sources_findpeaks_to_plot['y_peak'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r');

../../_images/1caa1574b562cc739a437e55b72210efc41e88eaf21dfcad7403bcfecc19f77a.png

8.3.2.3. Comparing Detection Methods#

Let’s compare how each of these different strategies did.

First, we repeat our earlier detection using DAOStarFinder and IRAFStarFinder.

daofind = DAOStarFinder(fwhm=5.0, threshold=20. * std)
sources_dao = daofind(np.ma.masked_where(xdf_image.mask, xdf_image))

iraffind_match = IRAFStarFinder(fwhm=5.0, threshold=20. * std,
                                sharplo=0.2, sharphi=1.0,
                                roundlo=-1.0, roundhi=1.0,
                                minsep_fwhm=0.0)
sources_iraf_match = iraffind_match(np.ma.masked_where(xdf_image.mask, xdf_image))

print(f'''DAOStarFinder: {len(sources_dao)} sources
IRAFStarFinder: {len(sources_iraf_match)} sources
find_peaks: {len(sources_findpeaks)} sources''') 

DAOStarFinder: 1470 sources
IRAFStarFinder: 1415 sources
find_peaks: 1033 sources

Next, how many of these sources match? We can answer this question by using sets to compare the centroids of the different sources (rounding to the nearest integer).

# Make lists of centroid coordinates
centroids_dao = [(x, y) for x, y in sources_dao['xcentroid', 'ycentroid']]
centroids_iraf = [(x, y) for x, y in sources_iraf_match['xcentroid', 'ycentroid']]
centroids_findpeaks = [(x, y) for x, y in sources_findpeaks['x_centroid', 'y_centroid']]

# Round those coordinates to the ones place and convert them to be sets
rounded_centroids_dao = set([(round(x, 0), round(y, 0)) for x, y in centroids_dao])
rounded_centroids_iraf = set([(round(x, 0), round(y, 0)) for x, y in centroids_iraf])
rounded_centroids_findpeaks = set([(round(x, 0), round(y, 0)) for x, y in centroids_findpeaks])

# Examine the intersections of different sets to determine which sources are shared
all_match = rounded_centroids_dao.intersection(rounded_centroids_iraf).intersection(rounded_centroids_findpeaks)
dao_iraf_match = rounded_centroids_dao.intersection(rounded_centroids_iraf)
dao_findpeaks_match = rounded_centroids_dao.intersection(rounded_centroids_findpeaks)
iraf_findpeaks_match = rounded_centroids_iraf.intersection(rounded_centroids_findpeaks)

print(f'''Matching sources found by:
    All methods: {len(all_match)}
    DAOStarFinder & IRAFStarFinder: {len(dao_iraf_match)}
    DAOStarFinder & find_peaks: {len(dao_findpeaks_match)}
    IRAFStarFinder & find_peaks: {len(iraf_findpeaks_match)}''')

Matching sources found by:
    All methods: 172
    DAOStarFinder & IRAFStarFinder: 975
    DAOStarFinder & find_peaks: 191
    IRAFStarFinder & find_peaks: 213

And just for fun, let’s plot these matching sources. (The colors chosen to represent different sets are from Paul Tol’s guide for accessible color schemes.)

# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image), norm=norm_image)
ax1.scatter([x for x, y in list(dao_findpeaks_match)], [y for x, y in list(dao_findpeaks_match)],
            s=30, marker='s', lw=1,  facecolor='None', edgecolor='#EE7733',
            label='Found by DAO and find_peaks')
ax1.scatter([x for x, y in list(dao_iraf_match)], [y for x, y in list(dao_iraf_match)],
            s=30, marker='D', lw=1, facecolor='None', edgecolor='#EE3377',
            label='Found by DAO and IRAF')
ax1.scatter([x for x, y in list(iraf_findpeaks_match)], [y for x, y in list(iraf_findpeaks_match)],
            s=30, marker='o', lw=1, facecolor='None', edgecolor='#0077BB',
            label='Found by IRAF and find_peaks')
ax1.scatter([x for x, y in list(all_match)], [y for x, y in list(all_match)],
            s=30, marker='o', lw=1.2, linestyle=':',facecolor='None', edgecolor='#BBBBBB',
            label='Found by all methods')

# Add legend
ax1.legend(ncol=2, loc='lower center', bbox_to_anchor=(0.5, -0.25))

# Define labels
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('Sources Found by Different Methods');

../../_images/145ddf78319aab406171c233f6c63dd4c4147db4de8e212232786955524c1ae9.png

This is pretty cluttered, so we once again look at the region around the large galaxy in the image.

fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image), norm=norm_image, alpha=0.9)

marker_size = 60
ax1.scatter(sources_findpeaks['x_peak'], sources_findpeaks['y_peak'], s=marker_size, marker='s',
            lw=1, alpha=1, facecolor='None', edgecolor='r', label='Found by find_peaks')
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=2 * marker_size, marker='D',
            lw=1, alpha=1, facecolor='None', edgecolor='#EE7733', label='Found by DAOfind')
ax1.scatter(sources_iraf_match['xcentroid'], sources_iraf_match['ycentroid'],
            s=4 * marker_size, marker='o', lw=1, alpha=1, facecolor='None', edgecolor='#BBBBBB',
            label="Found by IRAF starfind")

# Add legend
ax1.legend(ncol=2, loc='lower center', bbox_to_anchor=(0.5, -0.25))

top = 1125
left = 2350
cutout_size = 500

ax1.set_xlim(left, left + cutout_size)
ax1.set_ylim(top, top + cutout_size)
# Define labels
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('Sources Found by Different Methods');

../../_images/044c376268ea2c19e1bc8a1b6dfae70f9f06b1abdf93dc8c26cb01b577081e3c.png

There is only one actual source in this cutout that was found by one of the star-finding techniques that were not also found by peak finding. Several sources ideintified as separate by star-finding sources in the large galaxy at the center are not actually separate. Peak finding suffers from this too but to a lesser extent.

8.3.2.4. Custom Detection Algorithms#

If none of the algorithms we’ve reviewed above do exactly what you need, photutils also provides infrastructure for you to generate and use your own source detection algorithm: the StarFinderBase object can be inherited and used to develop new star-finding classes. Take a look at the documentation for more information.

If you do go that route, remember that photutils is open source; you would be very welcome to open a pull request and incorporate your new star finder into the photutils source code – for everyone to use!

8.3.2.5. Summary#

The sources found by find_peaks are roughly the same as those found by the star-finding algorithms. In XX BROKEN LINK SEGMENTATION we will look at a much different approach that is especially well suited to finding extended objects.