8.2.2. Detecting sources in the XDF#

8.2.2.1. Data for this notebook#

We will again be manipulating Hubble eXtreme Deep Field (XDF) data, which was collected using the Advanced Camera for Surveys (ACS) on Hubble between 2002 and 2012. The image we use here is the result of 1.8 million seconds (500 hours!) of exposure time, and includes some of the faintest and most distant galaxies that had ever been observed.

The methods demonstrated here are also available in narrative from within the photutils.detection documentation.

The original authors of this notebook were Lauren Chambers, Erik Tollerud and Tom Wilson.

8.2.2.2. Import necessary packages#

First, let’s import packages that we will use to perform arithmetic functions and visualize data:

from astropy.io import fits
import astropy.units as u
from astropy.nddata import CCDData
from astropy.stats import sigma_clipped_stats, SigmaClip
from astropy.visualization import ImageNormalize, LogStretch
import matplotlib.pyplot as plt
from matplotlib.ticker import LogLocator
import numpy as np

from photutils.background import Background2D, MeanBackground
from photutils.detection import DAOStarFinder, IRAFStarFinder

# Show plots in the notebook
%matplotlib inline

Let’s also define some matplotlib parameters, such as title font size and the dpi, to make sure our plots look nice. To make it quick, we’ll do this by loading a style file shared with the other photutils tutorials into pyplot. We will use this style file for all the notebook tutorials. (See here to learn more about customizing matplotlib.)

plt.style.use('../photutils_notebook_style.mplstyle')

8.2.2.3. Data representation#

Throughout this notebook, we are going to store our images in Python using a CCDData object (see Astropy documentation), which contains a numpy array in addition to metadata such as uncertainty, masks, or units. In this case, each image has units electrons (counts) per second.

Note that you could create the CCDData object directly from the URL where this image is taken from:

url = 'https://archive.stsci.edu/pub/hlsp/xdf hlsp_xdf_hst_acswfc-60mas_hudf_f435w_v1_sci.fits'
xdf_image = CCDData.read(url)

Since the data for the guide is meant to be downloaded in bulk before going through the guide, we read the image from disk here.

xdf_image = CCDData.read('hlsp_xdf_hst_acswfc-60mas_hudf_f435w_v1_sci.fits')

As explained in a previous notebook on background estimation, it is important to mask these data, as a large portion of the values are equal to zero. We will mask out the non-data portions of the image array, so all of those pixels that have a value of zero don’t interfere with our statistics and analyses of the data.

# Define the mask
mask = xdf_image.data == 0
xdf_image.mask = mask

Let’s look at the data:

# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Set up the normalization and colormap
norm_image = ImageNormalize(vmin=1e-4, vmax=5e-2, stretch=LogStretch(), clip=False)
cmap = plt.get_cmap('viridis')
cmap.set_bad('white') # Show masked data as white

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)

# Define the colorbar
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator())

def format_colorbar(bar):
    # Add minor tickmarks
    bar.ax.yaxis.set_minor_locator(LogLocator(subs=range(1, 10)))

    # Force the labels to be displayed as powers of ten and only at exact powers of ten
    bar.ax.set_yticks([1e-4, 1e-3, 1e-2])
    labels = [f'$10^{{{pow:.0f}}}$' for pow in np.log10(bar.ax.get_yticks())]
    bar.ax.set_yticklabels(labels)

format_colorbar(cbar)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.grid()
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)');
../../_images/5f65e2198d3650d33a95fc116f6a8fd884c0d262c00bff951f1e5cab72b267bd.png

Tip: Double-click on any inline plot to zoom in.

With the DAOStarFinder class, photutils provides users with an easy application of the popular DAOFIND algorithm (Stetson 1987, PASP 99, 191), originally developed at the Dominion Astrophysical Observatory.

This algorithm detects sources by:

  • Searching for local maxima

  • Selecting only sources with peak amplitude above a defined threshold

  • Selecting sources with sizes and shapes that match a 2-D Gaussian kernel (circular or elliptical)

It returns:

  • Location of the source centroid

  • Parameters reflecting the source’s sharpness and roundness

Generally, the threshold that source detection algorithms use is defined as a multiple of the standard deviation. So first, we need to calculate statistics for the data:

mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, maxiters=5, mask=xdf_image.mask)

Now, let’s run the DAOStarFinder algorithm on our data and see what it finds.

daofind = DAOStarFinder(fwhm=5.0, threshold=20. * std)
sources_dao = daofind(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_dao)
 id      xcentroid      ...        flux                 mag         
---- ------------------ ... ------------------ ---------------------
   1  2466.854970610095 ... 1.0288840532302856  -0.03091609026012894
   2 2509.7383250920184 ... 1.5627723932266235    -0.484739326743976
   3   2465.51286379057 ... 1.9257498979568481   -0.7114997088012577
   4 2525.2552128830876 ... 11.329010963439941    -2.635479992692634
   5 2465.2203887755345 ... 1.1178830862045288  -0.12099096309942062
   6 2551.7160997488236 ... 1.2884687185287476  -0.27518469801035816
   7   2613.70434435853 ... 1.8613871335983276   -0.6745917690713554
   8  2677.991255036429 ... 2.1094906330108643   -0.8104440031988435
   9 2585.5858708243118 ... 3.5341360569000244    -1.370708162320959
  10 2325.6447511340125 ...  1.514251947402954  -0.45049535214974534
 ...                ... ...                ...                   ...
1461 2814.0691684793956 ... 1.4545104503631592  -0.40679211478057165
1462 2452.1151453100597 ...  3.013660192489624   -1.1977357038496481
1463 2630.6208922286264 ...  1.357964277267456  -0.33222086376091386
1464 2676.3738336329197 ... 1.2891004085540771   -0.2757168651109099
1465  2549.531985289118 ... 1.3755886554718018  -0.34622146380705604
1466  2878.448681768306 ...   2.99257755279541   -1.1901135349682006
1467 2516.4156892210945 ... 1.0459871292114258  -0.04881585151254138
1468 2514.8031440763284 ...  1.731210470199585   -0.5958746750174272
1469 2718.2891658337066 ... 1.1487256288528442  -0.15054077645439984
1470  2783.582260375173 ... 1.0245482921600342 -0.026331084306021015
Length = 1470 rows

There are almost 1,500 sources detected in the image.

# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')

# Define the colorbar
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator())

# format the colorbar
format_colorbar(cbar)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOFind Sources');
../../_images/2ceb7d29331d5531ee899fcb3dcac03425df85ab6c9e2d0839bf5894c917a59c.png

Looking at a cutout around the relatively large galaxy in the top center of the image is instructive. As you can see below, the large galaxy is identified by DAOStarFinder as several overlapping sources. Some of the smaller galaxies in the image are identified as sources, but many are not. The parameters could be tuned to detect more of the sources, but it turns out there are source detection methods better suited to extended sources that will be discussed in XX BROKEN LINK XX.

fig, ax = plt.subplots(figsize=(8, 8))
top = 1125
left = 2350
cutout_size = 500
inset_display_size = 5 * cutout_size

fitsplot = ax.imshow(xdf_image, cmap=cmap, norm=norm_image, alpha=0.4)
ax.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
           lw=1, alpha=0.1, facecolor='None', edgecolor='r')

ax.set_xlabel('X (pixels)')
ax.set_ylabel('Y (pixels)')
ax.set_title('DAOFind Sources')

ax2 = ax.inset_axes([2600 - inset_display_size // 2, 2300, inset_display_size, inset_display_size], 
                    xlim=(left, left + cutout_size), ylim=(top, top + cutout_size),
                    transform=ax.transData)
ax.indicate_inset_zoom(ax2, edgecolor="black")

ax2.imshow(xdf_image, cmap=cmap, norm=norm_image)

in_region = (
    (left < sources_dao['xcentroid']) &
    (sources_dao['xcentroid'] < (left + cutout_size)) & 
    (top < sources_dao['ycentroid']) &
    (sources_dao['ycentroid'] < (top + cutout_size))
)
sources_dao_to_plot = sources_dao[in_region]
ax2.scatter(sources_dao_to_plot['xcentroid'], sources_dao_to_plot['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r');
../../_images/72df53f28c8d721b5b99ec761ab41d5f8d68941d2bb0ba491354daf2d16f7f03.png

8.2.2.4. Source Detection with IRAFStarFinder#

Similarly to DAOStarFinder, IRAFStarFinder is a class that implements a pre-existing algorithm that is widely used within the astronomical community. This class uses the starfind algorithm that was originally part of IRAF.

IRAFStarFinder is fundamentally similar to DAOStarFinder in that it detects sources by finding local maxima above a certain threshold that match a Gaussian kernel. However, IRAFStarFinder differs in the following ways:

  • Does not allow users to specify an elliptical Gaussian kernel

  • Uses image moments to calculate the centroids, roundness, and sharpness of objects

Let’s run the IRAFStarFinder algorithm on our data, with the same FWHM and threshold, and see how its results differ from DAOStarFinder:

iraffind = IRAFStarFinder(fwhm=5.0, threshold=20. * std)
sources_iraf = iraffind(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_iraf)
 id     xcentroid      ...         flux                mag         
--- ------------------ ... ------------------- --------------------
  1 2509.7190284925837 ... 0.14511650687200017    2.095707960365292
  2  2613.694107545662 ... 0.19381765997968614    1.781516635262892
  3  2635.000214262164 ... 0.10974624977097847   2.3990257786621534
  4  2585.635146488398 ...  0.2907299152575433   1.3412756957473753
  5 2297.1310169615317 ... 0.16760808671824634   1.9392625786380697
  6 2806.3919192581743 ... 0.12026738276472315   2.2996303498965296
  7  2506.104118124536 ...  0.2897530668415129   1.3449298965619323
  8 2326.4076700341316 ... 0.22564100101590157   1.6164549556816468
  9   2516.95367865115 ... 0.12704244733322412   2.2401278715754076
 10 2702.4927638179342 ... 0.11358246637973934    2.361721762720908
...                ... ...                 ...                  ...
202 2677.4075490936834 ...   1.142329734750092 -0.14447870397939355
203   2568.16614546712 ...  0.1637192026246339   1.9647509480061578
204 2060.2947886661673 ... 0.24417303601512685   1.5307557420138815
205 3163.7916559714276 ...   0.240644340403378   1.5465608695654134
206  2349.238879210583 ... 0.13923865760443732   2.1406004309678597
207 2348.7062258479177 ... 0.27708356245420873   1.3934730932237431
208 2878.2765247763396 ... 0.14217845676466823   2.1179155100181206
209  2475.923527971974 ... 0.17424126411788166    1.897122466941524
210  2676.314814458915 ... 0.10654794920992572   2.4311372624667555
211  2514.790677543236 ... 0.13224901584908366    2.196518877870933
Length = 211 rows

There are many fewer sources detected this time – only 211 compared to the almost 1500 detected by DAOStarFinder.

# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_iraf['xcentroid'], sources_iraf['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')

# Define the colorbar
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator())

# format the colorbar
format_colorbar(cbar)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('IRAFFind Sources');
../../_images/1817d54efd19f4e3101ea832c813e419b63862db6c5c17abd50ade80b625edcd.png

Looking at a cutout around the relatively large galaxy in the top center of the image is again instructive and gives a hint as to some of the differences between DAOStarFinder and IRAFStarFinder.

There are several differences. The large galaxy is now identified as a single source, but many fewer sources are detected overall. The sources aside from the large galaxy that are detected are ones that are point-like.

fig, ax = plt.subplots(figsize=(8, 8))
top = 1125
left = 2350
cutout_size = 500
inset_display_size = 5 * cutout_size

fitsplot = ax.imshow(xdf_image, cmap=cmap, norm=norm_image, alpha=0.4)
ax.scatter(sources_iraf['xcentroid'], sources_iraf['ycentroid'], s=30, marker='o',
           lw=1, alpha=0.1, facecolor='None', edgecolor='r')

ax.set_xlabel('X (pixels)')
ax.set_ylabel('Y (pixels)')
ax.set_title('IRAF Starfind Sources')

ax2 = ax.inset_axes([2600 - inset_display_size // 2, 2300, inset_display_size, inset_display_size], 
                    xlim=(left, left + cutout_size), ylim=(top, top + cutout_size),
                    transform=ax.transData)
ax.indicate_inset_zoom(ax2, edgecolor="black")

ax2.imshow(xdf_image, cmap=cmap, norm=norm_image)

in_region = (
    (left < sources_iraf['xcentroid']) &
    (sources_iraf['xcentroid'] < (left + cutout_size)) & 
    (top < sources_iraf['ycentroid']) &
    (sources_iraf['ycentroid'] < (top + cutout_size))
)
sources_iraf_to_plot = sources_iraf[in_region]
ax2.scatter(sources_iraf_to_plot['xcentroid'], sources_iraf_to_plot['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r');
../../_images/12f038b10cc35f2dcc0ff92a6026e679e28c8a94d1408cbff84026f13456a4e2.png

8.2.2.5. Comparing DAOStarFinder and IRAFStarFinder#

You might have noticed that the IRAFStarFinder algorithm only found 211 sources in our data – 14% of what DAOStarFinder found. Why is this?

The answer comes down to the default settings for the two algorithms: (1) there are differences in the upper and lower bounds on the requirements for source roundness and sharpness, and (2) IRAFStarFinder includes a minimum separation between sources that DAOStarFinder does not have:

 

IRAFStarFinder

DAOStarFinder

sharplo

0.5

0.2

sharphi

2.0

1.0

roundlo

0.0

-1.0

roundhi

0.2

1.0

minsep_fwhm

1.5 * FWHM

N/A

Thinking about this, it then makes sense that IRAFStarFinder would find fewer sources. It has tighter restrictions on source roundness and sharplo, meaning that it eliminates more elliptical galactic sources (this is the eXtreme Deep Field, after all!), and the minimum separation requirement further rules out sources that are too close to one another.

If we set all these parameters to be equivalent, though, we should find much better agreement between the two methods:

iraffind_match = IRAFStarFinder(fwhm=5.0, threshold=20. * std,
                                sharplo=0.2, sharphi=1.0,
                                roundlo=-1.0, roundhi=1.0,
                                minsep_fwhm=0.0)
sources_iraf_match = iraffind_match(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_iraf_match)
 id      xcentroid      ...         flux               mag        
---- ------------------ ... ------------------- ------------------
   1 2509.7190284925837 ... 0.14511650687200017  2.095707960365292
   2  2465.397953856956 ... 0.28363753808662295 1.3680907321616993
   3 2525.2784415133956 ...  0.8550870837643743 0.1699741340204049
   4 2465.9310085123957 ... 0.25824849400669336 1.4699055060454422
   5 2551.3794078445167 ... 0.15369679452851415 2.0333379749249216
   6  2613.694107545662 ... 0.19381765997968614  1.781516635262892
   7  2635.000214262164 ... 0.10974624977097847 2.3990257786621534
   8 2634.7190390648143 ... 0.38626047875732183 1.0327993138708345
   9  2678.027532333543 ...  0.1929994365782477  1.786109897057187
  10  2585.635146488398 ...  0.2907299152575433 1.3412756957473753
 ...                ... ...                 ...                ...
1406  2834.398111328477 ... 0.17723897472023964  1.878601927274589
1407 2955.8080014098355 ...  0.1022638701251708 2.4756944391009212
1408 2814.1086320053128 ...  0.1684430947061628 1.9338669703518374
1409 2452.1215084611995 ... 0.25968281202949584 1.4638919866099898
1410 2630.6803848992295 ... 0.16099105565808713  1.982995629582518
1411  2676.314814458915 ... 0.10654794920992572 2.4311372624667555
1412 2549.5763711845707 ... 0.12295090791303664  2.275670649823001
1413  2878.609142842793 ...  0.3203331842087209  1.235995173070129
1414 2606.8588280290196 ... 0.11837350379209965 2.3168637432861736
1415  2514.790677543236 ... 0.13224901584908366  2.196518877870933
Length = 1415 rows

The number of detected sources are in much better agreement now – 1415 versus 1470 – but the improved agreement can also be seen by plotting the location of these sources:

# Set up the figure with subplots
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(12, 6))
plt.tight_layout()

# Plot the DAOStarFinder data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOStarFinder Sources')

# Plot the IRAFStarFinder data
fitsplot = ax2.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax2.scatter(sources_iraf_match['xcentroid'], sources_iraf_match['ycentroid'],
            s=30, marker='o', lw=1, alpha=0.7, facecolor='None', edgecolor='r')
ax2.set_xlabel('X (pixels)')
ax2.set_title('IRAFStarFinder Sources')

# Define the colorbar
cbar_ax = fig.add_axes([1, 0.09, 0.03, 0.87])
cbar = plt.colorbar(fitsplot, cbar_ax, fraction=0.046, pad=0.04, ticks=LogLocator())

# format the colorbar
format_colorbar(cbar)

cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30);
../../_images/b1efcffc50dd0a2e923dac514b97345c10a87a1bd9c6014a5099438cb8e47e5a.png

Take this example as reminder to be mindful when selecting a source detection algorithm, and when defining algorithm parameters! Don’t be afraid to play around with the parameters and investigate how that affects your results.

The detailed view near the large galaxy confirms that the results are now nearly identical.

# Set up the figure with subplots
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(12, 6))
plt.tight_layout()

# Plot the DAOStarFinder data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
            lw=1, alpha=1, facecolor='None', edgecolor='r')
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOStarFinder Sources')
ax1.set_xlim(left, left + cutout_size)
ax1.set_ylim(top, top + cutout_size)

# Plot the IRAFStarFinder data
fitsplot = ax2.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax2.scatter(sources_iraf_match['xcentroid'], sources_iraf_match['ycentroid'],
            s=30, marker='o', lw=1, alpha=1, facecolor='None', edgecolor='r')
ax2.set_xlabel('X (pixels)')
ax2.set_title('IRAFStarFinder Sources')
ax2.set_xlim(left, left + cutout_size)
ax2.set_ylim(top, top + cutout_size)

# Define the colorbar
cbar_ax = fig.add_axes([1, 0.09, 0.03, 0.87])
cbar = plt.colorbar(fitsplot, cbar_ax, fraction=0.046, pad=0.04, ticks=LogLocator())
fig.suptitle("Cutout near large galaxy", fontsize=20, y=1.1)
# format the colorbar
format_colorbar(cbar)

cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30);
../../_images/9981562abd6f289f55b824676a962b1e2e34f176a2694481f7fce6c6789ee767.png

8.2.2.6. Summary#

Both DAOStarFinder and IRAFStarFinder are capable of producing similar results if the input parameters are matched. One noticable difference between them is that IRAFStarFinder includes a parameter for setting a minimum separation of sources. This is particularly useful to exclude overlapping sources if the intent is to perform aperture photometry.

In the section XX BROKEN LINK XX the detection methods discussed here will be compared with local peak detection, which performs better on extended sources. Section XX BROKEN LINK SEGMENTATIONXX will demonstrate using image segmentation to detect sources. That also performs better on extended sources than the star finders described in this notebook.