astromodule.table.crossmatch_cds#

crossmatch_cds(table: Table | DataFrame | str | Path | BufferedIOBase | RawIOBase | TextIOBase, cds_table: str = 'simbad', radius: float | Quantity = 1.0, ra: str | None = None, dec: str | None = None, find: Literal['all', 'best', 'best-remote', 'each', 'each-dist'] = 'all', fixcols: Literal['none', 'dups', 'all'] = 'dups', suffix_in: str = '_in', suffix_remote: str = '_cds', block_size: int = 50000, use_moc: bool = False, pre_sort: bool = False, service_url: str = None, fmt: Literal['fits', 'csv', 'parquet'] = 'parquet')[source]#

Uses the CDS X-Match service to join a local table to one of the tables hosted by the Centre de Données astronomiques de Strasbourg. This includes all of the VizieR tables and the SIMBAD database. The service is very fast, and in most cases it is the best way to match a local table against a large external table hosted by a service.

The local table is uploaded to the X-Match service in chunks, and the matches for each chunk are retrieved in turn and eventually stitched together to form the final result. The tool only uploads sky position and an identifier for each row of the input table, but all columns of the input table are reinstated in the result for reference.

Parameters:
tableTableLike | PathOrFile

The table that will be crossmatched. This parameter accepts a table-like object (pandas dataframe, astropy table), a path to a file represented as a str or pathlib.Path object, or a file object (BinaryIO, StringIO, file-descriptor, etc).

cds_tablestr

Identifier of the table from the CDS crossmatch service that is to be matched against the local table. This identifier may be the standard VizieR identifier (e.g. “II/246/out” for the 2MASS Point Source Catalogue) or “simbad” to indicate SIMBAD data.

See for instance the TAPVizieR table searching facility at http://tapvizier.u-strasbg.fr/adql/ to find VizieR catalogue identifiers.

radiusfloat | u.Quantity

The crossmatch max error radius. This function accepts a float value, that will be interpreted as arcsec unit, or a astropy.units.Quantity.

rastr

The name of the Right Ascension (RA) column. If None is passed, this function will try to guess the RA column name based on predefined patterns using the function guess_coords_columns, see this function’s documentation for more details.

decstr

The name of the Declination (Dec) column. If None is passed, this function will try to guess the RA column name based on predefined patterns using the function guess_coords_columns, see this function’s documentation for more details.

find“all” or “best” or “best-remote” or “each” or “each-dist”

Determines which pair matches are included in the result.

  • all: All matches

  • best: Matched rows, best remote row for each input row

  • best-remote: Matched rows, best input row for each remote row

  • each: One row per input row, contains best remote match or blank

  • each-dist: One row per input row, column giving distance only for best match

Note only the all mode is symmetric between the two tables.

Note also that there is a bug in best-remote matching. If the match is done in multiple blocks, it’s possible for a remote table row to appear matched against one local table row per uploaded block, rather than just once for the whole result. If you’re worried about that, set blocksize >= rowCount. This may be fixed in a future release.

fixcols“none” or “dups” or “all”

Determines how input columns are renamed before use in the output table. The choices are:

  • none: columns are not renamed

  • dups: columns which would otherwise have duplicate names in the output

    will be renamed to indicate which table they came from

  • all: all columns will be renamed to indicate which table they came from

If columns are renamed, the new ones are determined by suffix* parameters.

suffix_instr

If the fixcols parameter is set so that input columns are renamed for insertion into the output table, this parameter determines how the renaming is done. It gives a suffix which is appended to all renamed columns from the input table. Default: “_in”

suffix_remotestr

If the fixcols parameter is set so that input columns are renamed for insertion into the output table, this parameter determines how the renaming is done. It gives a suffix which is appended to all renamed columns from the CDS result table. Default: “_cds”

block_siseint

The CDS Xmatch service operates limits on the maximum number of rows that can be uploaded and the maximum number of rows that is returned as a result from a single query. In the case of large input tables, they are broken down into smaller blocks, and one request is sent to the external service for each block. This parameter controls the number of rows in each block. For an input table with fewer rows than this value, the whole thing is done as a single request.

At time of writing, the maximum upload size is 100Mb (about 3Mrow; this does not depend on the width of your table), and the maximum return size is 2Mrow.

Large blocksizes tend to be good (up to a point) for reducing the total amount of time a large xmatch operation takes, but they can make it harder to see the job progressing. There is also the danger (for ALL-type find modes) of exceeding the return size limit, which will result in truncation of the returned result.

use_mocbool

If true, first acquire a MOC coverage map from CDS, and use that to pre-filter rows before uploading them for matching. This should improve efficiency, but have no effect on the result.

pre_sortbool

If true, the rows are sorted by HEALPix index before they are uploaded to the CDS X-Match service. If the match is done in multiple blocks, this may improve efficiency, since when matching against a large remote catalogue the X-Match service likes to process requests in which sources are grouped into a small region rather than scattered all over the sky.

Note this will have a couple of other side effects that may be undesirable: it will read all the input rows into the task at once, which may make it harder to assess progress, and it will affect the order of the rows in the output table.

It is probably only worth setting true for rather large (multi-million-row?) multi-block matches, where both local and remote catalogues are spread over a significant fraction of the sky. But feel free to experiment.

service_urlstr

The URL at which the CDS Xmatch service can be found. Normally this should not be altered from the default, but if other implementations of the same service are known, this parameter can be used to access them.

fmt“fits” or “csv” or “parquet”

This function converts the input table to file before passing to stilts backend. This parameter can be used to set the intermediate file type. Fits is faster and is the default file type.