astromodule.table.crossmatch_cds#
- crossmatch_cds(table: Table | DataFrame | str | Path | BufferedIOBase | RawIOBase | TextIOBase, cds_table: str = 'simbad', radius: float | Quantity = 1.0, ra: str | None = None, dec: str | None = None, find: Literal['all', 'best', 'best-remote', 'each', 'each-dist'] = 'all', fixcols: Literal['none', 'dups', 'all'] = 'dups', suffix_in: str = '_in', suffix_remote: str = '_cds', block_size: int = 50000, use_moc: bool = False, pre_sort: bool = False, service_url: str = None, fmt: Literal['fits', 'csv', 'parquet'] = 'parquet')[source]#
Uses the CDS X-Match service to join a local table to one of the tables hosted by the Centre de Données astronomiques de Strasbourg. This includes all of the VizieR tables and the SIMBAD database. The service is very fast, and in most cases it is the best way to match a local table against a large external table hosted by a service.
The local table is uploaded to the X-Match service in chunks, and the matches for each chunk are retrieved in turn and eventually stitched together to form the final result. The tool only uploads sky position and an identifier for each row of the input table, but all columns of the input table are reinstated in the result for reference.
- Parameters:
- table
TableLike
|PathOrFile
The table that will be crossmatched. This parameter accepts a table-like object (pandas dataframe, astropy table), a path to a file represented as a
str
orpathlib.Path
object, or a file object (BinaryIO, StringIO, file-descriptor, etc).- cds_table
str
Identifier of the table from the CDS crossmatch service that is to be matched against the local table. This identifier may be the standard VizieR identifier (e.g. “II/246/out” for the 2MASS Point Source Catalogue) or “simbad” to indicate SIMBAD data.
See for instance the TAPVizieR table searching facility at http://tapvizier.u-strasbg.fr/adql/ to find VizieR catalogue identifiers.
- radius
float
|u.Quantity
The crossmatch max error radius. This function accepts a
float
value, that will be interpreted asarcsec
unit, or aastropy.units.Quantity
.- ra
str
The name of the Right Ascension (RA) column. If
None
is passed, this function will try to guess the RA column name based on predefined patterns using the functionguess_coords_columns
, see this function’s documentation for more details.- dec
str
The name of the Declination (Dec) column. If
None
is passed, this function will try to guess the RA column name based on predefined patterns using the functionguess_coords_columns
, see this function’s documentation for more details.- find“all” or “best” or “best-remote” or “each” or “each-dist”
Determines which pair matches are included in the result.
all
: All matchesbest
: Matched rows, best remote row for each input rowbest-remote
: Matched rows, best input row for each remote roweach
: One row per input row, contains best remote match or blankeach-dist
: One row per input row, column giving distance only for best match
Note only the all mode is symmetric between the two tables.
Note also that there is a bug in best-remote matching. If the match is done in multiple blocks, it’s possible for a remote table row to appear matched against one local table row per uploaded block, rather than just once for the whole result. If you’re worried about that, set blocksize >= rowCount. This may be fixed in a future release.
- fixcols“none” or “dups” or “all”
Determines how input columns are renamed before use in the output table. The choices are:
none
: columns are not renameddups
: columns which would otherwise have duplicate names in the outputwill be renamed to indicate which table they came from
all
: all columns will be renamed to indicate which table they came from
If columns are renamed, the new ones are determined by
suffix*
parameters.- suffix_in
str
If the fixcols parameter is set so that input columns are renamed for insertion into the output table, this parameter determines how the renaming is done. It gives a suffix which is appended to all renamed columns from the input table. Default: “_in”
- suffix_remote
str
If the fixcols parameter is set so that input columns are renamed for insertion into the output table, this parameter determines how the renaming is done. It gives a suffix which is appended to all renamed columns from the CDS result table. Default: “_cds”
- block_sise
int
The CDS Xmatch service operates limits on the maximum number of rows that can be uploaded and the maximum number of rows that is returned as a result from a single query. In the case of large input tables, they are broken down into smaller blocks, and one request is sent to the external service for each block. This parameter controls the number of rows in each block. For an input table with fewer rows than this value, the whole thing is done as a single request.
At time of writing, the maximum upload size is 100Mb (about 3Mrow; this does not depend on the width of your table), and the maximum return size is 2Mrow.
Large blocksizes tend to be good (up to a point) for reducing the total amount of time a large xmatch operation takes, but they can make it harder to see the job progressing. There is also the danger (for ALL-type find modes) of exceeding the return size limit, which will result in truncation of the returned result.
- use_mocbool
If true, first acquire a MOC coverage map from CDS, and use that to pre-filter rows before uploading them for matching. This should improve efficiency, but have no effect on the result.
- pre_sortbool
If true, the rows are sorted by HEALPix index before they are uploaded to the CDS X-Match service. If the match is done in multiple blocks, this may improve efficiency, since when matching against a large remote catalogue the X-Match service likes to process requests in which sources are grouped into a small region rather than scattered all over the sky.
Note this will have a couple of other side effects that may be undesirable: it will read all the input rows into the task at once, which may make it harder to assess progress, and it will affect the order of the rows in the output table.
It is probably only worth setting true for rather large (multi-million-row?) multi-block matches, where both local and remote catalogues are spread over a significant fraction of the sky. But feel free to experiment.
- service_url
str
The URL at which the CDS Xmatch service can be found. Normally this should not be altered from the default, but if other implementations of the same service are known, this parameter can be used to access them.
- fmt“fits” or “csv” or “parquet”
This function converts the input table to file before passing to stilts backend. This parameter can be used to set the intermediate file type. Fits is faster and is the default file type.
- table