astromodule.io.read_table#

read_table(path: Table | DataFrame | str | Path | BufferedIOBase | RawIOBase | TextIOBase, fmt: str | None = None, columns: Sequence[str] | None = None, low_memory: bool = False, comment: str | None = None, na_values: Sequence[str] | Dict[str, Sequence[str]] = None, keep_default_na: bool = True, na_filter: bool = True, header: Literal['infer'] | int | Sequence[int] = 'infer', col_names: Sequence[str] | None = None) DataFrame[source]#

This function tries to detect the table type comparing the file extension and returns a pandas dataframe of the loaded table.

Supported table types:

Table Type

Extensions

Fits

.fit, .fits, .fz

Votable

.vo, .vot, .votable, .xml

ASCII

.csv, .tsv, .dat

Heasarc

.tdat

Arrow

.parquet, .feather

Parameters:
pathstr or Path

Path to the table to be read.

fmtstr | None

Specify the file format manually to avoid inference by file extension. This parameter can be used to force a specific parser for the given file.

columnsSequence[str] | None

If specified, only the column names in list will be loaded. Can be used to reduce memory usage.

low_memorybool

Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).

Note

Used only for ASCII tables, ignored by other types of tables.

commentstr | None

Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment='#', parsing #empty\na,b,c\n1,2,3 with header=0 will result in 'a,b,c' being treated as the header.

Note

Used only for ASCII tables, ignored by other types of tables.

na_values: Hashable, Iterable of Hashable or dict of {HashableIterable}

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null “.

Note

Used only for ASCII tables, ignored by other types of tables.

keep_default_nabool

Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

  • If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.

  • If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.

  • If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.

  • If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.

Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.

Note

Used only for ASCII tables, ignored by other types of tables.

na_filterbool

Detect missing value markers (empty strings and the value of na_values). In data without any NA values, passing na_filter=False can improve the performance of reading a large file.

Note

Used only for ASCII tables, ignored by other types of tables.

header‘infer’ or int or Sequence[int]

Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly to names then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a pandas.MultiIndex on the columns e.g. [0, 1, 3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

Note

Used only for ASCII tables, ignored by other types of tables.

col_namesSequence[str]

Sequence of column labels to apply. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

Note

Used only for ASCII tables, ignored by other types of tables.

Returns:
pd.DataFrame

The table as a pandas dataframe

Raises:
ValueError

Raises an error if the file extension can not be detected

Notes

The Transportable Database Aggregate Table (TDAT) type is a data structure created by NASA’s Heasarc project and a very simple parser was implemented in this function due to lack of support in packages like pandas and astropy. For more information, see [1]

References