astromodule.io.read_table#
- read_table(path: Table | DataFrame | str | Path | BufferedIOBase | RawIOBase | TextIOBase, fmt: str | None = None, columns: Sequence[str] | None = None, low_memory: bool = False, comment: str | None = None, na_values: Sequence[str] | Dict[str, Sequence[str]] = None, keep_default_na: bool = True, na_filter: bool = True, header: Literal['infer'] | int | Sequence[int] = 'infer', col_names: Sequence[str] | None = None) DataFrame[source]#
This function tries to detect the table type comparing the file extension and returns a pandas dataframe of the loaded table.
Supported table types:
Table Type
Extensions
Fits
.fit, .fits, .fz
Votable
.vo, .vot, .votable, .xml
ASCII
.csv, .tsv, .dat
Heasarc
.tdat
Arrow
.parquet, .feather
- Parameters:
- path
strorPath Path to the table to be read.
- fmt
str|None Specify the file format manually to avoid inference by file extension. This parameter can be used to force a specific parser for the given file.
- columns
Sequence[str] |None If specified, only the column names in list will be loaded. Can be used to reduce memory usage.
- low_memorybool
Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).
Note
Used only for ASCII tables, ignored by other types of tables.
- comment
str|None Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as
skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, ifcomment='#', parsing#empty\na,b,c\n1,2,3withheader=0will result in'a,b,c'being treated as the header.Note
Used only for ASCII tables, ignored by other types of tables.
- na_values: Hashable, Iterable of Hashable or dict of {HashableIterable}
Additional strings to recognize as
NA/NaN. Ifdictpassed, specific per-columnNAvalues. By default the following values are interpreted asNaN: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null “.Note
Used only for ASCII tables, ignored by other types of tables.
- keep_default_nabool
Whether or not to include the default
NaNvalues when parsing the data. Depending on whetherna_valuesis passed in, the behavior is as follows:If
keep_default_naisTrue, andna_valuesare specified,na_valuesis appended to the default NaN values used for parsing.If
keep_default_naisTrue, andna_valuesare not specified, only the defaultNaNvalues are used for parsing.If
keep_default_naisFalse, andna_valuesare specified, only theNaNvalues specified na_values are used for parsing.If
keep_default_naisFalse, andna_valuesare not specified, no strings will be parsed asNaN.
Note that if
na_filteris passed in asFalse, thekeep_default_naandna_valuesparameters will be ignored.Note
Used only for ASCII tables, ignored by other types of tables.
- na_filterbool
Detect missing value markers (empty strings and the value of
na_values). In data without anyNAvalues, passingna_filter=Falsecan improve the performance of reading a large file.Note
Used only for ASCII tables, ignored by other types of tables.
- header‘infer’ or
intorSequence[int] Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no
namesare passed the behavior is identical toheader=0and column names are inferred from the first line of the file, if column names are passed explicitly tonamesthen the behavior is identical toheader=None. Explicitly passheader=0to be able to replace existing names. The header can be a list of integers that specify row locations for apandas.MultiIndexon the columns e.g.[0, 1, 3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines ifskip_blank_lines=True, soheader=0denotes the first line of data rather than the first line of the file.Note
Used only for ASCII tables, ignored by other types of tables.
- col_names
Sequence[str] Sequence of column labels to apply. If the file contains a header row, then you should explicitly pass
header=0to override the column names. Duplicates in this list are not allowed.Note
Used only for ASCII tables, ignored by other types of tables.
- path
- Returns:
pd.DataFrameThe table as a pandas dataframe
- Raises:
ValueErrorRaises an error if the file extension can not be detected
Notes
The Transportable Database Aggregate Table (TDAT) type is a data structure created by NASA’s Heasarc project and a very simple parser was implemented in this function due to lack of support in packages like pandas and astropy. For more information, see [1]
References