astromodule.io.read_table#
- read_table(path: Table | DataFrame | str | Path | BufferedIOBase | RawIOBase | TextIOBase, fmt: str | None = None, columns: Sequence[str] | None = None, low_memory: bool = False, comment: str | None = None, na_values: Sequence[str] | Dict[str, Sequence[str]] = None, keep_default_na: bool = True, na_filter: bool = True, header: Literal['infer'] | int | Sequence[int] = 'infer', col_names: Sequence[str] | None = None) DataFrame [source]#
This function tries to detect the table type comparing the file extension and returns a pandas dataframe of the loaded table.
Supported table types:
Table Type
Extensions
Fits
.fit, .fits, .fz
Votable
.vo, .vot, .votable, .xml
ASCII
.csv, .tsv, .dat
Heasarc
.tdat
Arrow
.parquet, .feather
- Parameters:
- path
str
orPath
Path to the table to be read.
- fmt
str
|None
Specify the file format manually to avoid inference by file extension. This parameter can be used to force a specific parser for the given file.
- columns
Sequence
[str
] |None
If specified, only the column names in list will be loaded. Can be used to reduce memory usage.
- low_memorybool
Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).
Note
Used only for ASCII tables, ignored by other types of tables.
- comment
str
|None
Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as
skip_blank_lines=True
), fully commented lines are ignored by the parameter header but not by skiprows. For example, ifcomment='#'
, parsing#empty\na,b,c\n1,2,3
withheader=0
will result in'a,b,c'
being treated as the header.Note
Used only for ASCII tables, ignored by other types of tables.
- na_values: Hashable, Iterable of Hashable or dict of {HashableIterable}
Additional strings to recognize as
NA
/NaN
. Ifdict
passed, specific per-columnNA
values. By default the following values are interpreted asNaN
: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null “.Note
Used only for ASCII tables, ignored by other types of tables.
- keep_default_nabool
Whether or not to include the default
NaN
values when parsing the data. Depending on whetherna_values
is passed in, the behavior is as follows:If
keep_default_na
isTrue
, andna_values
are specified,na_values
is appended to the default NaN values used for parsing.If
keep_default_na
isTrue
, andna_values
are not specified, only the defaultNaN
values are used for parsing.If
keep_default_na
isFalse
, andna_values
are specified, only theNaN
values specified na_values are used for parsing.If
keep_default_na
isFalse
, andna_values
are not specified, no strings will be parsed asNaN
.
Note that if
na_filter
is passed in asFalse
, thekeep_default_na
andna_values
parameters will be ignored.Note
Used only for ASCII tables, ignored by other types of tables.
- na_filterbool
Detect missing value markers (empty strings and the value of
na_values
). In data without anyNA
values, passingna_filter=False
can improve the performance of reading a large file.Note
Used only for ASCII tables, ignored by other types of tables.
- header‘infer’ or
int
orSequence
[int
] Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no
names
are passed the behavior is identical toheader=0
and column names are inferred from the first line of the file, if column names are passed explicitly tonames
then the behavior is identical toheader=None
. Explicitly passheader=0
to be able to replace existing names. The header can be a list of integers that specify row locations for apandas.MultiIndex
on the columns e.g.[0, 1, 3]
. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines ifskip_blank_lines=True
, soheader=0
denotes the first line of data rather than the first line of the file.Note
Used only for ASCII tables, ignored by other types of tables.
- col_names
Sequence
[str
] Sequence of column labels to apply. If the file contains a header row, then you should explicitly pass
header=0
to override the column names. Duplicates in this list are not allowed.Note
Used only for ASCII tables, ignored by other types of tables.
- path
- Returns:
pd.DataFrame
The table as a pandas dataframe
- Raises:
ValueError
Raises an error if the file extension can not be detected
Notes
The Transportable Database Aggregate Table (TDAT) type is a data structure created by NASA’s Heasarc project and a very simple parser was implemented in this function due to lack of support in packages like pandas and astropy. For more information, see [1]
References