Dataset#
- class mergernet.data.dataset.Dataset[source]#
Bases:
object
High-level representation of the dataset. This class abstracts all IO operations of the dataset (e.g. download, prepare, split)
- Parameters:
config (DatasetConfig) – The configuration object of the database get from Dataset.registry attribute
Attributes
A registry containing all datasets configurations
- _create_dataset_table()[source]#
Scan the images table and create a csv table with filenames if the dataset config has no table.
- _discretize_label(y: ndarray) ndarray [source]#
Find all ocurrences in table that matches
DatasetConfig.label_map
key and replaces with respective value.- Parameters:
y (np.ndarray) –
- clear()[source]#
Removes all downloaded files from hard disk. This includes:
Table file
Image archive
Extracted images folder
- static concat_fold_column(df: DataFrame, fname_column: str | None = None, class_column: str | None = None, r_column: str | None = None, n_splits: int = 5, bins: int = 3) DataFrame [source]#
- download()[source]#
Check if destination path exists, create missing folders and download the dataset files from web resource for a specified dataset type.
- get_X_by_fold(fold: int, kind='test') ndarray [source]#
Get X by fold
- Parameters:
- Returns:
X values
- Return type:
- get_fold(fold: int) Tuple[DatasetV2, DatasetV2] [source]#
Generates the train and test dataset based on selected fold
- get_n_folds() int [source]#
Get the number of folds in dataset
- Returns:
the number of folds
- Return type:
- is_dataset_downloaded() bool [source]#
Check if dataset files are downloaded locally at
Experiment.local_shared_path
- Returns:
True if the images dir and the table are found, False otherwise
- Return type:
- registry: DatasetRegistry = <mergernet.data.dataset_config.DatasetRegistry object>#
A registry containing all datasets configurations