Datasets
class indico.queries.datasets.CreateDataset(name, files, wait=True, dataset_type='TEXT', from_local_images=False, image_filename_col='filename', batch_size=20, ocr_engine=None, omnipage_ocr_options=None, read_api_ocr_options=None)
Create a dataset and upload the associated files.
- Parameters:
- name (str) – Name of the dataset
- files (List[str**]) – List of pathnames to the dataset files
Options:
: dataset_type (str): Type of dataset to create [TEXT, DOCUMENT, IMAGE]
wait (bool, default=True): Wait for the dataset to upload and finish
- Returns:
Dataset object
class indico.queries.datasets.GetDataset(id)
Retrieve a dataset description object
- Parameters:
id (int) – id of the dataset to query - Returns:
Dataset object
class indico.queries.datasets.GetDatasetStatus(id)
Get the status of a dataset
- Parameters:
id (int) – id of the dataset to query - Returns:
COMPLETE or FAILED - Return type:
status (str)
class indico.queries.datasets.GetDatasetFileStatus(id)
Get the status of dataset file upload
- Parameters:
id (int) – id of the dataset to query - Returns:
DOWNLOADED or FAILED - Return type:
status (str)
class indico.queries.datasets.ListDatasets(*, limit=100)
List all of your datasets
Options:
: limit (int, default=100): Max number of datasets to retrieve
- Returns:
List[Dataset]
class indico.queries.datasets.DeleteDataset(id)
Delete a dataset
- Parameters:
id (int) – ID of the dataset - Returns:
The success of the operation - Return type:
success (bool)
class indico.queries.datasets.AddDatasetFiles(dataset_id, files, autoprocess=False, wait=True, batch_size=20)
Add files to a dataset.
- Parameters:
- dataset_id (int) – ID of the dataset
- files (List[str**]) – List of pathnames to the dataset files
Options:
: autoprocess (bool, default=False): Automatically process new dataset files
wait (bool, default=True): Block while polling for status of files
batch_size (int, default=20): Batch size for uploading files
- Returns:
Dataset
class indico.queries.datasets.RemoveDatasetFile(dataset_id, file_id)
Remove a file from a dataset by ID. To retrieve a list of files in a dataset,
see GetDatasetFileStatus.
- Parameters:
- dataset_id (int) – Dataset ID
- file_id (int) – Datafile ID (returned by GetDatasetFileStatus)
- Returns:
Dataset object
Updated about 1 year ago