Skip to content

Commit 191d14e

Browse files
authored
Datasets csv (#279)
* feat: add csv output * add PR template * update csv without --details * add PR template * update basename * docs: update changes, readme and version * update basename * update basename
1 parent e24a855 commit 191d14e

5 files changed

Lines changed: 238 additions & 78 deletions

File tree

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Overview
2+
3+
Does this
4+
5+
# JIRA
6+
> Please add here as many related tasks this PR covers with its brief description, if more than one ticket
7+
8+
- https://lifebit.atlassian.net/browse/LP-XXXX - adds CSV, JSON outputs
9+
- https://lifebit.atlassian.net/browse/LP-XXXX - pytests
10+
- https://lifebit.atlassian.net/browse/LP-XXXX - documentation
11+
12+
# Changes
13+
14+
- Implements X
15+
- Refactors Y
16+
- Adds/Removes Z
17+
18+
# Acceptance Criteria
19+
> Please add here as many scenarios as in the Story
20+
21+
> Normally this acceptance criteria is tested in ADAPT workspace in PROD
22+
23+
<details>
24+
<summary>Scenario 1 - proof this scenario passes</summary>
25+
</details>
26+
27+
<details>
28+
<summary>Scenario 2 - proof this scenario passes</summary>
29+
</details>
30+
31+
<details>
32+
<summary>Scenario X - proof this scenario passes</summary>
33+
</details>
34+
35+
36+
# DEV
37+
> This Environment is interchangable with PROD if the acceptance criteria can only be tested in DEV for example. If that is the case please name this section PROD (or any new environment)
38+
39+
<details>
40+
<summary>Proof this feature/patch works in this environment</summary>
41+
</details>
42+
43+
# AZURE
44+
45+
<details>
46+
<summary>Proof this feature/patch works in this environment</summary>
47+
</details>
48+
49+
# Interactive Analysis
50+
51+
<details>
52+
<summary>Proof this feature/patch works in this environment</summary>
53+
</details>

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
## lifebit-ai/cloudos-cli: changelog
22

3+
## v2.73.0 (2025-12-02)
4+
5+
### Feat
6+
7+
- Adds CSV output format for `datasets ls` with or without `--details`
8+
- Adds PR template
9+
310
## v2.72.0 (2025-12-02)
411

512
### Feat

README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1634,6 +1634,39 @@ If you require more information on the files and folder listed, you can use the
16341634
- Virtual Name (the file or folder name)
16351635
- Storage Path
16361636

1637+
**Output Format Options**
1638+
1639+
The `datasets ls` command supports different output formats using the `--output-format` option:
1640+
1641+
- **`stdout` (default)**: Displays results in the console with Rich formatting
1642+
- Without `--details`: Simple list of file/folder names with color coding (blue underlined for folders)
1643+
- With `--details`: Rich formatted table with all file information
1644+
1645+
- **`csv`**: Saves results to a CSV file
1646+
- Without `--details`: CSV with two columns: "Name,Storage Path"
1647+
- With `--details`: CSV with columns "Type, Owner, Size, Size (bytes), Last Updated, Virtual Name, Storage Path"
1648+
1649+
Examples:
1650+
1651+
```bash
1652+
# Simple list to console (default)
1653+
cloudos datasets ls Data --profile my_profile
1654+
1655+
# Detailed table in console
1656+
cloudos datasets ls Data --details --profile my_profile
1657+
1658+
# Simple CSV output
1659+
cloudos datasets ls Data --profile my_profile --output-format csv
1660+
1661+
# Detailed CSV output
1662+
cloudos datasets ls Data --details --output-format csv --profile my_profile
1663+
1664+
# Custom output filename
1665+
cloudos datasets ls Data --details --output-format csv --output-basename my_files --profile my_profile
1666+
```
1667+
1668+
When using `--output-format csv`, you can optionally specify a custom base filename using `--output-basename`. If not provided, the filename will be auto-generated based on the path (e.g., `datasets_ls.csv`).
1669+
16371670
#### Move Files
16381671

16391672
Relocate files and folders within the same project or across different projects. This is useful for reorganizing data and moving results to appropriate locations.

cloudos_cli/__main__.py

Lines changed: 144 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -2997,6 +2997,16 @@ def run_bash_array_job(ctx,
29972997
'Details contains "Type", "Owner", "Size", "Last Updated", ' +
29982998
'"Virtual Name", "Storage Path".'),
29992999
is_flag=True)
3000+
@click.option('--output-format',
3001+
help=('The desired display for the output, either directly in standard output or saved as file. ' +
3002+
'Default=stdout.'),
3003+
type=click.Choice(['stdout', 'csv'], case_sensitive=False),
3004+
default='stdout')
3005+
@click.option('--output-basename',
3006+
help=('Output file base name to save jobs details. ' +
3007+
'Default=datasets_ls'),
3008+
default='datasets_ls',
3009+
required=False)
30003010
@click.pass_context
30013011
@with_profile_config(required_params=['apikey', 'workspace_id'])
30023012
def list_files(ctx,
@@ -3008,7 +3018,9 @@ def list_files(ctx,
30083018
project_name,
30093019
profile,
30103020
path,
3011-
details):
3021+
details,
3022+
output_format,
3023+
output_basename):
30123024
"""List contents of a path within a CloudOS workspace dataset."""
30133025
verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert)
30143026

@@ -3024,89 +3036,144 @@ def list_files(ctx,
30243036
try:
30253037
result = datasets.list_folder_content(path)
30263038
contents = result.get("contents") or result.get("datasets", [])
3039+
30273040
if not contents:
30283041
contents = result.get("files", []) + result.get("folders", [])
30293042

3030-
if details:
3031-
console = Console(width=None)
3032-
table = Table(show_header=True, header_style="bold white")
3033-
table.add_column("Type", style="cyan", no_wrap=True)
3034-
table.add_column("Owner", style="white")
3035-
table.add_column("Size", style="magenta")
3036-
table.add_column("Last Updated", style="green")
3037-
table.add_column("Virtual Name", style="bold", overflow="fold")
3038-
table.add_column("Storage Path", style="dim", no_wrap=False, overflow="fold", ratio=2)
3039-
3040-
for item in contents:
3041-
is_folder = "folderType" in item or item.get("isDir", False)
3042-
type_ = "folder" if is_folder else "file"
3043-
3044-
# Enhanced type information
3045-
if is_folder:
3046-
folder_type = item.get("folderType")
3047-
if folder_type == "VirtualFolder":
3048-
type_ = "virtual folder"
3049-
elif folder_type == "S3Folder":
3050-
type_ = "s3 folder"
3051-
elif folder_type == "AzureBlobFolder":
3052-
type_ = "azure folder"
3053-
else:
3054-
type_ = "folder"
3055-
else:
3056-
# Check if file is managed by Lifebit (user uploaded)
3057-
is_managed_by_lifebit = item.get("isManagedByLifebit", False)
3058-
if is_managed_by_lifebit:
3059-
type_ = "file (user uploaded)"
3060-
else:
3061-
type_ = "file (virtual copy)"
3062-
3063-
user = item.get("user", {})
3064-
if isinstance(user, dict):
3065-
name = user.get("name", "").strip()
3066-
surname = user.get("surname", "").strip()
3067-
else:
3068-
name = surname = ""
3069-
if name and surname:
3070-
owner = f"{name} {surname}"
3071-
elif name:
3072-
owner = name
3073-
elif surname:
3074-
owner = surname
3043+
# Process items to extract data
3044+
processed_items = []
3045+
for item in contents:
3046+
is_folder = "folderType" in item or item.get("isDir", False)
3047+
type_ = "folder" if is_folder else "file"
3048+
3049+
# Enhanced type information
3050+
if is_folder:
3051+
folder_type = item.get("folderType")
3052+
if folder_type == "VirtualFolder":
3053+
type_ = "virtual folder"
3054+
elif folder_type == "S3Folder":
3055+
type_ = "s3 folder"
3056+
elif folder_type == "AzureBlobFolder":
3057+
type_ = "azure folder"
30753058
else:
3076-
owner = "-"
3077-
3078-
raw_size = item.get("sizeInBytes", item.get("size"))
3079-
size = format_bytes(raw_size) if not is_folder and raw_size is not None else "-"
3080-
3081-
updated = item.get("updatedAt") or item.get("lastModified", "-")
3082-
filepath = item.get("name", "-")
3083-
3084-
if item.get("fileType") == "S3File" or item.get("folderType") == "S3Folder":
3085-
bucket = item.get("s3BucketName")
3086-
key = item.get("s3ObjectKey") or item.get("s3Prefix")
3087-
s3_path = f"s3://{bucket}/{key}" if bucket and key else "-"
3088-
elif item.get("fileType") == "AzureBlobFile" or item.get("folderType") == "AzureBlobFolder":
3089-
account = item.get("blobStorageAccountName")
3090-
container = item.get("blobContainerName")
3091-
key = item.get("blobName") if item.get("fileType") == "AzureBlobFile" else item.get("blobPrefix")
3092-
s3_path = f"az://{account}.blob.core.windows.net/{container}/{key}" if account and container and key else "-"
3059+
type_ = "folder"
3060+
else:
3061+
# Check if file is managed by Lifebit (user uploaded)
3062+
is_managed_by_lifebit = item.get("isManagedByLifebit", False)
3063+
if is_managed_by_lifebit:
3064+
type_ = "file (user uploaded)"
30933065
else:
3094-
s3_path = "-"
3095-
3096-
style = Style(color="blue", underline=True) if is_folder else None
3097-
table.add_row(type_, owner, size, updated, filepath, s3_path, style=style)
3066+
type_ = "file (virtual copy)"
3067+
3068+
user = item.get("user", {})
3069+
if isinstance(user, dict):
3070+
name = user.get("name", "").strip()
3071+
surname = user.get("surname", "").strip()
3072+
else:
3073+
name = surname = ""
3074+
if name and surname:
3075+
owner = f"{name} {surname}"
3076+
elif name:
3077+
owner = name
3078+
elif surname:
3079+
owner = surname
3080+
else:
3081+
owner = "-"
3082+
3083+
raw_size = item.get("sizeInBytes", item.get("size"))
3084+
size = format_bytes(raw_size) if not is_folder and raw_size is not None else "-"
3085+
3086+
updated = item.get("updatedAt") or item.get("lastModified", "-")
3087+
filepath = item.get("name", "-")
3088+
3089+
if item.get("fileType") == "S3File" or item.get("folderType") == "S3Folder":
3090+
bucket = item.get("s3BucketName")
3091+
key = item.get("s3ObjectKey") or item.get("s3Prefix")
3092+
storage_path = f"s3://{bucket}/{key}" if bucket and key else "-"
3093+
elif item.get("fileType") == "AzureBlobFile" or item.get("folderType") == "AzureBlobFolder":
3094+
account = item.get("blobStorageAccountName")
3095+
container = item.get("blobContainerName")
3096+
key = item.get("blobName") if item.get("fileType") == "AzureBlobFile" else item.get("blobPrefix")
3097+
storage_path = f"az://{account}.blob.core.windows.net/{container}/{key}" if account and container and key else "-"
3098+
else:
3099+
storage_path = "-"
3100+
3101+
processed_items.append({
3102+
'type': type_,
3103+
'owner': owner,
3104+
'size': size,
3105+
'raw_size': raw_size,
3106+
'updated': updated,
3107+
'name': filepath,
3108+
'storage_path': storage_path,
3109+
'is_folder': is_folder
3110+
})
3111+
3112+
# Output handling
3113+
if output_format == 'csv':
3114+
import csv
3115+
3116+
csv_filename = f'{output_basename}.csv'
3117+
3118+
if details:
3119+
# CSV with all details
3120+
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
3121+
fieldnames = ['Type', 'Owner', 'Size', 'Size (bytes)', 'Last Updated', 'Virtual Name', 'Storage Path']
3122+
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
3123+
writer.writeheader()
3124+
3125+
for item in processed_items:
3126+
writer.writerow({
3127+
'Type': item['type'],
3128+
'Owner': item['owner'],
3129+
'Size': item['size'],
3130+
'Size (bytes)': item['raw_size'] if item['raw_size'] is not None else '',
3131+
'Last Updated': item['updated'],
3132+
'Virtual Name': item['name'],
3133+
'Storage Path': item['storage_path']
3134+
})
3135+
else:
3136+
# CSV with just names
3137+
with open(csv_filename, 'w', newline='', encoding='utf-8') as csvfile:
3138+
writer = csv.writer(csvfile)
3139+
writer.writerow(['Name', 'Storage Path'])
3140+
for item in processed_items:
3141+
writer.writerow([item['name'], item['storage_path']])
3142+
3143+
click.secho(f'\nDatasets list saved to: {csv_filename}', fg='green', bold=True)
3144+
3145+
else: # stdout
3146+
if details:
3147+
console = Console(width=None)
3148+
table = Table(show_header=True, header_style="bold white")
3149+
table.add_column("Type", style="cyan", no_wrap=True)
3150+
table.add_column("Owner", style="white")
3151+
table.add_column("Size", style="magenta")
3152+
table.add_column("Last Updated", style="green")
3153+
table.add_column("Virtual Name", style="bold", overflow="fold")
3154+
table.add_column("Storage Path", style="dim", no_wrap=False, overflow="fold", ratio=2)
3155+
3156+
for item in processed_items:
3157+
style = Style(color="blue", underline=True) if item['is_folder'] else None
3158+
table.add_row(
3159+
item['type'],
3160+
item['owner'],
3161+
item['size'],
3162+
item['updated'],
3163+
item['name'],
3164+
item['storage_path'],
3165+
style=style
3166+
)
30983167

3099-
console.print(table)
3168+
console.print(table)
31003169

3101-
else:
3102-
console = Console()
3103-
for item in contents:
3104-
name = item.get("name", "")
3105-
is_folder = item.get("folderType") or item.get("isDir")
3106-
if is_folder:
3107-
console.print(f"[blue underline]{name}[/]")
3108-
else:
3109-
console.print(name)
3170+
else:
3171+
console = Console()
3172+
for item in processed_items:
3173+
if item['is_folder']:
3174+
console.print(f"[blue underline]{item['name']}[/]")
3175+
else:
3176+
console.print(item['name'])
31103177

31113178
except Exception as e:
31123179
raise ValueError(f"Failed to list files for project '{project_name}': {str(e)}")

cloudos_cli/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '2.72.0'
1+
__version__ = '2.73.0'

0 commit comments

Comments
 (0)