Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
3cfb5a6
[fix] Bad typing for S3ArtifactStorage_clientconfig args (#3276)
sbatchelder Jan 15, 2025
4321b07
[docs] Fix pages/bookmarks section links (#3274)
guspan-tanadi Jan 15, 2025
7a98238
[feat] Add `py.typed` marker to allow users to benefit from existing …
bluenote10 Jan 20, 2025
eba27a9
[fix] Decrease client resources keep-alive time (#3279)
mihran113 Jan 20, 2025
3ae2363
[fix] Resolve issues on data points connection on epoch alignment (#3…
mihran113 Feb 11, 2025
c6e0c7f
[fix] Correct indentation on query proxy object return statement (#3287)
alberttorosyan Feb 12, 2025
2d9f3b8
[feat] Skip metrics check when run is known to yield false result (#3…
alberttorosyan Feb 12, 2025
1766020
[chore] Bump ruff version from 0.3.3 to 0.9.2 and fix some invalid/de…
bluenote10 Feb 13, 2025
db6fcc1
[fix] Move performance tests to local mac mini (#3290)
mihran113 Feb 18, 2025
07338ca
[fix] Resolve session refresh issues when db file is replaced (#3294)
mihran113 Feb 24, 2025
fe316dd
[fix] Resolve issue with adding duplicate tags (#3296)
mihran113 Mar 4, 2025
b6c0b1f
[fix] Message stream parsing (#3298)
qzed Mar 11, 2025
3c40c83
[fix] Handle empty queries (#3299)
alberttorosyan Mar 13, 2025
86deb77
[chore] Remove legacy (`aim 2.x.x`) sdk (#3305)
mihran113 Mar 13, 2025
795067c
[fix] Improve error messages for remote tracking (#3303)
mihran113 Mar 13, 2025
5bafebb
[feat] Add AimCallback for distributed runs using the hugging face AP…
VassilisVassiliadis Mar 13, 2025
c57f4a8
[fix] Increase session pool size for sqlite engine (#3306)
mihran113 Mar 14, 2025
f731d3e
[feat] Remove metric version check to improve metric retrieval perfor…
mihran113 Mar 18, 2025
51b8435
[fix] Improve RT exception handling (#3309)
mihran113 Mar 20, 2025
fba908f
[feat] Move indexing thread to `aim up` main process (#3311)
alberttorosyan Mar 20, 2025
e02b98b
Bump up Aim to v3.28.0
alberttorosyan Mar 21, 2025
897459a
[feat] Constant indexing of in-progress Runs (#3310)
alberttorosyan Apr 1, 2025
e206b50
[fix] Resolve issue of min/max calculation for single point metrics (…
mihran113 Apr 2, 2025
943942c
[fix] Use polling observer to make sure new file modifications are de…
alberttorosyan Apr 3, 2025
02bdcdd
[feat] Mark stalled runs as finished (#3314)
alberttorosyan Apr 4, 2025
9ee40a2
[fix] Aim web ui integration in jupyter/colab (#3319)
larissapoghosyan Apr 8, 2025
6a559f3
[fix] Fallback to union db if index is missing (#3317)
alberttorosyan Apr 30, 2025
a1a233b
Bump up Aim to v3.29.0
alberttorosyan May 8, 2025
753f4b1
Bump up Aim to v3.29.1
alberttorosyan May 8, 2025
d67e766
[fix] Resolve issues with false tag reassignment (#3344)
mihran113 Jun 26, 2025
63c343a
Merge in upstream changes
collijk Jun 27, 2025
e91326b
Merge branch 'main' into chore/pull-in-upstream
Jul 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/nightly-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
- name: setup python
uses: actions/setup-python@v2
with:
python-version: '3.7'
python-version: '3.8'
architecture: x64

- name: install deps
Expand Down
20 changes: 10 additions & 10 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ on:
- reopened
- edited
jobs:
validate-naming-convention:
name: Pull Request's title matches naming convention
runs-on: ubuntu-latest
steps:
- uses: deepakputhraya/action-pr-title@master
with:
regex: '^\[(?:feat|fix|doc|refactor|deprecation)\]\s[A-Z].*(?<!\.)$'
github_token: ${{ github.token }}
# validate-naming-convention:
# name: Pull Request's title matches naming convention
# runs-on: ubuntu-latest
# steps:
# - uses: deepakputhraya/action-pr-title@master
# with:
# regex: '^\[(?:feat|fix|doc|refactor|deprecation)\]\s[A-Z].*(?<!\.)$'
# github_token: ${{ github.token }}
run-checks:
if: github.event.pull_request.draft == false && github.event.action != 'edited'
runs-on: ubuntu-latest
Expand Down Expand Up @@ -68,8 +68,8 @@ jobs:

storage-performance-checks:
needs: run-checks
concurrency: perf-tests
runs-on: [self-hosted, performance-tests]
concurrency: storage-performance-checks
runs-on: [self-hosted, perf-tests]
name: Performance tests
steps:
- name: checkout
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ jobs:
python -m pip install -r requirements.txt

- name: Build bdist wheels for 'cp37-cp37m'
if: matrix.manylinux-version == 'manylinux_2_24_x86_64'
uses: nick-fields/retry@v2
with:
max_attempts: 3
Expand Down
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,44 @@
# Changelog

## Unreleased:

### Fixes:
- Fix issues with tag false reassignment (mihran113)

## 3.29.1 May 8, 2025:

### Enhancements:
- Constant indexing of in-progress runs (alberttorosyan)
- Fallback to union view if index db is missing (alberttorosyan, mihran113)


### Fixes:
- Fix min/max calculation for single point metrics (mihran113)
- Aim web ui integration in jupyter/colab (larissapoghosyan)
- Package publishing for Linux/Python 3.7 (alberttorosyan)

## 3.29.0 May 8, 2025 (Yanked)

## 3.28.0 Mar 21, 2025

### Enhancements:
- Skip metrics check when run is known to yield false result (alberttorosyan)
- Remove metric version check to improve performance of metric retrieval (mihran113)
- Move indexing thread to main process of `aim up` (alberttorosyan)
- Add AimCallback implementation for hugging face distributed runs (VassilisVassiliadis)
- Add py.typed marker to allow usage of existing type annotations (bluenote10)


### Fixes:
- Decrease client resources keep-alive time (mihran113)
- Fix connection of data points on epoch alignment (mihran113)
- Resolve issue with adding duplicate tags to the same run (mihran113)
- Improve error messages for remote tracking server (mihran113)
- Fix spurious assertion error in message stream parsing (qzed)
- Correct indentation on query proxy object return statement (alberttorosyan)
- Fix typing issues in S3ArtifactStorage implementation (sbatchelder)


## 3.27.0 Dec 18, 2024

### Enhancements:
Expand Down
2 changes: 1 addition & 1 deletion aim/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.27.0
3.29.1
3 changes: 2 additions & 1 deletion aim/acme.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Alias to SDK acme interface
from aim.sdk.adapters.acme import AimCallback, AimWriter # noqa F401
from aim.sdk.adapters.acme import AimCallback as AimCallback
from aim.sdk.adapters.acme import AimWriter as AimWriter
2 changes: 1 addition & 1 deletion aim/cli/convert/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def convert_tensorboard(ctx, logdir, flat, no_cache):
@click.option('--flat', '-f', required=False, is_flag=True, default=False)
def convert_tensorflow(ctx, logdir, flat):
click.secho(
"WARN: Command 'tf' is deprecated and will be removed in future releases," " please use 'tensorboard' instead.",
"WARN: Command 'tf' is deprecated and will be removed in future releases, please use 'tensorboard' instead.",
fg='red',
)
repo_inst = ctx.obj['repo_inst']
Expand Down
4 changes: 2 additions & 2 deletions aim/cli/init/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ def init(repo, yes, skip_if_exists):
re_init = False
if Repo.exists(repo_path):
if yes and skip_if_exists:
raise click.BadParameter('Conflicting init options.' 'Either specify -y/--yes or -s/--skip-if-exists')
raise click.BadParameter('Conflicting init options.Either specify -y/--yes or -s/--skip-if-exists')
elif yes:
re_init = True
elif skip_if_exists:
click.echo('Repo exists at {}. Skipped initialization.'.format(repo_path))
return
else:
re_init = click.confirm(
'Aim repository is already initialized. ' 'Do you want to re-initialize to empty Aim repository?'
'Aim repository is already initialized. Do you want to re-initialize to empty Aim repository?'
)
if not re_init:
return
Expand Down
10 changes: 5 additions & 5 deletions aim/cli/manager/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,18 @@ def check_startup_success():
import requests

server_path = 'http://{}:{}{}'.format(args['--host'], args['--port'], args['--base-path'])
status_api = f'{server_path}/api/projects/status'
retry_count = 5
sleep_interval = 1
status_api = f'{server_path}/api/projects/'
retry_count = 10
sleep_interval = 0.1
for _ in range(retry_count):
time.sleep(sleep_interval)
sleep_interval *= 2
try:
response = requests.get(status_api)
if response.status_code == 200:
return True
except Exception:
pass
sleep_interval += 1
time.sleep(sleep_interval)

return False

Expand Down
2 changes: 1 addition & 1 deletion aim/cli/runs/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ def update_metrics(ctx, yes):
index_manager = RepoIndexManager.get_index_manager(repo)
hashes = repo.list_all_runs()
for run_hash in tqdm.tqdm(hashes, desc='Updating runs', total=len(hashes)):
meta_tree = repo.request_tree('meta', run_hash, read_only=False, from_union=False)
meta_tree = repo.request_tree('meta', run_hash, read_only=False)
meta_run_tree = meta_tree.subtree(('meta', 'chunks', run_hash))
try:
# check if the Run has already been updated.
Expand Down
2 changes: 1 addition & 1 deletion aim/cli/runs/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def upload_repo_runs(buffer: io.BytesIO, bucket_name: str) -> Tuple[bool, str]:
import boto3
except ImportError:
raise RuntimeError(
"This command requires 'boto3' to be installed. " 'Please install it with command: \n pip install boto3'
"This command requires 'boto3' to be installed. Please install it with command: \n pip install boto3"
)

try:
Expand Down
2 changes: 1 addition & 1 deletion aim/cli/server/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,5 +95,5 @@ def server(host, port, repo, ssl_keyfile, ssl_certfile, base_path, log_level, de
)
exec_cmd(cmd, stream_output=True)
except ShellCommandException:
click.echo('Failed to run Aim Tracking Server. ' 'Please see the logs above for details.')
click.echo('Failed to run Aim Tracking Server. Please see the logs above for details.')
exit(1)
2 changes: 1 addition & 1 deletion aim/cli/storage/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def to_3_11(ctx, hashes, yes):
try:
run = Run(run_hash, repo=repo)
if run.check_metrics_version():
backup_run(run)
backup_run(repo, run.hash)
run.update_metrics()
index_manager.index(run_hash)
else:
Expand Down
11 changes: 9 additions & 2 deletions aim/cli/up/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
get_repo_instance,
set_log_level,
)
from aim.sdk.index_manager import RepoIndexManager
from aim.sdk.repo import Repo
from aim.sdk.run_status_manager import RunStatusManager
from aim.sdk.utils import clean_repo_path
from aim.web.configs import (
AIM_ENV_MODE_KEY,
Expand All @@ -29,7 +31,7 @@
@click.command()
@click.option('-h', '--host', default=AIM_UI_DEFAULT_HOST, type=str)
@click.option('-p', '--port', default=AIM_UI_DEFAULT_PORT, type=int)
@click.option('-w', '--workers', default=1, type=int)
@click.option('-w', '--workers', default=2, type=int)
@click.option('--uds', required=False, type=click.Path(exists=False, file_okay=True, dir_okay=False, readable=True))
@click.option('--repo', required=False, type=click.Path(exists=True, file_okay=False, dir_okay=True, writable=True))
@click.option('--tf_logs', type=click.Path(exists=True, readable=True))
Expand Down Expand Up @@ -96,7 +98,7 @@ def up(
db_cmd = build_db_upgrade_command()
exec_cmd(db_cmd, stream_output=True)
except ShellCommandException:
click.echo('Failed to initialize Aim DB. ' 'Please see the logs above for details.')
click.echo('Failed to initialize Aim DB. Please see the logs above for details.')
return

if port == 0:
Expand All @@ -122,6 +124,11 @@ def up(
if profiler:
os.environ[AIM_PROFILER_KEY] = '1'

index_mng = RepoIndexManager.get_index_manager(repo_inst)
index_mng.start()

run_status_mng = RunStatusManager(repo_inst)
run_status_mng.start()
try:
server_cmd = build_uvicorn_command(
'aim.web.run:app',
Expand Down
2 changes: 2 additions & 0 deletions aim/distributed_hugging_face.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Alias to SDK distributed hugging face interface
from aim.sdk.adapters.distributed_hugging_face import AimCallback # noqa: F401
2 changes: 1 addition & 1 deletion aim/ext/notifier/notifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def notify(self, message: Optional[str] = None, **kwargs):
except Exception as e:
attempt += 1
if attempt == self.MAX_RETRIES:
logger.error(f'Notifier {sub} failed to send message "{message}". ' f'No retries left.')
logger.error(f'Notifier {sub} failed to send message "{message}". No retries left.')
raise NotificationSendError(e)
else:
logger.error(
Expand Down
2 changes: 1 addition & 1 deletion aim/ext/sshfs/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ def unmount_remote_repo(mount_point: str, mount_root: str):
if exit_code != 0:
# in case of failure log warning so the user can unmount manually if needed
logger.warning(
f'Could not unmount path: {mount_point}.\n' f'Please unmount manually using command:\n' f'{" ".join(cmd)}'
f'Could not unmount path: {mount_point}.\nPlease unmount manually using command:\n{" ".join(cmd)}'
)
else:
shutil.rmtree(mount_root)
2 changes: 1 addition & 1 deletion aim/ext/tensorboard_tracker/tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def _decode_histogram(value):

# This is a bit weird but it seems the histogram counts is usually padded by 0 as tensorboard
# only stores the right limits?
# See https://github.com/pytorch/pytorch/blob/7d2a18da0b3427fcbe44b461a0aa508194535885/torch/utils/tensorboard/summary.py#L390 # noqa
# See https://github.com/pytorch/pytorch/blob/7d2a18da0b3427fcbe44b461a0aa508194535885/torch/utils/tensorboard/summary.py#L390
bin_counts = bin_counts[1:]

bin_range = (bucket_limits[0], bucket_limits[-1])
Expand Down
4 changes: 1 addition & 3 deletions aim/ext/transport/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,12 @@ def get_tree(**kwargs):
name = kwargs['name']
sub = kwargs['sub']
read_only = kwargs['read_only']
from_union = kwargs['from_union']
index = kwargs['index']
timeout = kwargs['timeout']
no_cache = kwargs.get('no_cache', False)
if index:
return ResourceRef(repo._get_index_tree(name, timeout))
else:
return ResourceRef(repo.request_tree(name, sub, read_only=read_only, from_union=from_union, no_cache=no_cache))
return ResourceRef(repo.request_tree(name, sub, read_only=read_only))


def get_structured_run(hash_, read_only, created_at, **kwargs):
Expand Down
8 changes: 3 additions & 5 deletions aim/ext/transport/heartbeat.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,8 @@ class HeartbeatSender(object):
HEARTBEAT_INTERVAL_DEFAULT = 10
NETWORK_CHECK_INTERVAL = 180

NETWORK_UNSTABLE_WARNING_TEMPLATE = (
'Network connection between client `{}` ' 'and server `{}` appears to be unstable.'
)
NETWORK_ABSENT_WARNING_TEMPLATE = 'Network connection between client `{}` ' 'and server `{}` appears to be absent.'
NETWORK_UNSTABLE_WARNING_TEMPLATE = 'Network connection between client `{}` and server `{}` appears to be unstable.'
NETWORK_ABSENT_WARNING_TEMPLATE = 'Network connection between client `{}` and server `{}` appears to be absent.'

def __init__(
self,
Expand Down Expand Up @@ -118,7 +116,7 @@ def reset_responses():


class HeartbeatWatcher:
CLIENT_KEEP_ALIVE_TIME_DEFAULT = 30 * 60 # 30 minutes
CLIENT_KEEP_ALIVE_TIME_DEFAULT = 5 * 60 # 5 minutes

def __init__(self, heartbeat_pool, keep_alive_time: Union[int, float] = CLIENT_KEEP_ALIVE_TIME_DEFAULT):
self._heartbeat_pool = heartbeat_pool
Expand Down
27 changes: 12 additions & 15 deletions aim/ext/transport/message_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
from typing import Iterator, Tuple

from aim.storage.object import CustomObject
from aim.storage.treeutils import decode_tree, encode_tree # noqa
from aim.storage.treeutils import decode_tree as decode_tree
from aim.storage.treeutils import encode_tree as encode_tree
from aim.storage.types import BLOB


Expand Down Expand Up @@ -45,28 +46,23 @@ def pack_stream(tree: Iterator[Tuple[bytes, bytes]]) -> bytes:
yield struct.pack('I', len(key)) + key + struct.pack('?', True) + struct.pack('I', len(val)) + val


def unpack_helper(msg: bytes) -> Tuple[bytes, bytes]:
(key_size,), tail = struct.unpack('I', msg[:4]), msg[4:]
key, tail = tail[:key_size], tail[key_size:]
(is_blob,), tail = struct.unpack('?', tail[:1]), tail[1:]
(value_size,), tail = struct.unpack('I', tail[:4]), tail[4:]
value, tail = tail[:value_size], tail[value_size:]
assert len(tail) == 0
if is_blob:
yield key, BLOB(data=value)
else:
yield key, value


def unpack_stream(stream) -> Tuple[bytes, bytes]:
for msg in stream:
yield from unpack_helper(msg)
yield from unpack_args(msg)


def raise_exception(server_exception):
from filelock import Timeout

module = importlib.import_module(server_exception.get('module_name'))
exception = getattr(module, server_exception.get('class_name'))
args = json.loads(server_exception.get('args') or [])
message = server_exception.get('message')

# special handling for lock timeouts as they require lock argument which can't be passed over the network
if exception == Timeout:
raise Exception(message)

raise exception(*args) if args else exception()


Expand All @@ -75,6 +71,7 @@ def build_exception(exception: Exception):
'module_name': exception.__class__.__module__,
'class_name': exception.__class__.__name__,
'args': json.dumps(exception.args),
'message': str(exception),
}


Expand Down
2 changes: 1 addition & 1 deletion aim/ext/transport/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def inner(func):
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except exc_type as e: # noqa
except exc_type:
if error_message is not None:
logger.error(error_message)
raise RuntimeError(error_message)
Expand Down
2 changes: 1 addition & 1 deletion aim/fastai.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK fast.ai interface
from aim.sdk.adapters.fastai import AimCallback # noqa F401
from aim.sdk.adapters.fastai import AimCallback as AimCallback
2 changes: 1 addition & 1 deletion aim/hf_dataset.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK Hugging Face Datasets interface
from aim.sdk.objects.plugins.hf_datasets_metadata import HFDataset # noqa F401
from aim.sdk.objects.plugins.hf_datasets_metadata import HFDataset as HFDataset
2 changes: 1 addition & 1 deletion aim/hugging_face.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK Hugging Face interface
from aim.sdk.adapters.hugging_face import AimCallback # noqa F401
from aim.sdk.adapters.hugging_face import AimCallback as AimCallback
3 changes: 2 additions & 1 deletion aim/keras.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Alias to SDK Keras interface
from aim.sdk.adapters.keras import AimCallback, AimTracker # noqa F401
from aim.sdk.adapters.keras import AimCallback as AimCallback
from aim.sdk.adapters.keras import AimTracker as AimTracker
2 changes: 1 addition & 1 deletion aim/keras_tuner.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK Keras-Tuner interface
from aim.sdk.adapters.keras_tuner import AimCallback # noqa F401
from aim.sdk.adapters.keras_tuner import AimCallback as AimCallback
2 changes: 1 addition & 1 deletion aim/mxnet.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK mxnet interface
from aim.sdk.adapters.mxnet import AimLoggingHandler # noqa F401
from aim.sdk.adapters.mxnet import AimLoggingHandler as AimLoggingHandler
2 changes: 1 addition & 1 deletion aim/optuna.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK Optuna interface
from aim.sdk.adapters.optuna import AimCallback # noqa F401
from aim.sdk.adapters.optuna import AimCallback as AimCallback
2 changes: 1 addition & 1 deletion aim/paddle.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK PaddlePaddle interface
from aim.sdk.adapters.paddle import AimCallback # noqa F401
from aim.sdk.adapters.paddle import AimCallback as AimCallback
2 changes: 1 addition & 1 deletion aim/prophet.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Alias to SDK Prophet interface
from aim.sdk.adapters.prophet import AimLogger # noqa F401
from aim.sdk.adapters.prophet import AimLogger as AimLogger
File renamed without changes.
Loading
Loading