diff --git a/ChangeLog.md b/ChangeLog.md index ac9346d..381c2cf 100644 --- a/ChangeLog.md +++ b/ChangeLog.md @@ -1,3 +1,8 @@ +# v0.3 (05/09/2023) +* Added value range checks when the metric is running on HDR data (to avoid passing relative values) +* Added SSIM as an alternative metric +* Better handling of paths to configuration files + # v0.2 * Updated ColourVideoVDP model with cross-channel masking and more advanced pooling, different calibration and better prediction accuracy. * Added distograms diff --git a/README.md b/README.md index fa56d28..f567214 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,18 @@ -# ColourVideoVDP: A visible difference predictor for colour images and videos +# ColorVideoVDP: A visible difference predictor for colour images and videos **[BETA RELEASE]** -> **[in the Meta version we have some beta disclaimers]** -ColourVideoVDP is a full-reference visual quality metric that predicts the perceptual difference between pairs of images or videos. Similar to popular metrics like PSNR, SSIM, and DeltaE 2000 it is aimed at comparing a ground truth reference against a distorted (e.g. blurry, noisy, color-shifted) version. +ColorVideoVDP is a full-reference visual quality metric that predicts the perceptual difference between pairs of images or videos. Similar to popular metrics like PSNR, SSIM, and DeltaE 2000 it is aimed at comparing a ground truth reference against a distorted (e.g. blurry, noisy, color-shifted) version. This metric is unique because it is the first color-aware metric that accounts for spatial and temporal aspects of vision. The main features: -* models chromatic and achromatic contrast sensitivity using a novel contrast sensitivity model, allowing us to predict distortions in colour and luminance; +* models chromatic and achromatic contrast sensitivity using a novel contrast sensitivity model (castleCSF), allowing us to predict distortions in color and luminance; * models spatio-temporal sensitivity so that it can predict visibility of artifacts like waveguide nonuniformity and other temporally varying artifacts; * works with colorimetrically calibrated content, both SDR and HDR (any colour space); -* can predict a single number quality correlate or a distortion map. +* can predict a single number quality correlate, a distortion map or a visualization of an error over time and visual channels (distogram). -ColourVideoVDP is implemented in PyTorch and can be run efficiently on a CUDA-enabled GPU. It can also run on a CPU, but the processing times will be much longer, especially for video. Its usage is described [below](#example-usage). +ColorVideoVDP is implemented in PyTorch and can be run efficiently on a CUDA-enabled GPU. It can also run on a CPU, but the processing times will be much longer, especially for video. Its usage is described [below](#example-usage). The details of the metric will be available in our upcoming paper: @@ -21,12 +20,10 @@ The details of the metric will be available in our upcoming paper: > Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro. > Paper in preparation. -**[TODO:]** Link to project page - If you use the metric in your research, please cite the paper above. ## PyTorch quickstart -1. Start by installing [anaconda](https://docs.anaconda.com/anaconda/install/index.html) or [miniconda](https://docs.conda.io/en/latest/miniconda.html). Then, create a new environment for ColourVideoVDP and activate it: +1. Start by installing [anaconda](https://docs.anaconda.com/anaconda/install/index.html) or [miniconda](https://docs.conda.io/en/latest/miniconda.html). Then, create a new environment for ColorVideoVDP and activate it: ```bash conda create -n cvvdp python=3.10 conda activate cvvdp @@ -41,15 +38,15 @@ conda install ffmpeg 4. Obtain the ColourVDP codebase, by extracting a `.zip` file provided or cloning from Github: ```bash -git clone git@github.com:mantiuk/ColourVideoVDP.git # skip if a .zip is provided or you use Github GUI +git clone git@github.com:gfxdisp/ColorVideoVDP.git # skip if a .zip is provided or you use Github GUI ``` -5. Finally, install ColourVideoVDP with PyPI: +5. Finally, install ColorVideoVDP with PyPI: ```bash -cd ColourVideoVDP +cd ColorVideoVDP pip install -e . ``` -*Note:* The "-e/--editable" option to `pip` is optional and should be used only if you intend to change the ColourVideoVDP code. +*Note:* The "-e/--editable" option to `pip` is optional and should be used only if you intend to change the ColorVideoVDP code. After installation, run `cvvdp` directly from the command line: @@ -58,14 +55,14 @@ cvvdp --test test_file --ref ref_file --display standard_fhd ``` The test and reference files can be images or videos. The option `--display` specifies a display on which the content is viewed. See [vvdp_data/display_models.json](pycvvdp/vvdp_data/display_models.json) for the available displays. -See [Command line interface](#command-line-interface) for further details. ColourVideoVDP can be also run directly from Python - see [Low-level Python interface](#low-level-python-interface). +See [Command line interface](#command-line-interface) for further details. ColorVideoVDP can be also run directly from Python - see [Low-level Python interface](#low-level-python-interface). **Table of contents** - [Display specification](#display-specification) - [Custom specification](#custom-display-specification) - - [HDR content](#hdr-content) - - [Reporting metric results](#reporting-metric-results) - - [Predicting quality scores](#predicted-quality-scores) +- [HDR content](#hdr-content) +- [Reporting metric results](#reporting-metric-results) +- [Predicting quality scores](#predicted-quality-scores) - [Usage](#example-usage) - [Command line interface](#command-line-interface) - [Visualization](#visualization) @@ -75,23 +72,21 @@ See [Command line interface](#command-line-interface) for further details. Colou ## Display specification -Unlike most image quality metrics, ColourVideoVDP needs physical specification of the display (e.g. its size, resolution, peak brightness) and viewing conditions (viewing distance, ambient light) to compute accurate predictions. The specifications of the displays are stored in [vvdp_data/display_models.json](pycvvdp/vvdp_data/display_models.json). You can add the exact specification of your display to this file, or create a new JSON file and pass the directory it is located in as `--config-paths` parameter (see [Configuration files](#configuration-files)). If the display specification is unknown to you, you are encouraged to use one of the standard display specifications listed on the top of that file, for example `standard_4k`, or `standard_fhd`. If you use one of the standard displays, there is a better chance that your results will be comparable with other studies. +Unlike most image quality metrics, ColorVideoVDP needs physical specification of the display (e.g. its size, resolution, peak brightness) and viewing conditions (viewing distance, ambient light) to compute accurate predictions. The specifications of the displays are stored in [vvdp_data/display_models.json](pycvvdp/vvdp_data/display_models.json). You can add the exact specification of your display to this file, or create a new JSON file and pass the directory it is located in as `--config-paths` parameter (see [Configuration files](#configuration-files)). If the display specification is unknown to you, you are encouraged to use one of the standard display specifications listed on the top of that file, for example `standard_4k`, or `standard_fhd`. If you use one of the standard displays, there is a better chance that your results will be comparable with other studies. You specify the display by passing `--display` argument to `cvvdp`. Note the the specification in `display_models.json` is for the display and not the image. If you select to use `standard_4k` with the resolution of 3840x2160 for your display and pass a 1920x1080 image, the metric will assume that the image occupies one quarter of that display (the central portion). If you want to enlarge the image to the full resolution of the display, pass `--full-screen-resize {fast_bilinear,bilinear,bicubic,lanczos}` option (for now it works with video only). -The command line version of ColourVideoVDP can take as input HDR video streams encoded using the PQ transfer function. To correctly model HDR content, it is necessary to pass a display model with correct color space and transfer function (with the field `colorspace="BT.2020-PQ"`), for example `standard_hdr_pq`. - -The display specification is documented in [vvdp_data/README.md](pycvvdp/vvdp_data/README.md). +The command line version of ColorVideoVDP can take as input HDR video streams encoded using the PQ transfer function. To correctly model HDR content, it is necessary to pass a display model with correct color space and transfer function (with the field `colorspace="BT.2020-PQ"`), for example `standard_hdr_pq`. ### Custom display specification -If you run the metric from the command line, we recommend that you create a directory with a copy of `display_models.json`, add a new display specification in that file and then add to the command line `--config-paths --display `. +If you run the metric from the command line, we recommend that you create a directory with a copy of `display_models.json`, add a new display specification in that file and then add to the command line `--config-paths --display `. The format of `display_models.json` is explained [here](tree/main/pycvvdp/vvdp_data#readme). If you run the metric from Python code, the display photometry and geometry can be specified by passing `display_name` parameter to the metric. Alternatively, if you need more flexibility in specifying display geometry (size, viewing distance) and its colorimetry, you can instead pass objects of the classes `vvdp_display_geometry`, `vvdp_display_photo_gog` for most SDR displays, and `vvdp_display_photo_absolute` for HDR displays. You can also create your own subclasses of those classes for custom display specification. -### HDR content +## HDR content (Python command line only) You can use the metric to compare: @@ -107,15 +102,15 @@ pip install pyexr export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/conda/miniconda3/lib ``` -### Reporting metric results +## Reporting metric results When reporting the results of the metric, please include the string returned by the metric, such as: -`"ColourVideoVDP v0.1, 75.4 [pix/deg], Lpeak=200, Lblack=0.5979 [cd/m^2], (standard_4k)"` +`"ColorVideoVDP v0.3, 75.4 [pix/deg], Lpeak=200, Lblack=0.5979 [cd/m^2], (standard_4k)"` This is to ensure that you provide enough details to reproduce your results. -### Predicted quality scores +## Predicted quality scores -ColourVideoVDP reports image/video quality in the JOD (Just-Objectionable-Difference) units. The highest quality (no difference) is reported as 10 and lower values are reported for distorted content. In case of very strong distortion, or when comparing two unrelated images, the quality value can drop below 0. +ColorVideoVDP reports image/video quality in the JOD (Just-Objectionable-Difference) units. The highest quality (no difference) is reported as 10 and lower values are reported for distorted content. In case of very strong distortion, or when comparing two unrelated images, the quality value can drop below 0. The main advantage of JODs is that they (a) should be linearly related to the perceived magnitude of the distortion and (b) the difference of JODs can be interpreted as the preference prediction across the population. For example, if method A produces a video with the quality score of 8 JOD and method B gives the quality score of 9 JOD, it means that 75% of the population will choose method B over A. The plots below show the mapping from the difference between two conditions in JOD units to the probability of selecting the condition with the higher JOD score (black numbers on the left) and the percentage increase in preference (blue numbers on the right). @@ -130,19 +125,19 @@ The main advantage of JODs is that they (a) should be linearly related to the pe -## Example usage +# Example usage -### Command line interface -The main script to run the model on a set of images or videos is [run_cvvdp.py](pycvvdp/run_cvvdp.py), from which the binary `cvvdp` is created . Run `cvvdp --help` for detailed usage information. +## Command line interface +The main script to run the model on a set of images or videos is [run_cvvdp.py](pycvvdp/run_cvvdp.py), from which the binary `cvvdp` is created. Run `cvvdp --help` for detailed usage information. -For the first example, we will compare the performance of ColourVideoVDP to the popular Delta E 2000 color metric. A video was degraded with 3 different color artifacts using ffmpeg: +For the first example, we will compare the performance of ColorVideoVDP to the popular Delta E 2000 color metric. A video was degraded with 3 different color artifacts using ffmpeg: 1. Gaussian noise: `ffmpeg ... -vf noise=alls=28.55:allf=t+u ...` 1. Increased saturation: `ffmpeg ... -vf eq=saturation=2.418 ...` 1. Colorbalance: `ffmpeg ... -vf colorbalance=rm=0.3:gm=0.3:bm=0.3:rs=0.15:gs=0.2:bs=0.2` The magnitude of each degradation was adjusted so that the predicted maximum **Delta E 2000 = 33.5**. As Delta E 2000 is not sensitive to spatial or temporal components of the content, it is unable to distinguish between these cases. -To predict quality with ColourVideoVDP (shown below), run: +To predict quality with ColorVideoVDP (shown below), run: ```bash cvvdp --test example_media/structure/ferris-test-*.mp4 --ref example_media/structure/ferris-ref.mp4 --display standard_fhd --heatmap supra-threshold --distogram @@ -150,13 +145,13 @@ cvvdp --test example_media/structure/ferris-test-*.mp4 --ref example_media/struc |Original | ![ferris wheel](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-ref.gif) | Quality | **TODO:** adjust heatmaps | | :---: | :---: | :---: | :---: | -| Gaussian noise | ![noise](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-noise.gif) | DE00 = 33.5
CVVDP = 8.6842 | ![noise-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-noise-supra.gif) | -| Saturation | ![saturation](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-sat.gif) | DE00 = 33.5
CVVDP = 6.8960 | ![sat-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-sat-supra.gif) | -| Colorbalance | ![wb](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-wb.gif) | DE00 = 33.5
CVVDP = 7.1682 | ![wb-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-wb-supra.gif) | +| Gaussian noise | ![noise](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-noise.gif) | DE00 = 33.5
CVVDP = 9.2140 | ![noise-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-noise-supra.gif) | +| Saturation | ![saturation](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-sat.gif) | DE00 = 33.5
CVVDP = 5.4020 | ![sat-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-sat-supra.gif) | +| Colorbalance | ![wb](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/ferris-wb.gif) | DE00 = 33.5
CVVDP = 5.6020 | ![wb-heatmap](https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/html_reports/github_examples/cvvdp/heatmaps/ferris-wb-supra.gif) | -### Visualization +## Visualization -In addition to the single-valued quality scored in the JOD units, ColourVideoVDP can generate a heatmap (video or image) and a distogram. The heatmap is generated when `--heatmap` command argument is passed with one of the following options: +In addition to the single-valued quality scored in the JOD units, ColorVideoVDP can generate a heatmap (video or image) and a distogram. The heatmap is generated when `--heatmap` command argument is passed with one of the following options: * `supra-threshold` - the difference values between 0 and 3 will be mapped to blue to yellow colors (visualizes large differences) * `threshold` - the difference values between 0 and 1 will be mapped to green to red colors (visualizes small differences) * `raw` - the difference values between 0 and 10 will be mapped to back to white @@ -165,7 +160,7 @@ The `--distogram` command line argument can be followed by a floating point valu Both distogram and heatmap will be saved in the current directory and the filename will contain the name of the test image/video. To change the directory in which those files are saved, pass `--output-dir` option. -### Configuration files +## Configuration files Configuration files contain a list of available display models (`display_models.json`), colour spaces (`color_spaces.json`) and the parameters of the metric (`cvvdp_parameters.json`). Those are located by default in `pycvvdp/vvdp_data` folder. Different locations could be specified with the `--config-paths` argument or the `CVVDP_PATH` environment variable. @@ -181,8 +176,8 @@ To check which `display_models.json` file is used, run `cvvdp` with `--display ? To check which `cvvdp_parameters.json` file is used, run `cvvdp` with `--verbose`. -### Low-level Python interface -ColourVideoVDP can also be run through the Python interface by instatiating the `pycvvdp.cvvdp` class. +## Low-level Python interface +ColorVideoVDP can also be run through the Python interface by instatiating the `pycvvdp.cvvdp` class. ```python import pycvvdp @@ -194,13 +189,15 @@ cvvdp = pycvvdp.cvvdp(display_name='standard_4k', heatmap='threshold') JOD, m_stats = cvvdp.predict( I_test, I_ref, dim_order="HWC" ) ``` -Below, we show an example comparing ColourVideoVDP to the popular SSIM metric. While SSIM is aware of the structure of the content, it operates on luminance only, and is unable to accurately represent the degradation of a color-based artifact like chroma subsampling. +Below, we show an example comparing ColorVideoVDP to the popular SSIM metric. While SSIM is aware of the structure of the content, it operates on luminance only, and is unable to accurately represent the degradation of a color-based artifact like chroma subsampling. ![chroma_ss](imgs/chroma_ss.png) More examples can be found in these [example scripts](examples). -## Release notes +# Release notes + +* v0.3.0 (04/Sep/2023) - First public release * v0.1.0 - Initial internal release. diff --git a/example_media/structure/ferris-test-wb.mp4 b/example_media/structure/ferris-test-wb.mp4 index f1f73e4..803bc63 100644 Binary files a/example_media/structure/ferris-test-wb.mp4 and b/example_media/structure/ferris-test-wb.mp4 differ diff --git a/generate_timings.py b/generate_timings.py new file mode 100644 index 0000000..c268ae1 --- /dev/null +++ b/generate_timings.py @@ -0,0 +1,97 @@ +import numpy as np +import pandas as pd +import pycvvdp +from time import time +import torch +from tqdm import tqdm, trange +import logging + +class DummyVS: + def __init__(self, num_frames, height, width, fps=30, device='cuda'): + self.n = num_frames + self.h = height + self.w = width + self.fps = fps + self.frame = torch.randn(1,3,1,height,width, device=device) + + def __len__(self): + return self.n + + def get_video_size(self): + return self.h, self.w, self.n + + def get_frames_per_second(self): + return self.fps + + def get_reference_frame(self, i, device=None, colorspace=None): + return self.frame[:,:1] if colorspace == 'Y' else self.frame + + def get_test_frame(self, i, device=None, colorspace=None): + return self.get_reference_frame(i, device, colorspace) + + def to(self, device): + self.frame = self.frame.to(device) + +debug = True +metrics = ['PU21-VMAF', 'cvvdp', 'fvvdp'] +device = torch.device('cuda') +dims = ((720, 1280), (1080, 1920), (1440, 2560), (2160, 3840)) +# h, w = 1080, 1920 +# n = (30, 60, 120, 240, 480) +num_frames = 50 +num_samples = 5 + +# VMAF paths set for Param's PC +ffmpeg_path = '../vmaf/ffmpeg-6.0-amd64-static/ffmpeg' +vmaf_cache = '/local/scratch/pmh64/tmp' + +logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.DEBUG if debug else logging.INFO) + +timings = [] +# for num_frames in tqdm(n): +for h, w in tqdm(dims): + vs = DummyVS(num_frames, h, w, device=device) + pbar = tqdm(metrics, leave=False) + for metric_type in pbar: + pbar.set_description(f'N={num_frames}, h={h}, w={w}, metric={metric_type}') + + if metric_type == 'cvvdp': + metric = pycvvdp.cvvdp(quiet=True, device=device, temp_padding='replicate', heatmap=None) + if debug: + metric.debug = True + elif metric_type == 'fvvdp': + from pyfvvdp import fvvdp + # Add argument "colorspace='Y'" while retriving frames to run on images + # Lines 233 and 244 in commit 5bf67f92341604d238ebe72fdeeb4ad825db5485 + metric = fvvdp(quiet=True, device=device, temp_padding='replicate', heatmap=None) + elif metric_type == 'PSNR-RGB': + metric = pycvvdp.psnr_rgb(device=device) + elif metric_type == 'FLIP': + metric = pycvvdp.flip(device=device) + elif metric_type == 'PU21-VMAF': + metric = pycvvdp.pu_vmaf(ffmpeg_path, vmaf_cache, device=device) + else: + raise RuntimeError( f'Unknown metric {metric_type}' ) + + with torch.no_grad(): + metric.predict_video_source(vs) # dummy run + + times = [] + for _ in trange(num_samples, leave=False): + if metric_type == 'PU21-VMAF': + times.append(metric.predict_video_source(vs, record_time=True)) + else: + start = time() + metric.predict_video_source(vs) + times.append(time() - start) + + timings.append({'name': metric_type, + 'num_samples': num_samples, + 'num_frames': num_frames, + 'height': h, + 'width': w, + 'time_mean': np.mean(times), + 'time_std': np.std(times)}) + +df = pd.DataFrame(timings) +df.to_csv('timings.csv', index=False) diff --git a/pycvvdp/__init__.py b/pycvvdp/__init__.py index 9354f1a..fb6c1b6 100644 --- a/pycvvdp/__init__.py +++ b/pycvvdp/__init__.py @@ -1,6 +1,15 @@ -from pycvvdp.cvvdp_metric import cvvdp +from pycvvdp.cvvdp_metric import cvvdp, cvvdp_image from pycvvdp.cvvdp_nn_metric import cvvdp_nn from pycvvdp.pupsnr import pu_psnr_y, pu_psnr_rgb2020, psnr_rgb +from pycvvdp.e_itp import e_itp +from pycvvdp.e_sitp import e_sitp +from pycvvdp.de2000 import de2000 +from pycvvdp.de2000_spatial import s_de2000 +from pycvvdp.flip import flip +from pycvvdp.pu_lpips import pu_lpips +from pycvvdp.dolby_ictcp import ictcp +from pycvvdp.pu_vmaf import pu_vmaf +from pycvvdp.hyab import hyab from pycvvdp.video_source_file import video_source_file, video_source_video_file, load_image_as_array from pycvvdp.display_model import vvdp_display_photometry, vvdp_display_photo_eotf, vvdp_display_geometry from pycvvdp.video_source_yuv import video_source_yuv_file diff --git a/pycvvdp/cvvdp_metric.py b/pycvvdp/cvvdp_metric.py index 807d59f..eb57e2a 100644 --- a/pycvvdp/cvvdp_metric.py +++ b/pycvvdp/cvvdp_metric.py @@ -838,4 +838,40 @@ def export_distogram(self, stats, fname, jod_max=None, base_size=6): # plt.waitforbuttonpress() +class cvvdp_image(vq_metric): + def __init__(self, display_name="standard_4k", display_photometry=None, display_geometry=None, config_paths=[], heatmap=None, quiet=False, device=None, temp_padding="replicate", use_checkpoints=False, calibrated_ckpt=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + # Create a dummy display photometry object that does not change input frame + self.cvvdp_metric = cvvdp(display_name, None, None, config_paths, heatmap, quiet, self.device, temp_padding, use_checkpoints, calibrated_ckpt) + + def set_display_model(self, display_name="standard_4k", display_photometry=None, display_geometry=None, config_paths=[]): + self.linear_dm = vvdp_display_photo_eotf(display_photometry.Y_peak, contrast=display_photometry.contrast, source_colorspace='BT.2020-linear', E_ambient=display_photometry.E_ambient, k_refl=display_photometry.k_refl) + self.cvvdp_metric.set_display_model(display_photometry=self.linear_dm, display_geometry=display_geometry) + + def predict_video_source(self, vid_source): + + _, _, N_frames = vid_source.get_video_size() + + avg = 0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace='RGB2020') + R = vid_source.get_reference_frame(ff, device=self.device, colorspace='RGB2020') + test_vs = video_source_array( T, R, fps=0, display_photometry=self.linear_dm ) + Q, _ = self.cvvdp_metric.predict_video_source(test_vs) + avg += Q / N_frames + + return avg, None + + def short_name(self): + return "cvvdp-image" + + def quality_unit(self): + return "JOD" diff --git a/pycvvdp/cvvdp_nn_metric.py b/pycvvdp/cvvdp_nn_metric.py index 930b459..1769218 100644 --- a/pycvvdp/cvvdp_nn_metric.py +++ b/pycvvdp/cvvdp_nn_metric.py @@ -105,7 +105,7 @@ def do_pooling_and_jods(self, Q_per_ch, base_rho_band, fps): # Q_per_ch[channel,frame,sp_band] feat_in = Q_per_ch.permute(1, 0, 2).flatten(start_dim=1) feat_intermediate, _ = self.pooling_net[0](feat_in) - feat_intermediate = torch.cat((feat_intermediate[-1], base_rho_band.unsqueeze(0))) + feat_intermediate = torch.cat((feat_intermediate[-1], torch.as_tensor(base_rho_band, device=self.device, dtype=torch.float32).unsqueeze(0))) Q = self.pooling_net[1](feat_intermediate).squeeze() * 10 return Q diff --git a/pycvvdp/de2000.py b/pycvvdp/de2000.py new file mode 100644 index 0000000..ddafeda --- /dev/null +++ b/pycvvdp/de2000.py @@ -0,0 +1,92 @@ +import torch +import numpy as np + +from pycvvdp.utils import deltaE00 +from pycvvdp.vq_metric import vq_metric + +""" +DE2000 metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +""" +class de2000(vq_metric): + def __init__(self, device=None,display_name=None,display_geometry=None,display_photometry=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + # D65 White point + self.w = (0.9505, 1.0000, 1.0888) + self.colorspace = 'XYZ' + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + e00 = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # XYZ to Lab + w = self.max_L*self.w + T_lab = self.xyz_to_lab(T, w) + R_lab = self.xyz_to_lab(R, w) + + # Meancdm of Per-pixel DE2000 + e00 = e00 + self.e00_fn(T_lab, R_lab) / N_frames + return e00, None + + def xyz_to_lab(self, img, W): + Lab = torch.empty_like(img) + Lab[...,0:,:,:] = 116*self.lab_fn(img[...,1,:,:,:]/W[1])-16 + Lab[...,1:,:,:] = 500*(self.lab_fn(img[...,0,:,:,:]/W[0]) - self.lab_fn(img[...,1,:,:,:]/W[1])) + Lab[...,2:,:,:] = 200*(self.lab_fn(img[...,1,:,:,:]/W[1]) - self.lab_fn(img[...,2,:,:,:]/W[2])) + return Lab + + def lab_fn(self, x): + y = torch.empty_like(x) + sigma = (6/29) + y_1 = x**(1/3) + y_2 = (x/(3*(sigma**2)))+(4/29) + condition = torch.less(x, sigma**3) + y = torch.where(condition, y_2, y_1) + return y + + def e00_fn(self, img1, img2): + sz = torch.numel(img1[...,0,:,:,:]) + img1_row = torch.cat((torch.reshape(img1[...,0,:,:,:], (1,sz)), torch.reshape(img1[...,1,:,:,:], (1,sz)), torch.reshape(img1[...,2,:,:,:], (1,sz))), 0) + img2_row = torch.cat((torch.reshape(img2[...,0,:,:,:], (1,sz)), torch.reshape(img2[...,1,:,:,:], (1,sz)), torch.reshape(img2[...,2,:,:,:], (1,sz))), 0) + e00 = deltaE00(img1_row, img2_row) + # e00_mean = torch.empty_like(torch.reshape(img1[...,0,:,:,:], (1,sz))) + # e00_mean = torch.mean(torch.from_numpy(e00).to(e00_mean)) + e00_mean = torch.mean(e00) + return e00_mean + + def short_name(self): + return "DE-2000" + + def quality_unit(self): + return "Delta E2000" + + def get_info_string(self): + return None + + def set_display_model(self, display_photometry, display_geometry): + self.max_L = display_photometry.get_peak_luminance() + self.max_L = np.where( self.max_L < 300, self.max_L, 300) diff --git a/pycvvdp/de2000_spatial.py b/pycvvdp/de2000_spatial.py new file mode 100644 index 0000000..9b75e2f --- /dev/null +++ b/pycvvdp/de2000_spatial.py @@ -0,0 +1,125 @@ +import torch +import numpy as np + +from pycvvdp.utils import SCIELAB_filter, deltaE00 +from pycvvdp.vq_metric import vq_metric + +""" +Spatial DE2000 metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +""" +class s_de2000(vq_metric): + def __init__(self, device=None,display_name=None,display_geometry=None,display_photometry=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.slab = SCIELAB_filter(device=device) + # D65 White point + self.w = (0.9505, 1.0000, 1.0888) + self.colorspace = 'XYZ' + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + e_s00 = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # XYZ to Opp + T_opp = self.slab.xyz_to_opp(T) + R_opp = self.slab.xyz_to_opp(R) + + # Spatially filtered opponent colour images + T_s_opp = self.opp_to_sopp(T_opp, self.ppd) + R_s_opp = self.opp_to_sopp(R_opp, self.ppd) + + # S-OPP to S-XYZ + T_s_xyz = self.slab.opp_to_xyz(T_s_opp) + R_s_xyz = self.slab.opp_to_xyz(R_s_opp) + + # S-XYZ to S-Lab + w = self.max_L*self.w + T_s_lab = self.xyz_to_lab(T_s_xyz, w) + R_s_lab = self.xyz_to_lab(R_s_xyz, w) + + # Meancdm of Per-pixel DE2000 + e_s00 = e_s00 + self.e00_fn(T_s_lab, R_s_lab) / N_frames + return e_s00, None + + def opp_to_sopp(self, img, ppd): + S_OPP = torch.empty_like(img) + ## Filters are low-dimensional; construct using np + #[k1, k2, k3] = [torch.as_tensor(filter).to(img) for filter in self.slab.separableFilters_torch(ppd)] + #S_OPP[...,0:,:,:] = self.slab.separableConv_torch(torch.squeeze(img[...,0,:,:,:]), k1, torch.abs(k1)) + #S_OPP[...,1:,:,:] = self.slab.separableConv_torch(torch.squeeze(img[...,1,:,:,:]), k2, torch.abs(k2)) + #S_OPP[...,2:,:,:] = self.slab.separableConv_torch(torch.squeeze(img[...,2,:,:,:]), k3, torch.abs(k3)) + + # Simpler SCIELAB filters implementation + [k1, k2, k3] = self.slab.generateSCIELABfiltersParams(ppd) + # Limit filter width to 1-degree visual angle, and odd number of sampling points + # (so that the gaussians generated from Rick's gauss routine are symmetric). + width = int(np.ceil(ppd / 2) * 2 - 1) + S_OPP[...,0:,:,:] = self.slab.applyGaussFilter(img[...,0,:,:,:], width, k1) + S_OPP[...,1:,:,:] = self.slab.applyGaussFilter(img[...,1,:,:,:], width, k2) + S_OPP[...,2:,:,:] = self.slab.applyGaussFilter(img[...,2,:,:,:], width, k3) + + return S_OPP + + def xyz_to_lab(self, img, W): + Lab = torch.empty_like(img) + Lab[...,0:,:,:] = 116*self.lab_fn(img[...,1,:,:,:]/W[1])-16 + Lab[...,1:,:,:] = 500*(self.lab_fn(img[...,0,:,:,:]/W[0]) - self.lab_fn(img[...,1,:,:,:]/W[1])) + Lab[...,2:,:,:] = 200*(self.lab_fn(img[...,1,:,:,:]/W[1]) - self.lab_fn(img[...,2,:,:,:]/W[2])) + return Lab + + def lab_fn(self, x): + # y = torch.empty_like(x) + sigma = (6/29) + y_1 = x**(1/3) + y_2 = (x/(3*(sigma**2)))+(4/29) + condition = torch.less(x, sigma**3) + y = torch.where(condition, y_2, y_1) + return y + + def e00_fn(self, img1, img2): + sz = torch.numel(img1[...,0,:,:,:]) + img1_row = torch.cat((torch.reshape(img1[...,0,:,:,:], (1,sz)), torch.reshape(img1[...,1,:,:,:], (1,sz)), torch.reshape(img1[...,2,:,:,:], (1,sz))), 0) + img2_row = torch.cat((torch.reshape(img2[...,0,:,:,:], (1,sz)), torch.reshape(img2[...,1,:,:,:], (1,sz)), torch.reshape(img2[...,2,:,:,:], (1,sz))), 0) + e00 = deltaE00(img1_row, img2_row) + # e00_mean = torch.empty_like(torch.reshape(img1[...,0,:,:,:], (1,sz))) + # e00_mean = torch.mean(torch.from_numpy(e00).to(e00_mean)) + e00_mean = torch.mean(e00) + return e00_mean + + def short_name(self): + return "S-DE-2000" + + def quality_unit(self): + return "Delta E2000" + + def get_info_string(self): + return None + + def set_display_model(self, display_photometry, display_geometry): + self.ppd = display_geometry.get_ppd() + self.max_L = display_photometry.get_peak_luminance() + self.max_L = np.where( self.max_L < 300, self.max_L, 300) diff --git a/pycvvdp/dolby_ictcp.py b/pycvvdp/dolby_ictcp.py new file mode 100644 index 0000000..bd30df4 --- /dev/null +++ b/pycvvdp/dolby_ictcp.py @@ -0,0 +1,79 @@ +import numpy as np +import torch +import torch.nn as nn + +from pycvvdp.vq_metric import vq_metric + +""" +Dolby color metric +Reference: https://professional.dolby.com/siteassets/pdfs/dolby-vision-measuring-perceptual-color-volume-v7.1.pdf +""" +class ictcp(vq_metric): + def __init__(self, device=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.colorspace = 'XYZ' + self.XYZ2LMS = torch.tensor(((0.3593, -0.1921, 0.0071), + (0.6976, 1.1005, 0.0748), + (-0.0359, 0.0754, 0.8433)), device=self.device).T + self.LMS2ICTCP = torch.tensor(((2048, 2048, 0), + (6610, -13613, 7003), + (17933, -17390, -543)), device=self.device)/4096 + self.jnd_scaling = torch.tensor((720, 360, 720), device=self.device).view(1,3,1,1,1) + # ICTCP = bsxfun(@times, invEOTF(XYZ * XYZ2LMSmat) * LMS2ICTCPmat, [720, 360, 720]); + + def predict_video_source(self, vid_source, frame_padding="replicate"): + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + quality = 0.0 + for ff in range(N_frames): + # TODO: Batched processing + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + T_lms_prime = self.invEOTF(self.colorspace_conversion(T, self.XYZ2LMS)) + R_lms_prime = self.invEOTF(self.colorspace_conversion(R, self.XYZ2LMS)) + + T_ictcp = self.colorspace_conversion(T_lms_prime, self.LMS2ICTCP) + R_ictcp = self.colorspace_conversion(R_lms_prime, self.LMS2ICTCP) + + quality += self.delta_itp(T_ictcp, R_ictcp) / N_frames + return quality, None + + def invEOTF(self, lin): + return (((3424/4096)+(2413/128)*(lin.clip(min=0)/10000)**(2610/16384)) / \ + (1+(2392/128)*(lin.clip(min=0)/10000)**(2610/16384)))**(2523/32) + + def colorspace_conversion(self, img, M): + ABC = torch.empty_like(img) # ABC represents any linear colour space + # To avoid permute (slow), perform separate dot products + for cc in range(3): + ABC[...,cc,:,:,:] = torch.sum(img*(M[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return ABC + + """ + Reference: https://kb.portrait.com/help/ictcp-color-difference-metric + """ + def delta_itp(self, img1, img2): + return 720 * torch.sqrt((img1[...,0,:,:,:] - img2[...,0,:,:,:])**2 + + 0.5 * (img1[...,1,:,:,:] - img2[...,1,:,:,:])**2 + + (img1[...,2,:,:,:] - img2[...,2,:,:,:])**2).mean() + + def short_name(self): + return 'Dolby-ICTCP' diff --git a/pycvvdp/e_itp.py b/pycvvdp/e_itp.py new file mode 100644 index 0000000..7f51724 --- /dev/null +++ b/pycvvdp/e_itp.py @@ -0,0 +1,84 @@ +import torch + +from pycvvdp.utils import PU +from pycvvdp.video_source import * +from pycvvdp.vq_metric import * + +""" +E-ITP metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +""" +class e_itp(vq_metric): + def __init__(self, device=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.colorspace = 'LMShpe' + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + eitp = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # LMS_HPE to LMS_HPE_Lin + T_lms_lin = self.lmshpe_to_lmshpelin(T) + R_lms_lin = self.lmshpe_to_lmshpelin(R) + + # LMS_HPE_Lin to ITP + T_itp = self.lmshpelin_to_itp(T_lms_lin) + R_itp = self.lmshpelin_to_itp(R_lms_lin) + + eitp = eitp + self.eitp_fn(T_itp, R_itp) / N_frames + return eitp, None + + def lmshpe_to_lmshpelin(self, img): + lms_lin_pos = img**0.43 + lms_lin_neg = -1*(-img)**0.43 + condition = torch.less(img, 0) + lms_lin = torch.where(condition, lms_lin_neg, lms_lin_pos) + return lms_lin + + def lmshpelin_to_itp(self, img): + LMShpelin_to_itp = ( + (0.4, 0.4, 0.2), + (4.455, -4.851, 0.396), + (0.8056, 0.3572, -1.1628) ) + mat = torch.as_tensor( LMShpelin_to_itp, dtype=img.dtype, device=img.device) + ITP = torch.empty_like(img) + for cc in range(3): + ITP[...,cc,:,:,:] = torch.sum(img*(mat[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return ITP + + def eitp_fn(self, img1, img2): + mse = torch.mean(torch.sum( (img1 - img2)**2 )) + return torch.sqrt(mse) + + def short_name(self): + return "E-ITP" + + def quality_unit(self): + return "RMSE" + + def get_info_string(self): + return None \ No newline at end of file diff --git a/pycvvdp/e_sitp.py b/pycvvdp/e_sitp.py new file mode 100644 index 0000000..7cc7b19 --- /dev/null +++ b/pycvvdp/e_sitp.py @@ -0,0 +1,137 @@ +import torch +import numpy as np + +from pycvvdp.utils import SCIELAB_filter +from pycvvdp.video_source import * +from pycvvdp.vq_metric import * + +""" +E-SITP metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +""" +class e_sitp(vq_metric): + def __init__(self, device=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.sitp = SCIELAB_filter(device=device) + #self.colorspace = 'LMShpe' + self.colorspace = 'XYZ' + self.XYZ2LMS = torch.tensor(((0.3593, -0.1921, 0.0071), + (0.6976, 1.1005, 0.0748), + (-0.0359, 0.0754, 0.8433)), device=self.device).T + self.LMS2ICTCP = torch.tensor(((2048, 2048, 0), + (6610, -13613, 7003), + (17933, -17390, -543)), device=self.device)/4096 + self.jnd_scaling = torch.tensor((720, 360, 720), device=self.device).view(1,3,1,1,1) + # ICTCP = bsxfun(@times, invEOTF(XYZ * XYZ2LMSmat) * LMS2ICTCPmat, [720, 360, 720]); + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + esitp = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + ## LMS_HPE to LMS_HPE_Lin + #T_lms_lin = self.lmshpe_to_lmshpelin(T) + #R_lms_lin = self.lmshpe_to_lmshpelin(R) + + ## LMS_HPE_Lin to ITP + #T_itp = self.lmshpelin_to_itp(T_lms_lin) + #R_itp = self.lmshpelin_to_itp(R_lms_lin) + + ## ITP to Spatial ITP + #T_sitp = self.itp_to_sitp(T_itp, self.ppd) + #R_sitp = self.itp_to_sitp(R_itp, self.ppd) + + T_lms_prime = self.invEOTF(self.colorspace_conversion(T, self.XYZ2LMS)) + R_lms_prime = self.invEOTF(self.colorspace_conversion(R, self.XYZ2LMS)) + + T_ictcp = self.colorspace_conversion(T_lms_prime, self.LMS2ICTCP) + R_ictcp = self.colorspace_conversion(R_lms_prime, self.LMS2ICTCP) + + # ICTCP to Spatial ICTCP + T_sitp = self.itp_to_sitp(T_ictcp, self.ppd) + R_sitp = self.itp_to_sitp(R_ictcp, self.ppd) + + esitp = esitp + self.delta_itp(T_sitp, R_sitp) / N_frames + return esitp, None + + def invEOTF(self, lin): + return (((3424/4096)+(2413/128)*(lin.clip(min=0)/10000)**(2610/16384)) / \ + (1+(2392/128)*(lin.clip(min=0)/10000)**(2610/16384)))**(2523/32) + + def colorspace_conversion(self, img, M): + ABC = torch.empty_like(img) # ABC represents any linear colour space + # To avoid permute (slow), perform separate dot products + for cc in range(3): + ABC[...,cc,:,:,:] = torch.sum(img*(M[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return ABC + + def lmshpe_to_lmshpelin(self, img): + lms_lin_pos = img**0.43 + lms_lin_neg = -1*(-img)**0.43 + condition = torch.less(img, 0) + lms_lin = torch.where(condition, lms_lin_neg, lms_lin_pos) + return lms_lin + + def lmshpelin_to_itp(self, img): + LMShpelin_to_itp = ( + (0.4, 0.4, 0.2), + (4.455, -4.851, 0.396), + (0.8056, 0.3572, -1.1628) ) + mat = torch.as_tensor( LMShpelin_to_itp, dtype=img.dtype, device=img.device) + ITP = torch.empty_like(img) + for cc in range(3): + ITP[...,cc,:,:,:] = torch.sum(img*(mat[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return ITP + + def itp_to_sitp(self, img, ppd): + S_ITP = torch.empty_like(img) + # Filters are low-dimensional; construct using np + [k1, k2, k3] = [torch.as_tensor(filter).to(img) for filter in self.sitp.separableFilters(ppd)] + S_ITP[...,0:,:,:] = self.sitp.separableConv_torch(torch.squeeze(img[...,0,:,:,:]), k1, torch.abs(k1)) + S_ITP[...,1:,:,:] = self.sitp.separableConv_torch(torch.squeeze(img[...,1,:,:,:]), k2, torch.abs(k2)) + S_ITP[...,2:,:,:] = self.sitp.separableConv_torch(torch.squeeze(img[...,2,:,:,:]), k3, torch.abs(k3)) + return S_ITP + + def delta_itp(self, img1, img2): + return 720 * torch.sqrt((img1[...,0,:,:,:] - img2[...,0,:,:,:])**2 + + 0.5 * (img1[...,1,:,:,:] - img2[...,1,:,:,:])**2 + + (img1[...,2,:,:,:] - img2[...,2,:,:,:])**2).mean() + + def eitp_fn(self, img1, img2): + mse = torch.mean(torch.sum( (img1 - img2)**2 )) + return torch.sqrt(mse) + + def short_name(self): + return "E-SITP" + + def quality_unit(self): + return "RMSE" + + def get_info_string(self): + return None + + def set_display_model(self, display_photometry, display_geometry): + self.ppd = display_geometry.get_ppd() diff --git a/pycvvdp/flip.py b/pycvvdp/flip.py new file mode 100644 index 0000000..97ed6b3 --- /dev/null +++ b/pycvvdp/flip.py @@ -0,0 +1,681 @@ +import numpy as np +import torch +import torch.nn as nn + +from pycvvdp.vq_metric import vq_metric + +""" +HDR-FLIP +""" +class flip(vq_metric): + def __init__(self, device=None, block_size=32): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.ppd = None + self.flip = HDRFLIPLoss(device=device) + self.colorspace = 'RGB2020' + self.block_size = block_size + + def predict_video_source(self, vid_source, frame_padding="replicate"): + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + quality = 0.0 + for ff in range(N_frames//self.block_size + 1): + # TODO: Batched processing + T = [] + R = [] + for bb in range(self.block_size): + if ff*self.block_size + bb == N_frames: + break + T.append(vid_source.get_test_frame(ff*self.block_size + bb, device=self.device, colorspace=self.colorspace)) + R.append(vid_source.get_reference_frame(ff*self.block_size + bb, device=self.device, colorspace=self.colorspace)) + + if len(R) != 0: + T = torch.cat(T, dim=0) + R = torch.cat(R, dim=0) + + # Input required in shape (B,C,H,W) + quality += self.flip(T.squeeze(2), R.squeeze(2), self.ppd) / N_frames * T.shape[0] + return quality, None + + def set_display_model(self, display_photometry, display_geometry): + self.ppd = display_geometry.get_ppd() + + def short_name(self): + return 'HDR-FLIP' + + +""" +FLIP metric +Reference: https://github.com/NVlabs/flip/tree/main/python +""" +class HDRFLIPLoss(nn.Module): + """ Class for computing HDR-FLIP """ + + def __init__(self, device): + """ Init """ + super().__init__() + self.qc = 0.7 + self.qf = 0.5 + self.pc = 0.4 + self.pt = 0.95 + self.tmax = 0.85 + self.tmin = 0.85 + self.eps = 1e-15 + self.device = device + + def forward(self, test, reference, pixels_per_degree, tone_mapper="aces", start_exposure=None, stop_exposure=None): + """ + Computes the HDR-FLIP error map between two HDR images, + assuming the images are observed at a certain number of + pixels per degree of visual angle + :param test: test tensor (with NxCxHxW layout with nonnegative values) + :param reference: reference tensor (with NxCxHxW layout with nonnegative values) + :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer, + default corresponds to viewing the images on a 0.7 meters wide 4K monitor at 0.7 meters from the display + :param tone_mapper: (optional) string describing what tone mapper HDR-FLIP should assume + :param start_exposure: (optional tensor (with Nx1x1x1 layout) with start exposures corresponding to each HDR reference/test pair + :param stop_exposure: (optional) tensor (with Nx1x1x1 layout) with stop exposures corresponding to each HDR reference/test pair + :return: float containing the mean FLIP error (in the range [0,1]) between the HDR reference and test images in the batch + """ + if pixels_per_degree is None: + pixels_per_degree = (0.7 * 3840 / 0.7) * np.pi / 180 + + # HDR-FLIP expects nonnegative and non-NaN values in the input + reference = torch.clamp(reference, 0, 65536.0) + test = torch.clamp(test, 0, 65536.0) + + # Compute start and stop exposures, if they are not given + if start_exposure is None or stop_exposure is None: + c_start, c_stop = compute_start_stop_exposures(reference, tone_mapper, self.tmax, self.tmin) + if start_exposure is None: + start_exposure = c_start + if stop_exposure is None: + stop_exposure = c_stop + + # Compute number of exposures + num_exposures = torch.max(torch.tensor([2.0], requires_grad=False).to(self.device), torch.ceil(stop_exposure - start_exposure)) + most_exposures = int(torch.amax(num_exposures, dim=0).item()) + + # Compute exposure step size + step_size = (stop_exposure - start_exposure) / torch.max(num_exposures - 1, torch.tensor([1.0], requires_grad=False).to(self.device)) + + # Set the depth of the error tensor to the number of exposures given by the largest exposure range any reference image yielded. + # This allows us to do one loop for each image in our batch, while not affecting the HDR-FLIP error, as we fill up the error tensor with 0s. + # Note that the step size still depends on num_exposures and is therefore independent of most_exposures + dim = reference.size() + all_errors = torch.zeros(size=(dim[0], most_exposures, dim[2], dim[3])).to(self.device) + + # Loop over exposures and compute LDR-FLIP for each pair of LDR reference and test + for i in range(0, most_exposures): + exposure = start_exposure + i * step_size + + reference_tone_mapped = tone_map(reference, tone_mapper, exposure) + test_tone_mapped = tone_map(test, tone_mapper, exposure) + + reference_opponent = color_space_transform(reference_tone_mapped, 'linrgb2ycxcz') + test_opponent = color_space_transform(test_tone_mapped, 'linrgb2ycxcz') + + all_errors[:, i, :, :] = compute_ldrflip( + test_opponent, reference_opponent, pixels_per_degree, + self.qc, self.qf, self.pc, self.pt, self.eps + ).squeeze(1) + + # Take per-pixel maximum over all LDR-FLIP errors to get HDR-FLIP + hdrflip_error = torch.amax(all_errors, dim=1, keepdim=True) + return torch.mean(hdrflip_error) + + +def tone_map(img, tone_mapper, exposure): + """ + Applies exposure compensation and tone mapping. + Refer to the Visualizing Errors in Rendered High Dynamic Range Images + paper for details about the formulas. + :param img: float tensor (with NxCxHxW layout) containing nonnegative values + :param tone_mapper: string describing the tone mapper to apply + :param exposure: float tensor (with Nx1x1x1 layout) describing the exposure compensation factor + """ + # Exposure compensation + x = (2 ** exposure) * img + + # Set tone mapping coefficients depending on tone_mapper + if tone_mapper == "reinhard": + lum_coeff_r = 0.2126 + lum_coeff_g = 0.7152 + lum_coeff_b = 0.0722 + + Y = x[:, 0:1, :, :] * lum_coeff_r + x[:, 1:2, :, :] * lum_coeff_g + x[:, 2:3, :, :] * lum_coeff_b + return torch.clamp(torch.div(x, 1 + Y), 0.0, 1.0) + + if tone_mapper == "hable": + # Source: https://64.github.io/tonemapping/ + A = 0.15 + B = 0.50 + C = 0.10 + D = 0.20 + E = 0.02 + F = 0.30 + k0 = A * F - A * E + k1 = C * B * F - B * E + k2 = 0 + k3 = A * F + k4 = B * F + k5 = D * F * F + + W = 11.2 + nom = k0 * torch.pow(W, torch.tensor([2.0]).to(img.device)) + k1 * W + k2 + denom = k3 * torch.pow(W, torch.tensor([2.0]).to(img.device)) + k4 * W + k5 + white_scale = torch.div(denom, nom) # = 1 / (nom / denom) + + # Include white scale and exposure bias in rational polynomial coefficients + k0 = 4 * k0 * white_scale + k1 = 2 * k1 * white_scale + k2 = k2 * white_scale + k3 = 4 * k3 + k4 = 2 * k4 + # k5 = k5 # k5 is not changed + else: + # Source: ACES approximation: https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/ + # Include pre-exposure cancelation in constants + k0 = 0.6 * 0.6 * 2.51 + k1 = 0.6 * 0.03 + k2 = 0 + k3 = 0.6 * 0.6 * 2.43 + k4 = 0.6 * 0.59 + k5 = 0.14 + + x2 = torch.pow(x, 2) + nom = k0 * x2 + k1 * x + k2 + denom = k3 * x2 + k4 * x + k5 + denom = torch.where(torch.isinf(denom), torch.Tensor([1.0]).to(img.device), denom) # if denom is inf, then so is nom => nan. Pixel is very bright. It becomes inf here, but 1 after clamp below + y = torch.div(nom, denom) + return torch.clamp(y, 0.0, 1.0) + + +def compute_start_stop_exposures(reference, tone_mapper, tmax, tmin): + """ + Computes start and stop exposure for HDR-FLIP based on given tone mapper and reference image. + Refer to the Visualizing Errors in Rendered High Dynamic Range Images + paper for details about the formulas + :param reference: float tensor (with NxCxHxW layout) containing reference images (nonnegative values) + :param tone_mapper: string describing which tone mapper should be assumed + :param tmax: float describing the t value used to find the start exposure + :param tmin: float describing the t value used to find the stop exposure + :return: two float tensors (with Nx1x1x1 layout) containing start and stop exposures, respectively, to use for HDR-FLIP + """ + if tone_mapper == "reinhard": + k0 = 0 + k1 = 1 + k2 = 0 + k3 = 0 + k4 = 1 + k5 = 1 + + x_max = tmax * k5 / (k1 - tmax * k4) + x_min = tmin * k5 / (k1 - tmin * k4) + elif tone_mapper == "hable": + # Source: https://64.github.io/tonemapping/ + A = 0.15 + B = 0.50 + C = 0.10 + D = 0.20 + E = 0.02 + F = 0.30 + k0 = A * F - A * E + k1 = C * B * F - B * E + k2 = 0 + k3 = A * F + k4 = B * F + k5 = D * F * F + + W = 11.2 + nom = k0 * torch.pow(W, torch.tensor([2.0]).to(reference.device)) + k1 * W + k2 + denom = k3 * torch.pow(W, torch.tensor([2.0]).to(reference.device)) + k4 * W + k5 + white_scale = torch.div(denom, nom) # = 1 / (nom / denom) + + # Include white scale and exposure bias in rational polynomial coefficients + k0 = 4 * k0 * white_scale + k1 = 2 * k1 * white_scale + k2 = k2 * white_scale + k3 = 4 * k3 + k4 = 2 * k4 + # k5 = k5 # k5 is not changed + + c0 = (k1 - k4 * tmax) / (k0 - k3 * tmax) + c1 = (k2 - k5 * tmax) / (k0 - k3 * tmax) + x_max = - 0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).to(reference.device) * c0) ** 2) - c1) + + c0 = (k1 - k4 * tmin) / (k0 - k3 * tmin) + c1 = (k2 - k5 * tmin) / (k0 - k3 * tmin) + x_min = - 0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).to(reference.device) * c0) ** 2) - c1) + else: + # Source: ACES approximation: https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/ + # Include pre-exposure cancelation in constants + k0 = 0.6 * 0.6 * 2.51 + k1 = 0.6 * 0.03 + k2 = 0 + k3 = 0.6 * 0.6 * 2.43 + k4 = 0.6 * 0.59 + k5 = 0.14 + + c0 = (k1 - k4 * tmax) / (k0 - k3 * tmax) + c1 = (k2 - k5 * tmax) / (k0 - k3 * tmax) + x_max = - 0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).to(reference.device) * c0) ** 2) - c1) + + c0 = (k1 - k4 * tmin) / (k0 - k3 * tmin) + c1 = (k2 - k5 * tmin) / (k0 - k3 * tmin) + x_min = - 0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).to(reference.device) * c0) ** 2) - c1) + + # Convert reference to luminance + lum_coeff_r = 0.2126 + lum_coeff_g = 0.7152 + lum_coeff_b = 0.0722 + Y_reference = reference[:, 0:1, :, :] * lum_coeff_r + reference[:, 1:2, :, :] * lum_coeff_g + reference[:, 2:3, :, :] * lum_coeff_b + + # Compute start exposure + Y_hi = torch.amax(Y_reference, dim=(2, 3), keepdim=True) + start_exposure = torch.log2(x_max / Y_hi) + + # Compute stop exposure + dim = Y_reference.size() + Y_ref = Y_reference.view(dim[0], dim[1], dim[2]*dim[3]) + Y_lo = torch.median(Y_ref, dim=2).values.unsqueeze(2).unsqueeze(3) + stop_exposure = torch.log2(x_min / Y_lo) + + return start_exposure, stop_exposure + + +def compute_ldrflip(test, reference, pixels_per_degree, qc, qf, pc, pt, eps): + """ + Computes the LDR-FLIP error map between two LDR images, + assuming the images are observed at a certain number of + pixels per degree of visual angle + :param reference: reference tensor (with NxCxHxW layout with values in the YCxCz color space) + :param test: test tensor (with NxCxHxW layout with values in the YCxCz color space) + :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer, + default corresponds to viewing the images on a 0.7 meters wide 4K monitor at 0.7 meters from the display + :param qc: float describing the q_c exponent in the LDR-FLIP color pipeline (see FLIP paper for details) + :param qf: float describing the q_f exponent in the LDR-FLIP feature pipeline (see FLIP paper for details) + :param pc: float describing the p_c exponent in the LDR-FLIP color pipeline (see FLIP paper for details) + :param pt: float describing the p_t exponent in the LDR-FLIP color pipeline (see FLIP paper for details) + :param eps: float containing a small value used to improve training stability + :return: tensor containing the per-pixel FLIP errors (with Nx1xHxW layout and values in the range [0, 1]) between LDR reference and test images + """ + # --- Color pipeline --- + # Spatial filtering + s_a, radius_a = generate_spatial_filter(pixels_per_degree, 'A', reference.device) + s_rg, radius_rg = generate_spatial_filter(pixels_per_degree, 'RG', reference.device) + s_by, radius_by = generate_spatial_filter(pixels_per_degree, 'BY', reference.device) + radius = max(radius_a, radius_rg, radius_by) + filtered_reference = spatial_filter(reference, s_a, s_rg, s_by, radius) + filtered_test = spatial_filter(test, s_a, s_rg, s_by, radius) + + # Perceptually Uniform Color Space + preprocessed_reference = hunt_adjustment(color_space_transform(filtered_reference, 'linrgb2lab')) + preprocessed_test = hunt_adjustment(color_space_transform(filtered_test, 'linrgb2lab')) + + # Color metric + deltaE_hyab = hyab(preprocessed_reference, preprocessed_test, eps) + power_deltaE_hyab = torch.pow(deltaE_hyab, qc) + hunt_adjusted_green = hunt_adjustment(color_space_transform(torch.tensor([[[0.0]], [[1.0]], [[0.0]]]).unsqueeze(0), 'linrgb2lab')) + hunt_adjusted_blue = hunt_adjustment(color_space_transform(torch.tensor([[[0.0]], [[0.0]], [[1.0]]]).unsqueeze(0), 'linrgb2lab')) + cmax = torch.pow(hyab(hunt_adjusted_green, hunt_adjusted_blue, eps), qc).item() + deltaE_c = redistribute_errors(power_deltaE_hyab, cmax, pc, pt) + + # --- Feature pipeline --- + # Extract and normalize Yy component + ref_y = (reference[:, 0:1, :, :] + 16) / 116 + test_y = (test[:, 0:1, :, :] + 16) / 116 + + # Edge and point detection + edges_reference = feature_detection(ref_y, pixels_per_degree, 'edge') + points_reference = feature_detection(ref_y, pixels_per_degree, 'point') + edges_test = feature_detection(test_y, pixels_per_degree, 'edge') + points_test = feature_detection(test_y, pixels_per_degree, 'point') + + # Feature metric + deltaE_f = torch.max( + torch.abs(torch.norm(edges_reference, dim=1, keepdim=True) - torch.norm(edges_test, dim=1, keepdim=True)), + torch.abs(torch.norm(points_test, dim=1, keepdim=True) - torch.norm(points_reference, dim=1, keepdim=True)) + ) + deltaE_f = torch.clamp(deltaE_f, min=eps) # clamp to stabilize training + deltaE_f = torch.pow(((1 / np.sqrt(2)) * deltaE_f), qf) + + # --- Final error --- + return torch.pow(deltaE_c, 1 - deltaE_f) + + +def generate_spatial_filter(pixels_per_degree, channel, device): + """ + Generates spatial contrast sensitivity filters with width depending on + the number of pixels per degree of visual angle of the observer + :param pixels_per_degree: float indicating number of pixels per degree of visual angle + :param channel: string describing what filter should be generated + :yield: Filter kernel corresponding to the spatial contrast sensitivity function of the given channel and kernel's radius + """ + a1_A = 1 + b1_A = 0.0047 + a2_A = 0 + b2_A = 1e-5 # avoid division by 0 + a1_rg = 1 + b1_rg = 0.0053 + a2_rg = 0 + b2_rg = 1e-5 # avoid division by 0 + a1_by = 34.1 + b1_by = 0.04 + a2_by = 13.5 + b2_by = 0.025 + if channel == "A": # Achromatic CSF + a1 = a1_A + b1 = b1_A + a2 = a2_A + b2 = b2_A + elif channel == "RG": # Red-Green CSF + a1 = a1_rg + b1 = b1_rg + a2 = a2_rg + b2 = b2_rg + elif channel == "BY": # Blue-Yellow CSF + a1 = a1_by + b1 = b1_by + a2 = a2_by + b2 = b2_by + + # Determine evaluation domain + max_scale_parameter = max([b1_A, b2_A, b1_rg, b2_rg, b1_by, b2_by]) + r = np.ceil(3 * np.sqrt(max_scale_parameter / (2 * np.pi**2)) * pixels_per_degree) + r = int(r) + deltaX = 1.0 / pixels_per_degree + x, y = np.meshgrid(range(-r, r + 1), range(-r, r + 1)) + z = (x * deltaX)**2 + (y * deltaX)**2 + + # Generate weights + g = a1 * np.sqrt(np.pi / b1) * np.exp(-np.pi**2 * z / b1) + a2 * np.sqrt(np.pi / b2) * np.exp(-np.pi**2 * z / b2) + g = g / np.sum(g) + g = torch.Tensor(g).unsqueeze(0).unsqueeze(0).to(device) + + return g, r + + +def spatial_filter(img, s_a, s_rg, s_by, radius): + """ + Filters an image with channel specific spatial contrast sensitivity functions + and clips result to the unit cube in linear RGB + :param img: image tensor to filter (with NxCxHxW layout in the YCxCz color space) + :param s_a: spatial filter matrix for the achromatic channel + :param s_rg: spatial filter matrix for the red-green channel + :param s_by: spatial filter matrix for the blue-yellow channel + :return: input image (with NxCxHxW layout) transformed to linear RGB after filtering with spatial contrast sensitivity functions + """ + dim = img.size() + # Prepare image for convolution + img_pad = torch.zeros((dim[0], dim[1], dim[2] + 2 * radius, dim[3] + 2 * radius), device=img.device) + img_pad[:, 0:1, :, :] = nn.functional.pad(img[:, 0:1, :, :], (radius, radius, radius, radius), mode='replicate') + img_pad[:, 1:2, :, :] = nn.functional.pad(img[:, 1:2, :, :], (radius, radius, radius, radius), mode='replicate') + img_pad[:, 2:3, :, :] = nn.functional.pad(img[:, 2:3, :, :], (radius, radius, radius, radius), mode='replicate') + + # Apply Gaussian filters + img_tilde_opponent = torch.zeros((dim[0], dim[1], dim[2], dim[3]), device=img.device) + img_tilde_opponent[:, 0:1, :, :] = nn.functional.conv2d(img_pad[:, 0:1, :, :], s_a.to(img.device), padding=0) + img_tilde_opponent[:, 1:2, :, :] = nn.functional.conv2d(img_pad[:, 1:2, :, :], s_rg.to(img.device), padding=0) + img_tilde_opponent[:, 2:3, :, :] = nn.functional.conv2d(img_pad[:, 2:3, :, :], s_by.to(img.device), padding=0) + + # Transform to linear RGB for clamp + img_tilde_linear_rgb = color_space_transform(img_tilde_opponent, 'ycxcz2linrgb') + + # Clamp to RGB box + return torch.clamp(img_tilde_linear_rgb, 0.0, 1.0) + + +def hunt_adjustment(img): + """ + Applies Hunt-adjustment to an image + :param img: image tensor to adjust (with NxCxHxW layout in the L*a*b* color space) + :return: Hunt-adjusted image tensor (with NxCxHxW layout in the Hunt-adjusted L*A*B* color space) + """ + # Extract luminance component + L = img[:, 0:1, :, :] + + # Apply Hunt adjustment + img_h = torch.zeros(img.size(), device=img.device) + img_h[:, 0:1, :, :] = L + img_h[:, 1:2, :, :] = torch.mul((0.01 * L), img[:, 1:2, :, :]) + img_h[:, 2:3, :, :] = torch.mul((0.01 * L), img[:, 2:3, :, :]) + + return img_h + + +def hyab(reference, test, eps): + """ + Computes the HyAB distance between reference and test images + :param reference: reference image tensor (with NxCxHxW layout in the standard or Hunt-adjusted L*A*B* color space) + :param test: test image tensor (with NxCxHxW layout in the standard or Hunt-adjusted L*a*b* color space) + :param eps: float containing a small value used to improve training stability + :return: image tensor (with Nx1xHxW layout) containing the per-pixel HyAB distances between reference and test images + """ + delta = reference - test + root = torch.sqrt(torch.clamp(torch.pow(delta[:, 0:1, :, :], 2), min=eps)) + delta_norm = torch.norm(delta[:, 1:3, :, :], dim=1, keepdim=True) + return root + delta_norm # alternative abs to stabilize training + + +def redistribute_errors(power_deltaE_hyab, cmax, pc, pt): + """ + Redistributes exponentiated HyAB errors to the [0,1] range + :param power_deltaE_hyab: float tensor (with Nx1xHxW layout) containing the exponentiated HyAb distance + :param cmax: float containing the exponentiated, maximum HyAB difference between two colors in Hunt-adjusted L*A*B* space + :param pc: float containing the cmax multiplier p_c (see FLIP paper) + :param pt: float containing the target value, p_t, for p_c * cmax (see FLIP paper) + :return: image tensor (with Nx1xHxW layout) containing redistributed per-pixel HyAB distances (in range [0,1]) + """ + # Re-map error to 0-1 range. Values between 0 and + # pccmax are mapped to the range [0, pt], + # while the rest are mapped to the range (pt, 1] + deltaE_c = torch.zeros(power_deltaE_hyab.size(), device=power_deltaE_hyab.device) + pccmax = pc * cmax + deltaE_c = torch.where(power_deltaE_hyab < pccmax, (pt / pccmax) * power_deltaE_hyab, pt + ((power_deltaE_hyab - pccmax) / (cmax - pccmax)) * (1.0 - pt)) + + return deltaE_c + + +def feature_detection(img_y, pixels_per_degree, feature_type): + """ + Detects edges and points (features) in the achromatic image + :param imgy: achromatic image tensor (with Nx1xHxW layout, containing normalized Y-values from YCxCz) + :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer + :param feature_type: string indicating the type of feature to detect + :return: image tensor (with Nx2xHxW layout, with values in range [0,1]) containing large values where features were detected + """ + # Set peak to trough value (2x standard deviations) of human edge + # detection filter + w = 0.082 + + # Compute filter radius + sd = 0.5 * w * pixels_per_degree + radius = int(np.ceil(3 * sd)) + + # Compute 2D Gaussian + [x, y] = np.meshgrid(range(-radius, radius+1), range(-radius, radius+1)) + g = np.exp(-(x ** 2 + y ** 2) / (2 * sd * sd)) + + if feature_type == 'edge': # Edge detector + # Compute partial derivative in x-direction + Gx = np.multiply(-x, g) + else: # Point detector + # Compute second partial derivative in x-direction + Gx = np.multiply(x ** 2 / (sd * sd) - 1, g) + + # Normalize positive weights to sum to 1 and negative weights to sum to -1 + negative_weights_sum = -np.sum(Gx[Gx < 0]) + positive_weights_sum = np.sum(Gx[Gx > 0]) + Gx = torch.Tensor(Gx) + Gx = torch.where(Gx < 0, Gx / negative_weights_sum, Gx / positive_weights_sum) + Gx = Gx.unsqueeze(0).unsqueeze(0).to(img_y.device) + + # Detect features + featuresX = nn.functional.conv2d(nn.functional.pad(img_y, (radius, radius, radius, radius), mode='replicate'), Gx, padding=0) + featuresY = nn.functional.conv2d(nn.functional.pad(img_y, (radius, radius, radius, radius), mode='replicate'), torch.transpose(Gx, 2, 3), padding=0) + return torch.cat((featuresX, featuresY), dim=1) + + +def color_space_transform(input_color, fromSpace2toSpace): + """ + Transforms inputs between different color spaces + :param input_color: tensor of colors to transform (with NxCxHxW layout) + :param fromSpace2toSpace: string describing transform + :return: transformed tensor (with NxCxHxW layout) + """ + dim = input_color.size() + + # Assume D65 standard illuminant + reference_illuminant = torch.tensor([[[0.950428545]], [[1.000000000]], [[1.088900371]]]).to(input_color.device) + inv_reference_illuminant = torch.tensor([[[1.052156925]], [[1.000000000]], [[0.918357670]]]).to(input_color.device) + + if fromSpace2toSpace == "srgb2linrgb": + limit = 0.04045 + transformed_color = torch.where( + input_color > limit, + torch.pow((torch.clamp(input_color, min=limit) + 0.055) / 1.055, 2.4), + input_color / 12.92 + ) # clamp to stabilize training + + elif fromSpace2toSpace == "linrgb2srgb": + limit = 0.0031308 + transformed_color = torch.where( + input_color > limit, + 1.055 * torch.pow(torch.clamp(input_color, min=limit), (1.0 / 2.4)) - 0.055, + 12.92 * input_color + ) + + elif fromSpace2toSpace in ["linrgb2xyz", "xyz2linrgb"]: + # Source: https://www.image-engineering.de/library/technotes/958-how-to-convert-between-srgb-and-ciexyz + # Assumes D65 standard illuminant + if fromSpace2toSpace == "linrgb2xyz": + a11 = 10135552 / 24577794 + a12 = 8788810 / 24577794 + a13 = 4435075 / 24577794 + a21 = 2613072 / 12288897 + a22 = 8788810 / 12288897 + a23 = 887015 / 12288897 + a31 = 1425312 / 73733382 + a32 = 8788810 / 73733382 + a33 = 70074185 / 73733382 + else: + # Constants found by taking the inverse of the matrix + # defined by the constants for linrgb2xyz + a11 = 3.241003275 + a12 = -1.537398934 + a13 = -0.498615861 + a21 = -0.969224334 + a22 = 1.875930071 + a23 = 0.041554224 + a31 = 0.055639423 + a32 = -0.204011202 + a33 = 1.057148933 + A = torch.Tensor([[a11, a12, a13], + [a21, a22, a23], + [a31, a32, a33]]) + + input_color = input_color.view(dim[0], dim[1], dim[2]*dim[3]).to(input_color.device) # NC(HW) + + transformed_color = torch.matmul(A.to(input_color.device), input_color) + transformed_color = transformed_color.view(dim[0], dim[1], dim[2], dim[3]) + + elif fromSpace2toSpace == "xyz2ycxcz": + input_color = torch.mul(input_color, inv_reference_illuminant) + y = 116 * input_color[:, 1:2, :, :] - 16 + cx = 500 * (input_color[:, 0:1, :, :] - input_color[:, 1:2, :, :]) + cz = 200 * (input_color[:, 1:2, :, :] - input_color[:, 2:3, :, :]) + transformed_color = torch.cat((y, cx, cz), 1) + + elif fromSpace2toSpace == "ycxcz2xyz": + y = (input_color[:, 0:1, :, :] + 16) / 116 + cx = input_color[:, 1:2, :, :] / 500 + cz = input_color[:, 2:3, :, :] / 200 + + x = y + cx + z = y - cz + transformed_color = torch.cat((x, y, z), 1) + + transformed_color = torch.mul(transformed_color, reference_illuminant) + + elif fromSpace2toSpace == "xyz2lab": + input_color = torch.mul(input_color, inv_reference_illuminant) + delta = 6 / 29 + delta_square = delta * delta + delta_cube = delta * delta_square + factor = 1 / (3 * delta_square) + + clamped_term = torch.pow(torch.clamp(input_color, min=delta_cube), 1.0 / 3.0).to(dtype=input_color.dtype) + div = (factor * input_color + (4 / 29)).to(dtype=input_color.dtype) + input_color = torch.where(input_color > delta_cube, clamped_term, div) # clamp to stabilize training + + L = 116 * input_color[:, 1:2, :, :] - 16 + a = 500 * (input_color[:, 0:1, :, :] - input_color[:, 1:2, :, :]) + b = 200 * (input_color[:, 1:2, :, :] - input_color[:, 2:3, :, :]) + + transformed_color = torch.cat((L, a, b), 1) + + elif fromSpace2toSpace == "lab2xyz": + y = (input_color[:, 0:1, :, :] + 16) / 116 + a = input_color[:, 1:2, :, :] / 500 + b = input_color[:, 2:3, :, :] / 200 + + x = y + a + z = y - b + + xyz = torch.cat((x, y, z), 1) + delta = 6 / 29 + delta_square = delta * delta + factor = 3 * delta_square + xyz = torch.where(xyz > delta, torch.pow(xyz, 3), factor * (xyz - 4 / 29)) + + transformed_color = torch.mul(xyz, reference_illuminant) + + elif fromSpace2toSpace == "srgb2xyz": + transformed_color = color_space_transform(input_color, 'srgb2linrgb') + transformed_color = color_space_transform(transformed_color, 'linrgb2xyz') + elif fromSpace2toSpace == "srgb2ycxcz": + transformed_color = color_space_transform(input_color, 'srgb2linrgb') + transformed_color = color_space_transform(transformed_color, 'linrgb2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2ycxcz') + elif fromSpace2toSpace == "linrgb2ycxcz": + transformed_color = color_space_transform(input_color, 'linrgb2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2ycxcz') + elif fromSpace2toSpace == "srgb2lab": + transformed_color = color_space_transform(input_color, 'srgb2linrgb') + transformed_color = color_space_transform(transformed_color, 'linrgb2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2lab') + elif fromSpace2toSpace == "linrgb2lab": + transformed_color = color_space_transform(input_color, 'linrgb2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2lab') + elif fromSpace2toSpace == "ycxcz2linrgb": + transformed_color = color_space_transform(input_color, 'ycxcz2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2linrgb') + elif fromSpace2toSpace == "lab2srgb": + transformed_color = color_space_transform(input_color, 'lab2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2linrgb') + transformed_color = color_space_transform(transformed_color, 'linrgb2srgb') + elif fromSpace2toSpace == "ycxcz2lab": + transformed_color = color_space_transform(input_color, 'ycxcz2xyz') + transformed_color = color_space_transform(transformed_color, 'xyz2lab') + else: + raise NotImplemented('Error: The color transform %s is not defined!' % fromSpace2toSpace) + + return transformed_color diff --git a/pycvvdp/hyab.py b/pycvvdp/hyab.py new file mode 100644 index 0000000..1617a9b --- /dev/null +++ b/pycvvdp/hyab.py @@ -0,0 +1,89 @@ +import torch +import numpy as np + +from pycvvdp.vq_metric import vq_metric + +""" +HyAB metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +Abasi, S., Amani Tehran, M., & Fairchild, M. D. (2020). Distance metrics for very large color differences. +Color Research & Application, 45(2), 208–223. https://doi.org/10.1002/col.22451 +""" +class hyab(vq_metric): + def __init__(self, device=None,display_name=None,display_geometry=None,display_photometry=None): + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + # D65 White point + self.w = (0.9505, 1.0000, 1.0888) + self.colorspace = 'XYZ' + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + ehyab = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # XYZ to Lab + w = self.max_L*self.w + T_lab = self.xyz_to_lab(T, w) + R_lab = self.xyz_to_lab(R, w) + + # Meancdm of Per-pixel DE2000 + ehyab = ehyab + self.hyab_fn(T_lab, R_lab) / N_frames + return ehyab, None + + def xyz_to_lab(self, img, W): + Lab = torch.empty_like(img) + Lab[...,0:,:,:] = 116*self.lab_fn(img[...,1,:,:,:]/W[1])-16 + Lab[...,1:,:,:] = 500*(self.lab_fn(img[...,0,:,:,:]/W[0]) - self.lab_fn(img[...,1,:,:,:]/W[1])) + Lab[...,2:,:,:] = 200*(self.lab_fn(img[...,1,:,:,:]/W[1]) - self.lab_fn(img[...,2,:,:,:]/W[2])) + return Lab + + def lab_fn(self, x): + y = torch.empty_like(x) + sigma = (6/29) + y_1 = x**(1/3) + y_2 = (x/(3*(sigma**2)))+(4/29) + condition = torch.less(x, sigma**3) + y = torch.where(condition, y_2, y_1) + return y + + def hyab_fn(self, img1, img2): + del_L = torch.absolute(img1[...,0,:,:,:]-img2[...,0,:,:,:]) + del_ab = torch.sqrt((img1[...,1,:,:,:]-img2[...,1,:,:,:])**2 + (img1[...,2,:,:,:]-img2[...,2,:,:,:])**2) + hyab_mean = torch.mean(del_L+del_ab) + return hyab_mean + + def short_name(self): + return "HyAB" + + def quality_unit(self): + return "Hybrid Delta E Lab" + + def get_info_string(self): + return None + + def set_display_model(self, display_photometry, display_geometry): + self.max_L = display_photometry.get_peak_luminance() + self.max_L = np.where( self.max_L < 300, self.max_L, 300) diff --git a/pycvvdp/pu_lpips.py b/pycvvdp/pu_lpips.py new file mode 100644 index 0000000..f4fa260 --- /dev/null +++ b/pycvvdp/pu_lpips.py @@ -0,0 +1,59 @@ +import torch + +from pycvvdp.utils import PU +from pycvvdp.vq_metric import vq_metric + +""" +PU21 + LPIPS +First install LPIPS with "pip install lpips" +""" +class pu_lpips(vq_metric): + def __init__(self, device=None, net='vgg'): + from lpips import LPIPS + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + self.pu = PU() + # Default versions uses VGG net + self.lpips = LPIPS(net=net) + self.lpips.to(self.device) + self.colorspace = 'display_encoded_100nit' + + def predict_video_source(self, vid_source, frame_padding="replicate"): + # T_vid and R_vid are the tensors of the size (1,3,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + + quality = 0.0 + for ff in range(N_frames): + # TODO: Batched processing + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # Apply PU and reshape to (1,C,H,W) + # Input pixels shoulb lie in [-1,1] + T_enc = self.pu.encode(T.clip(0, self.max_L)).squeeze(2) / 255. * 2 - 1 + R_enc = self.pu.encode(R.clip(0, self.max_L)).squeeze(2) / 255. * 2 - 1 + + quality += self.lpips(T_enc, R_enc) / N_frames + return quality, None + + def short_name(self): + return 'PU21-LPIPS' + + def set_display_model(self, display_photometry, display_geometry): + self.max_L = display_photometry.get_peak_luminance() + self.max_L = min(self.max_L, 300) diff --git a/pycvvdp/pu_vmaf.py b/pycvvdp/pu_vmaf.py new file mode 100644 index 0000000..1ee6b55 --- /dev/null +++ b/pycvvdp/pu_vmaf.py @@ -0,0 +1,140 @@ +import json +import os, os.path as osp +from time import time +from tqdm import trange +import torch + +from pycvvdp.utils import PU +from pycvvdp.video_source import * +from pycvvdp.vq_metric import * + +""" +PU21-VMAF metric. Usage is same as the FovVideoVDP metric (see pytorch_examples). +Required: ffmpeg compiled with libvmaf (https://github.com/Netflix/vmaf/blob/master/resource/doc/ffmpeg.md) +""" +class pu_vmaf(vq_metric): + def __init__(self, ffmpeg_bin=None, cache_ref_loc='.', device=None): + if ffmpeg_bin is None: + # Empty constructor to retrieve name + return + + # Use GPU if available + if device is None: + if torch.cuda.is_available() and torch.cuda.device_count()>0: + self.device = torch.device('cuda:0') + else: + self.device = torch.device('cpu') + else: + self.device = device + + if not osp.isdir(cache_ref_loc): + os.makedirs(cache_ref_loc) + + self.T_enc_path = osp.join(cache_ref_loc, 'temp_test.yuv') + self.R_enc_path = osp.join(cache_ref_loc, 'temp_ref.yuv') + self.output_file = osp.join(cache_ref_loc, 'vmaf_output.json') + + self.ffmpeg_bin = ffmpeg_bin + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate", record_time=False): + + # T_vid and R_vid are the tensors of the size (1,1,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + h, w, N_frames = vid_source.get_video_size() + + if osp.isfile(self.T_enc_path): os.remove(self.T_enc_path) + if osp.isfile(self.R_enc_path): os.remove(self.R_enc_path) + if osp.isfile(self.output_file): os.remove(self.output_file) + + self.T_enc_file = open(self.T_enc_path,'w') + self.R_enc_file = open(self.R_enc_path,'w') + + for ff in trange(N_frames, leave=False): + T = vid_source.get_test_frame(ff, device=self.device, colorspace='display_encoded_01').squeeze().permute(1,2,0).cpu().numpy() + R = vid_source.get_reference_frame(ff, device=self.device, colorspace='display_encoded_01').squeeze().permute(1,2,0).cpu().numpy() + + # Save the output as yuv file + self.write_yuv_frame(T, bit_depth=10, type='T') + self.write_yuv_frame(R, bit_depth=10, type='R') + + self.T_enc_file.close() + self.R_enc_file.close() + + pix_fmt = 'yuv444p10le' + ffmpeg_cmd = f'{self.ffmpeg_bin} -hide_banner -loglevel error ' \ + f'-s {w}x{h} -pix_fmt {pix_fmt} -i {self.T_enc_path} ' \ + f'-s {w}x{h} -pix_fmt {pix_fmt} -i {self.R_enc_path} ' \ + f'-lavfi libvmaf=\"log_fmt=json:log_path={self.output_file}:n_threads=4\" -f null -' + + if record_time: + start = time() + os.system(ffmpeg_cmd) + time_taken = time() - start + else: + os.system(ffmpeg_cmd) + with open(self.output_file) as f: + results = json.load(f) + quality = results['pooled_metrics']['vmaf']['mean'] + + os.remove(self.T_enc_path) + os.remove(self.R_enc_path) + os.remove(self.output_file) + return time_taken if record_time else (torch.tensor(quality), None) + + def short_name(self): + return 'PU21-VMAF' + + # This function takes into input an encoded RGB709 frame and saves it as a yuv file (it operates only on numpy arrays) + def write_yuv_frame( self, RGB ,bit_depth=10,type = 'T'): + _rgb2ycbcr_rec709 = np.array([[0.2126 , 0.7152 , 0.0722],\ + [-0.114572 , -0.385428 , 0.5],\ + [0.5 , -0.454153 , -0.045847]], dtype=np.float32) + + YUV = (np.reshape( RGB, (-1, 3), order='F' ) @ _rgb2ycbcr_rec709.transpose()).reshape( (RGB.shape), order='F' ) + + YUV_fixed = self.float2fixed( YUV, bit_depth ) + + Y = YUV_fixed[:,:,0] + u = YUV_fixed[:,:,1] + v = YUV_fixed[:,:,2] + + if type == 'T': + Y.tofile(self.T_enc_file) + u.tofile(self.T_enc_file) + v.tofile(self.T_enc_file) + elif type == 'R': + Y.tofile(self.R_enc_file) + u.tofile(self.R_enc_file) + v.tofile(self.R_enc_file) + + # For now this code operates only on array vectors (Because there is no available torch.uint16) + def float2fixed(self,YCbCr,nbit): + + offset = (2**(nbit-8))*16 + weight = (2**(nbit-8))*219 + max_lum = (2**nbit)-1 + + if nbit<=8: + dtype = np.uint8 + else: + dtype = np.uint16 + + Y = np.round(weight*YCbCr[:,:,0]+offset).clip(0,max_lum).astype(dtype) + + offset = (2**(nbit-8)) * 128 + weight = (2**(nbit-8)) * 224 + + U = np.round(weight*YCbCr[:,:,1]+offset).clip(0,max_lum).astype(dtype) + V = np.round(weight*YCbCr[:,:,2]+offset).clip(0,max_lum).astype(dtype) + + return np.concatenate( (Y[:,:,np.newaxis], U[:,:,np.newaxis], V[:,:,np.newaxis]), axis=2 ) \ No newline at end of file diff --git a/pycvvdp/pupsnr.py b/pycvvdp/pupsnr.py index 7155b37..923ba1a 100644 --- a/pycvvdp/pupsnr.py +++ b/pycvvdp/pupsnr.py @@ -97,11 +97,11 @@ def predict_video_source(self, vid_source, frame_padding="replicate"): psnr = 20*torch.log10( self.max_I/torch.sqrt(mse/N_frames) ) return psnr, None - + def psnr_fn(self, img1, img2): mse = torch.mean( (img1 - img2)**2 ) return 20*torch.log10( self.pu.peak/torch.sqrt(mse) ) - + def short_name(self): return "PU21-PSNR-Y" @@ -116,4 +116,52 @@ def __init__(self, display_name="standard_4k", display_photometry=None, color_sp def short_name(self): return "PU21-PSNR-RGB2020" + +class pu_psnr_mse_y(pu_psnr_y): + def __init__(self, device=None): + super().__init__(device) + + ''' + The same as `predict` but takes as input fvvdp_video_source_* object instead of Numpy/Pytorch arrays. + ''' + def predict_video_source(self, vid_source, frame_padding="replicate"): + + # T_vid and R_vid are the tensors of the size (1,1,N,H,W) + # where: + # N - the number of frames + # H - height in pixels + # W - width in pixels + # Both images must contain linear absolute luminance values in cd/m^2 + # + # We assume the pytorch default NCDHW layout + + _, _, N_frames = vid_source.get_video_size() + mse = 0.0 + for ff in range(N_frames): + T = vid_source.get_test_frame(ff, device=self.device, colorspace=self.colorspace) + R = vid_source.get_reference_frame(ff, device=self.device, colorspace=self.colorspace) + + # Apply PU + T_enc = self.pu.encode(T) + R_enc = self.pu.encode(R) + + mse = mse + self.mse_fn(T_enc, R_enc) / N_frames + + psnr = 20*torch.log10( self.pu.peak/torch.sqrt(mse) ) + + return psnr, None + + def mse_fn(self, img1, img2): + return torch.mean( (img1 - img2)**2 ) + + def short_name(self): + return "PU21-PSNR-MSE-Y" + +class pu_psnr_mse_rgb2020(pu_psnr_mse_y): + def __init__(self, device=None): + super().__init__(device) + self.colorspace = 'RGB2020' + + def short_name(self): + return "PU21-PSNR-MSE-RGB2020" diff --git a/pycvvdp/utils.py b/pycvvdp/utils.py index f2f4840..30fd66a 100644 --- a/pycvvdp/utils.py +++ b/pycvvdp/utils.py @@ -3,7 +3,10 @@ import numpy as np import json import torch.nn.functional as Func -#from PIL import Image +import torchvision.transforms.functional as gaussFilter + +from PIL import Image +from scipy.signal import convolve2d from pycvvdp.third_party.loadmat import loadmat @@ -96,6 +99,44 @@ def fovdots_load_condition(content_id, condition_id, device, data_res="full"): ncdhw_tensor = torch.unsqueeze(torch.unsqueeze(dhw_tensor, 0), 0) return ncdhw_tensor +class YUV_encoding: + def write_yuv_frame_444( self, RGB ,bit_depth=10): + _rgb2ycbcr_rec709 = np.array([[0.2126 , 0.7152 , 0.0722],\ + [-0.114572 , -0.385428 , 0.5],\ + [0.5 , -0.454153 , -0.045847]], dtype=np.float32) + + YUV = (np.reshape( RGB, (-1, 3), order='F' ) @ _rgb2ycbcr_rec709.transpose()).reshape( (RGB.shape), order='F' ) + + YUV_fixed = self.float2fixed( YUV, bit_depth ) + + Y = YUV_fixed[:,:,0] + u = YUV_fixed[:,:,1] + v = YUV_fixed[:,:,2] + + return Y, u, v + + +# For now this code operates only on array vectors (Because there is no available torch.uint16) + def float2fixed(self,YCbCr,nbit): + + offset = (2**(nbit-8))*16 + weight = (2**(nbit-8))*219 + max_lum = (2**nbit)-1 + + if nbit<=8: + dtype = np.uint8 + else: + dtype = np.uint16 + + Y = np.round(weight*YCbCr[:,:,0]+offset).clip(0,max_lum).astype(dtype) + + offset = (2**(nbit-8)) * 128 + weight = (2**(nbit-8)) * 224 + + U = np.round(weight*YCbCr[:,:,1]+offset).clip(0,max_lum).astype(dtype) + V = np.round(weight*YCbCr[:,:,2]+offset).clip(0,max_lum).astype(dtype) + + return np.concatenate( (Y[:,:,np.newaxis], U[:,:,np.newaxis], V[:,:,np.newaxis]), axis=2 ) class ImGaussFilt(): def __init__(self, sigma, device): @@ -210,3 +251,539 @@ def decode(self, V): V_p = ((V/self.p[6] + self.p[5]).clip(min=0))**(1/self.p[4]) Y = ((V_p - self.p[0]).clip(min=0)/(self.p[1] - self.p[2]*V_p))**(1/self.p[3]) return Y + +class SCIELAB_filter: + # apply S-CIELAB filtering + + XYZ_to_opp_mat = ( + (0.2787,0.7218,-0.1066), + (-0.4488,0.2898,0.0772), + (0.0860,-0.5900,0.5011) ) + + def __init__(self, device): + # Since the matrices are needed for multiple calls, move it to device only once + self.XYZ_to_opp_mat = torch.as_tensor(self.XYZ_to_opp_mat, device=device) + self.opp_to_XYZ_mat = torch.inverse(self.XYZ_to_opp_mat) + self.device = device + + def xyz_to_opp(self, img): + OPP = torch.empty_like(img) + for cc in range(3): + OPP[...,cc,:,:,:] = torch.sum(img*(self.XYZ_to_opp_mat[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return OPP + + def opp_to_xyz(self, img): + XYZ = torch.empty_like(img) + for cc in range(3): + XYZ[...,cc,:,:,:] = torch.sum(img*(self.opp_to_XYZ_mat[cc,:].view(1,3,1,1,1)), dim=-4, keepdim=True) + return XYZ + + def gauss(self, halfWidth, width): + # Returns a 1D Gaussian vector. The gaussian sums to one. + # The halfWidth must be greater than one. + # The halfwidth specifies the width of the gaussian between the points + # where it obtains half of its maximum value. The width indicates the gaussians width in pixels. + + alpha = 2 * np.sqrt(np.log(2)) / (halfWidth - 1) + x = np.linspace(1, width, width) + x = x - np.round(width / 2) + + t = x ** 2 + g = np.exp(-alpha ** 2 * t) + g = g / np.sum(g) + return g + + def gauss_torch(self, halfWidth, width): + # Returns a 1D Gaussian vector. The gaussian sums to one. + # The halfWidth must be greater than one. + # The halfwidth specifies the width of the gaussian between the points + # where it obtains half of its maximum value. The width indicates the gaussians width in pixels. + + alpha = 2 * np.sqrt(np.log(2)) / (halfWidth - 1) + x = torch.linspace(1, width, width) + x = x - torch.round(width / 2) + + t = x ** 2 + g = torch.exp(-alpha ** 2 * t) + g = g / torch.sum(g) + return g + + def resize(self, orig, newSize, align=[0, 0], padding=0): + # result = resize(orig, newSize, align, padding) + # + # if newSize is larger than orig size, pad with padding + # if newSize is smaller than orig size, truncate to fit. + # align specifies alignment. 0=centered + # -1=left (up) aligned + # 1=right (down) aligned + # For example, align=[0 -1] centers on rows (y) and left align on columns (x). + # align=1 aligns left on columns and top on rows. + + if len(newSize) == 1: + newSize = [newSize, newSize] + if len(align) == 1: + align = [align, align] + + if len(orig.shape) == 1: # 1D array + orig = orig.reshape(-1, 1) # make it to (mx1) array + + [m1, n1] = orig.shape + m2 = newSize[0] + n2 = newSize[1] + m = np.minimum(m1, m2) + n = np.minimum(n1, n2) + + result = np.ones((m2, n2)) * padding + + start1 = np.array([np.floor((m1 - m) / 2 * (1 + align[0])), np.floor((n1 - n) / 2 * (1 + align[1]))]) + 1 + start2 = np.array([np.floor((m2 - m) / 2 * (1 + align[0])), np.floor((n2 - n) / 2 * (1 + align[1]))]) + 1 + + t1 = np.int_(np.linspace(start2[0], start2[0] + m - 1, m) - 1) + t2 = np.int_(np.linspace(start2[1], start2[1] + n - 1, n) - 1) + t3 = np.int_(np.linspace(start1[0], start1[0] + m - 1, m) - 1) + t4 = np.int_(np.linspace(start1[1], start1[1] + n - 1, n) - 1) + + result[t1[:,np.newaxis],t2] = orig[t3[:,np.newaxis],t4] + # trickly to broadcast 2D matrix + # https://towardsdatascience.com/numpy-indexing-explained-c376abb2440d + + # result[np.int_(np.linspace(start2[0], start2[0] + m - 1, m) - 1), np.int_( + # np.linspace(start2[1], start2[1] + n - 1, n) - 1)] = \ + # orig[np.int_(np.linspace(start1[0], start1[0] + m - 1, m) - 1), np.int_( + # np.linspace(start1[1], start1[1] + n - 1, n) - 1)] + + return result + + def resize_torch(self, orig, newSize, align=[0, 0], padding=0): + # result = resize(orig, newSize, align, padding) + # + # if newSize is larger than orig size, pad with padding + # if newSize is smaller than orig size, truncate to fit. + # align specifies alignment. 0=centered + # -1=left (up) aligned + # 1=right (down) aligned + # For example, align=[0 -1] centers on rows (y) and left align on columns (x). + # align=1 aligns left on columns and top on rows. + + if len(newSize) == 1: + newSize = [newSize, newSize] + if len(align) == 1: + align = [align, align] + + if len(orig.shape) == 1: # 1D array + orig = torch.reshape(orig,(-1, 1)) # make it to (mx1) array + + [m1, n1] = orig.shape + m2 = newSize[0] + n2 = newSize[1] + m = np.minimum(m1, m2) + n = np.minimum(n1, n2) + + result = torch.ones((m2, n2), device=orig.device) * padding + + start1 = np.array([np.floor((m1 - m) / 2 * (1 + align[0])), np.floor((n1 - n) / 2 * (1 + align[1]))]) + 1 + start2 = np.array([np.floor((m2 - m) / 2 * (1 + align[0])), np.floor((n2 - n) / 2 * (1 + align[1]))]) + 1 + + t1 = np.int_(np.linspace(start2[0], start2[0] + m - 1, m) - 1) + t2 = np.int_(np.linspace(start2[1], start2[1] + n - 1, n) - 1) + t3 = np.int_(np.linspace(start1[0], start1[0] + m - 1, m) - 1) + t4 = np.int_(np.linspace(start1[1], start1[1] + n - 1, n) - 1) + + result[t1[:,np.newaxis],t2] = orig[t3[:,np.newaxis],t4] + # trickly to broadcast 2D matrix + # https://towardsdatascience.com/numpy-indexing-explained-c376abb2440d + + # result[np.int_(np.linspace(start2[0], start2[0] + m - 1, m) - 1), np.int_( + # np.linspace(start2[1], start2[1] + n - 1, n) - 1)] = \ + # orig[np.int_(np.linspace(start1[0], start1[0] + m - 1, m) - 1), np.int_( + # np.linspace(start1[1], start1[1] + n - 1, n) - 1)] + + return result + + def conv2(self, x, y, mode='full'): + # While Matlab's conv2 results in artifacts on the bottom and right of an image, + # scipy.signal.convolve2d has the same artifacts on the top and left of an image. + return np.rot90(convolve2d(np.rot90(x, 2), np.rot90(y, 2), mode=mode), 2) + + def conv2_torch(self, x, y, mode='full'): + # While Matlab's conv2 results in artifacts on the bottom and right of an image, + # scipy.signal.convolve2d has the same artifacts on the top and left of an image. + # Ignore these differences, they are much smaller than inaccuracy between torch and scipy + + # torch.nn.functional.conv2d actually computes cross-correlation + # flip the kernel to get convolution + # Order of errors (w.r.t scipy) is ~5e-7 when the input values are in range(0, 170). + # TODO: Investigate further if precision is important + x = x.view(1,1,*x.shape) + y = y.flip(dims=(0,1)).view(1,1,*y.shape) + return torch.nn.functional.conv2d(x, y, padding=[dim-1 for dim in y.shape[-2:]]).squeeze() + + def separableFilters(self, sampPerDeg): + # not full implementation but correct for S-CIELAB usage. + # Please refer to original Matlab version + + # if sampPerDeg is smaller than minSAMPPERDEG, need to upsample image data before filtering. + # This can be done equivalently by convolving filters with the upsampling matrix, then downsample it. + minSAMPPERDEG = 224 + dimension=3 + + if sampPerDeg < minSAMPPERDEG: + uprate = int(np.ceil(minSAMPPERDEG / sampPerDeg)) + sampPerDeg = sampPerDeg * uprate + else: + uprate = 1 + + # these are the same filter parameters, except that the weights are normalized to sum to 1 + # This eliminates the need to normalize after the filters are generated + x1 = np.array([0.05, 1.00327, 0.225, 0.114416, 7.0, -0.117686]) + x2 = np.array([0.0685, 0.616725, 0.826, 0.383275]) + x3 = np.array([0.0920, 0.567885, 0.6451, 0.432115]) + + # Convert the unit of halfwidths from visual angle to pixels. + x1[[0, 2, 4]] = x1[[0, 2, 4]] * sampPerDeg + x2[[0, 2]] = x2[[0, 2]] * sampPerDeg + x3[[0, 2]] = x3[[0, 2]] * sampPerDeg + + # Limit filter width to 1-degree visual angle, and odd number of sampling points + # (so that the gaussians generated from Rick's gauss routine are symmetric). + width = int(np.ceil(sampPerDeg / 2) * 2 - 1) + + # Generate the filters + # These Gaussians are used in the row and col separable convolutions. + k1 = np.array([self.gauss(x1[0], width) * np.sqrt(np.abs(x1[1])) * np.sign(x1[1]), + self.gauss(x1[2], width) * np.sqrt(np.abs(x1[3])) * np.sign(x1[3]), + self.gauss(x1[4], width) * np.sqrt(np.abs(x1[5])) * np.sign(x1[5])]) + + # These are the two 1-d kernels used by red/green + k2 = np.array([self.gauss(x2[0], width) * np.sqrt(np.abs(x2[1])) * np.sign(x2[1]), + self.gauss(x2[2], width) * np.sqrt(np.abs(x2[3])) * np.sign(x2[3])]) + + # These are the two 1-d kernels used by blue/yellow + k3 = np.array([self.gauss(x3[0], width) * np.sqrt(np.abs(x3[1])) * np.sign(x3[1]), + self.gauss(x3[2], width) * np.sqrt(np.abs(x3[3])) * np.sign(x3[3])]) + + # upsample and downsample + if uprate > 1: + upcol = np.concatenate((np.linspace(1, uprate, uprate), np.linspace(uprate - 1, 1, uprate - 1))) / uprate + s = len(upcol) + upcol = upcol.reshape(1, -1) # 1xm matrix + upcol = self.resize(upcol, [1, s + width - 1]) + # upcol = resize(upcol, [1 s + width - 1]); + + up1 = self.conv2(k1, upcol, 'same') + up2 = self.conv2(k2, upcol, 'same') + up3 = self.conv2(k3, upcol, 'same') + + mid = np.ceil(up1.shape[1] / 2) + downs = np.int_(np.concatenate( + (np.flip(np.arange(mid, 0, -uprate)), np.arange(mid + uprate, up1.shape[1] + 1, uprate)) + ) - 1) + + k1 = up1[:, downs] + k2 = up2[:, downs] + k3 = up3[:, downs] + + return [k1, k2, k3] + + def separableFilters_torch(self, sampPerDeg): + # not full implementation but correct for S-CIELAB usage. + # Please refer to original Matlab version + + # if sampPerDeg is smaller than minSAMPPERDEG, need to upsample image data before filtering. + # This can be done equivalently by convolving filters with the upsampling matrix, then downsample it. + minSAMPPERDEG = 224 + dimension=3 + + if sampPerDeg < minSAMPPERDEG: + uprate = int(np.ceil(minSAMPPERDEG / sampPerDeg)) + sampPerDeg = sampPerDeg * uprate + else: + uprate = 1 + + # these are the same filter parameters, except that the weights are normalized to sum to 1 + # This eliminates the need to normalize after the filters are generated + x1 = torch.as_tensor([0.05, 1.00327, 0.225, 0.114416, 7.0, -0.117686]) + x2 = torch.as_tensor([0.0685, 0.616725, 0.826, 0.383275]) + x3 = torch.as_tensor([0.0920, 0.567885, 0.6451, 0.432115]) + + # Convert the unit of halfwidths from visual angle to pixels. + x1[[0, 2, 4]] = x1[[0, 2, 4]] * sampPerDeg + x2[[0, 2]] = x2[[0, 2]] * sampPerDeg + x3[[0, 2]] = x3[[0, 2]] * sampPerDeg + + # Limit filter width to 1-degree visual angle, and odd number of sampling points + # (so that the gaussians generated from Rick's gauss routine are symmetric). + width = torch.as_tensor(int(np.ceil(sampPerDeg / 2) * 2 - 1)) + + # Generate the filters + # These Gaussians are used in the row and col separable convolutions. + + k1 = torch.stack([self.gauss_torch(x1[0], width) * torch.sqrt(torch.abs(x1[1])) * torch.sign(x1[1]), + self.gauss_torch(x1[2], width) * torch.sqrt(torch.abs(x1[3])) * torch.sign(x1[3]), + self.gauss_torch(x1[4], width) * torch.sqrt(torch.abs(x1[5])) * torch.sign(x1[5])]) + + # These are the two 1-d kernels used by red/green + k2 = torch.stack([self.gauss_torch(x2[0], width) * torch.sqrt(np.abs(x2[1])) * torch.sign(x2[1]), + self.gauss_torch(x2[2], width) * torch.sqrt(np.abs(x2[3])) * torch.sign(x2[3])]) + + # These are the two 1-d kernels used by blue/yellow + k3 = torch.stack([self.gauss_torch(x3[0], width) * torch.sqrt(torch.abs(x3[1])) * torch.sign(x3[1]), + self.gauss_torch(x3[2], width) * torch.sqrt(torch.abs(x3[3])) * torch.sign(x3[3])]) + + # upsample and downsample + if uprate > 1: + upcol = torch.concatenate((torch.linspace(1, uprate, uprate), torch.linspace(uprate - 1, 1, uprate - 1))) / uprate + s = len(upcol) + upcol = torch.reshape(upcol, (1, -1)) # 1xm matrix + upcol = torch.as_tensor(self.resize(upcol, [1, s + width - 1])) + # upcol = resize(upcol, [1 s + width - 1]); + + up1 = self.conv2_torch(k1, upcol, 'same') + up2 = self.conv2_torch(k2, upcol, 'same') + up3 = self.conv2_torch(k3, upcol, 'same') + + mid = np.ceil(up1.shape[1] / 2) + downs = np.int_(np.concatenate( + (np.flip(np.arange(mid, 0, -uprate)), np.arange(mid + uprate, up1.shape[1] + 1, uprate)) + ) - 1) + + k1 = up1[:, downs] + k2 = up2[:, downs] + k3 = up3[:, downs] + + return [k1, k2, k3] + + def generateSCIELABfiltersParams(self, sampPerDeg): + # not full implementation but correct for S-CIELAB usage. + # Please refer to original Matlab version + + # if sampPerDeg is smaller than minSAMPPERDEG, need to upsample image data before filtering. + # This can be done equivalently by convolving filters with the upsampling matrix, then downsample it. + dimension=3 + + # these are the same filter parameters, except that the weights are normalized to sum to 1 + # This eliminates the need to normalize after the filters are generated + x1 = np.array([0.05, 1.00327, 0.225, 0.114416, 7.0, -0.117686]) + x2 = np.array([0.0685, 0.616725, 0.826, 0.383275]) + x3 = np.array([0.0920, 0.567885, 0.6451, 0.432115]) + + # Convert the unit of halfwidths from visual angle to pixels. + x1[[0, 2, 4]] = x1[[0, 2, 4]] * sampPerDeg + x2[[0, 2]] = x2[[0, 2]] * sampPerDeg + x3[[0, 2]] = x3[[0, 2]] * sampPerDeg + + return [x1, x2, x3] + + def applyGaussFilter(self, im, width, kernels): + # Apply pytorch Gaussian blur and sum for each channel + + result = torch.zeros_like(im) + for j in range(int((kernels.shape[0])/2)): + p = kernels[j*2 + 1]*gaussFilter.gaussian_blur(im, width, kernels[j*2]) + + # result is sum of several separable convolutions + result = result + p + + return result + + def separableConv(self, im, xkernels, ykernels): + # Two-dimensional convolution with X-Y separable kernels. + # + # im is the input matric. im is reflected on all sides before convolution. + # xkernels and ykernels are both row vectors. + # If xkernels and ykernels are matrices, each row is taken as + # one convolution kernel and convolved with the image, and the + # sum of the results is returned. + + w1 = self.pad4conv(im, xkernels.shape[1], 2) + + result = np.zeros_like(im) + for j in range(xkernels.shape[0]): + # first convovle in the horizontal direction + p = self.conv2(w1, xkernels[j,:].reshape(1,-1)) + p = self.resize(p, im.shape) + + # then the vertical direction + w2 = self.pad4conv(p, ykernels.shape[1], 1) + p = self.conv2(w2, ykernels[j,:].reshape(-1,1)) + p = self.resize(p, im.shape) + + # result is sum of several separable convolutions + result = result + p + return result + + def separableConv_torch(self, im, xkernels, ykernels): + # Two-dimensional convolution with X-Y separable kernels. + # + # im is the input matric. im is reflected on all sides before convolution. + # xkernels and ykernels are both row vectors. + # If xkernels and ykernels are matrices, each row is taken as + # one convolution kernel and convolved with the image, and the + # sum of the results is returned. + + w1 = self.pad4conv_torch(im, xkernels.shape[1], 2) + + result = torch.zeros_like(im) + for j in range(xkernels.shape[0]): + # first convovle in the horizontal direction + p = self.conv2_torch(w1, xkernels[j,:].reshape(1,-1)) + p = self.resize_torch(p, im.shape) + + # then the vertical direction + w2 = self.pad4conv_torch(p, ykernels.shape[1], 1) + p = self.conv2_torch(w2, ykernels[j,:].reshape(-1,1)) + p = self.resize_torch(p, im.shape) + + # result is sum of several separable convolutions + result = result + p + return result + + def pad4conv(self, im, kernelsize, dim): + # Pad the input image ready for convolution. The edges of the image are reflected on all sides. + # kernelsize -- size of the convolution kernel in the format + # [numRows numCol]. If one number is given, assume numRows=numCols. + # dim -- when set at 1, pad extra rows, but leave number of columns unchanged; + # when set at 2, pad extra columns, leave number of rows unchanged; + + newim = np.copy(im) + [m, n] = np.int_(im.shape) + if not isinstance(kernelsize, list): + kernelsize = [kernelsize, kernelsize] + + # If kernel is larger than image, than just pad all side with half + # the image size, otherwise pad with half the kernel size + if kernelsize[0] >= m: + h = int(np.floor(m / 2)) + else: + h = int(np.floor(kernelsize[0] / 2)) + + if kernelsize[1] >= n: + w = int(np.floor(n / 2)) + else: + w = int(np.floor(kernelsize[1] / 2)) + + # first reflect the upper and lower edges + if h != 0 and dim != 2: + im1 = np.flipud(newim[0:h, :]) + im2 = np.flipud(newim[m - h:m, :]) + newim = np.concatenate((im1, newim, im2), axis=0) + + # then reflect the left and right sides + if w != 0 and dim != 1: + im1 = np.fliplr(newim[:, 0:w]) + im2 = np.fliplr(newim[:, n - w:n]) + newim = np.concatenate((im1, newim, im2), axis=1) + + return newim + + def pad4conv_torch(self, im, kernelsize, dim): + # Pad the input image ready for convolution. The edges of the image are reflected on all sides. + # kernelsize -- size of the convolution kernel in the format + # [numRows numCol]. If one number is given, assume numRows=numCols. + # dim -- when set at 1, pad extra rows, but leave number of columns unchanged; + # when set at 2, pad extra columns, leave number of rows unchanged; + + newim = torch.clone(im) + [m, n] = np.int_(im.shape) + #print('is list or not') + #print(isinstance(kernelsize, list)) + #print('kernel type before') + #print(type(kernelsize)) + if not isinstance(kernelsize, list): + kernelsize = [kernelsize, kernelsize] + #print('kernel type after') + #print(type(kernelsize)) + + # If kernel is larger than image, than just pad all side with half + # the image size, otherwise pad with half the kernel size + if kernelsize[0] >= m: + h = int(np.floor(m / 2)) + else: + h = int(np.floor(kernelsize[0] / 2)) + + if kernelsize[1] >= n: + w = int(np.floor(n / 2)) + else: + w = int(np.floor(kernelsize[1] / 2)) + + # first reflect the upper and lower edges + if h != 0 and dim != 2: + im1 = torch.flipud(newim[0:h, :]) + im2 = torch.flipud(newim[m - h:m, :]) + newim = torch.cat((im1, newim, im2), axis=0) + + # then reflect the left and right sides + if w != 0 and dim != 1: + im1 = torch.fliplr(newim[:, 0:w]) + im2 = torch.fliplr(newim[:, n - w:n]) + newim = torch.cat((im1, newim, im2), axis=1) + + return newim + +def deltaE00(Lab1, Lab2, paramFctr = [1,1,1]): + + kL = paramFctr[0]; kC = paramFctr[1]; kH = paramFctr[2] + + #a1 = np.power(Lab1[1,:],2) + #b1 = np.power(Lab1[2,:],2) + #c1 = a1 + b1 + + # CIELAB Chroma + C1 = torch.sqrt( torch.pow(Lab1[1,:],2) + torch.pow(Lab1[2,:],2) ) + C2 = torch.sqrt( torch.pow(Lab2[1,:],2) + torch.pow(Lab2[2,:],2) ) + + # Lab Prime + mC = torch.add(C1,C2)/2 + G = 0.5*( 1 - torch.sqrt( torch.divide( torch.pow(mC,7) , torch.pow(mC,7)+25**7 ) )) + LabP1 = torch.vstack( (Lab1[0,:], Lab1[1,:]*(1+G), Lab1[2,:]) ) + LabP2 = torch.vstack( (Lab2[0,:], Lab2[1,:]*(1+G), Lab2[2,:]) ) + + # Chroma + CP1 = torch.sqrt( torch.pow(LabP1[1,:],2) + torch.pow(LabP1[2,:],2) ) + CP2 = torch.sqrt( torch.pow(LabP2[1,:],2) + torch.pow(LabP2[2,:],2) ) + + # Hue Angle + hP1 = torch.arctan2( LabP1[2,:],LabP1[1,:] ) * 180/torch.pi # varies from -180 to +180 degree + hP2 = torch.arctan2( LabP2[2,:],LabP2[1,:] ) * 180/torch.pi # varies from -180 to +180 degree + hP1[hP1<0] = hP1[hP1<0] + 360 # varies from 0 to +360 degree + hP2[hP2<0] = hP2[hP2<0] + 360 # varies from 0 to +360 degree + + + # Delta Values + DLP = LabP1[0,:] - LabP2[0,:] + DCP = CP1 - CP2 + DhP = hP1 - hP2; DhP[DhP>180] = DhP[DhP>180]-360; DhP[DhP<-180] = DhP[DhP<-180]+360 + DHP = torch.multiply( 2*torch.sqrt(torch.multiply(CP1,CP2)), torch.sin( DhP/2.*torch.pi/180. ) ) + + # Arithmetic mean of LCh' values + mLP = ( LabP1[0,:]+LabP2[0,:] )/2 + mCP = (CP1+CP2)/2 + mhP = torch.zeros_like(mCP) + # for k in range(0,mhP.numel()): + # if abs(hP1[k]-hP2[k])<=180: + # mhP[k] = (hP1[k]+hP2[k])/2 + # elif abs(hP1[k]-hP2[k])>180 and hP1[k]+hP2[k]<360: + # mhP[k] = (hP1[k]+hP2[k]+360)/2 + # elif abs(hP1[k]-hP2[k])>180 and hP1[k]+hP2[k]>=360: + # mhP[k] = (hP1[k]+hP2[k]-360)/2 + mask1 = torch.abs(hP1-hP2) <= 180 + mhP[mask1] = (hP1+hP2)[mask1]/2 + mask2 = (hP1+hP2) < 360 + mhP[torch.logical_and(torch.logical_not(mask1), mask2)] = ((hP1+hP2)[torch.logical_and(torch.logical_not(mask1), mask2)]+360)/2 + mhP[torch.logical_and(torch.logical_not(mask1), torch.logical_not(mask2))] = ((hP1+hP2)[torch.logical_and(torch.logical_not(mask1), torch.logical_not(mask2))]-360)/2 + + # Weighting Functions + SL = 1 + torch.divide( 0.015*torch.pow(mLP-50,2), torch.sqrt( 20+torch.pow(mLP-50,2) ) ) + SC = 1+0.045*mCP + T = 1-0.17*torch.cos((mhP-30)*torch.pi/180.)+0.24*torch.cos((2*mhP)*torch.pi/180.)+0.32*torch.cos((3*mhP+6)*torch.pi/180.)-0.2*torch.cos((4*mhP-63)*torch.pi/180.) + SH = 1+0.015*torch.multiply(mCP,T) + + # Rotation function + RC = 2 * torch.sqrt(torch.divide( torch.pow(mCP,7), torch.pow(mCP,7)+25**7 )) + # DTheta = 30.*exp(-((mhP-275)./25).^2) + DTheta = 30 * torch.exp(-torch.pow( (mhP-275)/25,2 )) + RT = torch.multiply( -torch.sin(2*DTheta*torch.pi/180.), RC ) + + dE00 = torch.sqrt( torch.pow( torch.divide(DLP,kL*SL) ,2) + torch.pow( torch.divide(DCP,kC*SC) ,2) + torch.pow( torch.divide(DHP,kH*SH) ,2) + + torch.multiply(RT, torch.multiply( torch.divide(DCP,kC*SC), torch.divide(DHP,kH*SH) ) )) + return dE00 diff --git a/pycvvdp/video_source.py b/pycvvdp/video_source.py index a1a34fc..16225ba 100644 --- a/pycvvdp/video_source.py +++ b/pycvvdp/video_source.py @@ -373,4 +373,4 @@ def _get_frame(self, from_array, idx, device): # Convert to grayscale L = L[:,0:1,:,:,:]*self.color_to_luminance[0] + L[:,1:2,:,:,:]*self.color_to_luminance[1] + L[:,2:3,:,:,:]*self.color_to_luminance[2] - return L + return L \ No newline at end of file diff --git a/pycvvdp/video_source_file.py b/pycvvdp/video_source_file.py index c08ca1c..7db14a8 100644 --- a/pycvvdp/video_source_file.py +++ b/pycvvdp/video_source_file.py @@ -7,6 +7,7 @@ import torch import ffmpeg import re +import functools import logging from video_source import * @@ -59,7 +60,7 @@ def load_image_as_array(imgfile): class video_reader: - def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_width=-1, verbose=False): + def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_width=-1, verbose=False, fps=-1): try: probe = ffmpeg.probe(vidfile) except: @@ -68,6 +69,7 @@ def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_ # select the first video stream video_stream = next((stream for stream in probe['streams'] if stream['codec_type'] == 'video'), None) + self.file_name = vidfile self.width = int(video_stream['width']) self.src_width = self.width self.height = int(video_stream['height']) @@ -75,9 +77,18 @@ def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_ self.color_space = video_stream['color_space'] if ('color_space' in video_stream) else 'unknown' self.color_transfer = video_stream['color_transfer'] if ('color_transfer' in video_stream) else 'unknown' self.in_pix_fmt = video_stream['pix_fmt'] - num_frames = int(video_stream['nb_frames']) - avg_fps_num, avg_fps_denom = [float(x) for x in video_stream['r_frame_rate'].split("/")] - self.avg_fps = avg_fps_num/avg_fps_denom + + avg_fps_num, avg_fps_denom = map(float, video_stream['r_frame_rate'].split('/')) + self.avg_fps = avg_fps_num/avg_fps_denom if fps == -1 else fps + try: + num_frames = int(video_stream['nb_frames']) + except KeyError: + # Metadata may not contain total number of frames + duration_text = video_stream['tags']['DURATION'] + hrs, mins, secs = map(float, duration_text.split(':')) + duration = (hrs * 60 + mins) * 60 + secs + + num_frames = int(np.floor(duration * self.avg_fps)) if frames==-1: self.frames = num_frames @@ -101,6 +112,7 @@ def _setup_ffmpeg(self, vidfile, resize_fn, resize_height, resize_width, verbose self.dtype = np.uint8 stream = ffmpeg.input(vidfile) + stream = ffmpeg.filter(stream, 'fps', fps=self.avg_fps) if (resize_fn is not None) and (resize_width!=self.width or resize_height!=self.height): resize_mode = resize_fn if resize_fn != 'nearest' else 'neighbor' stream = ffmpeg.filter(stream, 'scale', resize_width, resize_height, flags=resize_mode) @@ -168,8 +180,8 @@ def __exit__(self, type, value, tb): Decode frames to Yuv, perform upsampling and colour conversion with pytorch (on the GPU) ''' class video_reader_yuv_pytorch(video_reader): - def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_width=-1, verbose=False): - super().__init__(vidfile, frames, resize_fn, resize_height, resize_width, verbose) + def __init__(self, vidfile, frames=-1, resize_fn=None, resize_height=-1, resize_width=-1, verbose=False, fps=-1): + super().__init__(vidfile, frames, resize_fn, resize_height, resize_width, verbose, fps) y_channel_pixels = int(self.width*self.height) self.y_pixels = y_channel_pixels @@ -216,6 +228,7 @@ def _setup_ffmpeg(self, vidfile, resize_fn, resize_height, resize_width, verbose self.resize_width = resize_width stream = ffmpeg.input(vidfile) + stream = ffmpeg.filter(stream, 'fps', fps=self.avg_fps) log_level = 'info' if verbose else 'quiet' stream = ffmpeg.output(stream, 'pipe:', format='rawvideo', pix_fmt=out_pix_fmt).global_args( '-loglevel', log_level ) self.process = ffmpeg.run_async(stream, pipe_stdout=True) @@ -286,15 +299,17 @@ def _fixed2float_upscale(self, Y, u, v, device): ''' class video_source_video_file(video_source_dm): - def __init__( self, test_fname, reference_fname, display_photometry='sdr_4k_30', config_paths=[], frames=-1, full_screen_resize=None, resize_resolution=None, ffmpeg_cc=False, verbose=False ): + def __init__( self, test_fname, reference_fname, display_photometry='sdr_4k_30', config_paths=[], frames=-1, full_screen_resize=None, resize_resolution=None, ffmpeg_cc=False, verbose=False, match_fps=False ): fs_width = -1 if full_screen_resize is None else resize_resolution[0] fs_height = -1 if full_screen_resize is None else resize_resolution[1] - self.reader = video_reader if ffmpeg_cc else video_reader_yuv_pytorch - self.reference_vidr = self.reader(reference_fname, frames, resize_fn=full_screen_resize, resize_width=fs_width, resize_height=fs_height, verbose=verbose) - self.test_vidr = self.reader(test_fname, frames, resize_fn=full_screen_resize, resize_width=fs_width, resize_height=fs_height, verbose=verbose) + reader_class = video_reader if ffmpeg_cc else video_reader_yuv_pytorch + reader = functools.partial(reader_class, frames=frames, resize_fn=full_screen_resize, resize_width=fs_width, resize_height=fs_height, verbose=verbose) + self.reference_vidr = reader(vidfile=reference_fname) + fps = self.reference_vidr.avg_fps if match_fps else -1 + self.test_vidr = reader(vidfile=test_fname, fps=fps) - self.frames = self.test_vidr.frames if frames==-1 else frames + self.frames = self.reference_vidr.frames if frames==-1 else frames for vr in [self.test_vidr, self.reference_vidr]: if vr == self.test_vidr: @@ -341,7 +356,7 @@ def get_video_size(self): # Return the frame rate of the video def get_frames_per_second(self) -> int: return self.test_vidr.avg_fps - + # Get a pair of test and reference video frames as a single-precision luminance map # scaled in absolute inits of cd/m^2. 'frame' is the frame index, # starting from 0. diff --git a/pycvvdp/video_source_yuv.py b/pycvvdp/video_source_yuv.py index 837f22a..662f680 100644 --- a/pycvvdp/video_source_yuv.py +++ b/pycvvdp/video_source_yuv.py @@ -237,11 +237,14 @@ def __exit__(self, type, value, tb): class video_source_yuv_file(video_source_dm): - def __init__( self, test_fname, reference_fname, display_photometry='standard_4k', frames=-1, full_screen_resize=None, resize_resolution=None, retain_aspect_ratio=False, verbose=False ): + def __init__( self, test_fname, reference_fname, display_photometry='standard_4k', frames=-1, full_screen_resize=None, resize_resolution=None, retain_aspect_ratio=False, match_fps=False, verbose=False ): self.reference_vidr = YUVReader(reference_fname) self.test_vidr = YUVReader(test_fname) - self.total_frames = self.test_vidr.frame_count + self.match_fps = match_fps + if not match_fps: + assert self.test_vidr.frame_count == self.reference_vidr.frame_count + self.total_frames = self.reference_vidr.frame_count self.frames = self.total_frames if frames==-1 else min(self.total_frames, frames) self.offset = 0 # Offset for random access of a shorter subsequence @@ -293,6 +296,8 @@ def get_frames_per_second(self) -> int: # scaled in absolute inits of cd/m^2. 'frame' is the frame index, # starting from 0. def get_test_frame( self, frame, device, colorspace="Y" ) -> Tensor: + if self.match_fps: + frame = int(self.test_vidr.get_frame_count() / self.reference_vidr.get_frame_count() * frame) L = self._get_frame( self.test_vidr, frame, device, colorspace ) return L diff --git a/pycvvdp/vvdp_data/cvvdp_parameters.json b/pycvvdp/vvdp_data/cvvdp_parameters.json index 3769623..50e8e1d 100644 --- a/pycvvdp/vvdp_data/cvvdp_parameters.json +++ b/pycvvdp/vvdp_data/cvvdp_parameters.json @@ -1,6 +1,6 @@ { - "version": "0.2", - "__comment": "The base model prepared for SIGGRAPH Asia 2023 submission.", + "version": "0.3", + "__comment": "The base model before the final publication.", "calibration_date": "18/05/2023", "internal_model_name": "cvvdp_0_2", "mask_p": 2.316437244415283, diff --git a/requirements.txt b/requirements.txt index d9fd5a0..39c7d78 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,7 @@ numpy==1.23.1 -scipy==1.8.1 +scipy==1.10.0 ffmpeg-python==0.2.0 -torch==1.12.0 +torch==1.13.1 ffmpeg==1.4 imageio==2.19.5 matplotlib==3.5.3 diff --git a/setup.py b/setup.py index 45dc232..0eb97e0 100644 --- a/setup.py +++ b/setup.py @@ -19,10 +19,10 @@ package_data={'pycvvdp': ['csf_cache/*.mat', 'vvdp_data/*.json']}, include_package_data=True, install_requires=['numpy>=1.17.3', - 'scipy>=1.7.0', + 'scipy>=1.10.0', 'ffmpeg-python>=0.2.0', - 'torch>=1.8.2', - 'torchvision>=0.9.2' + 'torch>=1.13.1', + 'torchvision>=0.9.2', 'ffmpeg>=1.4', 'imageio>=2.19.5', ], diff --git a/test_on_local_file.py b/test_on_local_file.py new file mode 100644 index 0000000..232adef --- /dev/null +++ b/test_on_local_file.py @@ -0,0 +1,114 @@ +# This example shows how to use python interface to run FovVideoVDP directly on video files +import os +import glob +import time + +import pycvvdp +import logging + +display_name = 'standard_4k' + +# media_folder = 'S:\\Datasets\\XR-DAVID\\cache' +# ref_file = os.path.join(media_folder, 'Bonfire_reference_1920x1080_10b_444_709_30fps.yuv') +# TST_FILEs = glob.glob(os.path.join(media_folder, 'Bonfire_Blur_*.yuv')) + +media_folder = '../color_metric_videos/' +# ref_file = os.path.join(media_folder, 'Business_reference_Level001.mp4') +# TST_FILEs = glob.glob(os.path.join(media_folder, 'Business_WGNU_Level003.mp4')) + +ref_file = os.path.join(media_folder, 'Bonfire_reference_Level001.mp4') +TST_FILEs = glob.glob(os.path.join(media_folder, 'Bonfire_Blur_Level003.mp4')) + +# media_folder = 'S:\\Datasets\\LIVEHDR\\train' +# ref_file = os.path.join(media_folder, '4k_ref_CenterPanorama.mp4') +# TST_FILEs = glob.glob(os.path.join(media_folder, '4k_3M_CenterPanorama.mp4')) + + +logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.DEBUG) + +""" + +print('CVVDP test') +cvvdp = pycvvdp.cvvdp(display_name=display_name) +cvvdp.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = cvvdp.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + + +print('PU PSNR test') +psnr = pycvvdp.pu_psnr_y() +psnr.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = psnr.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + + +print('E_ITP test') +eitp = pycvvdp.e_itp() +eitp.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = eitp.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + + +print('E_Spatial ITP test') +esitp = pycvvdp.e_sitp(display_name=display_name) +esitp.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = esitp.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + + +print('DE2000 test') +de00 = pycvvdp.de2000(display_name=display_name) +de00.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = de00.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + + """ + +print('Spatial DE2000 test') +s_de00 = pycvvdp.s_de2000(display_name=display_name) +s_de00.debug = False +for tst_fname in TST_FILEs: + + vs = pycvvdp.video_source_file( tst_fname, ref_file, display_photometry=display_name, frames=10 ) + + start = time.time() + Q_JOD_static, stats_static = s_de00.predict_video_source( vs ) + end = time.time() + + print( 'Quality for {}: {:.3f} JOD (took {:.4f} secs to compute)'.format(tst_fname, Q_JOD_static, end-start) ) + \ No newline at end of file