You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello - thanks for your incredible work! May I ask something and do a proposal?
Pipeline
Which pipeline do you use exactly? A self-written or traiNNer, neosr ? (modified?)
Do you intend to publish that, would be very helpful for a lot of people, because after reading a lot and trying out myself e.g. the real-ersgan training/ fine-tuning script by @xinntao there are certain inconsistencies (mentioned in the issues on that repo) but not fixed. Most users cannot just write their own code but depend on a working example at first. Thus, for a lot of people it would give a lot of confidence to take a complete worked-out role-model one can adjust to one's need and that works out-of-the-box. If you regularly work with the same workflow one worked-example would be completely sufficient like published on the real-ersgan github repo (for me it worked but a lot of the issues reported are understandable).
Proposal: configs + possible research
As you trained so many models I suggest the following if that does not interfere with your personal (publishing) policies:
Publish config files of each model with same model name as the resulting pth/safetensors/...
One could then via script extract the parameters and create a huge table model x config parameters|model characteristics|...
This table is invaluable because AFAIK nobody has done something like that so far
Even if parameters not necessarily match (e.g. due to different workflows) it would allow to do EDA sensu Tukey or even some QDA research on it (EDA = exploratory statistics, QDA = qualitative comparative analysis), with output criteria like image quality, etc., a lot of different questions can be answered by such a table
If hardware parameters (type of GPU, number of GPUs) + overall GPU time can be added, that would enhance configs and add some definite information about reality
No need to extract that, if you still have configs and logs, probably one could extract that for you from those files, still it's a manageable number of training sessions, config parameter names will be mostly be constant, etc.
Additionally, one could do more research on the models like categorization (main purpose of a model like web upscale, or characteristics of the degradation process like jpg compression artifacts, blurriness, etc.) so one gets an extensive and sufficient theory driven part of the model x config|characteristics table
Using those values and beyond the EDA/QCA research on it one can take e.g. a sample of purpose driven HR images not! contained in any of the model training datasets and apply all models on it, create some image-based objective measurement (using multiple criteria, not just one) to assess the accuracy of downgraded images (LR) and re-upscaled by fine-tuned/trained models to HR and compare it to the original HR.
This would allow a huge and precise comparison of all the models, because there are four simple cases:
model (purpose) matches on average the intended purpose (upscale downgraded images)
model (purpose) does not match on average the intended purpose (e.g. anime)
model (purpose) does match on average a different purpose (e.g. anime instead of web photo)
model (purpose) does not match on average intended and/ or different purpose
variation and extremes of the four cases above (ie. distribution), because seldom empirical results are consistent and narrow, to gt an idea about worst + best cases is always good to formulate a best practice approach
...of course at first perform this with images NOT being part of the model data set. This would offer empirical proof of the training concept being superior for each model (on average). One has to think about an appropriate sample size and apply each model to the same sample of images (of different category ie. purpose). Thinking big one could use subjective preferences via asking persons what they prefer or even create an order of top 5 preferred models for each image category (using sophisticated sampling techniques to avoid habituation, exhaustion of the eyes, etc.), but that would probably go too far.
Thinking big one can also add as another category images from the datasets of course. that's a different and interesting research question which can be combined with the thoughts above.
Thinking in time of creation one could also do research on the variation of quality over time - a lot of research questions are possible, far more than a single person can work on.
So if you intend to write an extensive MA thesis or even more, your work is worth even more :-)! Actually with this potential this is definitely worth a serious journal article about fine-tuning super-resolution models and their resulting accuracy.
Such a huge "table" would (s.a.) serve a lot of different purposes and you already did the work, so it would be (the effort on your part) "just" to compile the config files with proper names (can be scripted?), add hardware environment, GPU time or some comparable time measurement that allows to compare GPU-independent, and that's it.
You already did this with e.g. 4xRealWebPhoto_v3.txt as a short descriptor (or add the original config.yml, config.json or whatever)
No idea whether this meets in any ways your interests, but it should be possible to outsource the creation of such a table if all required single config files are available. Your hardware setup is constant? GPU time can be taken from existent logs?
There may (hopefully!) be people interested in the research part as well and collaborating with you as this still requires quite some serious effort although a lot of things can be scripted as soon as a table is present. The rest is more or less straightforward ie. takes time but is not a problem in a narrower sense.
So most of what is outlined above is an extension of your visual comparison page with additional "objective measurements", esp. also to learn how (subjective) visual comparison relates to different (more objective) comparison criteria (which in most cases are distance measurements like cosine similarity, euclidean, maybe KL-divergence/ entropy, etc.). All this could be embedded into your existent webpage framework as another entry: People could vote while doing comparisons (e.g. using a 5-point likert scale), etc. (then add a note that a comparison vote is stored without collecting personal data, etc.).
On the level of research it would be possible to compare models or person's preferences even with smaller samples based on Bayesian t-Tests (there is an analytic solution by GL Bretthorst from early 90s) which circumvents the problems of classical statistics. So statements can be extended from EDA/descriptive into the field of inferential statistics.
Last comment - from my own experience in other fields (science, voluntary work) the documentation part is so crucial and essential (here: to know how a model evolved concretely for replication) and is often neglected due to time constraints, concerns, and other reasons. I saw a lot of cases when original information already got lost, the digital area of information is one of the fields with the fastest (more or less absolute) decay (just drop a 8 TB HDD to the floor...) - it's not comparable with the Egyptian scrolls that still exist and can be read using sophisticated technology.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Phhofm,
Hello - thanks for your incredible work! May I ask something and do a proposal?
Pipeline
Which pipeline do you use exactly? A self-written or traiNNer, neosr ? (modified?)
Do you intend to publish that, would be very helpful for a lot of people, because after reading a lot and trying out myself e.g. the real-ersgan training/ fine-tuning script by @xinntao there are certain inconsistencies (mentioned in the issues on that repo) but not fixed. Most users cannot just write their own code but depend on a working example at first. Thus, for a lot of people it would give a lot of confidence to take a complete worked-out role-model one can adjust to one's need and that works out-of-the-box. If you regularly work with the same workflow one worked-example would be completely sufficient like published on the real-ersgan github repo (for me it worked but a lot of the issues reported are understandable).
Proposal: configs + possible research
As you trained so many models I suggest the following if that does not interfere with your personal (publishing) policies:
model x config parameters|model characteristics|...model x config|characteristicstable...of course at first perform this with images NOT being part of the model data set. This would offer empirical proof of the training concept being superior for each model (on average). One has to think about an appropriate sample size and apply each model to the same sample of images (of different category ie. purpose). Thinking big one could use subjective preferences via asking persons what they prefer or even create an order of top 5 preferred models for each image category (using sophisticated sampling techniques to avoid habituation, exhaustion of the eyes, etc.), but that would probably go too far.
Thinking big one can also add as another category images from the datasets of course. that's a different and interesting research question which can be combined with the thoughts above.
Thinking in time of creation one could also do research on the variation of quality over time - a lot of research questions are possible, far more than a single person can work on.
So if you intend to write an extensive MA thesis or even more, your work is worth even more :-)! Actually with this potential this is definitely worth a serious journal article about fine-tuning super-resolution models and their resulting accuracy.
Such a huge "table" would (s.a.) serve a lot of different purposes and you already did the work, so it would be (the effort on your part) "just" to compile the config files with proper names (can be scripted?), add hardware environment, GPU time or some comparable time measurement that allows to compare GPU-independent, and that's it.
You already did this with e.g.
4xRealWebPhoto_v3.txtas a short descriptor (or add the originalconfig.yml,config.jsonor whatever)No idea whether this meets in any ways your interests, but it should be possible to outsource the creation of such a table if all required single config files are available. Your hardware setup is constant? GPU time can be taken from existent logs?
There may (hopefully!) be people interested in the research part as well and collaborating with you as this still requires quite some serious effort although a lot of things can be scripted as soon as a table is present. The rest is more or less straightforward ie. takes time but is not a problem in a narrower sense.
So most of what is outlined above is an extension of your visual comparison page with additional "objective measurements", esp. also to learn how (subjective) visual comparison relates to different (more objective) comparison criteria (which in most cases are distance measurements like cosine similarity, euclidean, maybe KL-divergence/ entropy, etc.). All this could be embedded into your existent webpage framework as another entry: People could vote while doing comparisons (e.g. using a 5-point likert scale), etc. (then add a note that a comparison vote is stored without collecting personal data, etc.).
On the level of research it would be possible to compare models or person's preferences even with smaller samples based on Bayesian t-Tests (there is an analytic solution by GL Bretthorst from early 90s) which circumvents the problems of classical statistics. So statements can be extended from EDA/descriptive into the field of inferential statistics.
Last comment - from my own experience in other fields (science, voluntary work) the documentation part is so crucial and essential (here: to know how a model evolved concretely for replication) and is often neglected due to time constraints, concerns, and other reasons. I saw a lot of cases when original information already got lost, the digital area of information is one of the fields with the fastest (more or less absolute) decay (just drop a 8 TB HDD to the floor...) - it's not comparable with the Egyptian scrolls that still exist and can be read using sophisticated technology.
What do you think?
Beta Was this translation helpful? Give feedback.
All reactions