-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtodo
More file actions
265 lines (248 loc) · 11.1 KB
/
todo
File metadata and controls
265 lines (248 loc) · 11.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
TODOs
=====
WIP
---
merged in mm
- Handle non-power-of-two min_nr_regions aligning.
- ALIGN() is only for power of two alignment.
- merged in mm.git
- per-filtered-address-ragne prioritization
- do the temperature-prioritization for address filter-passed regions,
separately
- merged in mm.git
- respect min_nr_regions from beginning
- merged in mm.git
- fix sampling intervals overflow
- add CONFIG_DAMON_HARDENED for integrity checking
- allow stop-after-achieving DAMOS goal
- add per-goal quota tuning strategy
- consist: the current one, default.
- temporal: the new one.
- bpf: for future?
- fix walk_system_ram() type violation
- https://lore.kernel.org/20260129161029.48991-1-sj@kernel.org
- mark DAMOS filters/ dir deprecated on doc
- cover all physical address space from DAMON modules
- add kdamond_pid to damon_stat.
- let DAMON_RECLAIM auto-tune intervals.
- kdamond pause/resume.
- Use it for drgn tests.
- Let DAMOS action failed region be charged in a different rate
- e.g., charge only 1 byte for 1 MiB of failed region.
- Documentation cleanup: Make link from design to API
- deprecate sysfs filters/ in favor of {core,ops}_filters/
- warn on use of filters/, by 2027
- rename filters/ to filters_DEPRECATED/, by 2028
- remove filters/ code and documentation, by 2029
- counters of interests
- for production level lightweight page-level monitoring
- let users register counters of interests
- when doing access check, check if the sample page is of the interest,
increase the counter.
- users know how many of each region is of the type, proportionally.
- nr_accesses will be the default counter
- Sashiko bug findings
- damon_..._enabled_store() NULL deref on init failure
- https://lore.kernel.org/20260419014800.877-1-sj@kernel.org
- repeat_call_control leak on premature DAMON stop
- https://lore.kernel.org/20260319155742.186627-5-objecting@objecting.org
- merged in mm
- damos_walk_cancel() deadlock.
- https://lore.kernel.org/20260322170700.83123-1-sj@kernel.org
- patch posted
- damon_call() cleanup deadlock.
- https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org
- patch posted
- validate quota goal nid
- https://lore.kernel.org/20260325155221.202700-1-objecting@objecting.org
- patch posted
- charged_from wraparound during DAMOS deactivation/ctx pause
- https://lore.kernel.org/20260324040722.57944-1-sj@kernel.org
- patch posted
- race between module params and commit_inputs
- divide by zero of damon_reclaim_new_scheme()
- https://lore.kernel.org/20260319161620.189392-3-objecting@objecting.org
- cache addr_unit using READ_ONCE(), on DAMON_RECLAIM and DAMON_LRU_SORT.
- https://lore.kernel.org/20260319161620.189392-2-objecting@objecting.org
- patch posted
- damos_stat damon_call() failure memory leak
- https://lore.kernel.org/20260401012428.86694-1-sj@kernel.org
- patch posted
- https://lore.kernel.org/20260402134418.74121-1-sj@kernel.org
- zero throughput-causing permanent damos deactivation on time quota use case
- https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org
- patch posted
- https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
- min_region_sz power_of_2() validation for damon_sysfs_turn_on()
- https://lore.kernel.org/20260403155530.64647-1-sj@kernel.org
- patch posted
- https://lore.kernel.org/20260411213638.77768-1-sj@kernel.org
- make damon_sysfs_next_updte_jiffies per-context
- https://lore.kernel.org/20260319155742.186627-5-objecting@objecting.org
- pid leak on damon_commit_ctx()
- https://lore.kernel.org/20260319155742.186627-2-objecting@objecting.org
- think about age wrap around to 0.
- https://lore.kernel.org/20260319161620.189392-3-objecting@objecting.org
- avoid divide-by-zero in damos_get_node_mem_bp() dueo ti zero i.totalram
- https://lore.kernel.org/20260328133216.9697-1-sj@kernel.org
- underflow of numerator in damos_get_node_memcg_used_bp()
- https://lore.kernel.org/20260329154813.47382-1-sj@kernel.org
- avoid divide-by-zero in damos_get_node_memcg_used_bp()
- https://lore.kernel.org/20260329154813.47382-1-sj@kernel.org
- mult_frac() in damos_get_node_memcg_used_bp() overflow on 32bit
- https://lore.kernel.org/20260329154813.47382-1-sj@kernel.org
- allow force-restart of DAMON_RECLAIM/LRU_SORT
- https://lore.kernel.org/20260410135500.81989-1-sj@kernel.org
- Liew is working
- https://lore.kernel.org/20260411000458.11479-1-aethernet65535@gmail.com
- cleanup !sz case in damon_do_apply_schemes()
- https://lore.kernel.org/20260410233223.88212-1-sj@kernel.org
- Think about damon_start() partial failures
- https://lore.kernel.org/20260411233431.78220-1-sj@kernel.org
- damon_stat stale kdamond_pid
- https://lore.kernel.org/20260414053742.90296-1-sj@kernel.org
- module init and parameter set race causing NULL dereference
- https://lore.kernel.org/20260418153656.834-1-sj@kernel.org
Planning / Considering
----------------------
- Aware 1 GiB hugetlb pages when auto-tuning intervals.
- Subtract 1 GiB hugetlb pages from the total size of memory.
- Let DAMON_STAT and DAMON_RECLAIM run in parallel.
- Let DAMON API callers share a kdamond
- What kdamond to share?
- A single one that always running, say, kdamond*<N>?
- Just the kdamond of the first caller that started (could be any
kdamond<N>?
- add drgn selftests for DAMON modules.
- Support tlb flushing
- https://lore.kernel.org/all/a2fb10bd-b44a-350e-f693-82ecfa6f54a8@huawei.com/
- automatic paddr regions detection
(handle hot(un)plugged memory regions)
- similar to vaddr, detect and cover all online memory as much as possible,
with a few holes.
- make DAMON_STAT cover multiple hot-[un]pluggable NUMA nodes
- implement and use kind of three big regions detection mechanism
- distinction of uncheckable memory
- not just say it is not accessed, but disclose the fact that it was unable
to check the access to, and apply merge-split with the fact.
- add kunit test for all parameters commit functions
- add damos_stat for auto-tuned zero size applying
- DAMOS_KILL: DAMON-based OOM killing
- If DAMON_RECLAIM doesn't increase free memory, killing processes of hot
pages.
- implement pause/resume of kdamonds
- Or, let user feed nr_accesses and age of regions
- User can snapshot the last monitoring results and continue monitoring
from the status.
- support (multiple) kdamosd
- per-noe memory bandwidth utilization DAMOS quota goal metric
- clarify possible monitoring results loss on usage doc.
- implement addr_unit commit kunit test
- implement addr_unit to/from core address conversion kunit test
- handle charged_ns overflow on 32bit machines
- cleanup damon_set_attrs() documentation
- add kunit test for damon_call() and damos_walk()
- DAMON-based khugepaged
- Make khugepaged listen DAMON's voice when collapsing
- Support multi contexts per kdamond
- Add sharable kdamond
- Run with auto-tuned intervals
- Let API callers report access information, read monitoring results and
add/remove DAMOS schemes
- Major callers would be DAMON modules
- damon_get_handle(), damon_report_access(), damos_add_folios()
- Let ABI users read monitoring results and DAMOS stat
- /sys/kernel/mm/damon/???/
- DAMON_NUMA_MIGRATE
- Aim for not only traditional NUMA but also tiering
- Just extend mtier to every node, but CPU-aware.
- sysfs command for updating params (for restoring old params)
- write a selftest for sz_filter_passed
- contigurized-vddr
- trat end of a discrete region as start of next region
- connect regions before and after huge unmapped area
- connect regions of different virtual addresses
- Write API documentation
- no just kerenl doc, but more structured document is needed
- Let users decide regions split factor
- https://lore.kernel.org/20241026215311.148363-1-sj@kernel.org
- Let users periodically split regions without per-region subregions limit
- https://lore.kernel.org/20241026215311.148363-1-sj@kernel.org
- Sampling based page level properties based monitoring
- For DAMOS_STAT, do sampling for sz_filter_passed calculation
- Let user sets the number of samples per region for this
- Support reserved uninstall of DAMOS
- Allow running DAMOS scheme for only specific apply intervals
- Extend for memory bandwidth monitoring
- Extend for AMD IBS-based monitoring
- Extend for cache-set space monitoring
- Extend for cache-line space monitoring
- Require sub-page level monitoring (IBS?)
- Access/Contiguity-aware Memory Auto-scaling
- https://lore.kernel.org/damon/20240512193657.79298-1-sj@kernel.org
- Support cleaning sysfs input files up to committed values
- holistic heterogenous memory management
- address CPU-numa, CXL-numa, and device(e.g., GPU)-numa nodes
- DAMON_LRU_SORT auto-tuning
- Let auto-tuning using active/inactive memory ratio
- Selftests: Test DAMON online tuning
- Selftests: Test DAMOS online tuning
- Selftests: Test DAMOS filter
- Make 'age' counted by sample_interval rather than aggregation interval
- CPU time quota for DAMON monitirng
- monitoring part CPU usage statistics
- Setting resolution of damos tried_regions
- Should be able to control the directories population overhead when the
number of regions is big
- More DAMON modules
- DAMON-based THP hinting module
- rename nr_accesses/moving_accesses_bp
- mark nr_accesses as private
- use a dedicated struct for access rate
- let kdamond name be user-defined
- let DAMON modules share one kdamond
- unify DAMON modules
- support multiple contexts per kdamond
- DAMON-based VMA split/merge
- Help big VMA contention issue?
- We can further expose the monitoring results via vma name
- Reading results becomes very easy
- Contig memory access util monitoring
- WSS/RSS based processes sorting
- LRU-based monitoring ops
- Fixed granularity idleness monitoring
- Must be useful for further DAMON overhead/accuracy evaluation
- Improve regions-based monitoring quality
- Support cgroups
- Add __counted_by() annotation when ready
(https://lore.kernel.org/r/CAKwvOdkvGTGiWzqEFq=kzqvxSYP5vUj3g9Z-=MZSQROzzSa_dg@mail.gmail.com)
- Ideas from LSFMM
- Add operations driven access check
- Let it calls DAMON functions for noticed access and then let DAMON records
the access
- Take care of fairness on ACMA (e.g., NUMA)
- Consider hugetlb handling optimization
- DAMON_RECLAIM will meaninglessly try reclaiming hugetlb pages, consume
CPUs. Find a way to optimize.
- DAMON in process context
- Do the monitoring for each process in task_work context, like NUMA
balancing installs prot_none.
Frozen
------
Recently Done
-------------
- DAMOS stat/control improvement
- Show how many snapshots the scheme has processed
- Let DAMOS deactivated based on the number of processed snapshots
- Provide a tracepoint for DAMOS stat
- merged into 7.0-rc1
- DAMON_LRU_SORT modernization
- merged into 7.0-rc1
- hide kdamond and kdamond_lock from API callers
- merged into 7.0-rc1
- modernize wss_estimation kselftest
- merged into 7.0-rc1
- document modules usage on usage doc
- merged into 7.0-rc1
Non-DAMON issues
----------------