-
Notifications
You must be signed in to change notification settings - Fork 0
347. Top K Frequent Elements #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b72bca7
b4ad0cd
fbde086
6610a43
4dae248
05180ec
0dfbc21
c838916
e5300f0
31bdc0f
bbf1111
7a4d2bf
13cd2a3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # step1 | ||
|
|
||
| メタ情報でpriority queueを使用しようと思いました。 | ||
| ただどうも良いやり方が思いつかず、解答にあったコードを参考にして写経してあります。 | ||
|
|
||
| # step2 | ||
|
|
||
| ## 典型コメント集をみて | ||
|
|
||
| - https://discord.com/channels/1084280443945353267/1235829049511903273/1245555256360697949 | ||
| - Counterを使うのはいいが、その実装を理解しているかが出題者の意図なのでは | ||
| - dictを初期化、その後listを作成しsortして上位何件かを取るという処理ができているか | ||
| - https://discord.com/channels/1084280443945353267/1183683738635346001/1185972070165782688 | ||
| - QuickSelectというアルゴリズムがある | ||
| - https://www.geeksforgeeks.org/dsa/quickselect-algorithm/ | ||
|
|
||
|
|
||
|
|
||
| ## Bucket Sort | ||
|
|
||
| Bucket Sortという方法があった。 | ||
| 2次元配列のインデックスを頻度として使用し、対象インデックスの配列に対して番号を追加していく方式。 | ||
|
|
||
| 時間計算量 O(N) | ||
| - numsの要素分のループが定数倍回行われるのみなので、O(N)なはず。 | ||
| 空間計算量 O(N) | ||
| - bucket用のリストを確保するので、空間計算量はO(N) | ||
|
|
||
| ## Heap | ||
|
|
||
| Min-heapを使う方法。 | ||
| Min heapに対して、(頻度,番号)のタプルを追加し、k個以上の要素が追加された場合 heappop する。つまり、heapの中には常に tok k frequent elements のみが存在する状態。 | ||
|
|
||
| 時間計算量 O(NlogK) | ||
| - heapを構成する要素数は必ずK個以下になるので、O(NlogK)なはず。 | ||
| 空間計算量 O(N+K) | ||
| - N個の要素 | ||
|
|
||
| ## Sort | ||
|
|
||
| シンプルにSortする方法 | ||
| [頻度,番号]のリストをリストに追加して、ソートする。(あとで突っ込む値はリストからタプルに変えた) | ||
|
|
||
| 時間計算量 O(NlogN) | ||
| - N個の要素を持つリストをソートするのでNlogNななず。 | ||
| 空間計算量 O(N) | ||
| - N個の要素を持つリストを作成するため。 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| count = {} | ||
| for num in nums: | ||
| count[num] = 1 + count.get(num, 0) | ||
|
|
||
| heap = [] | ||
| for num in count.keys(): | ||
| heapq.heappush(heap, (count[num], num)) | ||
| if len(heap) > k: | ||
| heapq.heappop(heap) | ||
|
|
||
| res = [] | ||
| for i in range(k): | ||
| res.append(heapq.heappop(heap)[1]) | ||
|
|
||
| return res | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| num_to_frequency = {} | ||
| frequency_to_nums = [ [] for i in range(len(nums) + 1)] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 出現回数freq回の要素をfrequency_to_nums[freq - 1]に入れるようにすればこの配列の長さはlen(nums)で済みますね。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| for num in nums: | ||
| num_to_frequency[num] = 1 + num_to_frequency.get(num, 0) | ||
| for num, frequency in num_to_frequency.items(): | ||
| frequency_to_nums[frequency].append(num) | ||
|
Comment on lines
+3
to
+9
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. num_to_frequencyの初期化 -> num_to_frequencyの構築 -> frequency_to_numsの初期化 -> frequency_to_numsの構築 の順番にした方が読み手のワーキングメモリに優しく読みやすいと思います。 |
||
|
|
||
| res = [] | ||
| for i in range(len(frequency_to_nums) - 1, 0, -1): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 自分なら i -> freqにすると思いますが、どちらでも良いと思います。 |
||
| for num in frequency_to_nums[i]: | ||
| res.append(num) | ||
| if len(res) == k: | ||
| return res | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| count = {} | ||
| for num in nums: | ||
| count[num] = 1 + count.get(num, 0) | ||
|
|
||
| heap = [] | ||
| for num in count.keys(): | ||
| heapq.heappush(heap, (count[num], num)) | ||
| if len(heap) > k: | ||
| heapq.heappop(heap) | ||
|
|
||
| res = [] | ||
| for _ in range(k): | ||
| res.append(heapq.heappop(heap)[1]) | ||
| return res |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| counts = {} | ||
| for num in nums: | ||
| counts[num] = 1 + counts.get(num, 0) | ||
| items = list(counts.items()) | ||
| n = len(items) | ||
|
|
||
| if k >= n: | ||
| return [num for num, _ in items] | ||
|
|
||
| target = n - k | ||
|
|
||
| def partition(left: int, right: int, pivot_index: int) -> int: | ||
| pivot_freq = items[pivot_index][1] | ||
|
|
||
| items[pivot_index], items[right] = items[right], items[pivot_index] | ||
| store_index = left | ||
|
|
||
| for i in range(left, right): | ||
| if items[i][1] < pivot_freq: | ||
| items[store_index], items[i] = items[i], items[store_index] | ||
| store_index += 1 | ||
| items[store_index], items[right] = items[right], items[store_index] | ||
| return store_index | ||
|
|
||
| def quickselect(left: int, right: int, k_smallest: int) -> None: | ||
| if left == right: | ||
| return | ||
| pivot_index = random.randint(left, right) | ||
| pivot_index = partition(left, right, pivot_index) | ||
|
|
||
| if k_smallest == pivot_index: | ||
| return | ||
| elif k_smallest < pivot_index: | ||
| quickselect(left, pivot_index - 1, k_smallest) | ||
| else: | ||
| quickselect(pivot_index + 1, right, k_smallest) | ||
|
|
||
| quickselect(0, n - 1, target) | ||
| top_k_nums = [num for num, _ in items[target:]] | ||
| return top_k_nums | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| num_to_frequency = {} | ||
| for num in nums: | ||
| num_to_frequency[num] = 1 + num_to_frequency.get(num, 1) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 初期値は0が正しいのではないでしょうか? |
||
|
|
||
| frequency_to_num_array = [] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorted()とlambdaを使ってスッキリ描くことも可能です。 |
||
| for num, frequency in num_to_frequency.items(): | ||
| frequency_to_num_array.append([frequency, num]) | ||
| frequency_to_num_array.sort() | ||
|
|
||
| res = [] | ||
| while len(res) < k: | ||
| res.append(frequency_to_num_array.pop()[1]) | ||
| return res | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| counts = {} | ||
| frequency_buckets = [[] for _ in range(len(nums) + 1)] | ||
|
|
||
| for num in nums: | ||
| counts[num] = 1 + counts.get(num, 0) | ||
| for num, cnt in counts.items(): | ||
| frequency_buckets[cnt].append(num) | ||
| result = [] | ||
| for frequency_value in range(len(frequency_buckets) - 1, 0, -1): | ||
| for num in frequency_buckets[frequency_value]: | ||
| result.append(num) | ||
| if len(result) == k: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 単に問題を解くだけでなくその先を考えてみるのも良いと思います。 例えばこの関数が広く使われる中で、numsのユニークな要素数がk個未満だった場合、if len(result) == kに至ることはないため、暗黙的にNoneが返ることになりますが、それは望ましいでしょうか? また、入力に対する頑健性があると便利だと思います。将来numsに[]が来たり、ときにはNoneが来たりするかもしれませんが、if not nums: return [] と最初に書いておけば一応エラーは起こさずに済みます。
https://discord.com/channels/1084280443945353267/1367399154200088626/1371325723612151918 |
||
| return result | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| counts = {} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ここは使い方的に counts だと情報が少なすぎる気がするので num_to_frequency などにすると思いました。 |
||
| for num in nums: | ||
| counts[num] = 1 + counts.get(num, 0) | ||
| heap = [] | ||
| for num, freq in counts.items(): | ||
| heapq.heappush(heap, (freq, num)) | ||
| if len(heap) > k: | ||
| heapq.heappop(heap) | ||
| result = [] | ||
| while heap: | ||
| result.append(heapq.heappop(heap)[1]) | ||
| return result | ||
|
Comment on lines
+11
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. どの順番で返してもいいと問題文に書いてあるので、return heapとしても良いでしょう。
Comment on lines
+6
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. heapq.nlargestがこれと等価で簡潔です。 return heapq.nlargest(k, counts, key=counts.get)https://docs.python.org/ja/3/library/heapq.html#heapq.nlargest
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. これは知りませんでした、助かります。ありがとうございます! |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| counts = {} | ||
| for num in nums: | ||
| counts[num] = 1 + counts.get(num, 0) | ||
|
Comment on lines
+3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. これをPythonでやる場合、標準モジュールcollectionsのCounterクラスを使うと便利です。 import collections
counts = collections.Counter(nums)https://docs.python.org/ja/3.13/library/collections.html#collections.Counter
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ありがとうございます!
|
||
| freq_num_pairs = [(cnt, num) for num, cnt in counts.items()] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. freqのように英単語から削って変数名としている場合、読み手は元の英単語を推測する必要があり認知負荷が上がることがあります。原則としてフルスペルで書くことをおすすめします。 num, cutについては一時変数で使い捨てなのと、countsのkey, valueはすぐ上のコードから読み取れるので個人的には許容範囲です。 |
||
| freq_num_pairs.sort(reverse=True) | ||
| result = [num for _, num in freq_num_pairs[:k]] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. すみません、上のコメントは誤りです... 元のコードを正しく理解できてなかったです。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. result を消すのはありですね。 一行に詰め込んでいますが、私はこれくらいまでが許容範囲くらいです。 |
||
| return result | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| class Solution: | ||
| def topKFrequent(self, nums: List[int], k: int) -> List[int]: | ||
| counts = {} | ||
| for num in nums: | ||
| counts[num] = 1 + counts.get(num, 0) | ||
| frequency_num_pairs = [] | ||
| for key, cnt in counts.items(): | ||
| frequency_num_pairs.append((cnt, key)) | ||
| frequency_num_pairs.sort(reverse=True) | ||
| return [num for _, num in frequency_num_pairs[:k]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
本練習会の標準的な進め方だとStep 2で他の参加者のコードをみたり、コメント集の自分が解いている問題のセクションをみたりすると思います。
何を見たかという意味でURL、そしてそれを見た感想をペアにして列挙していただくとレビュワーの助けになると思います。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
他の方の解き方で見たところも追記してみました。
QuickSelectを用いた解き方もやってみました!
https://github.com/t-ooka/leetcode/blob/question/Top-K-Frequent-Elements/Top%20K%20Frequent%20Elements%20(retry)/step2-using-quickselect.py