Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
569 changes: 569 additions & 0 deletions Source/DigitViewer2/DigitScanner/DigitScanner.cpp

Large diffs are not rendered by default.

28 changes: 28 additions & 0 deletions Source/DigitViewer2/DigitScanner/DigitScanner.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/* DigitScanner.h
*
* Author : Michael Kleber
* Date Created : 01/15/2026
* Last Modified : 01/15/2026
* Copyright 2026 Google LLC
*
*/

#pragma once
#include "PublicLibs/Types.h"

namespace DigitViewer2 {
using namespace ymp;

class BasicDigitReader;

class DigitScanner {
public:
DigitScanner(BasicDigitReader& reader, upL_t d);
void search();

private:
BasicDigitReader& m_reader;
upL_t m_d;
};

}
62 changes: 62 additions & 0 deletions Source/DigitViewer2/DigitScanner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Scanning for All Strings of Digits
========
by Michael Kleber

Code in this directory implements a way to scan through a large file of digits until _every_ sequence of $d$ digits has appeared.

Are you wondering "Does my 10-digit phone number appear in the digits of pi?"
Yes it does, somewhere in the first 241,641,121,048 digits.
What about your 16-digit credit card number?
I don't know — we haven't calculated enough digits of pi to see every 16-digit number.
(Yet.)

## Background

Pi, and many other numbers you can compute with y-cruncher, are believed to be [normal numbers](https://en.wikipedia.org/wiki/Normal_number).
This would mean that every sequence of $d$ decimal digits should appear in it, in approximately $1/(10^d)$ of the possible locations.
(That's what you would expect if the digits were random... and we have every reason to believe that pi's digits behave like random ones _from this particular point of view_.)

That leads to asking the very natural question:
"Out of the $10^d$ sequences of $d$ digits, which one takes the longest to appear, and how many digits does it take?"

* For n=1, the digit 0 is the last one to show up in pi, all the way out at the 32nd place after the decimal point: 3.1415926535897932384626433832795**0**2...
* For n=2 you need to go out to 606 places before you finally see the two-digit sequence 68.
* For n=3,4,5,...,11, you need to go out to 8555, 99849, 1369564, 14118312, 166100506, 1816743912, 22445207406, 241641121048, 2512258603207 digits of pi before you finally see the digit sequence 483, 6716, 33394, 569540, 1075656, 36432643, 172484538, 5918289042, 56377726040 respectively.
* These are recorded in the [On-line Encyclopedia of Integer Sequences](https://oeis.org/) as entries [A036903](https://oeis.org/A036903) and [A032510](https://oeis.org/A032510).

With 314 trillion random digits, there is around a
[79% chance](https://www.wolframalpha.com/input?i=N%5Bexp%28-n+exp%28-w%2Fn%29%29%5D+where+n+%3D+10%5E13+and+w+%3D+314+trillion)
of seeing all strings of length 13.

## Algorithm

### Basic idea
To search for every string of $d$ digits:
* Make a bitvector of $10^d$ zeros
* Look at strings of $d$ digits one at a time, considered as a $d$-digit number $n$.
* If the $n$'th bit in the bitstring is a $0$, then you've found a new string!
* Go you! Add one to the variable "how many strings I've found so far."
* If that variable equals $10^d$, you've seen them all! Have a party.
* If the $n$'th bit in the bitstring is already a $1$, nothing to see here, move along.

If you have a lot of digits, a lot of memory, and a lot of time, this will do the job.

If you don't have $10^d$ bits of memory, then you could scan the digits more than once —
"Okay _this_ time I'm going to only pay attention to $d$-digit strings that start with a 7."
This multi-scan idea is not implemented here. Call a friend with more RAM.

### Parallelization and efficiency
To run this search faster, we use many threads, and also a bitvector built on top of atomic values so that the threads
don't corrupt one another's work or fight about which of them should increment the found-strings counter.

We stop that approach when the bitvector is getting close to all 1's, and switch to a new phase where we track the arrival
of the last few thousand strings in a (mutex-guarded) hash map that remembers at what position those strings finally appear.
This lets us keep using many threads and still find out which string took the longest to first show up.

The bitvector phase of the search is sped up by issuing memory prefetch hints, since the CPU spending all its time
asking for randomly-placed individual bits in a very large span of memory is a latency-pessimal access pattern.
The hash map phase uses a quick little Bloom filter to do less hashing.

The cutover point between the two search phases and the memory prefetch hint details are definitely sensitive to what
exact hardware you're running on. If you plan to use this for large $d$ (say 10 or up), you may profit from tuning
these details to your setup.
14 changes: 14 additions & 0 deletions Source/DigitViewer2/DigitViewer/DigitViewerTasks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "DigitViewer2/DigitWriters/BasicDigitWriter.h"
#include "DigitViewer2/DigitWriters/BasicTextWriter.h"
#include "DigitViewer2/DigitWriters/BasicYcdSetWriter.h"
#include "DigitViewer2/DigitScanner/DigitScanner.h"
#include "DigitViewerTasks.h"
namespace DigitViewer2{
////////////////////////////////////////////////////////////////////////////////
Expand Down Expand Up @@ -479,8 +480,21 @@ void to_ycd_file_partial(BasicDigitReader& reader){
);
process_write(reader, start_pos, end_pos - start_pos, writer, start_pos);
}
void find_last_d_string(BasicDigitReader& reader){
Console::println("\n\nFind Last d-Digit String");
Console::println();

// Get d from the user.
upL_t d = Console::scan_label_upL_range("Enter d (1-13): ", 1, 13);
Console::println();

DigitScanner scanner(reader, d);
scanner.search();
}
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}


2 changes: 2 additions & 0 deletions Source/DigitViewer2/DigitViewer/DigitViewerTasks.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ void compute_stats(BasicDigitReader& reader);
void to_text_file(BasicDigitReader& reader);
void to_ycd_file_all(BasicDigitReader& reader);
void to_ycd_file_partial(BasicDigitReader& reader);
void find_last_d_string(BasicDigitReader& reader);
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}
#endif

19 changes: 16 additions & 3 deletions Source/DigitViewer2/DigitViewer/DigitViewerUI2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,11 @@ void Menu_TextFile(BasicTextReader& reader){
Console::println("Compress digits 1 - N into one or more .ycd files.", 'G');
Console::print(" 4 ", 'w');
Console::println("Compress a subset of digits into .ycd files.", 'G');
Console::print(" 5 ", 'w');
Console::println("Search for all d-digit strings.", 'G');

Console::println("\nEnter your choice:", 'w');
upL_t c = Console::scan_label_upL_range("option: ", 0, 4);
upL_t c = Console::scan_label_upL_range("option: ", 0, 5);
Console::println();

switch (c){
Expand All @@ -73,6 +75,9 @@ void Menu_TextFile(BasicTextReader& reader){
case 4:
to_ycd_file_partial(reader);
return;
case 5:
find_last_d_string(reader);
return;
default:;
}
}
Expand Down Expand Up @@ -115,14 +120,16 @@ void Menu_YcdFile(BasicYcdSetReader& reader){
Console::println("Compress digits 1 - N into one or more .ycd files.", 'G');
Console::print(" 4 ", 'w');
Console::println("Compress a subset of digits into .ycd files.", 'G');
Console::print(" 5 ", 'w');
Console::println("Search for all d-digit strings.", 'G');
Console::println();

Console::print(" 5 ", 'w');
Console::print(" 6 ", 'w');
Console::print("Add search directory.", 'G');
Console::println(" (if .ycd files are in multiple paths)", 'Y');

Console::println("\nEnter your choice:", 'w');
upL_t c = Console::scan_label_upL_range("option: ", 0, 5);
upL_t c = Console::scan_label_upL_range("option: ", 0, 6);
Console::println();

switch (c){
Expand All @@ -142,6 +149,10 @@ void Menu_YcdFile(BasicYcdSetReader& reader){
to_ycd_file_partial(reader);
return;
case 5:
find_last_d_string(reader);
return;

case 6:
Console::println("\nEnter directory:");
reader.add_search_path(Console::scan_utf8());
break;
Expand Down Expand Up @@ -200,3 +211,5 @@ void Menu_Main(){
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}


3 changes: 3 additions & 0 deletions Source/DigitViewer2/Objects.mk
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,12 @@ CURRENT += DigitWriters/BasicTextWriter.cpp
CURRENT += DigitWriters/BasicYcdFileWriter.cpp
CURRENT += DigitWriters/BasicYcdSetWriter.cpp

CURRENT += DigitScanner/DigitScanner.cpp

CURRENT += DigitViewer/DigitViewerTasks.cpp
CURRENT += DigitViewer/DigitViewerUI2.cpp


SOURCES := $(SOURCES) $(addprefix $(CURRENT_DIR)/, $(CURRENT))
endif

2 changes: 2 additions & 0 deletions Source/DigitViewer2/SMC_DigitViewer2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@

#include "DigitViewer/DigitViewerTasks.cpp"
#include "DigitViewer/DigitViewerUI2.cpp"

#include "DigitScanner/DigitScanner.cpp"
7 changes: 7 additions & 0 deletions Source/PublicLibs/BasicLibs/StringTools/ToString.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,13 @@ YM_NO_INLINE std::string tostrln(uiL_t x, NumberFormat format){
YM_NO_INLINE std::string tostrln(siL_t x, NumberFormat format){
return tostr(x, format) += "\r\n";
}
YM_NO_INLINE std::string tostr_width(uiL_t x, int width){
std::ostringstream out;
out << std::setfill('0');
out << std::setw(width);
out << x;
return out.str();
}
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
Expand Down
1 change: 1 addition & 0 deletions Source/PublicLibs/BasicLibs/StringTools/ToString.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ YM_NO_INLINE std::string tostrln (uiL_t x, NumberFormat format = NORMAL);
YM_NO_INLINE std::string tostrln (siL_t x, NumberFormat format = NORMAL);
static std::string tostrln (u32_t x, NumberFormat format = NORMAL){ return tostrln((uiL_t)x, format); }
static std::string tostrln (s32_t x, NumberFormat format = NORMAL){ return tostrln((siL_t)x, format); }
YM_NO_INLINE std::string tostr_width (uiL_t x, int width);
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
// Float
Expand Down
1 change: 1 addition & 0 deletions TinyTestData/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Minimal .ycd file of 1 million decimal digits, just to have for testing purposes.
Binary file added TinyTestData/pi1m - 0.ycd
Binary file not shown.
32 changes: 17 additions & 15 deletions VSS - DigitViewer2/DigitViewer2/DigitViewer2.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -62,102 +62,102 @@
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{78460907-F11F-45DF-A8B3-BCF1D8E54EC5}</ProjectGuid>
<RootNamespace>DigitViewer2</RootNamespace>
<WindowsTargetPlatformVersion>10.0.17763.0</WindowsTargetPlatformVersion>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='04-SSE3|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='07-Penryn|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='13-Haswell|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='17-Skylake|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='00-x86|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='04-SSE3|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='07-Penryn|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='13-Haswell|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='17-Skylake|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='00-x86|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
Expand Down Expand Up @@ -564,6 +564,7 @@
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\BasicYcdSetReader.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\InconsistentMetadataException.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\ParsingTools.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerTasks.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerUI2.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitWriters\BasicTextWriter.cpp" />
Expand Down Expand Up @@ -699,6 +700,7 @@
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\BasicYcdSetReader.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\InconsistentMetadataException.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\ParsingTools.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerTasks.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerUI2.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitWriters\BasicDigitWriter.h" />
Expand Down
6 changes: 6 additions & 0 deletions VSS - DigitViewer2/DigitViewer2/DigitViewer2.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,9 @@
<ClCompile Include="..\..\Source\PublicLibs\SystemLibs\FileIO\BufferredStreamFile.cpp">
<Filter>Source Files\PublicLibs\SystemLibs\FileIO</Filter>
</ClCompile>
<ClCompile Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\..\Source\DigitViewer2\DigitCount\DigitCount.h">
Expand Down Expand Up @@ -769,5 +772,8 @@
<ClInclude Include="..\..\Source\PublicLibs\SystemLibs\FileIO\BaseFile\BaseFile_Default.h">
<Filter>Source Files\PublicLibs\SystemLibs\FileIO\BaseFile</Filter>
</ClInclude>
<ClInclude Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.h">
<Filter>Header Files</Filter>
</ClInclude>
</ItemGroup>
</Project>