|
| 1 | +--- |
| 2 | +article: |
| 3 | + publishedTime: "2025-11-25T02:55:00-08:00" |
| 4 | + modifiedTime: "2025-11-25T02:55:00-08:00" |
| 5 | + authors: ["Violet Monserate"] |
| 6 | + section: Class Projects |
| 7 | + tags: ["c", "c++", "gdb", "valgrind", "gitlab"] |
| 8 | +layout: '@components/MarkdownProjectLayout.astro' |
| 9 | +title: File Explorer |
| 10 | +description: Individual project for CSE 333 Systems Programming course, linking a web server to an inverted index of a file directory |
| 11 | +seoDescription: Group project for CSE 440 Human-Computer Interaction course, where we researched and developed a novel commercial product |
| 12 | +image: |
| 13 | + src: "@assets/333gle-homepage.png" |
| 14 | + alt: "Homepage for 333gle: web interface for file explorer. The query is currently 'hello world' and shows a couple links related to the query hello world" |
| 15 | +startDate: '2025-01' |
| 16 | +finishDate: '2025-03' |
| 17 | +icons: ["c", "c++", "gdb", "valgrind", "gitlab"] |
| 18 | +--- |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Key Tools |
| 25 | + |
| 26 | +In order to ensure that my code was working, I utilized 2 debugging tools: |
| 27 | +- **GDB**: This allowed me to go instruction by instruction and inspect the state of the program, which was especially useful when tracking "segfaults" |
| 28 | +- **Valgrind**: This allowed me to see how memory was being utilized by our program, and ensure that we were allocating and freeing memory correctly. |
| 29 | + |
| 30 | + |
| 31 | +## Homework 1: C Data Structures Implementation |
| 32 | + |
| 33 | +### Overview |
| 34 | +Implemented two fundamental C data structures from scratch: |
| 35 | +- **Doubly-linked list** with iterator support |
| 36 | +- **Chained hash table** with dynamic resizing |
| 37 | + |
| 38 | +### Key Features |
| 39 | +- **Generic payload support** for storing arbitrary data types |
| 40 | +- **Memory management** with proper malloc/free handling |
| 41 | +- **Iterator abstractions** for safe data structure traversal |
| 42 | +- **Robust error handling** using Verify333 assertions |
| 43 | + |
| 44 | +### Technical Implementation |
| 45 | +- **LinkedList**: Managed head/tail pointers with node splicing logic |
| 46 | +- **HashTable**: Used FNV hashing with separate chaining collision resolution |
| 47 | +- **Memory safety**: Comprehensive Valgrind testing for leaks and errors |
| 48 | +- **Code quality**: Followed Google C++ style guide with cpplint validation |
| 49 | + |
| 50 | +## Homework 2: In-Memory Search Engine |
| 51 | + |
| 52 | +### Overview |
| 53 | +Built a file system crawler, indexer, and query processor using HW1 data structures. |
| 54 | + |
| 55 | +### Components Implemented |
| 56 | + |
| 57 | +#### Part A: File Parser |
| 58 | +- **Text file ingestion** with memory-efficient string handling |
| 59 | +- **Word parsing** using alphabetic character separation |
| 60 | +- **Position tracking** with byte offset recording |
| 61 | +- **Case normalization** converting all words to lowercase |
| 62 | + |
| 63 | +#### Part B: Crawler and Indexer |
| 64 | +- **Recursive directory traversal** with document ID assignment |
| 65 | +- **Inverted index construction** mapping words → documents → positions |
| 66 | +- **Document table management** for filename ↔ docID bidirectional lookup |
| 67 | + |
| 68 | +#### Part C: Query Processor |
| 69 | +- **Multi-word query processing** with result intersection |
| 70 | +- **Ranking algorithm** based on term frequency summation |
| 71 | +- **Interactive shell** with console-based user interface |
| 72 | + |
| 73 | +### Data Structures Used |
| 74 | +- Document table: Dual hash tables for bidirectional lookup |
| 75 | +- Inverted index: Nested hash tables (word → docID → positions) |
| 76 | +- Position tracking: Linked lists maintaining sorted offsets |
| 77 | + |
| 78 | +## Homework 3: Disk-Based Search Engine |
| 79 | + |
| 80 | +### Overview |
| 81 | +Extended HW2 search engine to persistent storage with architecture-neutral file format. |
| 82 | + |
| 83 | +### Components Implemented |
| 84 | + |
| 85 | +#### Part A: Index Marshaller |
| 86 | +- **Big-endian serialization** for cross-platform compatibility |
| 87 | +- **Complex file format** with header, doctable, and index regions |
| 88 | +- **Checksum validation** for data integrity verification |
| 89 | +- **Hierarchical data storage** maintaining in-memory structure relationships |
| 90 | + |
| 91 | +#### Part B: Index Reader |
| 92 | +- **C++ class hierarchy** for file-based data structure access |
| 93 | +- **Efficient lookup algorithms** for query processing |
| 94 | +- **Memory-mapped style access** without full file loading |
| 95 | + |
| 96 | +#### Part C: Multi-Index Search Shell |
| 97 | +- **Multiple index file support** for distributed searching |
| 98 | +- **Rank aggregation** across multiple corpora |
| 99 | +- **Interactive query interface** with result merging |
| 100 | + |
| 101 | +### File Format Features |
| 102 | +- Magic number identification (0xCAFEF00D) |
| 103 | +- Embedded hash tables with bucket chaining |
| 104 | +- Variable-length string storage |
| 105 | +- Position list compression and sorting |
| 106 | + |
| 107 | +## Final Project: Web Server Security & Session Management |
| 108 | + |
| 109 | +### Security Features Implemented |
| 110 | + |
| 111 | +#### Session Management |
| 112 | +- **Secure cookie generation** with session tracking |
| 113 | +- **HMAC-SHA256 protection** against cookie tampering |
| 114 | +- **Session validation** with cryptographic verification |
| 115 | + |
| 116 | +#### Authentication System |
| 117 | +- **Login page** with credential processing |
| 118 | +- **Admin cookie minting** for privileged access |
| 119 | +- **Plaintext authentication** (noted as potential security concern) |
| 120 | + |
| 121 | +#### Access Control |
| 122 | +- **Admin-only routes** (`/quitquitquit` endpoint protection) |
| 123 | +- **Protected file access** for `$(BASE_DIR)/admin` contents |
| 124 | +- **Role-based authorization** using session cookies |
| 125 | + |
| 126 | +#### Administrative Features |
| 127 | +- **Server logging** of client requests and activities |
| 128 | +- **Admin dashboard** with system overview |
| 129 | +- **Navigation system** with role-appropriate links |
| 130 | + |
| 131 | +### Technical Implementation Details |
| 132 | +- **Cookie security**: HMAC verification prevents unauthorized modifications |
| 133 | +- **Access enforcement**: Session validation on protected endpoints |
| 134 | +- **User experience**: Seamless navigation between public and admin areas |
| 135 | +- **Monitoring**: Comprehensive request logging for administrative oversight |
0 commit comments