Easy-Touch

A projection interaction system based on YOLO object detection.

📝 Project Introduction

Easy-Touch is a 2024-2025 College Student Innovation and Entrepreneurship Training Program project at Northeastern University. This project aims to use a more easily deployable solution to replace interaction methods that rely on depth sensors: using only an ordinary camera to capture laser points in the projected image, combining object detection and perspective transformation to map the laser point's position to the computer screen coordinates, and finally calling the system's low-level API to perform click operations.

🎥 Video Demo

✨ Core Features

YOLOv8-based Laser Point Detection: Uses a trained YOLO model to identify laser points in the projected image, minimizing reliance on specialized hardware.
Automatic Projection Area Recognition: Combines OpenCV contour detection methods to extract the four corner points of the projection area for subsequent coordinate transformation.
Perspective Transformation Coordinate Mapping: Uses perspective transformation to convert coordinates from the camera's perspective to the screen coordinate system, reducing position deviations caused by installation angles.
Basic Interaction Simulation: Currently supports left-click, right-click, and double-click, utilizing a simple time threshold to reduce continuous false touches.
Visual Control Panel: Provides a simple Tkinter-based GUI to pause recognition and adjust horizontal and vertical offsets.

🛠️ Environment Dependencies

This project mainly relies on the following core libraries:

ultralytics (YOLOv8)
opencv-python
numpy
keyboard

Installation method:

pip install -r src/requirements.txt

🚀 Quick Start

Device Preparation: Project the computer screen onto a wall or curtain, and connect a camera to the computer, pointing it at the projection area.
Model Preparation: Ensure that the trained laser pen recognition model weights are placed in the src/model/best.pt path.
Start the System: Run the main program script.
```
python src/predictByCap.py
```
Calibration and Operation:
- After startup, the program will automatically attempt to recognize the boundaries of the projection area
- In the popped-up EasyTouch_V1.0 interface, you can select the mouse operation you want to execute (Click / Right Click / Double Click).
- If you notice a slight deviation in the click position, you can perform real-time calibration using the Horizontal Offset and Vertical Offset sliders at the bottom of the interface.
- Shine the laser pen within the projection area, and the system will respond with the corresponding mouse action in real-time.

🔧 Implementation Details

1. Data Collection

First, we collected and constructed a dataset of about 100 laser pen images.

💡 Tuning Tip: During the collection step, you can appropriately lower the camera's exposure. This can significantly highlight the relative brightness of the red laser without affecting the overall projection effect, thereby greatly improving the subsequent model's recognition accuracy.

2. Data Annotation and Training

The collected images were uploaded to Roboflow for annotation and preprocessing, mainly including:

Bounding box annotation for the laser points.
Data augmentation such as rotation, scaling, and brightness adjustment.
Splitting the data into training, validation, and test sets.

Figure 1: A single collection sample, with the yellow box marking the laser point position.

Figure 2: A collage of some collection samples, showing laser points against different backgrounds and positions.

Model training is based on the YOLOv8 framework provided by Ultralytics:

yolo task=detect mode=train model=yolov8n.pt data=path/to/your/data.yaml epochs=50 imgsz=640 device=0

3. Area Recognition

The camera and the projector are usually not facing the same plane perfectly, so the projection area seen by the camera is generally a tilted quadrilateral rather than a standard rectangle. If these coordinates are used directly to control the mouse, the deviation will be quite noticeable.

To solve this problem, the program first identifies the boundary of the projection area and extracts the four corner points to serve as inputs for the subsequent perspective transformation.

4. Perspective Transformation and Coordinate Mapping

After obtaining the four corner points, the program calculates the perspective transformation matrix using cv2.getPerspectiveTransform to map the projection area in the camera frame to the computer screen coordinate system. The purpose of doing this is to minimize the distortion effects caused by the shooting angle.

The left side of the figure below shows the projection area from the camera's perspective, and the right side is a schematic diagram corresponding to the standard screen plane after perspective transformation.

5. Interaction Logic

When YOLO detects a laser point, the system calculates its center position and triggers a click, right-click, or double-click according to the current mode. To prevent a laser point from continuously triggering multiple operations in a short period, a simple debounce logic based on a time threshold is added to the program.

The simulation of mouse events currently relies on the Windows low-level API ctypes.windll.user32, so this set of interaction logic is designated for Windows environments by default.

Current Limitations

It is still quite sensitive to ambient light and camera parameters. Moving to a different venue may require re-adjusting the exposure and offsets.
Currently, it mainly supports click-based operations; the interaction methods are not rich enough yet.

🙏 References

Kinect Depth Sensing Solution By czaoth Perspective Transformation By Arthur Wang Perspective Transformation By MaWB

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pic		pic
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches