This Flask application is designed to find and verify email addresses from domains listed in a CSV file using the Common Crawl data. The application uses Flask, Flask-RESTful, and Flask-SSE for handling server-sent events, and SQLAlchemy for database interactions. Redis is used for managing server-sent events. Logging is configured to capture application activity.
Before running the application, ensure you have the following installed on your system:
- Python 3.7 or later
- Flask
- Flask-RESTful
- Flask-SSE
- SQLAlchemy
- Redis
- smtplib
- dnspython
- requests
- warcio
Clone the repository from GitHub:
git clone <repository_url>
cd <repository_directory>It is recommended to create a virtual environment to manage dependencies:
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`Install the required Python packages:
pip install -r requirements.txtEnsure Redis is installed and running on your system. The application expects Redis to be running on localhost at port 6379.
To install Redis, follow the instructions on the Redis website.
Set up your database. By default, the application uses SQLite, but you can configure it to use any other SQL database by setting the DATABASE_URL environment variable.
Run the following commands to create the database:
flask db init
flask db migrate -m "Initial migration."
flask db upgradeSet the necessary environment variables. You can create a .env file in the project root with the following content:
DATABASE_URL=sqlite:///emails.db
REDIS_URL=redis://localhost:6379/0Ensure the Redis server is running:
redis-serverStart the Flask application:
flask runThe application should now be running on http://localhost:5000.
This endpoint processes the domains listed in domains.csv and finds email addresses associated with those domains.
-
Create a CSV file named
domains.csvin the project directory containing the list of domains you want to process. The file should have one domain per line. -
Start the Redis Server
Ensure the Redis server is running:
redis-server
To process the domains and find emails, send a GET request to the /find_emails endpoint. You can use curl for this:
curl http://localhost:5000/find_emailsThe application streams real-time updates as it processes each domain. You can see the following types of messages in the stream:
- Reading Domains:
Read N domains from domains.csv - Processing Domain:
Processing domain: example.com - Found Emails:
Found N emails for domain: example.com - Verified Email:
Verified email: email@example.com, valid: True/False - Errors: Descriptive error messages if any issues occur
The results, including the validity of each email, are saved in found_emails.csv in the project directory. The log file app.log captures detailed information about the application's operations.
Below is an example of how to structure your domains.csv file and an example output from the found_emails.csv file:
example.com
anotherdomain.comDomain,Email,Valid
example.com,email1@example.com,True
example.com,email2@example.com,False
anotherdomain.com,email3@anotherdomain.com,TrueLogs are written to app.log in the project directory, capturing detailed information about the application's operations. This includes reading domains, processing domains, finding and verifying emails, and any errors encountered.
- Check the
found_emails.csvfile for the results. - Refer to
app.logfor detailed logs and troubleshooting.
To stop the application, you can use Ctrl+C in the terminal where the Flask server is running. Also, stop the Redis server if it is running separately.
By following these instructions, you should be able to run and use the Flask Email Finder application effectively.