🧠 Multimodal Classification Template (Image + Text)

A deep learning project demonstrating multimodal (image and text) classification using TensorFlow/Keras, specifically designed for execution in a Google Colab environment with persistent storage via Google Drive.

Launch the Notebook in Colab:

🚀 Setup & Execution

Prerequisites

A Google account.
A Google Drive where you can store your dataset and model outputs.

Google Drive Setup

You must create a folder in your Google Drive to house the project's data and outputs.

Create a folder in your Google Drive (e.g., Multimodal_Project_Data).
Inside this main folder, place your dataset following this structure (replace 'Your_Project_Folder_Name' with your actual folder name in the notebook):
- /content/drive/My Drive/Your_Project_Folder_Name/images/train/...
- /content/drive/My Drive/Your_Project_Folder_Name/images/test/...
- /content/drive/My Drive/Your_Project_Folder_Name/texts/train_titles.csv
- /content/drive/My Drive/Your_Project_Folder_Name/texts/test_titles.csv

🔄 Performance Optimization: Why Copy Data Locally?

A crucial step in this notebook is copying the image and text files from Google Drive to the local Colab runtime environment (/content/data). This is a performance-critical step because:

Google Drive Latency: Direct reading from Google Drive, even when mounted, involves high latency and slower I/O operations. This creates a bottleneck when training a deep learning model, as the GPU sits idle waiting for the next batch of data.
Speed: By copying the files to the local Colab file system (SSD storage), data loading and fetching speeds are vastly improved, ensuring the GPU can be fed data quickly and operate at peak efficiency.

Running the Notebook

Open the notebook via the Colab badge link above.
Verify the DRIVE_PROJECT_PATH variable in Section 1.2 matches the path to your folder in Google Drive.
Run all cells sequentially.

⚙️ Key Technologies

Platform: Google Colaboratory (Colab)
Storage: Google Drive (for persistent data and outputs)
Deep Learning: TensorFlow 2.x / Keras
- Keras was chosen for its user-friendliness and rapid prototyping capability. It allows complex architectures, such as this multimodal one that combines a CNN and an RNN, to be built and configured with minimal, clean code. This enables faster research iteration and experimentation.
Data Handling: Pandas, NumPy
Vision Model: EfficientNetB0 (transfer learning from keras.applications)
Text Model: LSTM (via Keras Embedding layer)

✨ Results (Outputs Saved to Drive)

Upon successful training, the following assets will be permanently saved to your specified Google Drive path:

best_food_multimodal_model.h5: The best-performing model weights (saved via ModelCheckpoint).
text_vectorizer_vocab.pkl: The pickled vocabulary for the text preprocessing layer.
training_accuracy_plot_v2.png: Plot of training/validation accuracy.
training_loss_plot_v2.png: Plot of training/validation loss.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Multimodal_Training_Colab.ipynb		Multimodal_Training_Colab.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Multimodal Classification Template (Image + Text)

🚀 Setup & Execution

Prerequisites

Google Drive Setup

🔄 Performance Optimization: Why Copy Data Locally?

Running the Notebook

⚙️ Key Technologies

✨ Results (Outputs Saved to Drive)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Multimodal Classification Template (Image + Text)

🚀 Setup & Execution

Prerequisites

Google Drive Setup

🔄 Performance Optimization: Why Copy Data Locally?

Running the Notebook

⚙️ Key Technologies

✨ Results (Outputs Saved to Drive)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages