Multimodal support for Gemini

Currently, it's only possible to send text messages using the Gemini adapter:

https://github.com/thmsmlr/instructor_ex/blob/1abd8473d05111c11a4d9033b6a88acc29737fa0/lib/instructor/adapters/gemini.ex#L61

The Gemini API supports image, video and audio inputs(unlike the OpenAI API where you send the file contents base64-encoded, you [need to upload the file separately](https://ai.google.dev/gemini-api/docs/vision?lang=rest#upload-image))

Would you be open to a PR that adds support for uploading files, or would you say that is out of scope of this project? 

If it's out of scope, I  can create a smaller PR that allows media URLs(with the upload happening outside the library):

```elixir
Instructor.chat_completion(
  mode: :json_schema,
  model: "gemini-1.5-flash",
  response_model: VideoDesc,
  messages: [
    %{
      role: "user", 
      content: [
        %{
          type: "video_url",
          video_url: %{
            url: "https://generativelanguage.googleapis.com/v1beta/files/..."
          }
        },
        %{
          type: "text",
          text: " what's going on in this video?"
        }
      ]
    }
  ]
)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal support for Gemini #80

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multimodal support for Gemini #80

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions