Add Litellm watsonx integration#776
Conversation
| path: "{{ .BaseDir }}/models" | ||
| type: Directory | ||
| containers: | ||
| {{- if .Values.litellm.enable }} |
There was a problem hiding this comment.
this might a workaround considering we need to adapt the new folder structure under here - https://github.com/IBM/project-ai-services/tree/main/ai-services/assets/components/llm
please add here as well for the new structure..
There was a problem hiding this comment.
I think it should be fine. I feel anyway we are out of sync with other assets too. Once we have the deploy flow ready, we will sync it and post that in future we need to ensure that we update it accordingly
| - name: WATSONX_APIKEY | ||
| value: "{{ .Values.litellm.watsonxApiKey }}" | ||
| - name: WATSONX_PROJECT_ID | ||
| value: "{{ .Values.litellm.watsonxProjectId }}" | ||
| - name: WATSONX_URL | ||
| value: "{{ .Values.litellm.watsonxUrl }}" | ||
| - name: INSTRUCT_MODEL | ||
| value: "watsonx/{{ .Values.litellm.instructModel }}" |
There was a problem hiding this comment.
So if my understanding is correct based on the manifest file here, we will have 1 litellm pod for each projectID, URL, APIKEY?
What if we have a new API_KEY or ProjectID or WATSONX_URL, can I use it within same pod or always it has to be a different deployment of pod for this?
There was a problem hiding this comment.
It's highly unlikely that user would want to connect multiple watsonx services. IMO we can get started with one instance of Watsonx service.
865ec88 to
3f9b04c
Compare
Signed-off-by: T K Chandra Hasan <t.k.chandra.hasan@ibm.com>
3f9b04c to
e8a57f4
Compare
Following files should get merged in the upstream litellm and after that we can remove applying these patches. Keeping it for now.
Above files are having generic changes required to support the passthrough route for watsonx. The reason we need this change is that, the custom callback (pre-api hook) doesn't allow us to customize the authentication header as we need to set the bearer token dynamically.
The request flow for the /tokenize endpoint in this case would be
Client -> Litellm (/tokenize) -> Litellm (/watsonx/ml/v1/text/tokenzation) -> IBM Watsonx
The custom_callbacks.py file would be used to perform the request & response translation for the /tokenize endpoint.