Skip to content

All is well, some insights #21

@HarvieKrumpet

Description

@HarvieKrumpet

Your able to keep up with ggerganov who updates quite quickly. In the c# world we have limited choices for fast minimalist to interface to ggerganov's work. Even ggerganov builds with 12.2 cuda, instead of the latest 12.6. So unless we completely build bare metal rather than rely on prebuilt librarys. We will always be behind the 8ball for quite awhile. Other c# bridge lib's impose excessive abstractions, bulk it out with 3rd party tools. and because of this update quite slowly.
The minimalist approach you take keeps things cutting edge. One of your best techniques was to keep the inference loop in a separate worker thread. and enqueue/dequeue it rather than tight integration with asyncenumerators to the client. This makes it clean and allows many other libs to work in parallel such as stableDiffusion.net. Not to mention the clean make/building toolchain between ggerganov you setup for a quick pull and build all which encompasses cuda toolkit changes in a one click solution. This usually requires 3 separate steps to baremetal compile all the associated libraries.

My wish though is you include some of the special higher level features of ggerganov has in his cli. I have tried to get the cache load/store to work. But it requires some knowledge of how to deal with unsafe pointers, and globalheapmemory beyond just calling the bindings. I know this probably just takes 8-10 lines of code. But apparently is beyond me, and a separate tokenizing step to save the tokens which are usually handled in that worker thread.
I do not forsee ggerganov creating a middleware stack that will be as useful as your current system. And I would like to keep the inference loop as you have it. Just want the cache load/save to function. Perhaps call this your middleware stack rather than just some code snippets in your example client examples. adding this, plus, embedding, quantization's, and other things that is supported by ggerganov's cli tool. It would still be a minimalist approach with just this extra toolkit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions