-
-
Notifications
You must be signed in to change notification settings - Fork 5
HostDMA/EMIFA basic driver #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ipc driver basic test
|
I haven't had a chance to test this and I only had a brief look at the code.
Sorry, I missed your previous question about this. I think DSP HOST_ACK is connected to CPU GPIO Bank 6 Pin 1.
I'll look into this.
I haven't made a start on DMA for the CPU yet. We should start with a generic EDMA peripheral driver, then we can set up DMA transfers for most peripherals and memory.
I was thinking we should set the FIFO to its maximum (I think 16 x 16 bit) and pad smaller transfers if we need to. This will be less efficient at low load, but could be more efficient for larger sustained transfers when we really need the efficiency. Probably the only way to know is to profile.
I wouldn't worry about that, I've been learning how to do this by making it up as I go along. Can you split this PR into only the code needed for the EMIFA/HostDMA driver? It looks like this branch also has an app/module and some changes to the build scripts. I'll try to have a proper look at this soon, thank you for contributing to Freetribe. |
I can confirm without HostDP enabled this pin will not fire an ISR, chances are you are correct. I would have never guessed that pin to be responsible for this! I will get back to you on the Interrupt vs Acknowledge mode matter.
Right so I suspect there's a bit of a misunderstanding as to what you mean by "padding" and what I mean by "padding". Indeed the FIFO is 16 16-bit words. The idea is to transfer blocks filling the FIFO, with the last block having the remainder of the words. I think you mean "we are always sending full blocks of 16 16-bit words, the remainder gets padded to a full FIFO." I mean: In burst mode, the FIFO must contain atleast 2 words, hence the remainder cannot be 1 (transfer lengths 17, 33, 49, 65 etc). Ofcourse we can pad that, with a tiny extra overhead/extra complication.
Give me a couple of days for this :) An additional question: If I go through the trouble to set up JTAG debugging, will it save me the pain of waiting a minute for sysex transfer each time? Cheers, |
|
Welp I almost implemented interrupt mode properly, but seems the host read does not work according how the manual claims it works. I highly suspect EMIFA timing values to be off, considering I used random timing, and putting them higher actually messes it up. So that's something to straight up rip from the E2 firmware. |
Yes, I misunderstood what you were saying about burst mode. I think the factory firmware has burst mode disabled, the manual says this give higher throughput at the expense of more DMA transactions. We should be able to present an interface that accepts an arbitrary number of bytes, with the driver extending or truncating the data as necessary for the transfer. I haven't had a good look at your implementation yet, so it's difficult to say more. For now, I agree that packing data to 32 bit makes sense, and it works with burst mode on or off.
Don't feel any need to rush, it may be a few weeks before I can get much work done.
Yes, it will be much quicker. We only need CPU JTAG to load firmware, the DSP is booted by the CPU. J-Link is the best option and the power switch will need modifying. Ideally we should use the J-Link reset pin to cycle the power, but I've been doing it manually.
Try this: EMIFAWaitTimingConfig(SOC_EMIFA_0_REGS, EMIFA_CHIP_SELECT_2, EMIFA_ASYNC_WAITTIME_CONFIG(
0, // wset Write setup time or width in EMA_CLK cycles
3, // wstb Write strobe time or width in EMA_CLK cycles
0, // whld Write hold time or width in EMA_CLK cycles
0, // rset Read setup time or width in EMA_CLK cycles
3, // rstb Read strobe time or width in EMA_CLK cycles
0, // rhld Read hold time or width in EMA_CLK cycles
0 // ta Minimum Turn-Around time
));This should set the |
|
At a rave, so yes no rush let's take it slow and think things through properly, a deep breath always pays off The alternative, and in my opinion most useful to implement mode is Interrupt mode + stop mode. Where transactions will be done with a callback interface, introducing some latency at the mercy of precious CPU cycles.
In the manual I have to admit I still find burst mode a confusing concept. I know it seems to group word transfers with a certain timing, at the expense of smaller block sizes? I do know that it is enabled/encouraged by default according to the manual. The question is what kind of driver API did korg implement and what for? |
I implemented a very bare bones IPC between CPU and DSP.
Key points:
I wanted to make sure it works first, so I used HostDMA in stop mode, ofcourse, autobuffer mode is more preferable because the host wouldn't need to re-configure after each FIFO transfer.
Took lot's of effort, but could not find / manage to trigger an IRQ host side for the HOST_ACK (also known as FRDY/HRDY) pin, so HostDMA runs in acknowledge mode. For sake of efficiency, this sucks. Would be great to verify in ghidra if E2 firmware runs HostDMA in acknowledge or interrupt mode. I still suspect EMA_WAITx pin to be wired to HOST_ACK, but the EMIFA sysint is rising edge only. As a GPIO interrupt I didn't get it to work either.
EMIFA timing has yet to be calculated and adjusted, but it may be best to pull the configuration straight from ghidra; see if it writes to 0x68000010 (EMIFAWaitTimingConfig, CE2CFG register).
Would it be possible setting up a DMA channel on the host side aswell using the EDMA3 peripheral? That would bring us closer to the "shared memory illusion" you have talked about before. Currently, from the host side, the CPU still has to perform reads and writes in loops, in fact, the process of waiting for HostDMA completion is blocking!!! This could be done using a dispatching state machine instead, that is, if a CPU side DMA channel is not possible.
Next step in abstraction would be writing a device API layer on top of the peripheral code a command queue. Currently if the bus is busy, it just returns an error status.
Code requires an even buffer size, since the FIFO can never transfer a single word in burst mode. My opinion is that the best solution is to have the future device API on top of this only accept 32-bit buffers, as this would avoid complicating the peripheral code.
Without atomic MMR writes, I managed to add some prevention of bus contention, but more thorough testing is be needed to verify it's robustness
Learnt a lot from this ordeal but I am well aware that I probably made a lot of mistakes, but for what it is, it does seem to work properly. Pull the code and see for yourself --- look at per_emifa.c and per_hostdma.c
Cheers, 🔊Max꩜