Skip to content

OWRS Performance Enhancement #3

@minxinhao

Description

@minxinhao

I ran smart's code on my testbed, which brought a huge performance boost. My testbed uses a connectx-6 NIC and two Intel(R) Xeon(R) Gold 5218 CPUs.
When I wanted to replicate the performance gains of owrs, I wrote my own test code, which just posted depth wrs and poll all. But I couldn't get the same performance gain with my code.
Here are the throughputs at 8byte using smart and my test code.
smart
post_and_poll

I turned off all optimization options for smart except thread_aware_alloc. And made my test code as close as possible to the qp optimization and owrs optimization that smart uses. But no matter what, I can't get similar performance improvement above 24 threads and above 8 depth. Can you give me some idea about the source of the performance improvement in smart. Here is the smart_config I am using.
Also, my testing found that qp's allocation optimization on the doorbell register is not applied above 12 (which is the actual driver limit) (I turned off preload), but the smart code still gets a higher performance boost with more shared_uuar than 12. This is something I can't understand either.

{
"infiniband": {
"name": "",
"port": 1,
"gid_idx": 1
},

"qp_param": {
"max_cqe_size": 256,
"max_wqe_size": 256,
"max_sge_size": 1,
"max_inline_data": 64
},

"max_nodes": 128,
"initiator_cache_size": 4096,

"use_thread_aware_alloc": true,
"thread_aware_alloc": {
"total_uuar": 100,
"shared_uuar": 96,
"shared_cq": true
},

"use_work_req_throt": false,
"work_req_throt": {
"initial_credit": 4,
"max_credit": 12,
"credit_step": 2,
"execution_epochs": 60,
"sample_cycles": 19200000,
"inf_credit_weight": 1.05,
"auto_tuning": false
},

"use_conflict_avoidance": false,
"use_speculative_lookup": false,

"experimental": {
"qp_sharing": false
}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions