InfinibandVerbs.jl Documentation
Here are some of the more commonly used high level API functions. They are presented in these sections:
General functions
These structure and functions are useful for both sending and receiving packets.
InfinibandVerbs.Context
— TypeContext(dev_name, port_num; <kwargs>)
Create a Context
object to use Infiniband Verbs on port_num
of dev_name
.
Keyword arguments, described in the extended help, control various aspects of the created objects. Their default values request minimal resources, but non-trivial applications will need to request more resources depending on the application's needs. Upon successful return, the Context
's queue pair will be in the IBV_QPS_INIT
state.
Extended help
The Context
structure manages the following fields (so you don't have to):
Field name | Type | Description |
---|---|---|
context | Ptr{ibv_context} | Context from C library |
pd | Ptr{ibv_pd} | Protection domain |
send_comp_channel | Ptr{ibv_comp_channel} | Send completion channel |
recv_comp_channel | Ptr{ibv_comp_channel} | Receive completion channel |
send_cq | Ptr{ibv_cq} | Send completion queue |
recv_cq | Ptr{ibv_cq} | Receive completion queue |
qp | Ptr{ibv_qp} | Queue pair |
send_wcs | Vector{ibv_wc} | Send work completions |
recv_wcs | Vector{ibv_wc} | Recv work completions |
The following fields of the Context
structure hold sizing information:
Field name | Type | Description |
---|---|---|
send_cqe | UInt32 | Number of send completion queue events |
recv_cqe | UInt32 | Number of recv completion queue events |
max_send_wr | UInt32 | Max number of posted send work requests |
max_recv_wr | UInt32 | Max number of posted recv work requests |
max_send_sge | UInt32 | Max number of SGEs per send work request |
max_recv_sge | UInt32 | Max number of SGEs per recv work request |
(!) max_inline_data | UInt32 | Maximum amount of inline data |
The Context
constructor supports the following keyword arguments:
Keyword argument | Default | Description |
---|---|---|
force | false | Allow dev_name:port_num to be inactive |
(-) send_cqe | 1 | Number of events for send completion queue |
(-) recv_cqe | 1 | Number of events for recv completion queue |
(-) max_send_wr | 1 | Maximum number of send work requests |
(-) max_recv_wr | 1 | Maximum number of recv work requests |
(-) max_send_sge | 1 | Maximum number of SGEs for send WR sg_lists |
(-) max_recv_sge | 1 | Maximum number of SGEs for recv WR sg_lists |
req_notify_send | true | Request send CQ notifications |
req_notify_recv | true | Request recv CQ notifications |
solicited_only_send | false | solicited_only for send CQ notifications |
solicited_only_recv | false | solicited_only for recv CQ notifications |
(!) comp_vector | 0 | Completion queue comp_vector |
(!) max_inline_data | 0 | Maximum inline data for QP |
(!) qp_type | IBV_QPT_RAW_PACKET | QP type |
InfinibandVerbs.hascapability
— Functionhascapability(ctx::Context, cap)
Return true
if the device corresponding to ctx
has capability cap
.
cap
may be any one (and only one) of the ibv_device_cap_flags
or ibv_raw_packet_caps
flags.
Extended help
ibv_device_cap_flags |
---|
IBV_DEVICE_RESIZE_MAX_WR |
IBV_DEVICE_BAD_PKEY_CNTR |
IBV_DEVICE_BAD_QKEY_CNTR |
IBV_DEVICE_RAW_MULTI |
IBV_DEVICE_AUTO_PATH_MIG |
IBV_DEVICE_CHANGE_PHY_PORT |
IBV_DEVICE_UD_AV_PORT_ENFORCE |
IBV_DEVICE_CURR_QP_STATE_MOD |
IBV_DEVICE_SHUTDOWN_PORT |
IBV_DEVICE_INIT_TYPE |
IBV_DEVICE_PORT_ACTIVE_EVENT |
IBV_DEVICE_SYS_IMAGE_GUID |
IBV_DEVICE_RC_RNR_NAK_GEN |
IBV_DEVICE_SRQ_RESIZE |
IBV_DEVICE_N_NOTIFY_CQ |
IBV_DEVICE_MEM_WINDOW |
IBV_DEVICE_UD_IP_CSUM |
IBV_DEVICE_XRC |
IBV_DEVICE_MEM_MGT_EXTENSIONS |
IBV_DEVICE_MEM_WINDOW_TYPE_2A |
IBV_DEVICE_MEM_WINDOW_TYPE_2B |
IBV_DEVICE_RC_IP_CSUM |
IBV_DEVICE_RAW_IP_CSUM |
IBV_DEVICE_MANAGED_FLOW_STEERING |
ibv_raw_packet_caps |
---|
IBV_RAW_PACKET_CAP_CVLAN_STRIPPING |
IBV_RAW_PACKET_CAP_SCATTER_FCS |
IBV_RAW_PACKET_CAP_IP_CSUM |
IBV_RAW_PACKET_CAP_DELAY_DROP |
InfinibandVerbs.post_wrs
— Functionpost_wrs(ctx::Context, wr::Ptr; modify_qp=false) -> nothing
post_wrs(ctx::Context, wrs::Vector, idx=1; modify_qp=false) -> nothing
Post linked WRs to the Context
's QP.
wr
may be a Ptr{ibv_send_wr}
/Ptr{ibv_recv_wr}
or Vector{ibv_send_wr}
/Vector{ibv_recv_wr}
. Passing a Vector
will only post the WRs that are part of the linked list headed by the WR wrs[idx]
of the Vector, which may not be the same as wrs[idx:end]
. Throws a SystemError
if the underlying library call fails, otherwise returns nothing
.
If modify_qp
is true
, the Context's QP will be transitioned appropriately for the work request type:
For
ibv_recv_wr
, the QP will be transitioned to a "ready-to-receive" compatible state after posting the WRs.For
ibv_send_wr
, the QP will be transitioned to the "ready-to-send" state before posting the WRs.
InfinibandVerbs.repost_loop
— Functionrepost_loop(callback, ctx::Context, wrs, timeout_ms, callback_args...)
Run a loop that processes work completions and reposts work requests.
The callback
argument is a user supplied callback function described in detail below. ctx
is a Context
struct. wrs
is a Vector{ibv_recv_wr}
or Vector{ibv_send_wr}
containing all the work requests being used. callback_args...
are user supplied arguments that will be passed to the user supplied callback function. Typically callback_args
will include the packet buffers (or some other means of getting packet (fragment) buffers) and a Ref{Int}
for accumulating the number of packets posted (useful when finding the next location in the packet buffers to use), but the caller is free to decide how to make use of callback_args...
.
The user supplied callback function callback
is called for every work completion event that has a non-zero number of work completions. It is also called if no work completions are received for timeout_ms
milliseconds. The callback function is passed the following arguments:
callback(wcs, num_wc, wrs, callback_args...)
wcs
: AVector{ibv_wc}
containing work completions (WCs)num_wc
: Number of valid entries inwcs
(i.e.wcs[1:num_wc]
are valid). If the callback function is called after a timeout,num_wc
will be0
.wrs
: The samerecv_wrs
orsend_wrs
passed torepost_loop
callback_args...
: The samecallback_args...
parameters that were passed torepost_loop
(the callback function may specify the individual extra arguments by name rather than splattingcallback_args
)
This function can block waiting for work completion events. It should be run in a separate thread from the main Julia thread so that it will not interfere with the Julia scheduler. Usually this will be done using Threads.@spawn
. For example (using repost_loop
to receive):
recvtask = Threads.@spawn repost_loop(recv_callback, ctx, recv_wrs, timeout, user1, user2)
Note that Threads.@spawn
does not create new threads. To use multiple threads, Julia must be started with more than one thread (e.g. julia -t 4
). Additional threads cannot be created after startup.
Extended help
The main purpose of the callback function is to update the SG lists of the completed work requests (WRs) with pointers to the next available packet (fragment) buffer location(s). The callback function should return the number of work requests for repost_loop
to repost. Returning 0
from the callback will keep repost_loop
running but no WRs will be reposted. Returning a negative value will cause repost_loop
to return (i.e. end). When num_wc
is 0
(i.e. after a timeout), the callback should return a value less than 0
to end the loop or 0
to keep waiting for more work completions (i.e. packets).
For streaming applications that run "forever", the callback function should update the SG lists of the WRs for all num_wc
WCs and return num_wc
.
Applications that want to process exactly N
WRs should pass N
as part of callback_args...
as well as two Ref{Int}
parameters so the callback function can accumulate the number of WRs posted and the number of WRs done/processed. As the number of WRs posted approaches N
the callback may return a number less than than num_wc
. Such applications should also pay attention to the number of done/processed WRs and return a negative value when it equals N
(or exceeds N
, but that should never happen if exactly N
WRs are posted).
Send related functions
These functions are useful when sending packets.
InfinibandVerbs.create_send_wrs
— Functioncreate_send_wrs(ctx::Context, bufs, num_wr[, npad]; offload=false, post=false) -> send_wrs, sges, mrs
create_send_wrs(mrs, bufs, num_wr[, npad]) -> send_wrs, sges
Return num_wr
ibv_send_wr
work requests (WRs) and their associated scatter-gather (SG) lists.
bufs
and mrs
, if given, must be Vectors. If the form with ctx::Context
is called, the memory region(s) of bufs
will be registered before calling the mrs
form and the resulting MRs will be returned in addition to the WRs and SG lists. If memory regions mrs
are given, their lkey
values will be used when populating the SG lists so mrs
must correspond to bufs
. All of the Arrays in bufs
must hold the same number of packets (i.e. packet fragments). See the doc string for create_sges
for details about the npad
parameter.
The Context
form also accepts keyword arguments offload
and post
, both of which default to false
. If offload
is true
the WRs will be setup to offload the IP checksum calculation to the NIC. You can use hascapability
to check whether your device supports this. See the man page of ibv_post_send
for more details. If post
is true
, the Context
's QP will be transitioned to the "ready-to-send" (RTS) state and the work requests will be posted.
Receive related functions
These functions are useful when receiving packets.
InfinibandVerbs.create_recv_wrs
— Functioncreate_recv_wrs(ctx::Context, bufs, num_wr[, npad]; post=false) -> recv_wrs, sges, mrs
create_recv_wrs(mrs, bufs, num_wr[, npad]) -> recv_wrs, sges
Return num_wr
ibv_recv_wr
work requests (WRs) and their associated scatter-gather (SG) lists.
bufs
and mrs
, if given, must be Vectors. If the form with ctx::Context
is called, the memory region(s) of bufs
will be registered before calling the mrs
form and the resulting MRs will be returned in addition to the WRs and SG lists. If memory regions mrs
are given, their lkey
values will be used when populating the SG lists so mrs
must correspond to bufs
. All of the Arrays in bufs
must hold the same number of packets (i.e. packet fragments). See the doc string for create_sges
for details about the npad
parameter.
The Context
form also accepts keyword argument post
(default false
). If post
is true
, the work requests will be posted and the Context
's QP will be transitioned to a "ready-to-receive" (RTR) compatible state.
InfinibandVerbs.create_flow
— Functioncreate_flow(qp, port_num; <kwargs>) -> Ptr{ibv_flow}
create_flow(ctx::Context; <kwargs>) -> Ptr{ibv_flow}
Create a flow rule to specify which packets to receive.
The flow rule will be created for queue pair qp
, port number port_num
(or ctx.qp
and ctx.port_num
).
Various attributes and selectors of the flow rule can be specified by keyword arguments as described in the extended help. The returned Ptr{ibv_flow}
can be passed to ibv_destroy_flow
to remove the flow rule.
Extended help
Attributes
These keyword arguments set general attributes of the flow rule. See the Linux manual page for ibv_create_flow
for more details.
- General flow rule attributes:
flow_type
one of:normal
(default),:all
,:mc
, or:sniffer
priority
defaults to 0flow_flags
defaults to 0
Selectors
Keyword arguments that are unspecified (or zero) are not used for matching. Integer values will be converted to network byte order as necessary.
- Ethernet layer selectors
dmac::NTuple{6, UInt8}
: destination MAC address to matchsmac::NTuple{6, UInt8}
: source MAC address to matchethtype
: EtherType value to match (e.g.0x0800
for IPv4 packets)vlan
: VLAN tag to match
- IPv4 layer selectors (can be
UInt32
orIPv4
)sip
: source IPv4 address (e.g.0x0a000123
orip"10.0.1.35"
)dip
: destination IPv4 addressproto
: IP protocoltos
: Type of service (aka DSCP/ECN)ttl
: Time to liveflags
: IP flags
- TCP/UDP layer selectors
dport
: destination TCP/UDP port to matchsport
: source TCP/UDP port to matchtcpudp
: one of:udp
(default) or:tcp
. Specifies whethersport
/dport
should match UDP or TCP ports. To match all UDP or all TCP packets, use the IPv4 layerproto
selector instead.
InfinibandVerbs.destroy_flow
— Functiondestroy_flow(flow) -> nothing
Destroy flow
that was created by create_flow
. Packets specified by flow
will no longer be received from the flow's device and port.
Less frequently used functions
These functions are not directly used so frequently because they are called by more commonly used functions. Sometimes the more commonly used functions expose parameters that are passed on to these functions so they are documented here.
InfinibandVerbs.create_sges
— Functioncreate_sges(bufs, lkeys, num_wr[, npad]) -> Matrix{ibv_sge}
Return a Matrix{ibv_sge}
of size (length(bufs), num_wr)
.
Each column of the returned Matrix is an SG list initialized to point to the first num_wr
packets of bufs
. Both bufs
and lkeys
must be Vectors of the same length. The elements of bufs
are Arrays that must be sized for the same number of packets and num_wr
must not exceed this number. The elements of lkeys
must be the lkey
values corresponding to the registered memory regions for bufs
.
The Arrays in bufs
may include a number of padding bytes along their first dimension to improve alignment of the packet data. The number of padding bytes may be specified by passing npad
. npad
may be a single Int
to apply the same value to all bufs
Arrays or a Vector{Int}
to specify values for all the bufs
Array. The default value of npad
is 0
(i.e. no padding). The value(s) of npad
will be subtracted from the number of bytes in the first dimensions of the bufs
Array.