Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that all of the rdma components have been pulled into a single
repository on github, remove all of the contents of this repo and
leave a pointer to the new location.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When creating an Address Handle from a Work Completion, source GID index must
be set. To enable RoCE v2, libibverbs should select the right GID index for
it.
RoCE v2 is using UDP/IP layers so a WC's GRH could be either an IB GRH, IPv4
header or an IPv6 header. Libibverbs should be able to differ between them in a
driver-agnostic way to avoid API changes.
This patch is using the fact that for RoCE v2, the GRH is either an IPv6 header
or 20 garbled bytes followed by an IPv4 header, as defined in RoCE v2 annex.
The annex also specifies that for packets with IPv4 header, the version number
is 4, for packets with IPv6 header it's 6 or the packet is silently dropped.
This fact is also taken into account when parsing the GRH.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, libibverbs does not support UD traffic for RoCE v2 since
it can't differ between v1 and v2 GIDs (both have the same GID, only
the version is different). This means that GID index can't be
selected correctly.
This patch introduces ibv_query_gid_type helper function to be used
by libibverbs and its vendors to return GID type based on its GID
index by using the relevant sysfs.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Update man pages for RSS usage, it includes:
- Add man pages for the new related verbs.
- Update man/ibv_create_qp_ex and man/ibv_query_device_ex
to include the related RSS stuff.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Expose RSS related capabilities, it includes:
- QP types that support RSS on the device.
- Max number of receive work queue indirection tables that
could be opened on the device.
- Max size of a receive work queue indirection table.
- Max number of work queues of receive type that
could be opened on the device.
- Bit mask of the supported types of hash functions.
- Bit mask of the supported RX fields that can participate
in the RX hashing.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add support to track asynchronous events on a work queue object.
For now only IBV_EVENT_WQ_FATAL is applicable.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Extend create QP to get RX hash data, this is needed
to enable RSS based on some RX configuration.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Extend create QP to get a Receive Work Queue indirection table,
this is needed to enable RSS on some set of Receive Work Queues.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Introduce Receive Work Queue (WQ) indirection table and its
create/destroy verbs. This object can be used to spread incoming
traffic to different receive Work Queues.
A Receive WQ indirection table points to variable size of WQs. This
table is given to a QP in downstream patches.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Introduce Work Queue object and its create/destroy/modify verbs.
QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.
WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple Receive Work
Queues.
WQ associated (many to one) with Completion Queue and it owns WQ
properties (PD, WQ size, etc.).
WQ has a type, this patch introduces the IBV_WQT_RQ (i.e.receive
queue), it may be extend to others such as IBV_WQT_SQ. (send queue).
WQ from type IBV_WQT_RQ contains receive work requests and as such
exposes post receive function to be used to post a list of work
requests (WRs) to its receive queue.
PD is an attribute of a work queue (i.e. send/receive queue), it's
used by the hardware for security validation before scattering to a
memory region which is pointed by the WQ. For that, an external WQ
object
needs a PD, letting the hardware makes that validation.
When accessing a memory region that is pointed by the WQ its PD
is used and not the QP's PD, this behavior is similar to a SRQ and a QP.
WQ context is subject to a well-defined state transitions done by
the modify_wq verb.
When WQ is created its initial state becomes IBV_WQS_RESET.
>From IBV_WQS_RESET it can be modified to itself or to IBV_WQS_RDY.
>From IBV_WQS_RDY it can be modified to itself, to IBV_WQS_RESET
or to IBV_WQS_ERR.
>From IBV_WQS_ERR it can be modified to IBV_WQS_RESET.
Note: transition to IBV_WQS_ERR might occur implicitly in case there
was some HW error.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Update relevant man pages for TSO:
- Add max_tso_header as part of ibv_create_qp_ex man page.
- Add IBV_WR_TSO opcode and update send operation support table for
RAW_PACKET QP as part of ibv_post_send man page.
- Add TSO capabilities as part of ibv_query_device_ex man page.
In addition, fixed a typo as part of updating ibv_post_send related to
ibv_odp_caps.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
1) Add a structure to define a TSO packet as part of struct ibv_send_wr.
2) Add IBV_WR_TSO opcode to be used as part of post_send.
3) Add IBV_WC_TSO to be used as part of poll_cq to report a TSO completion.
4) Add IBV_QP_INIT_ATTR_MAX_TSO_HEADER to define the maximum TSO header size
when creating a QP. This is needed to let providers prepare their SQ buffer
to fit application's usage.
5) Report TSO capabilities when querying a device.
In order to preserve the size of ibv_send_wr structure and prevents some
performance penalty, the TSO definition was added under a union with the
memory window stuff, those options are mutual exclusive.
The TSO definition should include:
- A pointer to the packet header.
- Header size.
- The maximum segment size (mss) that the hardware should generate in
its TSO engine.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Make sure to return valid output by memset the extended fields to zero.
No need to deal with specific fields any more.
Currently, extended fields will be assigned to some value or 0 depending
on response from the command. When adding a new extended field, relevant
variables must be cleared if no response got from the kernel.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
I store release tarballs in a subdirectory of my git repo, so I needed
an update to my .gitignore file
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently there is a "generic" implementation for the memory barrier
macros in arch.h. These turned out to be insuffient for ARM64 causing
memory corruption problems when doing RDMA operations. So going forward,
fail a compile on a platform w/o platform-specific memory barrier macros.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The default generic barriers are not correct for ARM64. This results
in data corruption. The correct macros are based on the ARM Compiler
Toolchain Assembler Reference documenation.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
We have internal tools that complain about any binaries that are missing
corresponding man pages, and all the other examples have 'em.
Shamelessly ripped off from ibv_srq_pingpong's man page.
CC: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Multiple conflicts with timestamp support. Hand resolved.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add timestamp support in rc_pingpong, it can serve as some example
of using ibv_create_cq_ex verb and the new
ibv_wc_read_xxx accessors.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add ibv_query_rt_values_ex, an extension verb to query certain real
time values of a device. Currently, only IBV_VALUES_MASK_RAW_CLOCK is
supported, but this verb could support other flags like
IBV_VALUES_MASK_TEMP_SENSOR, IBV_VALUES_MASK_CORE_FREQ, etc.
This extension verb only calls the provider.
The provider has to query this value somehow and mark the queried
values in comp_mask.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When a CQ is used only from one thread, there's no need to waste cycles
on locking. Since this series introduces a mechanism which allows the
vendor to introduce different polling functions per CQ, it allows the
vendor to implement both locking and lockless CQs and assign them
accordingly.
Adding a new creation flag for this.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In order to report when a completion was created, we add a member
function to the CQ which reads the completion timestamp from the
current CQ. The time is given in raw format.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The fields timestamp_mask and hca_core_clock were added
to the extended version of ibv_query_device verb.
timestamp_mask represents the supported mask of the timestamp.
Users could infer the accuracy of the reported possible
timestamp.
hca_core_clock represents the frequency of the HCA (in kHZ).
Since timestamp is given in hardware cycles, knowing the frequency
is mandatory in order to convert this number into seconds.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, ibv_poll_cq is one stop shop for polling all the
completion's attribute. The current implementation has a few
implications:
1. The vendor's completion format is transformed to an agnostic IB
format (copying data which might has been used directly).
2. Adding new completion attributes means wasting more time in copies.
3. Extensible functions require wasting cycles on logic which
determines whether a function is known and implemented in the
provider.
In order to solve these problems, we introduce a new poll_cq mechanism
for extensible CQs. Polling is done in batches, where every batch
could contain more than one completion. A batch is started with
calling to start_poll function. This already fetches a completion
that the user can now query using the various query functions.
Advancing to the next completion is done using next_poll.
After querying all completions (or when the user wants to stop
fetching completions in the current batch), end_poll is called.
As 'wr_id' and 'status' are mostly used, they can be accessed directly
and there is no function call for.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add ibv_create_cq_ex. This extended verb follows the extension verbs
scheme and hence could be extendible in the future for more features.
The new command requires the user to declare which fields are going
to be polled. This is mandatory in order to maintain compatibility
between new applications and old libraries.
The user shall only read fields from the completion which he requested
upon creating the CQ.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Raw Packet QPs that were created with Scatter FCS will scatter
the FCS into the receive buffers.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
|
|
Since the legacy device capability flags are occupied, add new
device capability flags to the extended query device.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
|
|
Extend verbosity mode to print device capability flags in text format.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
|
|
Add GID index parameter, it's needed for RoCE.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
SRQ number is 24 bits and not 16, fix to have sufficient
size for it.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add an example rule to show typical usage of
the API.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add man page for ibv_create_flow/ibv_destroy_flow verbs, it includes
the new dont_trap flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add an option to create a normal flow steering rule that doesn't trap
received packets, allowing them to match lower prioritized rules.
When the don't trap rule exists and matches a packet, the underlying HCA
should pass the packet to the rule's assigned QP(s). However, the HCA
will continue looking for other matches at lower priority rules, which
may be assigned to other QPs. This will let them get the traffic as
well.
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Memory re-registeration is a feature that enables one to change
the attributes of a memory region, including PD, translation
(address and length) and access flags.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, MR re-registration isn't implemented in libibverbs.
The only part which does exist is an API call between libibverbs
and the provider's library. Since there's no way for a user application
to invoke this API call, it's safe to assume it's unused.
Similarly to other verbs (for example, ibv_modify_qp), a modification
to the MR shouldn't change the user's handle. The current existing API is:
struct ibv_mr * (*rereg_mr)(struct ibv_mr *mr,
int flags,
struct ibv_pd *pd, void *addr,
size_t length,
int access);
As a result, this API call returns the exact same pointer it gets.
Instead, we propose retuning a status int, which is far more useful
than the current return value.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add and update man pages to describe the Memory Window usage.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
At that step Memory Window type two is ready to be used, exposing
its device capabilities.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
To bind:
Application should directly post bind WR to the
QP using ibv_post_send.
To unbind/invalidate there are 2 options, local and remote
invalidations.
Local invalidation:
Send a work request where the immediate data contains the
MW's R_key and the opcode is IBV_WR_LOCAL_INV.
Upon success a completion with opcode of IBV_WC_LOCAL_INV is polled.
Remote invalidation:
Send with invalidate can be used to invalidate a remote memory
key.
The invalidation is done by posting a send work request, where the
opcode is IBV_WR_SEND_WITH_INV and the immediate data
contains the R_key.
Upon success, the responder's work completion will contain the
invalidated R_key in its immediate data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
At that step Memory Window type one is ready to be used, exposing
its device capability.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Type one:
- Bind through a verb.
- The R_key is allocated by the verb.
- To unbind, the bind verb should be used with size of 0.
- Belongs to a PD (no QP association).
Add above functionality by:
- Restructuring struct ibv_mw_bind.
- Expose ibv_bind_mw verb.
- Add an helper API to be used by drivers to increment the tag
part of the R_key.
- Add IBV_ACCESS_ZERO_BASED to expose the zero based address option.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add alloc/dealloc Memory Window verbs, those verbs
are used by both MW types one and two.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Update man page to:
- Align the ibv_send_wr structure according to the infiniband/verbs.h
- Include IBV_QPT_XRC_SEND supported opcodes.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add QP creation flags, specifically add a flag to indicate that
the QP will not receive self multicast loopback traffic.
To pass the QP creation flags to the kernel need to add
ibv_cmd_create_qp_ex2 API which follows the extended scheme
and uses the CREATE_QP_EX command.
ibv_cmd_create_qp_ex API doesn't follow the extended scheme,
it uses the CREATE_QP command and can't be used.
To prevent code duplication common code of above 2
functions was shared.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Setting the environment variable RDMAV_HUGEPAGES_SAFE tells the library
to check the underlying page size used by the kernel for memory regions.
This is required if an application uses huge pages either directly or
indirectly via a library such as libhugetlbfs.
The check of this variable was performed at the first call to
ibv_fork_init. This caused to unpredicted behavior in complex
applications with multiple underlying libraries.
The proposed change will allow support of huge pages without relying on
ibv_fork_init calls order.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The current code always monitors the first device found. This patch
allows to specify the monitored device by a command line argument.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Even though I need to manually update the changelog for each release,
there's no reason not to keep the version updated with the version
in configure.ac.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
One of the example programs issues compiler warnings with strict
aliasing enabled in the gcc options, so disable it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
make distcheck failed because it searched for headers
in the src directory.
Added noinst_HEADERS to fix that.
Change-Id: Ibc0949286a97ac8775156df6465e31fe301d27db
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add GID index parameter, it's needed for RoCE.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
SRQ number is 24 bits and not 16, fix to have sufficient
size for it.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add an example rule to show typical usage of
the API.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add man page for ibv_create_flow/ibv_destroy_flow verbs, it includes
the new dont_trap flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add an option to create a normal flow steering rule that doesn't trap
received packets, allowing them to match lower prioritized rules.
When the don't trap rule exists and matches a packet, the underlying HCA
should pass the packet to the rule's assigned QP(s). However, the HCA
will continue looking for other matches at lower priority rules, which
may be assigned to other QPs. This will let them get the traffic as
well.
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Memory re-registeration is a feature that enables one to change
the attributes of a memory region, including PD, translation
(address and length) and access flags.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, MR re-registration isn't implemented in libibverbs.
The only part which does exist is an API call between libibverbs
and the provider's library. Since there's no way for a user application
to invoke this API call, it's safe to assume it's unused.
Similarly to other verbs (for example, ibv_modify_qp), a modification
to the MR shouldn't change the user's handle. The current existing API is:
struct ibv_mr * (*rereg_mr)(struct ibv_mr *mr,
int flags,
struct ibv_pd *pd, void *addr,
size_t length,
int access);
As a result, this API call returns the exact same pointer it gets.
Instead, we propose retuning a status int, which is far more useful
than the current return value.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add and update man pages to describe the Memory Window usage.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
At that step Memory Window type two is ready to be used, exposing
its device capabilities.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
To bind:
Application should directly post bind WR to the
QP using ibv_post_send.
To unbind/invalidate there are 2 options, local and remote
invalidations.
Local invalidation:
Send a work request where the immediate data contains the
MW's R_key and the opcode is IBV_WR_LOCAL_INV.
Upon success a completion with opcode of IBV_WC_LOCAL_INV is polled.
Remote invalidation:
Send with invalidate can be used to invalidate a remote memory
key.
The invalidation is done by posting a send work request, where the
opcode is IBV_WR_SEND_WITH_INV and the immediate data
contains the R_key.
Upon success, the responder's work completion will contain the
invalidated R_key in its immediate data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
At that step Memory Window type one is ready to be used, exposing
its device capability.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Type one:
- Bind through a verb.
- The R_key is allocated by the verb.
- To unbind, the bind verb should be used with size of 0.
- Belongs to a PD (no QP association).
Add above functionality by:
- Restructuring struct ibv_mw_bind.
- Expose ibv_bind_mw verb.
- Add an helper API to be used by drivers to increment the tag
part of the R_key.
- Add IBV_ACCESS_ZERO_BASED to expose the zero based address option.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add alloc/dealloc Memory Window verbs, those verbs
are used by both MW types one and two.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Update man page to:
- Align the ibv_send_wr structure according to the infiniband/verbs.h
- Include IBV_QPT_XRC_SEND supported opcodes.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add QP creation flags, specifically add a flag to indicate that
the QP will not receive self multicast loopback traffic.
To pass the QP creation flags to the kernel need to add
ibv_cmd_create_qp_ex2 API which follows the extended scheme
and uses the CREATE_QP_EX command.
ibv_cmd_create_qp_ex API doesn't follow the extended scheme,
it uses the CREATE_QP command and can't be used.
To prevent code duplication common code of above 2
functions was shared.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Setting the environment variable RDMAV_HUGEPAGES_SAFE tells the library
to check the underlying page size used by the kernel for memory regions.
This is required if an application uses huge pages either directly or
indirectly via a library such as libhugetlbfs.
The check of this variable was performed at the first call to
ibv_fork_init. This caused to unpredicted behavior in complex
applications with multiple underlying libraries.
The proposed change will allow support of huge pages without relying on
ibv_fork_init calls order.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The current code always monitors the first device found. This patch
allows to specify the monitored device by a command line argument.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Even though I need to manually update the changelog for each release,
there's no reason not to keep the version updated with the version
in configure.ac.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
One of the example programs issues compiler warnings with strict
aliasing enabled in the gcc options, so disable it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
make distcheck failed because it searched for headers
in the src directory.
Added noinst_HEADERS to fix that.
Change-Id: Ibc0949286a97ac8775156df6465e31fe301d27db
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add IBV_DEVICE_UD_IP_CSUM, IBV_DEVICE_RAW_IP_CSUM and IBV_DEVICE_RC_IP_CSUM to
device capability enum field. These enum will denote IPv4 checksum offload
support for UD, RAW and RC QPs.
Flags IBV_SEND_IP_CSUM and IBV_WC_IP_CSUM_OK are added for utilizing this
capability for send and receive separately.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
On-demand paging feature allows registering memory regions without pinning
their pages. Unfortunately the feature doesn't work together will all
transports and all operations. This patch adds the ability to report on-demand
paging capabilities through the ibv_query_device_ex.
The patch also add the IBV_ACCESS_ON_DEMAND access flag to allow registration
of on-demand paging enabled memory regions.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add the verb ibv_query_device_ex which is extensible and allows following
commits to add new features to define additional properties.
Cc: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When adding this API, there had been consensus that having separate
lib_* and drv_* function pointers in the extended context struct
was not needed and should not be done. However, that snuck in anyway.
This backs that out and takes us back to a single pointer for each
function, but does so in a way as to preserve both back and forward
compatibility.
Fixes: 389de6a6ef4e (Add receive flow steering support)
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch adds the required platform specific code to allow execution of
the libibverbs functions on the s390x platform.
Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In order to implement RoCE IP based addressing for UD QPs, without introducing
uverbs changes, we need a way to resolve the L2 Ethernet addresses from user-space.
This is done with netlink through libnl, and in libibverbs such that multiple
vendor provider libraries can use the code.
This is implemented as a helper function ibv_resolve_eth_l2_from_gid.
If the GID is IP based, the provider shall call this utility function,
which resolves the IP to MAC and vlan.
The steps for resolution:
1. get sgid
2. from sgid, get the interface
3. query route from interface to the destination
4. query the neigh table, if the MAC for the destination is already known, done.
5. if not, loop until timeout:
a. send a UDP packet to that IP targeted to the "discard" port
b. listen to events from the neigh table
c. query neigh table in order to know if the neigh has shown up between
query until we started monitoring it
6. query vlan id from the interface
This solution depends on libnl-3 with backports to libnl-1.
Change-Id: I5c36145f8eaa46ab3890cc06da00d04686d68a1e
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add an enum that describes ibv_port_cap_flags that complies
with the respective kernel enum.
This value could be fetched when using ibv_query_port in
port_cap_flags.
Change-Id: I499565f17d378f525796ee187ef0fac91fd48c21
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Coverity found issue. When we read a sysfs file and the length of the
read exactly matches our buffer and we don't have a newline to replace
with a null termination, we either have to truncate the result, or fail
to null terminate. Either way, we will not get the desired behavior, so
in that case, fail the read entirely.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
It's not a warning or an error if libibverbs cannot find a userspace
driver for kernel devices. Indeed, returning a num_devices of is
sufficient -- the middleware shouldn't be unconditionally printing out
stderr message; let the upper layer application do that (if it wants
to).
For debugging purposes, if the environment variable IBV_SHOW_WARNINGS
is set (to any value), warnings will be emitted to stderr if a
corresponding userspace driver cannot be found for a kernel device.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Various arguments that should be unsigned were signed, giving rise to
the possiblity of people passing negative numbers when they intended to
pass large positive numbers (this was mainly seen in real world usage
with the size argument). Switch the args that are never legitimately
negative to unsigned.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets -- specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule into the HW so packets arriving at the local port can be
routed to them.
In a similar manner, when the device supports flow streeing, IB UD QPs
created by IPoIB allow userspace applications to steer specific TCP/IP
flows to their QPs.
This patch adds ibv_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ibv_attach_mcast() for IB UD multicast
handling.
Flow specifications are provided as instances of struct
ibv_flow_spec_yyy, which describes L2, L3 and L4 headers. Currently
specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are
made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ibv_flow_attr {
uint32_t comp_mask;
enum ibv_flow_attr_type type;
uint16_t size;
uint16_t priority;
uint8_t num_of_specs;
uint8_t port;
uint32_t flags;
/* Following are the optional layers according to user request
* struct ibv_flow_spec_xxx [L2]
* struct ibv_flow_spec_yyy [L3/L4]
*/
};
These flow specs are defined and used in a way which allows adding new
spec types without kernel/user ABI change, just with a little API
enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ibv_flow_attr the kernel uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
The returned value from ibv_create_flow() is a struct ibv_flow, which
contains a handle provided by the kernel to be used when calling
ibv_destroy_flow().
The ib_flow_attr enum type supports usage of flow steering for
promiscuous and sniffer purposes:
IBV_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule
specification
IBV_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule,
receive all Ethernet traffic which isn't steered to any QP
IBV_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but
only for multicast
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet
link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Each new verb using the verbs extensions approach includes an extended
header, containing:
__u16 provider_in_words;
__u16 provider_out_words;
__u32 cmd_hdr_reserved;
__u32 comp_mask;
The new macros IBV_INIT_CMD_EX() and IBV_INIT_CMD_RESP_EX() initialize
these fields.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Kernel commits 5db5765e255d ("IB/core: Add support for RDMA_NODE_USNIC_UDP")
and 180771a3707a ("IB/core: Add Cisco usNIC rdma node and transport types")
add IB_NODE_USNIC[_UDP] and IB_TRANSPORT_USNIC[_UDP]. Add the corresponding
support in libibverbs.
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Fix sync issue when clients go down. This prevents a case where the
client misses a response from the daemon and then waits forever.
Also fix a typo in error message.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
XRC receive QPs are shareable across multiple processes. Allow any
process with access to the XRC domain to open an existing QP. After
opening the QP, the process will receive events related to the QP and
be able to modify the QP.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
XRC queue pairs: XRC defines two new types of QPs. The initiator, or
send-side, XRC QP behaves similar to a send-only RC QP. XRC send QPs
are managed through the existing QP functions. The send_wr structure
is extended in a backwards compatible way to support posting sends on
a send XRC QP, which require specifying the remote XRC SRQ.
The target, or receive-side, XRC QP behaves differently than other
implemented QPs. A recv XRC QP can be created, modified, and
destroyed like other QPs through the existing calls. The
qp_init_attr structure is extended for XRC QPs.
Because XRC recv QPs are bound to an XRCD, rather than a PD, they are
intended to be used among multiple processes. Any process with access
to an XRCD may allocate and connect an XRC recv QP. The actual XRC
recv QP is allocated and managed by the kernel. If the owning process
explicitly destroys the XRC recv QP, it is destroyed. However, if the
XRC recv QP is left open when the user process exits or closes its
device, then the lifetime of the XRC recv QP is tied to the lifetime
of the XRCD.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
XRC support requires the use of a new type of SRQ: XRC shared receive
queues. XRC SRQs are similar to normal SRQs, except that they are bound
to an XRCD, rather than to a protection domain. Based on the current
spec and implementation, they are only usable with XRC QPs. To support
XRC SRQs, we define a new srq_init_attr structure to include an SRQ type
and other needed information.
The user-kernel ABI is also updated to allow creating extended SRQs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
XRC introduces several new concepts and structures, one of which is
the XRC domain.
XRC domains: XRCDs are a type of protection domain used to associate
shared receive queues with XRC queue pairs. Since XRCDs are meant to be
shared among multiple processes, we introduce new APIs to open/close
XRCDs.
The user-kernel ABI is extended to account for opening/closing XRCDs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add infrastructure to support extended verbs capabilities in a
forward/backward manner.
Support for extensions is determined by the provider calling
verbs_register_driver in place of ibv_register_driver. When
extensions are enabled, ibverbs sets the current alloc_context /
free_context device operations to NULL. These are used to indicate
that the struct ibv_device may be cast to struct verbs_device.
With extensions, ibverbs allocates the ibv_context structure and calls
into the provider to initialize it. The init call is part of the
verbs_device struct.
The abi_compat field of struct ibv_context is used to determine
support of verbs extensions. As a result, support for ABI version < 2
is removed (corresponds to kernel releases 2.6.11-2.6.14 no longer
being supported). The lowest ABI now supported is 3 (really 4 since
2.6.15 was ABI 4, I don't see that ABI 3 was in a release).
Acked-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Tzahi Oved <tzahio@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
If the send size is less than the cap.max_inline_data reported by the
qp, use the IBV_SEND_INLINE flag. This not gives an example of using
ibv_query_qp(), it also reduces the latency time shown by the pingpong
programs when the sends can be inlined.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Following advice in "Autotool Mythbuster" [1], option subdir-objects
can be used to have Makefiles create object files in the same
directory than theirs source files.
It reduces clobbering in the build directory.
[1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
'autoupdate' is a tool to help developer to update configure.ac.
This patch applies a few fixes as suggested by autoupdate.
It was tested on Debian 6.0.7 (Squeeze) and Fedora 17 (Beefy Miracle).
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
File opened by libibverbs are not supposed to be inherited across
exec*(), most of the files are of no use for another program, and
others cannot be used without the associated memory mapping.
This patch changes open() and fopen() to always set close on exec flag.
This patch also add checks to configure to guess if fopen() supports
"e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
support "e". If not supported, its discarded according to POSIX. Many
operating systems have support for fopen("e").
You might find more information about close on exec in the following articles:
- "Excuse me son, but your code is leaking !!!" by Dan Walsh
http://danwalsh.livejournal.com/53603.html
- "Secure File Descriptor Handling" by Ulrich Drepper
http://udrepper.livejournal.com/20407.html
Note: this patch won't set close on exec flag on file descriptors
created by the kernel for completion channel and such. This should be
addressed by a kernel patch.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add some entries to config/.gitignore for newer versions of the GNU
Autotools.
Also rename configure.in -> configure.ac to accomodate newer GNU Autotools.
(http://lists.gnu.org/archive/html/autotools-announce/2012-11/msg00000.html
announced the intent to drop support for "configure.in" in future
versions of Autoconf).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The old sequence of Autotools commands listed in autogen.sh is no
longer correct. Instead, just use the single "autoreconf" command,
which will invoke all the Right Autotools commands in the correct
order.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The physical link state on iWARP transports has no meaning, so don't
print it out at all.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The UD protocol doesn't support message sizes larger than the path
MTU. We don't go so far as to check path MTU, but we do check port
MTU. This prevents failed runs of the pingpong_ud program with large
MTUs.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The PPC architecture packs the work request struct 1.0 in such a way
that a straight memcpy won't work. Instead, break the copy out into
chunks whenever the sizes don't match for given portions of the struct.
Found by built in gcc memcpy buffer overflow checks.
Help on the right fix provided by Jakub Jelinek from the gcc
team inside Red Hat.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
We special case port == 0 to mean all ports, and it's the default,
so if a user passes in 0, they likely meant 1 instead. Throw an
error because they probably didn't mean to specify the default
behavior of scan all ports. Path of least surprise and all that.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Fix leaks in error paths.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add support for the following extended speeds:
FDR: IBA extended speed 14.0625 Gbps.
EDR: IBA extended speed 25.78125 Gbps.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.
This QP type is designed for userspace direct access to Ethernet; for
example, by applications that do TCP/IP themselves. Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
SCNxPTR is the correct format to parse uintptr_t hexadecimal values,
whatever the width of uintptr_t type.
This fixes a warning when building on a 32-bit system.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Fix compiler warnings when compiling with NVALGRIND defined and the
latest Valgrind header files. Recently the Valgrind client request
implementation has been modified in order to not trigger compiler
warnings when building with gcc 4.6. A side effect of that change is
that Valgrind client request macros that return a value have to be
cast to void in order to avoid a compiler warning.
For more information, see also:
* Valgrind manual about VALGRIND_MAKE_MEM_DEFINED (http://valgrind.org/docs/manual/mc-manual.html).
* Valgrind trunk r11755 (http://article.gmane.org/gmane.comp.debugging.valgrind.devel/13489).
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add code to ibv_devinfo to display the following new speeds:
8: FDR-10 is a proprietary link speed which is 10.3125 Gbps with 64b/66b
encoding rather than 8b/10b encoding.
16: FDR - 14.0625 Gbps
32: EDR - 25.78125 Gbps
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Fix the following automake warning message:
Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')
A quote from the automake manual:
INCLUDES
This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
if it is used). It is an older name for the same functionality. This
variable is deprecated; we suggest using AM_CPPFLAGS and per-target
_CPPFLAGS instead.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
|
|
Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
automake that the libibverbs package does not follow the GNU
standards. This change makes it possible to use 'autoreconf' for the
libibverbs package.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Since IBoE requires usage of GRH, update ibv_*_pinpong examples to
accept GIDs. GIDs are given as an index to the local port's table and
are exchanged between the client and the server through the socket
connection.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
|
|
Add handling for GID change events, which are generated by the kernel
IBoE stack when the HW driver updates the GID table.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
|
|
Modify the code to handle returning the link layer of a port from the
kernel to the library. The kernel has done this since commit
2420b60b1dc4 ("IB/uverbs: Return link layer type to userspace for
query port operation"), merged in 2.6.37-rc1.
The new field does not change the size of struct ibv_query_port_resp
as it replaces a reserved field. Binary compatibility between the
kernel to the library is kept, since old kernels running below new
library will not zero that field, so it will be read as "unspecified,"
while an old library running over new kernel will ignore the value
returned by the kernel.
The solution was suggested by Roland Dreier <roland@purestorage.com>
and Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
|
|
The new field has three possible values: IBV_LINK_LAYER_UNSPECIFIED,
IBV_LINK_LAYER_INFINIBAND, IBV_LINK_LAYER_ETHERNET. It can be used by
applications to know the link layer used by the port, which can be
either InfiniBand or Ethernet.
The addition of the new field does not change the size of struct
ibv_port_attr due to alignment of the preceding fields. Binary
compatibility between the library to applications is kept, since old
apps running over new library do not read this field, and new apps
running over old library will determine the link layer as unspecified
and hence take their IB code path.
The solution was suggested by Roland Dreier <roland@purestorage.com>
and Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Do not exit postinst if not configuring -- code added by debhelper needs
to run in all cases, not only the configure case.
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
When fork support is enabled in libibverbs, madvise() is called for
every memory page that is registered as a memory region. Memory
ranges that are passed to madvise() must be page aligned and the size
must be a multiple of the page size.
libibverbs uses sysconf(_SC_PAGESIZE) to find out the system page size
and rounds all ranges passed to reg_mr() according to this page size.
When memory from libhugetlbfs is passed to reg_mr(), this does not
work as the page size for this memory range might be different
(e.g. 16MB). So libibverbs would have to use the huge page size to
calculate a page aligned range for madvise.
As huge pages are provided to the application "under the hood" when
preloading libhugetlbfs, the application does not have any knowledge
about when it registers a huge page or a usual page.
To work around this issue, detect the use of huge pages in libibverbs
and align memory ranges passed to madvise according to the huge page
size. Determining the page size of a given memory range by watching
madvise() fail has proven to be unreliable. So we introduce the
RDMAV_HUGEPAGES_SAFE environment variable to let the user decide if
the page size should be checked on every reg_mr() call or not. This
requires the user to be aware if huge pages are used by the running
application or not.
I did not add an aditional API call to enable this, as applications
can use setenv() + ibv_fork_init() to enable checking for huge pages
in the code.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
[ Updated ibv_fork_init() manpage for RDMAV_HUGEPAGES_SAFE. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Dotan Barak <dotan@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
If there's no driver name, strsep() will set config to NULL and later
processing of the driver name will segfault.
Spotted with zzuf.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
If no devices are found, ibverbs_init() sets num_devices to 0. This
means the next call to __ibv_get_device_list() would call
ibverbs_init() again, which crashes because ibverbs_init() leaves
various internal pointers pointing to freed memory.
Fix this by using pthread_once() to call ibverbs_init() exactly once,
and then doing the right thing even if num_devices stays 0.
Tested-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add AC_PROG_LIBTOOL to configure.in to fix an autogen.sh warning about
LIBTOOL configuration.
Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Running autogen.sh with a new version of autotools and then building
on a system with an older version tends to explode. Unfortunately
this is sometimes necessary since the new version is required by the
package. The fix changes the autogen.sh output from:
+ aclocal -I config
+ libtoolize --force --copy
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.in and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
+ autoheader
+ automake --foreign --add-missing --copy
+ autoconf
to:
+ aclocal -I config
+ libtoolize --force --copy
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `config'.
libtoolize: copying file `config/libtool.m4'
libtoolize: copying file `config/ltoptions.m4'
libtoolize: copying file `config/ltsugar.m4'
libtoolize: copying file `config/ltversion.m4'
libtoolize: copying file `config/lt~obsolete.m4'
+ autoheader
+ automake --foreign --add-missing --copy
+ autoconf
And fixes various build problems in weird cases.
This is how GNU envisions this mess works at least...
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
ibv_asyncwatch defaults to block-buffering when stdout is redirected to
a file or pipe. Changing to line-buffered mode makes it more usable in
scripted environments.
Signed-off-by: Hakon Bugge <Haakon.Bugge@sun.com>
|
|
Add definitions for path record wire definition. This will be used by
the librdmacm and ib_acm service, and is exchanged with the kernel
using the newer set and query route functionality.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
ibv_madvise_range() doesn't cleanup if madvise() fails. This patch
rolls back changes already made in the memory range tracking tree by
madvise() calls before the one that failed. We can do this fairly
simply by simply restarting ibv_madvise_range() from the original
start to the current location with the opposite advice/inc values.
Signed-off-by: Alex Vainman <alexv@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
ibv_madvise_range() first manages (splits or merges) memory ranges in
the tree and only then calls madvise(). If madvise() fails, the
tree's memory range may contain incorrectly split or merged ranges.
The patch undoes the split and merge operations performed on the node
which caused the madvise() failure as well as on that node's
neighbors.
Signed-off-by: Alex Vainman <alexv@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
ibv_madvise_range() first updates the memory range reference count and
then calls to madvise(). If madvise() fails, the reference count of
the failed node is incorrect. Fix this by updating the node's
reference count only after a successful call to madvise() (or if no
call to madvise() was needed).
Signed-off-by: Alex Vainman <alexv@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Clean up some code in ibv_madvise_range() by adding functions
merge_ranges(), split_range() and get_start_node().
Signed-off-by: Alex Vainman <alexv@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add an override_dh_strip target so that the -dbg package ends up with
actual debug information in it. This was broken in the dh7 transition.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The debian rules use a override_dh_makeshlibs target, so (as lintian
points out) we need a build dependency on debhelper >= 7.0.50.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Avoid casting from uint8_t* to uint16_t* and then dereferencing to avoid
warnings about type punning.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Get rid of the output to stderr on various failure cases from
ibv_get_device_list() such as no device driver found, so that
applications can control how to present errors. Fix up the examples
and the man page to match.
Code expecting this behavior linking to old libibverbs will
get the old fprint and errno set to garbage (probably ESPIPE).
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix double free of sysfs_dev in find_sysfs_devs if ibv_read_sysfs_file()
fails (which is unlikely in practice).
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add missing breaks for the 'm' case of options handling.
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Arithmetic operations on enum members do not result in the enum type;
C++ is stricter about this than C. So using flag enums results in
compile errors when they are OR'd together in a C++ application.
To fix this, replace all flag enum objects with int. int was selected
to preserve the ABI; we checked that enum types are the same size as
int on at least i386, x86-64, ppc32, ppc64, ia64, and mips, and arm
and sparc also appear compatible with this choice.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
With debhelper 7 we can get just as simple a rules file without all of
the cdbs magic.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Conditionally use the new AM_SILENT_RULES macro in configure.in.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
This reverts commit 25ade84d1cd0b8b3a68872d3fc195e88cc7c4211. Rather
than using shave, we'll use automake 1.11's native quiet build.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
ibv_attach_mcast() and ibv_detach_mcast() don't change the gid
argument, so the arguments should be const to allow applications to
pass in constant gids. This constness flows through to the driver
call struct and into the drivers and back into
ibv_cmd_attach_mcast()/ibv_cmd_detach_mcast().
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
If the driver line starts with a / then no lib prefix is applied and
the full path is passed to dlopen(). This allows a completely
self-contained installation that relies on RPATH for the binaries and
this mechanism for the drivers.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
|
|
None of the changes 3.7.3 -> 3.8.2 affect us.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add shave (git://git.lespiau.name/shave) to make build output of libibverbs
much more readable by abbreviating the outputed commands so that
warnings become visible, etc.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Update Dotan's email in all of the files it appears.
Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
wmb() for PPC was incorrect defined as an eieio instruction in
libibverbs. eieio only orders pure I/O memory or a pure system memory
accesses. In a situation where the device drivers use the d_map
kernel services to share a portion of system memory with an I/O
adapter, we need to use sync() instead. See below link for reference:
http://www.ibm.com/developerworks/eserver/articles/powerpc.html
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Using ibv_port_state_str() changes the port state output of ibv_devinfo
(eg "PORT_DOWN" becomes "down"), which is reported to break scripts that
parse this output. Revert to using the old code in ibv_devinfo; we want
ibv_port_state_str() to continue producing the nicer-looking lower case
output, so just leave the open-coded alternative in ibv_devinfo.
Reported-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Improve readability based on warnings from kernel's checkpatch.pl.
Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
There actually is no ".nl" macro defined in troff, so convert all uses
of it to ".sp", which seems to be what was intended.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
This fixes the rpmlint warning
libibverbs-devel.x86_64: W: no-dependency-on libibverbs
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add ibv_xxx_str() functions to convert node type, port state, event
type and wc status enum values to strings.
Signed-off-by: Ira K. Weiny <weiny2@llnl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
libibverbs works with both iWARP and InfiniBand devices, so update
various places that talk about InfiniBand to be more general.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
libibverbs sources are now in downloads/verbs/, not just downloads/
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add a --sl/-l command line parameter for the pingpong examples to set
the SL of the QP/AH. This can be used to test a QoS setup.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Need to mark response buffer as defined after write() succeeds.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
If the IBV_SEND_INLINE flag is set in a work request posted with
ibv_post_send(), the data buffers can be reused immediately after the
call returns. Document this.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Some fixes and updates to several man pages:
* Correct formatting in a few places.
* Add more "SEE ALSO" functions where appropriate.
* Document byte order of GUID and P_Key fields.
* Fix example code in ibv_get_cq_event.3
* Document GRH handling on receive.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix the following issues reported by valgrind in the examples:
* memory leaks
* uninitialized members of attribute structures
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix several issues that were reported by valgrind:
* Initialize reserved attributes of command structures
* Fix the pointer and size when calling VALGRIND_MAKE_MEM_DEFINED in
ibv_cmd_reg_mr() and ibv_cmd_create_cq_v2(): if we have struct
xxx_resp *resp and resp_size, we need to do
VALGRIND_MAKE_MEM_DEFINED(resp, resp_size)
rather than the getting the paramters wrong as in
VALGRIND_MAKE_MEM_DEFINED(&resp, sizeof resp)
VALGRIND_MAKE_MEM_DEFINED(resp, sizeof resp_size);
* Call VALGRIND_MAKE_MEM_DEFINED for buffers that are filled by
the kernel in ibv_cmd_query_srq(), ibv_cmd_destroy_srq() and
ibv_cmd_query_qp().
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Use DEB_DH_MAKESHLIBS_ARGS_ALL to pass appropriate -V option to
dh_makeshlibs, since new symbols were added in version 1.1.0.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
A bare "-" in a man page will be rendered as a hyphen; to get a minus
sign, "\-" must be used. Very pedantic people (or automatic checkers,
such as Debian's lintian tool) may notice the difference. The man page
for ibv_query_pkey incorrectly wrote a negative return value as "-1".
Fix this to be the correct "\-1".
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
New dpkg can actually parse Homepage: fields in debian/control.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
None of the changes 3.7.2 -> 3.7.3 affect us.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
There are error cases in the kernel's uverbs work request posting
functions where the return value is negative (i.e., an error) and yet a
non-zero resp.bad_wr is not written back to userspace. In this case,
ibv_cmd_post_send() should still set the bad_wr pointer.
Bug pointed out in ibv_post_send() by Ralph Campbell
<ralph.campbell@qlogic.com>, and noticed elsewhere by Dotan Barak
<dotanb@dev.mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Our license information is properly described as "GPLv2 or BSD".
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
When allocating a device structure, set the node_type member correctly.
Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Initialize the reserved attributes in modify QP command to eliminate
valgrind warnings like:
==23549== Syscall param write(buf) points to uninitialised byte(s)
==23549== at 0x316B1B933F: (within /lib64/tls/libc-2.3.4.so)
==23549== by 0x4A33AF7: ibv_cmd_modify_qp (cmd.c:782)
==23549== by 0x4F860D8: mlx4_modify_qp (verbs.c:480)
==23549== by 0x4A37A53: ibv_modify_qp@@IBVERBS_1.1 (verbs.c:441)
==23549== by 0x40972E: qp_reset_to_rtr (mr_test_fun.c:1189)
==23549== by 0x403AFC: mr_test_connect_qp (mr_test.c:232)
==23549== by 0x404956: do_test (mr_test.c:85)
==23549== by 0x402DF8: main (main.c:448)
==23549== Address 0x7FEFFF2AE is on thread 1's stack
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
When the first memory range found in ibv_madvise_range() is merged
with the previous range before entering the loop that calls madvise(),
a too-big range could be passed to madvise(). This could lead to
trying to madvise() memory that has already been freed and unmapped,
which causes madvise() and therefore ibv_reg_mr() to fail.
Fix this by making sure we don't madvise() any memory outside the
range passed into ibv_madvise_range().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=682>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The AC_CHECK_HEADER() test for <valgrind/memcheck.h> will never result
in HAVE_VALGRIND_MEMCHECK_H being defined, so ibverbs.h will never
include <valgrind/memcheck.h> and Valgrind annotations will never actually
get built. Fix this by adding an AC_DEFINE() of HAVE_VALGRIND_MEMCHECK_H
if the header is found.
Pointed out by Jeff Squyres <jsquyres@cisco.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Update configure.in so that the comment generated by autoheader for
NVALGRIND in config.h.in is a complete sentence to match the style of
the rest of the file.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
<infiniband/arch.h> uses uint64_t, so it needs to include <stdint.h>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Replace ${Source-Version} with the more-correct ${binary:Version}.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|