Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RDMA transport #11182

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from
Open

Conversation

pizhenwei
Copy link
Contributor

@pizhenwei pizhenwei commented Aug 24, 2022

Main changes in this patch:

  • introduce Redis Over RDMA protocol, see Protocol section in RDMA.md
  • implement server side of connection module only, this means we can NOT
    compile RDMA support as built-in.
  • add necessary information in RDMA.md
  • support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380', then
    check this by 'rdma res show cm_id' and redis-cli(with RDMA support,
    but not implemented in this patch)
  • the full listeners show like():
    listener0:name=tcp,bind=,bind=-::,port=6379
    listener1:name=unix,bind=/var/run/redis.sock
    listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379
    listener3:name=tls,bind=,bind=-::,port=16379

valgrind test works fine:

valgrind --track-origins=yes --suppressions=./src/valgrind.sup
         --show-reachable=no --show-possibly-lost=no --leak-check=full
         --log-file=err.txt ./src/redis-server --port 6379
         --loadmodule src/redis-rdma.so port=6379 bind=xx.xx.xx.xx
         --loglevel verbose --protected-mode no --server_cpulist 2
         --bio_cpulist 3 --aof_rewrite_cpulist 3 --bgsave_cpulist 3
         --appendonly no

performance test:
server side:

./src/redis-server --port 6379 # TCP port 6379 has no conflict with RDMA port 6379
             --loadmodule src/redis-rdma.so port=6379 bind=xx.xx.xx.xx bind=yy.yy.yy.yy
             --loglevel verbose --protected-mode no --server_cpulist 2 --bio_cpulist 3
             --aof_rewrite_cpulist 3 --bgsave_cpulist 3 --appendonly no

build a redis-benchmark with RDMA support(not implemented in this patch), run
on a x86(Intel Platinum 8260) with RoCEv2 interface(Mellanox ConnectX-5):
client side:

./src/redis-benchmark -h xx.xx.xx.xx -p 6379 -c 30 -n 10000000 --threads 4
             -d 1024 -t ping,get,set --rdma


====== PING_INLINE ======
480561.28 requests per second, 0.060 msec avg latency.

====== PING_MBULK ======
540482.06 requests per second, 0.053 msec avg latency.

====== SET ======
399952.00 requests per second, 0.073 msec avg latency.

====== GET ======
443498.31 requests per second, 0.065 msec avg latency.

MR & Issues in history:
ISSUE Support RDMA as tranport layer protocol
MR Support RDMA as tranport layer protocol
MR Support RDMA as tranport layer protocol by rsocket
MR Fully abstract connection and make TLS dynamically loadable

@pizhenwei
Copy link
Contributor Author

Hi @yossigo @oranagra
Could you please give me any hint about this PR?
I'd like to know your concern/suggestion/plan ...

@xiezhq-hermann
Copy link

Hi Redis committee members @yossigo @oranagra. I'm a Ph.D. student at Stanford focusing on cloud computing. In my recent research project, I found this RDMA implementation for Redis to be both impressively organized and performant.
Given the widespread use of Redis as a benchmark in high-performance and I/O software stack studies, setting up a standard for RDMA implementation would greatly enhance the relevance and fairness of comparative analyses. Currently, many RDMA implementations reported in research papers are ad-hoc and infrequently publicly available, skewing benchmarks and comparisons.
I believe that this PR can potentially serve as a starting point and it might be worth having a core member steering this development with @pizhenwei. Please consider brining this up in an upcoming committee meeting, as it will undoubtedly benefit a broad audience.

@oranagra
Copy link
Member

Sorry for the lack of response here, @pizhenwei my sincere apologies, this PR is still marked in my followup list, but i never got to look at it.
The fact is that we simply lack the time and knowledge to really review it an make any informed decisions.
Without these, i don't think it'll be right to merge it and officially support it.
Not sure how we can proceed, sorry.
on the bright side, after all the work we previously did, the interfaces are stable, so this code can maybe be maintained elsewhere in the meanwhile..

@pizhenwei
Copy link
Contributor Author

Sorry for the lack of response here, @pizhenwei my sincere apologies, this PR is still marked in my followup list, but i never got to look at it. The fact is that we simply lack the time and knowledge to really review it an make any informed decisions. Without these, i don't think it'll be right to merge it and officially support it. Not sure how we can proceed, sorry. on the bright side, after all the work we previously did, the interfaces are stable, so this code can maybe be maintained elsewhere in the meanwhile..

Hi @oranagra
After Redis 7.2, I believe that Redis Over RDMA could be built as a shared library, and loaded by redis-server dynamically. As you mentioned, "this code can maybe be maintained elsewhere" works in theory.
However, Redis is ecosystem which has proxy and client implements, it needs "Redis Over RDMA" protocol to transfer payload. From the point of my view, Redis side SHOULD define "Redis Over RDMA" protocol at least. Then the implementation could be:

  • part of redis source code(like this PR).
  • an Out-Of-Tree connection library(maybe another repo).

Ether is fine to me, I volunteer to contribute/maintain this driver.

@pizhenwei
Copy link
Contributor Author

Hi Redis committee members @yossigo @oranagra. I'm a Ph.D. student at Stanford focusing on cloud computing. In my recent research project, I found this RDMA implementation for Redis to be both impressively organized and performant. Given the widespread use of Redis as a benchmark in high-performance and I/O software stack studies, setting up a standard for RDMA implementation would greatly enhance the relevance and fairness of comparative analyses. Currently, many RDMA implementations reported in research papers are ad-hoc and infrequently publicly available, skewing benchmarks and comparisons. I believe that this PR can potentially serve as a starting point and it might be worth having a core member steering this development with @pizhenwei. Please consider brining this up in an upcoming committee meeting, as it will undoubtedly benefit a broad audience.

Hi @xiezhq-hermann
Thanks for your feedback, glad to see this!

@oranagra
Copy link
Member

However, Redis is ecosystem which has proxy and client implements, it needs "Redis Over RDMA" protocol to transfer payload. From the point of my view, Redis side SHOULD define "Redis Over RDMA" protocol at least.

I understand that this part must be official in order for libraries to be able to adapt it, but i don't think i have the time or knowledge to make any informed decision about it.
@yossigo maybe you have an advise.

@yossigo
Copy link
Member

yossigo commented May 17, 2023

I agree there should be a formal specification of Redis over RDMA, but I also don't have the knowledge necessary to approach this. @pizhenwei Do you have a rough outine of how such spec needs to look like? For example, are there major decisions we'd need to make first? Are there other notable examples of similar protocols that have an RDMA-based spec?

@pizhenwei
Copy link
Contributor Author

pizhenwei commented May 23, 2023

I agree there should be a formal specification of Redis over RDMA, but I also don't have the knowledge necessary to approach this. @pizhenwei Do you have a rough outine of how such spec needs to look like? For example, are there major decisions we'd need to make first? Are there other notable examples of similar protocols that have an RDMA-based spec?

Hi @yossigo @oranagra

Let me enumerate several RDMA-based spec here(as far as I know):

In our proposal, we(with my colleagues @zhuojiang123 @zhangyiming1201 from RDMA team and friend @sigempty from Duke university) introduce a Ring buffer like mechanism for Redis(similar to papers, but not same).

I create a new PR to introduce the protocol only.
Maybe I should drop the protocol part from this PR.

Cc @xiezhq-hermann @Spartee suggestions/feedback are welcomed!

@zhangyiming1201
Copy link

In our proposal, we(with my colleagues @zhuojiang123 @zhangyiming1201 from RDMA team and friend @sigempty from Duke university) introduce a Ring buffer like mechanism for Redis(similar to papers, but not same).

As a key technology for high-speed networks, RDMA has been deployed on a large scale within ByteDance and has been applied in multiple services. We have been paying attention to the huge performance improvement of RDMA for Redis, and we are willing to provide more supplementary information about RDMA if necessary.

We will work with @pizhenwei to promote the optimization of the Redis over RDMA protocol continuously. Looking forward to its final realization. Believe it will play a certain role in various related applications.

@GuangguanWang
Copy link

GuangguanWang commented Dec 18, 2023

  • a common RPC lib. For example: brpc. This is designed as a common RPC framework. This is quite common to use, but not performance first.
  • SMC-R is ready since linux 4.11, from the test result, it seems not good enough.(up to ~1.4X QPS).

I am wondering whether performance is the only criterion for evaluation.

@pizhenwei
Copy link
Contributor Author

  • a common RPC lib. For example: brpc. This is designed as a common RPC framework. This is quite common to use, but not performance first.
  • SMC-R is ready since linux 4.11, from the test result, it seems not good enough.(up to ~1.4X QPS).

I am wondering whether performance is the only criterion for evaluation.

Yes, the performance is NOT good enough. But I'm not against SMC-R for Redis socket.

However this is not the entire reason. Generally, the protocol gets more close to the uplayer, the uplayer reaches higher performance.

  • SMC-R is designed as a general protocol, it's easy to use for user processes. For a regular/personal program which is NOT deployed widely, SMC-R may be a good choice.
  • But this one(Redis Over RDMA) is designed for Redis, it adapts to the Redis streaming transmission(from message transmission, also feature negotiation). As the test result shows, it gets 2.5x QPS. And Redis is almost deployed in the whole world, it's worth developing an exclusive protocol.

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2024

CLA assistant check
All committers have signed the CLA.

@JSpewock
Copy link

JSpewock commented Apr 2, 2024

Hi @pizhenwei

I work with the OFA on maintaining the FSDP cluster and we were looking to get this patch tested on the appropriate hardware. When working with this patch, it still applies on the unstable branch, but it seems like it uses the variable MAX_ACCEPTS_PER_CALL which no longer exists in the Redis repository, so it no longer builds. I was wondering if you had any insight into this issue, or if you would be able to rebase the commit onto the tip of the unstable branch. I can rollback to the commit it was applied on for the time being and test it there, but it would be great to be able to test the patch on latest as well. Thank you in advance for any assistance!

Main changes in this patch:
* introduce *Redis Over RDMA* protocol, see *Protocol* section in RDMA.md
* implement server side of connection module only, this means we can *NOT*
  compile RDMA support as built-in.
* add necessary information in RDMA.md
* support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380', then
  check this by 'rdma res show cm_id' and redis-cli(with RDMA support,
  but not implemented in this patch)
* the full listeners show like():
    listener0:name=tcp,bind=*,bind=-::*,port=6379
    listener1:name=unix,bind=/var/run/redis.sock
    listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379
    listener3:name=tls,bind=*,bind=-::*,port=16379

valgrind test works fine:
valgrind --track-origins=yes --suppressions=./src/valgrind.sup
         --show-reachable=no --show-possibly-lost=no --leak-check=full
         --log-file=err.txt ./src/redis-server --port 6379
         --loadmodule src/redis-rdma.so port=6379 bind=xx.xx.xx.xx
         --loglevel verbose --protected-mode no --server_cpulist 2
         --bio_cpulist 3 --aof_rewrite_cpulist 3 --bgsave_cpulist 3
         --appendonly no

performance test:
server side: ./src/redis-server --port 6379 # TCP port 6379 has no conflict with RDMA port 6379
             --loadmodule src/redis-rdma.so port=6379 bind=xx.xx.xx.xx bind=yy.yy.yy.yy
             --loglevel verbose --protected-mode no --server_cpulist 2 --bio_cpulist 3
             --aof_rewrite_cpulist 3 --bgsave_cpulist 3 --appendonly no

build a redis-benchmark with RDMA support(not implemented in this patch), run
on a x86(Intel Platinum 8260) with RoCEv2 interface(Mellanox ConnectX-5):
client side: ./src/redis-benchmark -h xx.xx.xx.xx -p 6379 -c 30 -n 10000000 --threads 4
             -d 1024 -t ping,get,set --rdma

====== PING_INLINE ======
480561.28 requests per second, 0.060 msec avg latency.

====== PING_MBULK ======
540482.06 requests per second, 0.053 msec avg latency.

====== SET ======
399952.00 requests per second, 0.073 msec avg latency.

====== GET ======
443498.31 requests per second, 0.065 msec avg latency.

Signed-off-by: zhenwei pi <[email protected]>
@pizhenwei
Copy link
Contributor Author

Hi @pizhenwei

I work with the OFA on maintaining the FSDP cluster and we were looking to get this patch tested on the appropriate hardware. When working with this patch, it still applies on the unstable branch, but it seems like it uses the variable MAX_ACCEPTS_PER_CALL which no longer exists in the Redis repository, so it no longer builds. I was wondering if you had any insight into this issue, or if you would be able to rebase the commit onto the tip of the unstable branch. I can rollback to the commit it was applied on for the time being and test it there, but it would be great to be able to test the patch on latest as well. Thank you in advance for any assistance!

Hi @JSpewock
I rebased code against the latest unstable (commit 4df0379), and force pushed.

I also tested this patch, it would be applied after commit c3f8b54(Manage number of new connections per cycle (#12178)). So if you want to apply this patch against a released version, I guest these commands would work successfully:

git format-patch -1 (in this branch, and xxx.patch should be generated)
git am xxx.patch (apply xxx.patch to the released version)

Any suggestoin&feedback is welcome! Have fun!

@JSpewock
Copy link

JSpewock commented Apr 20, 2024

Hi @pizhenwei

Out of curiosity, how have you been testing the server when you update the PR? When do a make RDMA_BUILD=module and then try to run the redis-benchmark that you provide in the original PR description, the benchmark hangs and the server is accepting the connection but then throwing the error # RDMA: FATAL error, unknown cmd. Do you know why this would be happening? I've tried Infiniband, Omni-Path, RoCE, and iwarp fabrics but they all throw the same message.

@pizhenwei
Copy link
Contributor Author

Hi @pizhenwei

Out of curiosity, how have you been testing the server when you update the PR? When do a make RDMA_BUILD=module and then try to run the redis-benchmark that you provide in the original PR description, the benchmark hangs and the server is accepting the connection but then throwing the error # RDMA: FATAL error, unknown cmd. Do you know why this would be happening? I've tried Infiniband, Omni-Path, RoCE, and iwarp fabrics but they all throw the same message.

Hi,
I forgot to update the client side URL in this PR. It should be https://github.com/pizhenwei/redis/tree/feature-rdma-with-cli.
PS: the new command is used for feature negotiation, once any new optional feature/command is supported in the future, the different version between server and client would not block the communication.

I only test this on RoCE/RXE(soft RoCE of linux), fixes on other platform are welcome!

@hz-cheng
Copy link

Many cloud providers offer RDMA acceleration on their cloud platforms, and I think that there is a foundational basis for the application of Redis over RDMA. We performed some performance tests on this PR on the 8th generation ECS instances (g8ae.2xlarge, 16 vCPUs, 32G DDR) provided by Alibaba Cloud. Test results indicate that, compared to TCP sockets, the use of RDMA can significantly enhance performance.

Test command of server side:

./src/redis-server --port 6379 \
  --loadmodule src/valkey-rdma.so port=6380 bind=11.0.0.114 \
  --loglevel verbose --protected-mode no \
  --server_cpulist 2 --bio_cpulist 3 --aof_rewrite_cpulist 3 \
  --bgsave_cpulist 3 --appendonly no 

Test command of client side:

# Test for RDMA
./src/redis-benchmark -h 11.0.0.114 -p 6380 -c 30 -n 10000000 \
  --threads 4 -d 1024 -t ping,get,set --rdma

# Test for TCP socket
./src/redis-benchmark -h 11.0.0.114 -p 6379 -c 30 -n 10000000 \
  --threads 4 -d 1024 -t ping,get,set

The performance test results are as shown in the following table. The throughput can be increased by at least 74%, and the average (AVG) and P99 latencies can be reduced by at least 33%.

RDMA TCP RDMA vs TCP
PING_INLINE
Throughput 644122.38 376704.59 170.99%
Latency-AVG 0.045 0.076 59.21%
Latency-P99 0.079 0.119 66.39%
PING_MBULK
Throughput 665601.69 380314.91 175.01%
Latency-AVG 0.044 0.075 58.67%
Latency-P99 0.079 0.119 66.39%
SET
Throughput 596979.31 314940.78 189.55%
Latency-AVG 0.049 0.092 53.26%
Latency-P99 0.079 0.143 55.24%
GET
Throughput 615308.88 335570.47 183.36%
Latency-AVG 0.047 0.085 55.29%
Latency-P99 0.079 0.135 58.52%

Besides, I seen the comment from @yossigo that RDMA networks are not easily accessible. If necessary, I could try reaching out to relevant colleagues to see if we can offer some Alibaba Cloud ECS instances to the community for free, so that you can use and test Redis over RDMA, as well as for future CI/CD purposes.

@pizhenwei
Copy link
Contributor Author

Hi @hz-cheng
I notice that you are the author of alibaba-cloud erdma driver for both linux kernel and rdma-core. Cooooooooool!

I'm happy to see the feedback from the cloud vendor side, this means lots of end-user will enjoy the improvement easily.

@pizhenwei
Copy link
Contributor Author

Many cloud providers offer RDMA acceleration on their cloud platforms, and I think that there is a foundational basis for the application of Redis over RDMA. We performed some performance tests on this PR on the 8th generation ECS instances (g8ae.2xlarge, 16 vCPUs, 32G DDR) provided by Alibaba Cloud. Test results indicate that, compared to TCP sockets, the use of RDMA can significantly enhance performance.

Test command of server side:

./src/redis-server --port 6379 \
  --loadmodule src/valkey-rdma.so port=6380 bind=11.0.0.114 \
  --loglevel verbose --protected-mode no \
  --server_cpulist 2 --bio_cpulist 3 --aof_rewrite_cpulist 3 \
  --bgsave_cpulist 3 --appendonly no 

Test command of client side:

# Test for RDMA
./src/redis-benchmark -h 11.0.0.114 -p 6380 -c 30 -n 10000000 \
  --threads 4 -d 1024 -t ping,get,set --rdma

# Test for TCP socket
./src/redis-benchmark -h 11.0.0.114 -p 6379 -c 30 -n 10000000 \
  --threads 4 -d 1024 -t ping,get,set

The performance test results are as shown in the following table. The throughput can be increased by at least 74%, and the average (AVG) and P99 latencies can be reduced by at least 33%.

RDMA TCP RDMA vs TCP
PING_INLINE
Throughput 644122.38 376704.59 170.99%
Latency-AVG 0.045 0.076 59.21%
Latency-P99 0.079 0.119 66.39%
PING_MBULK
Throughput 665601.69 380314.91 175.01%
Latency-AVG 0.044 0.075 58.67%
Latency-P99 0.079 0.119 66.39%
SET
Throughput 596979.31 314940.78 189.55%
Latency-AVG 0.049 0.092 53.26%
Latency-P99 0.079 0.143 55.24%
GET
Throughput 615308.88 335570.47 183.36%
Latency-AVG 0.047 0.085 55.29%
Latency-P99 0.079 0.135 58.52%
Besides, I seen the comment from @yossigo that RDMA networks are not easily accessible. If necessary, I could try reaching out to relevant colleagues to see if we can offer some Alibaba Cloud ECS instances to the community for free, so that you can use and test Redis over RDMA, as well as for future CI/CD purposes.

@yossigo @oranagra
PING!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

None yet

9 participants