Redis is an awesome key-value storage unit having client support for multiple languages,but being a python fan I will be discussing it's python client ,redis-py.A great introductory article on redis and redis-py has been provided by Adam at PlayNice.ly.The idea is simple redis allows you to store a key (which can be string,list,has,set or sorted set) and it's value (which again can be a list,string,set or a sorted set).Redis operates as a client-server model.It provides a basic TCP server.The requests for key-value pairs to be stored are forwarded to the server via client(in my case redis-py).So,each request made to the server is associated with a round trip time(RTT).RTT basically involves time taken for server to receive the command from the client and execute it.Hence,a smaller RTT can be a huge performance improvement.
So instead of executing commands one by one,if a number of commands were executed in batch RTT could be significantly reduced.This is were redis pipeline comes into play.The pipeline is based on the concept of queue(FIFO).The set of commands to be executed are all queued up in pipe but, executed later together.
In python we make an instance of Redis class available in redis module which represents the conncetion to the redis server.The pipeline object is obtained by calling the pipeline method on the Redis instance.This pipeline class inherits Redis.Hence the pipeline object provides all the commands for inserting key-value pairs along with the additional facility of pipelining these commands to be executed later by calling the execute method on the pipeline object.The code for comparing the performance between with and without pipeline is provided on redis website here, but it is in Ruby.So i decided to implement it in python to understand it better, as I am a strong believer of learning by doing.Now time for some code:
Importing the relevant modules
1: import redis2: import time
Now defining the function without_pipeline():
1: def without_pipeline():
2: r=redis.Redis()
4: for i in range(10000):
5: r.ping()
6: return
ping() is a simple method used to check if the server is running or not.If the server is running it returns "PONG" in response.The command ping can also be tested from redis command line(redis-cli).In above code we sequentially ping the server 10,000 times.
Now with_pipeline():
1: def with_pipeline():
2: r=redis.Redis()
3: pipeline=r.pipeline()
4: for i in range(10000):
5: pipeline.ping()
6: pipeline.execute()
7: return
The commands in the pipeline are executed simultaneously by pipeline's execute() method.
The bench function below is used for benchmarking or estimating the rtt in both the cases
1: def bench(desc):
2: start=time.clock()
3: desc()
4: stop=time.clock()
5: diff=stop-start
6: print "%s has taken %s"%(desc.func_name,str(diff))
The bench function takes a function as an argument, as functions in python are callable objects(giving testament to python's awesomeness).A simple timer is used for determining rtt.So final code is:
1: import redis
2: import time
3: def bench(desc):
4: start=time.clock()
5: desc()
6: stop=time.clock()
7: diff=stop-start
8: print "%s has taken %s"%(desc.func_name,str(diff))
9: def with_pipeline():
10: r=redis.Redis()
11: pipeline=r.pipeline()
12: for i in range(10000):
13: pipeline.ping()
14: pipeline.execute()
15: return
16: def without_pipeline():
17: r=redis.Redis()
18: pipeline=r.pipeline()
19: for i in range(10000):
20: r.ping()
21: return
22: if __name__=="__main__":
23: bench(without_pipeline)
24: bench(with_pipeline)
Output obtained:
without_pipeline has taken: 0.39
with_pipeline has taken : 0.19
The results speak for themselves.Although it is not very accurate but gives an idea of pipeline's advantage. Feel free to leave your comments below or suggest any other methods for trying this out.