patternpythonMinor
Efficient Ring Buffer (FIFO)
Viewed 0 times
efficientfiforingbuffer
Problem
I need to implement a Ring Buffer/FIFO for data coming from a TCP socket.
It must support the following operations:
I guess my needs are pretty standard for a TCP-based streaming protocol, but surprisingly I have never found a "best practice" method for doing this.
There are similar questions on SO already, most suggest to use
Mixing various suggestions I came up with the following implementation, which seems working but I wonder: can I do any better, performance-wise? Removing a single byte at a time in
If you are wondering why I am returning a string from
It must support the following operations:
- Append a the recv()'ed chunk of bytes.
- Allow me to peek at the beginning of the buffer, since i get differently-sized packets, and I must decode a small fixed-size header to know how many bytes to process.
- Remove a chunk of bytes from the beginning of the buffer for processing.
I guess my needs are pretty standard for a TCP-based streaming protocol, but surprisingly I have never found a "best practice" method for doing this.
There are similar questions on SO already, most suggest to use
collections.deque, which is fine but has some shortcomings that must be worked around for my needs:- It doesn't easily allow peeking.
- It doesn't allow removal of chunks of bytes.
Mixing various suggestions I came up with the following implementation, which seems working but I wonder: can I do any better, performance-wise? Removing a single byte at a time in
get() doesn't look optimal at all.import collections
import itertools
class RingBuffer (object):
"""Ring buffer"""
def __init__ (self, size = 4096):
self._buf = collections.deque (maxlen = size)
def put (self, data):
"""Adds data to the end of the buffer"""
self._buf.extend (data)
def get (self, size):
"""Retrieves data from the beginning of the buffer"""
data = str ()
for i in xrange (size):
data += self._buf.popleft ()
return data
def peek (self, size):
"""\"Peeks\" at the beginning of the buffer (i.e.: retrieves data without removing them from the buffer)"""
return str (bytearray (itertools.islice (self._buf, size)))
def len (self):
"""Returns the length of the buffer"""
return len (self._buf)If you are wondering why I am returning a string from
peek(), that is because I need to process its return valSolution
I suspect you’re stuck using one-pop-at-a-time if you want to get multiple elements at once.
I found this quote from one of the
There is no multi-pop method for deques.
The OP on that question had a very similar construction to you (they were saving the data to a list rather than a str), and he said:
Yes, it is correct. Yes, it is reasonably efficient though it can be further sped-up with boundmethods and itertools. No, it isn't stupid :-)
I don’t know enough about boundmethods or itertools to know what he was suggesting.
That doesn’t seem particularly helpful, so here are a few other comments on your general coding style:
And a few comments on your
-
In the for loop, your index variable is named
-
It used to be the case that it was better to construct a list of strings and then call
Regardless, I happen to have a personal preference for that construction. It makes the method slightly shorter:
But I think that’s entirely personal preference: I did an short performance test, and I couldn’t see any difference between this and your
-
Suppose I have an empty ring buffer, and I try to
The user sees the IndexError that arises from trying to pop an empty deque:
Unless you knew that RingBuffer() was using a deque under the hood, this error message would be confusing. I’d recommend having a custom error message to explain that somebody’s trying to get more items than you have to give.
It’s also worth noting that if you use a
Your
I found this quote from one of the
collections developers on Stack Overflow about using pop() for many elements in a deque:There is no multi-pop method for deques.
The OP on that question had a very similar construction to you (they were saving the data to a list rather than a str), and he said:
Yes, it is correct. Yes, it is reasonably efficient though it can be further sped-up with boundmethods and itertools. No, it isn't stupid :-)
I don’t know enough about boundmethods or itertools to know what he was suggesting.
That doesn’t seem particularly helpful, so here are a few other comments on your general coding style:
- You shouldn’t have a space immediately before the opening parens that starts a list of arguments to a function. (PEP 8: Pet Peeves)
- It’s also recommended that you don’t have a space around the equals sign for keyword arguments or default values. (PEP 8: Other Recommendations)
- In the interest of using descriptive variable names, I see no reason not to extend the length of the variable name
_bufto_buffer. Characters are cheap.
- Lines should be limited to 79 characters (PEP 8: Maximum Line Length), which includes docstrings: specifically, the docstring for
peek().
- If you have a single double-quote within a triple-quoted string, you don’t need to escape it. You only need to escape quotes within a triple-quoted string if you have three of them at once.
- The docstring for
RingBuffer()is almost useless. It tells me nothing that I couldn’t have learnt from the name of the class.
And a few comments on your
get() method in particular:-
In the for loop, your index variable is named
i, but you never use the value of this variable. It’s common practice to give unused index variables the name _, to make it explicit that the value of this variable is unimportant. That is,for _ in xrange(size):
data += self._buf.popleft()-
It used to be the case that it was better to construct a list of strings and then call
join() on the list than be continually appending to strings, but I think that’s fixed in newer versions of Python.Regardless, I happen to have a personal preference for that construction. It makes the method slightly shorter:
def alt_get(self, size):
"""Retrieves data from the beginning of the buffer."""
data = ''.join(self._buf.popleft() for _ in xrange(size))
return dataBut I think that’s entirely personal preference: I did an short performance test, and I couldn’t see any difference between this and your
get() method.-
Suppose I have an empty ring buffer, and I try to
get() an element from it:r = RingBuffer()
r.get()The user sees the IndexError that arises from trying to pop an empty deque:
Traceback (most recent call last):
File "buffer.py", line 29, in
r.get(1)
File "buffer.py", line 17, in get
data += self._buf.popleft ()
IndexError: pop from an empty deque
Unless you knew that RingBuffer() was using a deque under the hood, this error message would be confusing. I’d recommend having a custom error message to explain that somebody’s trying to get more items than you have to give.
It’s also worth noting that if you use a
try … except block here, the buffer will be emptied before you notice, and the contents could be lost. You need a check before you access anything from the buffer. Something like this:def get(self, size):
"""Retrieves data from the beginning of the buffer"""
if size > len(self._buf):
raise IndexError("Too many items: trying to access %d items "
"from a buffer of length %d" % (size, len(self)))
data = [self._buf.popleft() for _ in xrange(size)]
return dataYour
peak() method doesn't suffer from this problem.Code Snippets
for _ in xrange(size):
data += self._buf.popleft()def alt_get(self, size):
"""Retrieves data from the beginning of the buffer."""
data = ''.join(self._buf.popleft() for _ in xrange(size))
return datar = RingBuffer()
r.get()def get(self, size):
"""Retrieves data from the beginning of the buffer"""
if size > len(self._buf):
raise IndexError("Too many items: trying to access %d items "
"from a buffer of length %d" % (size, len(self)))
data = [self._buf.popleft() for _ in xrange(size)]
return dataContext
StackExchange Code Review Q#85068, answer score: 2
Revisions (0)
No revisions yet.