patternpythonMinor
Efficiently matching IP addresses with a list of CIDRs
Viewed 0 times
efficientlywithaddressescidrslistmatching
Problem
I am using Python3 to extract some information from a dictionary that contains 10k of CIDRs, but the running time is pretty long.
Here is my purpose: I have a dictionary of CIDRs (10000) and a list of IPs (5000), then I want to go through each IP in the list and check if it belongs to one CIDR in the dictionary.
Could you please give me some trick to reduce the running time?
Here is my purpose: I have a dictionary of CIDRs (10000) and a list of IPs (5000), then I want to go through each IP in the list and check if it belongs to one CIDR in the dictionary.
Could you please give me some trick to reduce the running time?
#python 3:
import os
import ipaddress
Dict = {127.0.0.0/8 : ABC, 169.128.0.0/16 : DEF}
List = {127.0.0.1, 126.1.1.1, 169.2.2.2, 150.0.22.2}
for IP in List:
for CIDR in Dict.keys():
if ipaddress.ip_address(IP) in ipaddress.ip_network(CIDR):
print(CIDR)
breakSolution
First of all, welcome to Code Review! This is a great first question. There are some things to learn about how to post a question in order to get the best response, so I'll cover these first and then cover the performance problems that you're seeing.
Readability
You will get more reviews if your code is readable. I find the layout of your code very clear and readable apart from two things.
I understand that these two variables are not part of the code you use in the real world setting and were just quickly added for the purposes of this question. However, it's worth using clear variable names and standard indenting because the reviewer won't have your familiarity with the code.
Note that your
Running your code before you post it
You didn't run this code before you posted it, and it contains syntax errors as pointed out by 200_success.
You can't be expected to spot every minor typo in your code before you post it. However, you are expected to run the code before you post it. This would have picked up the problem with
Since the site rules require working code, it would be perfectly reasonable for a reviewer to simply move on to the next question if this one has code that doesn't run. For this reason you are also likely to get more reviews and to see your first review sooner if you run your code before posting it.
For reference, the syntax errors on both lines are due to not enclosing strings in quotes, so that python tries to evaluate them as expressions.
should be:
Repeating work
You are repeating two things, each of which slows down your program.
-
You convert
-
You convert
You state that there are 10,000
The first problem will therefore repeat the
The second problem will repeat all 10,000
How to return an unconverted
I can see why you might be tempted to loop over unconverted
You can return the unconverted CIDR by creating a dict at the start which has converted
For example:
Now you can work with converted CIDRs to avoid all that recalculation, and still return an unconverted CIDR at the end by doing a quick dictionary lookup using the converted CIDR as the key.
Readability
You will get more reviews if your code is readable. I find the layout of your code very clear and readable apart from two things.
- Your indenting uses only 2 spaces instead of the recommended 4.
- Your variable names
DictandListare not descriptive and start with an uppercase letter (which is generally reserved for classes)
I understand that these two variables are not part of the code you use in the real world setting and were just quickly added for the purposes of this question. However, it's worth using clear variable names and standard indenting because the reviewer won't have your familiarity with the code.
Note that your
List is not a list, since you have used braces { and } instead of square brackets [ and ]. Braces denote a set, or a dict if key value pairs are used. In your case List is a set, which has different behaviour.Running your code before you post it
You didn't run this code before you posted it, and it contains syntax errors as pointed out by 200_success.
You can't be expected to spot every minor typo in your code before you post it. However, you are expected to run the code before you post it. This would have picked up the problem with
Dict and with List. It only takes a moment and means you will get more meaningful reviews, less confusion, and use up less of the reviewers time so they can help more people.Since the site rules require working code, it would be perfectly reasonable for a reviewer to simply move on to the next question if this one has code that doesn't run. For this reason you are also likely to get more reviews and to see your first review sooner if you run your code before posting it.
For reference, the syntax errors on both lines are due to not enclosing strings in quotes, so that python tries to evaluate them as expressions.
Dict = {127.0.0.0/8 : ABC, 169.128.0.0/16 : DEF}
List = {127.0.0.1, 126.1.1.1, 169.2.2.2, 150.0.22.2}should be:
Dict = {'127.0.0.0/8' : 'ABC', '169.128.0.0/16' : 'DEF'}
List = {'127.0.0.1', '126.1.1.1', '169.2.2.2', '150.0.22.2'}Repeating work
You are repeating two things, each of which slows down your program.
-
You convert
IP using ipaddress.ip_address(IP). This conversion always gives the same output for a given IP, so you can calculate it once at the start of the outer loop rather than recalculating the same thing every time you start the inner loop.-
You convert
CIDR using ipaddress.ip_network(CIDR). Each time through the inner loop you will be converting the same list of CIDRs. If you have sufficient memory you can calculate all of these once, and then use those precalculated values rather than recalculating.You state that there are 10,000
CIDRs and 5,000 IPs.The first problem will therefore repeat the
IP conversion on average 5,000 times for cases where a match is found (because on average it will get half way through the 10,000 CIDRs before finding the match). Where no match is found it will repeat the IP conversion 10,000 times, once for each CIDR. Between 5,000 and 10,000 conversions for each of 5,000 IPs is between 25,000,000 and 50,000,000 conversions when only 5,000 are needed (once per IP).The second problem will repeat all 10,000
CIDR conversions for any IP that does not find a match, and on average 5,000 CIDR conversions for any IP that does find a match. That's between 25,000,000 and 50,000,000 conversions when only 10,000 are needed (once per CIDR).How to return an unconverted
CIDRI can see why you might be tempted to loop over unconverted
CIDRs - you return an unconverted CIDR so working with a list of converted CIDRs would mean you'd have to find out which CIDR the converted CIDR came from before returning it.You can return the unconverted CIDR by creating a dict at the start which has converted
CIDR as the key and original CIDR as the value.For example:
converted_CIDRs = {ipaddress.ip_network(CIDR):CIDR for CIDR in Dict.keys()}Now you can work with converted CIDRs to avoid all that recalculation, and still return an unconverted CIDR at the end by doing a quick dictionary lookup using the converted CIDR as the key.
Code Snippets
Dict = {127.0.0.0/8 : ABC, 169.128.0.0/16 : DEF}
List = {127.0.0.1, 126.1.1.1, 169.2.2.2, 150.0.22.2}Dict = {'127.0.0.0/8' : 'ABC', '169.128.0.0/16' : 'DEF'}
List = {'127.0.0.1', '126.1.1.1', '169.2.2.2', '150.0.22.2'}converted_CIDRs = {ipaddress.ip_network(CIDR):CIDR for CIDR in Dict.keys()}Context
StackExchange Code Review Q#57493, answer score: 6
Revisions (0)
No revisions yet.