Area codes in the US

23 Apr 2011

related: snippet , python

Wikipedia has a list of the area codes used in the US. How many are used vs unused?

So you copy and paste that crap into a text file, let’s say it’s called areacodes.txt, and let’s make sure it’s really plain-text and not full of HTML badness.

It’s not so nice looking though; if you copied it into TextEdit and then converted to plain-text, it has bullet points and region names and area codes separated by slashes, and just isn’t that machine-readable

Clean it up with some sed:

1
2
3
4
# cat areacodes.txt |sed ’s/[^0-9]*\([0-9/]*\).*/\1/’ |tr “/” “\n” |sort |uniq > sortedareacodes.txt

# cat sortedareacodes.txt |wc -l
     288

That cuts out anything on a line except 3-digit numbers with optional ’/’ separators. Then converts ’/’ into newlines. Sorts and removes duplicates.

That’s great. Now Python it up.

1
2
3
4
5
6
7
# python
>>> f = open(“sortedareacodes.txt”)
>>> codes = f.read().split(“\n”)
>>> all = range(200,999)
>>> unused = [x for x in all if x and str(x) not in codes]
>>> len(unused)
512

That reads in the file and makes a list with one string per area code. ‘all’ is a list of all the valid area codes (since they can’t start with 0 or 1, and are 3-digits). The list comprehension makes a list of every 3-digit number that is in ‘all’ and not in ‘codes’.

Wikipedia knows of 288 area codes. 512 valid ones remain, minus any reserved combos this does not consider, like 555.