New Tool: sortcanon.py

sortcanon.py is a tool to sort text files according to some canonicalization function. For example, sorting domains or ipv4 addresses.

This is actually an old tool, that I still had to publish. I just updated it to Python 3.

This is the man page:

Usage: sortcanon.py [options] [files]
Sort with canonicalization function

Arguments:
@file: process each file listed in the text file specified
wildcards are supported

Valid Canonicalization function names:
 domain: lambda x: '.'.join(x.split('.')[::-1])
 ipv4: lambda x: [int(n) for n in x.split('.')]
 length: lambda x: len(x)

Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m, --man             Print manual
  -c CANONICALIZE, --canonicalize=CANONICALIZE
                        Canonicalization function
  -r, --reverse         Reverse sort
  -u, --unique          Make unique list
  -o OUTPUT, --output=OUTPUT
                        Output file

Manual:

sortcanon is a tool to sort the content of text files according to some
canonicalization function.
The tool takes input from stdin or one or more text files provided as argument.
All lines from the different input files are put together and sorted.

If no option is used to select a particular type of sorting, then normal
alphabetical sorting is applied.

Use option -o to write the output to the given file, in stead of stdout.

Use option -r to reverse the sort order.

Use option -u to produce a list of unique lines: remove all doubles before
sorting.

Option -c can be used to select a particular type of sorting.
For the moment, 2 options are provided:

domain: interpret the content of the text files as domain names, and sort them
first by TLD, then domain, then subdomain, and so on ...

length: sort the lines by line length. The longest lines will be printed out
last.

ipv4: sort IPv4 addresses.

You can also provide your own Python lambda function to canonicalize each line
for sorting.
Remark that this involves the use of the Python eval function: do only use this
with trusted input.


sortcanon_V0_0_1.zip (http)
MD5: CC20EA756E3E0796C617830C8F91AFF4
SHA256: 42EDE51EE70A39FD0933A77B8FE119F1CA8C174336C0DA4C079B1F02C1AB33EC

Article Link: New Tool: sortcanon.py | Didier Stevens