We have create an algorithm based on the specifications below and tested it on several hundred pages. The simple algorithm is:
ASCII art occurs between PRE or XMP elements
It must have greater than 5 lines of text
It must have at least one sequence of 5 same characters
The algorithm works quite well at detecting ASCII art. It misses very few ASCII art images and gives a small number of false positives. Two of the things it detects as ASCII art are computer programming source code examples and guitar tabulature.