A question was recently posed by a colleague about how to handle extracting specific text from a string with contains some desired information and some randomly appearing text. The API returns a string from which we need to extract the address and phone number, however, the address and phone are intermingled among other unrelated information.
As an example, consider the text “5439 Howe street call when you get to the lobby 412 760 7595 Pittsburgh, PA”. We want to get the address and the phone number and ignore the rest. My first thought was to use Regular Expressions to extract the phone number. This would be fairly straight forward, since all those components exist. We would need to account for parentheses and dashes as well as just spaces. More challenging would be the address since it would require some really complicated expressions to get the street, city, state, etc.
Apple engineers have mentioned before how one of the most common things they see is developers implementing features which they already provide. As luck would have it, Apple has already encountered this issue and provided a solution for us. Enter the
NSDataDetector class (and its relative
NSTextCheckingResult) which will provide the functionality that we require. Apple’s documentation for
“The NSDataDetector class is a specialized subclass of NSRegularExpression designed to match natural language text for predefined data patterns. Currently the NSDataDetector class can match dates, addresses, links, phone numbers and transit information.”
This sounds like just what we need. So how do we go about using this awesome new class? Well it is actually quite straight forward.
Apple has really made it very easy for us to process text that would be a challenge without a good understanding of regular expressions. For more information on
NSDataDetector see this NSHipster article.