Matching Explained

To understand the need for data matching technology requires a basic familiarity with how databases access data and what their limitations are. Databases store, identify, and find data through the use of keys. A key is simply a unique value that is assigned to a record in a database, much like an account number. Different organizations such as mortgage companies, title companies, and county tax offices often gather and store information about the very same physical entities (homes, homeowners, loans, etc.), but do not use the same key values to identify that information in their respective databases. This presents a challenge when data from one database must be used to supplement data from another (parcelization, title searches, deed updates, etc.), since no database has a built-in way of performing such a merge.

Certainty versus uncertainty

Databases are perfectly accurate and extremely efficient at matching data when records share the same key. When a key, by definition, identifies one and only one record in a database, querying the database for a matching key value returns a "yes" or "no" answer; either a record exists with this key value or it doesn't. In other words, the outcome is absolutely certain.

But when databases do not share the same key, non-unique  identifiers like addresses, names and legal descriptions are the only way to merge the datasets (match records).  But when using data elements like names and addresses, any given record from database A could have hundreds if not thousands of potentially matching counterparts in database B, or no matching record at all, so the selection process becomes much more difficult. Therefore, instead of "yes" or "no", the answer to the above question becomes "maybe", and therefore uncertain.

The difference between "yes/no" and "maybe" represents the large gap between the two main camps of matching technology: deterministic vs. probabilistic. Historically, the real estate industry has relied on deterministic-based technology for its matching needs. These systems use traditional nested logic to initially separate potentially good matches from bad ones, then back it up with a safety net of rules and exceptions to further weed out likely mistakes.  To many companies who demand accuracy in matching, a deterministic, or rules-based approach, seems easy and straightforward since individual rules and exceptions are easy to define and enforce. Good matches are simply defined as those that do not violate any of the rules or exceptions. And while some deterministic systems are reliable, there are drawbacks to the approach (sometimes referred to as triangulated systems in the real estate industry). The sets of rules and exceptions typically grow over the years into a tangle of overlapping conditions that tend to cannibalize one another. As a result, they hit a ceiling of sorts that keeps match rates down.

Mimicking the Human Thought Process

To increase match rates and better manage the uncertainty of matching on non-unique identifiers, we at Accumatch have taken a different approach: a statistically-based probabilistic one. As we developed this software, we painstakingly considered the way a human being looks up and verifies matching real estate records.  We found that experienced 'Searchers' don't just adhere to a set of rules and exceptions when deciding which matches are good ones - they trust their instincts. As we say, they take a 'whole record approach'. Based on their expertise with real estate data, they weigh the positive and negative aspects of all the data elements that contribute and detract from a potential match, until they settle on a sort of subconscious score. With a skilled appreciation of what is acceptable to their company, they are able to compare this 'score' to their company's business rules and separate correct matches from mismatches.

The advanced algorithms embedded in Searchlight were designed to mimic those same human thought processes. Unlike most deterministic systems, Searchlight is able to overcome misspellings, formatting issues, data mutilation, and other data discrepancies the way a skilled human would, without the mistakes caused by fatigue or typos. The result is higher match rates with virtually no errors. Through proper weighting, scoring, and threshold determination, Searchlight is able to match records that would have fallen through the cracks of traditional deterministic systems, without introducing errors.

A mortgage company's loan record:

Loan account number: 182-48974-12
Borrower name: Francis Jones
Property address: 123 Main St. Dallas, TX 75202
Property legal description: Main Pl. Blk 1 Lot 3

A county tax office's real property record:

Parcel number: 1000002355897
Homeowner name: Frances Jones
Property address: 123 Main Street Dallas, TX 75202
Property legal description: Bk 1 Lt 3 Main Place Subdivision
Annual taxes due: $2,780.25

*The loan record 'matched' the real property record on name, address, and legal description, despite misspellings and formatting differences.

But what if the mortgage company needs to enhance some of the records in its database with data from the county's database?

In the figure above you see a record from Big Nation's database, followed by the best matching record from the Dallas County Tax Office's database. Since Francis Jones is a Big Nation customer,  Big Nation is responsible for paying $2,780.25 in property taxes to Dallas County at year's end, on Francis' behalf. To make sure the taxes are paid on the correct piece of property, the mortgage company must include the parcel number when making a payment to the tax office. The tax office requires a parcel number with that payment because it identifies one and only one piece of property on their own database. i.e., it is a unique identifier.

But the mortgage company uses a different number to identify Francis Jones' home on its database and doesn't know what the parcel number is without looking it up on the tax office's database. Since their own unique identifier, the loan account number, is completely unrelated to the county's parcel number it is useless as a one-to-one matching identifier.

Using imprecise identifiers

The only way for Big Nation to find the parcel number for Francis Jones' home is to search for it on the county's own database using her name, address, and maybe even the legal description of her property. This is where things get tricky. The root 'matching problem' in this case is that Big Nation must find the parcel number that uniquely identifies a piece of property in the tax office's database, using data that does not  uniquely identify that piece of property on the tax office's database, data such as names and addresses. Furthermore, they must be very accurate about it, otherwise they run the risk of paying someone else's taxes by mistake, which can be very costly.

For a mortgage company the size of Big Nation, which processes 500 or more loans per day, doing all this matching by hand is slow and expensive, both in terms of salaries and in human error costs. A human searcher could take a couple of days to look up parcel numbers for 500 loans, and would probably make four or five mistakes in the process. Searchlight automated matching software can match thousands of records per hour, with virtually no errors.

Our technology sets us apart

Accumatch boasts a tremendous inventory of proprietary intellectual property, which includes the industry's most advanced matching technology, a huge nationwide database of real property data, and a tax servicing system that are second to none.