The Math Behind The FullContact Name API

Given name analysis is done based on an estimated living population for a name. Name records were collected from the US Social Security Administration by year of birth and then utilizing actuarial period of life data an analysis was done to determine living people by given name.

Family name analysis was done based on 2010 US Census data. Because of the methods used my the US SSA and US Census, no personally identifiable information is available via this API.

The populus of “living people by given name” and family name data does not match exactly (most Americans have a first and last name) but is considered to be reasonably close thus no corrections were done to account for the difference.

Below is a high level overview of how the math is determined for the Name Stats, Name Normalization, and Name Parser endpoints.

Name Stats

Given the following query:

https://api.fullcontact.com/v2/name/stats.json?name=michael&apiKey=xxxx

The response will be: (hover over green properties for more information)


{
  "status": 200,
  "name": {
    "value": "Michael",  
    "given": {
      "likelihood": 0.99,
      "count": 3906405,     
      "rank": 1, 
      "male": {
        "count": 3886055,     
        "likelihood": 0.995,
        "rank": 1,
        "frequencyRatio": "0.029339617",
        "age": {
          "densityCurve": {
            "mode": {
              "count": 83567,
              "modeAge": 
              [
                54
              ]
            },
            "meanAge": 45.4,
            "quartiles": {
              "q1": 24.8,
              "q2": 39.5, 
              "q3": 52
            }
          }
        }
      },
      "female": {
        "count": 20350,
        "likelihood": 0.005,
        "rank": 784,
        "frequencyRatio": "0.000154322",
        "age": {
          "densityCurve": {
            "mode": {
              "count": 583,
              "modeAge": 
              [
                25
              ]
            },
            "meanAge": 40.8,
            "quartiles": {
              "q1": 25.7,
              "q2": 35.2,
              "q3": 47.4
            }
          }
        }
      }
    },
    "family": {
      "likelihood": 0.01,
      "count": 39369,
      "rank": 798,
      "frequencyRatio": "0.0001626"
    }
  }
}

Name Parser

The Name Parser API endpoint utilizes the data above to determine likelihood of name order. It simply looks at the frequency ratio of two ambiguously defined names in isolation and finds the most probabilistic fit for the names.

Given the following query:

https://api.fullcontact.com/v2/name/parser.json?q=michael%20james&apiKey=xxxx

The response will be:


{
"status": 200,
"ambiguousName": "Michael James",
"result": {
      "givenName": "Michael",
      "familyName": "James",
      "likelihood": 0.975    <- The frequency ratio of the name combination in comparison to the inverted case.
 }
}


Name Normalization

The Name Normalization endpoint uses a bit more subjective math. The endpoint assumes that quasi-structured data is being input as a query, as opposed to randomly organized text strings. Thus by definition "Michael James" is likely first name Michael last name James, and "James Michael" is likely first name James last name Michael. Only when the probability of being wrong is very high do the values get reversed. None the less, the likelihood is calculated using the ambiguous name endpoint as the starting point. For the presence of additional tokens (prefix, suffix, nickname, etc.) aside from given name and family name there are reductions in confidence. In the case of a "," present, it actually gives a bump to the confidence such as in the case of "James, Michael".