Pocketburgers.com: Social Security Numbers

Tuesday, July 7, 2009

New algorithm guesses SSNs using date and place of birth

Two researchers have found that a pair of antifraud methods intended to increase the chances of detecting bogus social security numbers has actually allowed the statistical reconstruction of the number using information that many people place on social networking sites.

By John Timmer

New algorithm guesses SSNs using date and place of birth

For citizens of the US, the social security number (SSN) is the gateway to all things financial. It fills its government purpose of helping us pay our taxes and track our (in many cases, hypothetical) government benefits, and it has also been widely adopted as a means of verifying identity by a huge range of financial institutions. As a result, anytime you disclose an SSN you run a real risk of enabling identity theft. So far, most of the SSN-related ID theft problems have resulted from institutions that were careless with their record keeping, allowing SSNs to be harvested in bulk. But a pair of Carnegie Mellon researchers has now demonstrated a technique that uses publicly available information to reconstruct SSNs with a startling degree of accuracy.

The irony of their method is that it relies on two practices adopted by the federal government that were intended to reduce the ability of fraudsters to craft a bogus SSN. The first is that the government now maintains a publicly available database called a Death Master File, which indicates which SSNs were the property of individuals who are now deceased. This record provided the researchers with the raw material to perform a statistical analysis of how SSN assignments related to two other pieces of personal information: date and state of birth.

The second is that the government has centralized its handling of SSN assignments and provided documentation of the procedures. The first three digits are based on the state where the SSN was originally assigned, and the next two are what's termed a group number. The last four digits are ostensibly assigned at random. Since the late 1980s, the government has promoted an initiative termed "Enumeration at Birth" that seeks to ensure that SSNs are assigned shortly after birth, which should limit the circumstances under which individuals apply for them later in life (and hence, make fraudulent applications easier to detect).

That last program proved to be the key feature that allowed the new research, as it ensured that SSN assignments were more tightly correlated to date of birth. The researchers used the Death Master File to split out data from individual states (which determine the first three digits) then order them by date. At that point, they searched for statistical patterns within the resulting data.

Even from data before the 1990s, rough patterns were apparent in the assignment of region and group numbers but, by the mid-90s, it's obvious that, with a few exceptions, individual region and group numbers are used in a clear sequential order for most SSNs. The patterns are even easier to pick out in less populous states. Patterns in the final four digits were harder to detect, but the authors created an algorithm that predicted them with a lower degree of confidence.

The accuracy of these algorithms is positively disturbing. Using a separate pool of data from the Death Master File, the authors were able to get the first five digits right for seven percent of those with an SSN assigned before 1988; after that, the success rate goes up to a staggering 44 percent. For a smaller state, like Vermont, they could get it right over 90 percent of the time.

Getting the last four digits right was substantially harder. The authors used a standard of getting the whole SSN right within 10 tries, and could only manage that about 0.1 percent of the time even in the later period. Still, small states were somewhat easier—for Delaware in 1996, they had a five percent success rate.

That may still seem moderately secure if it weren't for some realities of the modern online world. The authors point out that many credit card verification services, recognizing the challenges of data entry from illegible forms, may allow up to two digits of the SSN to be wrong, provided the date and place of birth are accurate. They often allow several failed verification attempts per IP address before blacklisting it. Given these numbers, the authors estimate that even a moderate-sized botnet of 10,000 machines could successfully obtain identity verifications for younger residents of West Virginia at a rate of 47 a minute.

All of that requires that the botnet master have access to date and place of birth information, and a number of commercial services will happily provide that data for a price. But the authors also point out that it may not be necessary to pay; they cite a publication in progress that indicates it's easy to harvest a lot of that information from social networking sites like Facebook.

Social Security Numbers Deduced From Public Data

By Hadley Leggett

socialsecurity

For years, government officials have urged people to protect their Social Security numbers by giving out the nine-digit codes only when absolutely necessary. Now it turns out that all the caution in the world may not be enough: New research shows that Social Security numbers can be predicted from publicly available birth information with a surprising degree of accuracy.

By analyzing a public data set called the “Death Master File,” which contains SSNs and birth information for people who have died, computer scientists from Carnegie Mellon University discovered distinct patterns in how the numbers are assigned. In many cases, knowing the date and state of an individual’s birth was enough to predict a person’s SSN.

“We didn’t break any secret code or hack into an undisclosed data set,” said privacy expert Alessandro Acquisti, co-author of the study published Monday in the journal Proceedings of the National Academy of Sciences. “We used only publicly available information, and that’s why our result is of value. It shows that you can take personal information that’s not sensitive, like birth date, and combine it with other publicly available data to come up with something very sensitive and confidential.”

With just two attempts, the researchers correctly guessed the first five digits of SSNs for 60 percent of deceased Americans born between 1989 and 2003. With fewer than 1,000 attempts, they could identify the entire nine digits for 8.5 percent of the group.

There’s only a few short steps between making a statistical prediction about a person’s SSN and verifying their actual number, Acquisti said. Through a process called “tumbling,” hackers can exploit instant online credit approval services — or even the Social Security Administration’s own verification database — to test multiple numbers until they find the right one. Although these services usually block users after several failed attempts, criminals can use networks of compromised computers called botnets to scan thousands of numbers at a time.

“A botnet can be programmed to try variations of a Social Security number to apply for an instant credit card,” Acquisti said. “In 60 seconds, these services tell you whether you are approved or not, so they can be abused to tell whether you’ve hit the right social security number.”

To keep identity thieves from exploiting their research, the scientists left a few key details about their method out of the paper, and they released the document to government agencies before making it public.

After developing an algorithm using the Death Master File, the researchers tested their results using information on birthday and hometown taken from a social networking site (the researchers declined to say which one). Again, they were able to predict Social Security numbers with a high degree of accuracy.

“It worked a little worse in the online social test for obvious reasons,” Acquisti said. “Some people may not reveal the right date of birth, or they may call hometown where they went to high school, not where they were born. There’s more noise in online social networking, but nevertheless the two studies confirmed each other.”

It also turns out that some SSNs are easier to predict than others. Because of the way numbers are assigned, younger people and those born in less populated states are more at risk, Acquisti said. Before 1988, many people didn’t apply for an SSN until they left for college or got their first job. But thanks to an anti-fraud effort in 1988 called the “Enumeration at Birth” initiative, parents started applying for their child’s number at birth, making it much easier to predict based on a person’s birthday.

The new findings remind consumers that they should use caution when sharing data online, even when the information itself doesn’t seem particularly sensitive. But Acquisti said his real message is for policymakers.

“We really wanted to come public with this result because the issue goes way beyond individual response,” he said. “It’s not just about remembering to shred your documents or to remove personal identification off your mail. As much as you try to protect your personal info, the info is already out there.”

According to information privacy experts, Social Security numbers were never meant to be used for authentication purposes, and using them as passwords puts all consumers at risk for identity theft.

“I have long argued that Congress or the Federal Trade Commission should prohibit companies from using SSNs as a means to verify identity,” Daniel J. Solove, professor of law at George Washington University Law School, wrote in an e-mail. “Merely protecting against their disclosure is insufficient since Acquisti and Gross demonstrate that they can readily be predicted.”

As a first step, the researchers suggest that the Social Security Administration start randomizing the assignment of SSNs. But randomization is only a Band-Aid, Acquisti said.

“It can buy us more time, but it isn’t going to change the underlying problem,” he said. “These numbers are supposed to be secret, but your bank has it, your insurance company has it, even your doctor has it. As long as we rely on numbers that are used as both identifiers and authenticators, then we are a system that remains insecure.”

Privacy law expert Chris Hoofnagle of the University of California, Berkeley, says the response must be drastic. “Their paper points to a radical solution: Perhaps we should stop trying to protect the secrecy of the SSN, and just publish all of them to prevent their use as passwords.”

Pocketburgers.com

Zazzle Shop

Tuesday, July 7, 2009

New algorithm guesses SSNs using date and place of birth

Social Security Numbers Deduced From Public Data

Archives

Ian M. Sherwin Giclée

Pocketburgers Online Store

Search Pocketburgers

Pocketburgers on Facebook

Stay up-to-date with Pocketburgers.com with SMS text alerts

Contributors

Pocketburgers.com

Zazzle Shop

Tuesday, July 7, 2009

New algorithm guesses SSNs using date and place of birth

Social Security Numbers Deduced From Public Data

Archives

Ian M. Sherwin Giclée

Subscribe To Pocketburgers RSS Feed

Pocketburgers Online Store

Search Pocketburgers

Pocketburgers on Facebook

Stay up-to-date with Pocketburgers.com with SMS text alerts

Contributors