Big Data Captcha 22

Advances in the investigation of the physical universe we live in.
Post Reply
User avatar
Doc
Posts: 12562
Joined: Sat Nov 24, 2012 6:10 pm

Big Data Captcha 22

Post by Doc »

http://finance.yahoo.com/blogs/the-exch ... 44085.html
Big Data Could Create an Era of Big Discrimination
By Aaron Pressman | The Exchange – Fri, Oct 11, 2013 3:14 PM EDT

Kate Crawford is a Principal Researcher at Microsoft Research, a Visiting Professor at the MIT Center for Civic Media and a Senior Fellow at the Information Law Institute at NYU.

Yahoo Finance/Yahoo Finance - Kate Crawford is a Principal Researcher at Microsoft Research, a Visiting Professor at the MIT Center for Civic Media and a Senior Fellow at the Information Law Institute at N …more

Personal data harvested by marketers is growing so vast and far reaching that it is threatening to unleash a new wave of digital discrimination, one that ordinary people won't even be able to see happening, Microsoft principal researcher Kate Crawford is warning.
Combining the troves of information collected by retailers, mobile carriers, Internet companies and others into massive databases creates so-called big data sets. Computers then troll the data looking for patterns that can be used to make predictions about consumer habits.

“Some people think that big data is really quite fantastic because you're working at a mass level and therefore you can't actually conduct group-based discrimination,” Crawford said, speaking at the EmTech conference at MIT this week. “It's actually quite the opposite. Big data is not color blind, it's not gender blind and, in fact, marketers are using big data to have ever-more precise categories about you.”

A recent study at Cambridge University looking at almost 60,000 people’s Facebook “likes” was able to predict with high degrees of accuracy their gender, race, sexual orientation and even a tendency to drink excessively. The model could tell a gay man from a straight man correctly 88% of the time and predict race with 95% accuracy, for example. Government agencies, employers or landlords could easily obtain such data, Crawford warns.

A lender, for example, who didn't want borrowers of a certain race could show online offers only to people whose social network activity fit certain parameters. Banks must report detailed statistics about their actual lending activity to regulators, but web advertising parameters are seemingly free of discrimination. By never putting offers in front of unwanted groups, and thus never formally rejecting them, those who engage in online discrimination could sidestep fair lending and redlining laws that apply in the physical world.

Most concern about data collection has focused on the government, particularly after the revelations from former National Security Agency contractor Edward Snowden. Crawford welcomed the increased skepticism following the Snowden leaks but warns there is much potential harm from commercial misuse of data, as well.

“It's not that big data is effectively discriminating -- it is, we know that it is,” says Crawford. “It's that you will never actually know what those discriminations are.”

Another problem can arise when collected data isn’t representative of the entire population. For example, well-off people are more likely to carry smartphones than the poor. Two years ago, the City of Boston released an app called Street Bump that automatically sends reports about potholes using data from smartphone sensors. But the city had to be mindful that reports were more likely to come from areas with higher phone ownership rates.

Big data predictions and pigeon-holing can also be harmful when wrong. A decade ago, some TiVo users spent weeks trying to convince their machines to stop recording shows aimed at demographic groups they weren't in. "If TiVo Thinks You Are Gay, Here's How to Set It Straight," read one Wall Street Journal headline from 2002. Mistaken algorithms today could scare off employers, college admissions officers or others screening candidates via big data. "If I predict something about you and I'm right, that can be just as dangerous as if I predict something about you and I am wrong," Crawford says.

Crawford also wants to temper the excitement around studying real-time Twitter activity to guide rescue efforts during natural disasters. A review of activity on the social network during Hurricane Sandy last year, for example, found that the peaks of activity occurred not in places with the most damage or need for help, like the outskirts of Queens and Staten Island, but in areas where Twitter use was most prevalent, like Manhattan.

Databases are now combining a vast array of different sources – everything from the output of mobile apps and Web searches to radio tags on items bought at a store and phone-location trackers.

Even data scrubbed to remove personal references can be reconnected to individuals. Cellphone carriers are selling collections of data about phone movements, for instance, with all personal details removed. But a group of researchers from MIT, the Universite Catholique de Louvain in Belgium and other institutions looked at one such collection and were able to pinpoint 95% of the unique users by analyzing just four GPS time and location stamps per person.

Several years ago, researchers at Carnegie Mellon University were able to create a system to uncover Social Security numbers from birthday and hometown information listed on social networking sites like Facebook.

All the studies point to a need for additional protections and awareness, Crawford says. “We can't afford to set up a system with no opt out and no protections for its citizens,” she says. “Frankly, it doesn't take a science-fiction scenario to realize what is at stake.”
"I fancied myself as some kind of god....It is a sort of disease when you consider yourself some kind of god, the creator of everything, but I feel comfortable about it now since I began to live it out.” -- George Soros
User avatar
Typhoon
Posts: 27242
Joined: Mon Dec 12, 2011 6:42 pm
Location: 関西

Re: Big Data Captcha 22

Post by Typhoon »

The Prisoner's problem now seem quaint by comparison.

zalndXdxriI

as do the data acquisition methods.
May the gods preserve and defend me from self-righteous altruists; I can defend myself from my enemies and my friends.
noddy
Posts: 11318
Joined: Tue Dec 13, 2011 3:09 pm

Re: Big Data Captcha 22

Post by noddy »

“We can't afford to set up a system with no opt out and no protections for its citizens,” she says
ahahhahahahahaohohohohohohohahahahaha

sorry, i see that as realistic an option as opting out of any of the modern madness, they couldnt afford to allow it, the takeup rate would shock the twoo beliebers.
ultracrepidarian
User avatar
Doc
Posts: 12562
Joined: Sat Nov 24, 2012 6:10 pm

Re: Big Data Captcha 22

Post by Doc »

"we know who you are and what you like"

Quaint indeed
http://www.wirelessdesignmag.com/news/2 ... rack-users

Websites Using Device Fingerprinting to Secretly Track Users
Fri, 10/11/2013 - 9:22am
Kuleuven
Get today's design engineering headlines and news - Sign up now!

A new study by KU Leuven-iMinds researchers has uncovered that 145 of the Internet’s 10,000 top websites track users without their knowledge or consent. The websites use hidden scripts to extract a device fingerprint from users’ browsers. Device fingerprinting circumvents legal restrictions imposed on the use of cookies and ignores the Do Not Track HTTP header. The findings suggest that secret tracking is more widespread than previously thought.

Device fingerprinting, also known as browser fingerprinting, is the practice of collecting properties of PCs, smartphones and tablets to identify and track users. These properties include the screen size, the versions of installed software and plugins, and the list of installed fonts. A 2010 study by the Electronic Frontier Foundation (EFF) showed that, for the vast majority of browsers, the combination of these properties is unique, and thus functions as a ‘fingerprint’ that can be used to track users without relying on cookies. Device fingerprinting targets either Flash, the ubiquitous browser plugin for playing animations, videos and sound files, or JavaScript, a common programming language for web applications.

This is the first comprehensive effort to measure the prevalence of device fingerprinting on the Internet. The team of KU Leuven-iMinds researchers analysed the Internet’s top 10,000 websites and discovered that 145 of them (almost 1.5%) use Flash-based fingerprinting. Some Flash objects included questionable techniques such as revealing a user's original IP address when visiting a website through a third party (a so-called proxy).

The study also found that 404 of the top 1 million sites use JavaScript-based fingerprinting, which allows sites to track non-Flash mobile phones and devices. The fingerprinting scripts were found to be probing a long list of fonts – sometimes up to 500 – by measuring the width and the height of secretly-printed strings on the page.

Do Not Track

The researchers identified a total of 16 new providers of device fingerprinting, only one of which had been identified in prior research. In another surprising finding, the researchers found that users are tracked by these device fingerprinting technologies even if they explicitly request not to be tracked by enabling the Do Not Track (DNT) HTTP header.

The researchers also evaluated Tor Browser and Firegloves, two privacy-enhancing tools offering fingerprinting resistance. New vulnerabilities – some of which give access to users’ identity – were identified.

Device fingerprinting can be used for various security-related tasks, including fraud detection, protection against account hijacking and anti-bot and anti-scraping services. But it is also being used for analytics and marketing purposes via fingerprinting scripts hidden in advertising banners and web widgets.

To detect websites using device fingerprinting technologies, the researchers developed a tool called FPDetective. The tool crawls and analyses websites for suspicious scripts. This tool will be freely available at http://homes.esat.kuleuven.be/~gacar/fpdetective/ for other researchers to use and build upon.

The findings will be presented at the 20th ACM Conference on Computer and Communications Security this November in Berlin.

For more information visit http://www.kuleuven.be
"I fancied myself as some kind of god....It is a sort of disease when you consider yourself some kind of god, the creator of everything, but I feel comfortable about it now since I began to live it out.” -- George Soros
Post Reply