On Message with Ben Gross

SSL Wildcard and Multi-Domain Certificates

For better and more recently often for worse, SSL certificates provide the security for the data in transit for many modern protocols, most commonly the web. Every time you see a URL that starts with https rather than http, you are using SSL. Typically, SSL certificates are designated for only a single host such as www.example.com or shoppingcart.example.com. However, most modern businesses may have many hosts that run services that they wish to security with SSL and these hosts and services may have different security requirements and different public visibility. Certificate authorities offer multiple types of SSL certificates to match different requirements, in particular there are several kinds of certificates that can support security multiple hosts with a single certificate. These include types known as multi-domain certificates, Subject Alternative Name (SAN) certificates, Unified Communications Certificates (UCC), and wildcard certificates. Since example.com is considered a separate host from www.example.com—this is one of the most common situations—most certificate authorities will provide you with a certificate that will work for both www.example.com and example.com for the price of a single certificate.

The second name on the SSL certificate, such as example.com, is an important technicality that unfortunately needs some additional explanation. It is accomplished with Subject Alternative Name or subjectAltName (SAN) extension that appeared in X.509 version 3 PKI specification that defines much of what we think of as SSL. For the purposes of this article, think of the SAN as a field on the SSL certificate. The alternative names may be other hosts other domains or a hostname wildcard. All modern browsers support SAN fields, however older Symbian OS 9.1 and earlier and Palm Treo devices do not. In practice, supporting multiple domains on a single certificate is difficult, although support for multiple domains offers a great advantage for web servers that run virtual hosts, so that a single server may support multiple secure domains using a single IP address. The Server Name Indication (SNI) extension is one solution to this problem, however support is not universal, as it will not work with older operating systems such as Windows XP.

Unified Communications Certificates (UCC) are SAN certificates that are typically intended for use on Microsoft Exchange Server 2007 or 2010 installations that need multiple hostnames such as mail.example.com, owa.example.com, smtp.example.com, and autodiscover.example.com. In addition, UCC certificates can include NetBIOS names for configurations with older clients. UCC certificates can also be used with Microsoft Lync installations.

Another type of multi-domain certificates is known as a wildcard certificate, which allows a single certificate to secure an arbitrary number of hosts in a single subdomain such as *.example.com that could include www.example.com, mail.example.com, calendar.example.com, portal.example.com, and so forth. In addition, some certificate authorities will provide the option to include hosts via Subject Alternative Names on the certificate that can also secure example.com as well as other hosts that may need to be used with older mobile devices such as those running Windows Mobile 5. Some certificate authorities may restrict the number of different servers that are allowable for use with a single wildcard certificate via a license agreement even if it is technically possible to use the certificate on an unlimited number of machines. If this aspect is a consideration, you will want to check your license agreement carefully.

Most certificate authorities offer a wildcard certificate product. Not all certificate types are available as wildcards. For example, one of the certificate authority industry associations, the CA/Browser Forum says that the domain name field “must contain one or more host domain name(s) owned or controlled by the Subject and to be associated with Subject’s publicly accessible server. Such server may be owned and operated by the Subject or another entity (e.g., a hosting service). Wildcard certificates are not allowed for EV SSL Certificates.”

Pros and Cons of Wildcard Certificates

I commonly read arguments against wildcard certificates that I think are a bit specious. For example, I regularly heard that wildcard certificates are more expensive. This is true in the sense that they cost more than a single certificate, but if you have many hosts you wish to secure, such as many development server machines, the cost can be quite competitive. I recently purchased, a RapidSSL wildcard certificate for $150, while the least expensive UCC certificate I found was about $300.

I also frequently hear the argument that wildcard certificates are less secure. The argument follows that if one machine has a compromised SSL private key then all machines with that certificate would also be compromised. This is true, but mostly a red herring. If a typical web server is compromised so badly that someone can extract the private keys from the SSL certificates, then you likely have far greater problems and should probably reissue your certificates in any case. The extra work is minimal compared to the overall remediation problem. Some vendors such as DigiCert will issue unique variants of a single wildcard SSL certificate to reduce the damage from any one key leaking. This said, wildcard certificates are frequently used in a manner that is insecure and any certificate that is used on more than one machine should be treated with additional caution.

Wildcard certificates are best used when it is desirable to secure a large number of independent services and the cost of purchasing certificates would be prohibitive otherwise. As I mentioned earlier, support for SNI is still not as widespread as one would hope, so you will likely need one IP address or port per server unless you are certain that your user base is modern enough to support SNI. One additional note of caution, wildcard certificates can be finicky with Microsoft Exchange, especially versions older than 2010 and it is not currently possible to use a wildcard certificate with Microsoft Lync.

I have written about SSL certificates a number of other times including: Purchasing SSL Certificates, No Frills SSL Certificates Are Inexpensive and Useful, Smartphone Anti-Phishing Protection Leaves Much to Be Desired, SSL Is Critical Infrastructure at Risk

Checklist for Evaluating Enterprise Backup Products

I recently evaluated a number of enterprise backup products and services. Once I started the process, it quickly became obvious I needed a systematic checklist of items in order compare the different offerings. Given that World Backup Day is March 31, the topic is timely. The requirements for the project may not match your own, but I hope that it will be a good starting point. Please let me know if you have questions of your own that you would add.

Requirements:

  • Cross platform (Windows, Mac OS X, Linux)
  • Supports both server and workstation clients
  • Supports backup of individual directories (as opposed to just whole disk)
  • Supports backup of open files (aside from Exchange, SQL Server, and Quick Books)
  • Supports encryption of data in transit
  • Supports encryption of data stored on remote server
  • Supports Exchange 2007 and Exchange 2010
  • Supports brick or granular restore for Exchange

Business questions:

  • What are the licensing costs for client or server software?
  • What is the cost for maintenance on licenses?
  • What is the cost per machine and/or per gigabyte per month for hosted server storage?
  • What Service Level Agreement (SLA) is offered?
  • Where are the data centers located?
  • How many times are the backups replicated in how many different physical locations?
  • What are the certifications are the service and data center in compliance with?

Reseller questions:

  • Do they offer reseller accounts? What are the requirements?
  • Is it possible to private label / white label, or co-brand the service?
  • Who is responsible for support for the clients of resellers?
  • Who is responsible for billing for the clients of resellers?

Supported platforms, file systems, and network protocols:

  • What platforms and file systems are natively supported for both workstation and server clients?
  • What native agents are available on each platform?
  • What platforms are supported non-natively via file system or network drive copies?
  • Do users on workstation clients need to be logged in for the backup to run?

Backup management:

  • Does each machine require a separate account or can multiple machines be backed up under the same account?
  • Is there support for directory services integration (i.e. LDAP or Active Directory)?
  • What functionality for reporting, monitoring, and alerting is available?

Backup details:

  • Is there support for continuous backups in addition to timed intervals or scheduled backups?
  • Is there support for backing up individual files or directories instead of a whole disk?
  • Is there support for backing up to network drives?
  • Is there support for backup up to locally attached drives?
  • Is there support for backing up open files (aside from Exchange, SQL Server, and Quick Books)?
  • Is Volume Snapshot Service (VSS) / Shadow Copy used to backup open files?
  • Is data encrypted in transit?
  • Is data encrypted on the remote server?
  • Is there support for private encryption keys?
  • Is there support for bandwidth limiting or traffic shaping?
  • Is there a way to seed the initial backup by shipping in a hard disk?

Application specific backup details:

  • Is there support for backing up Microsoft Exchange? How is this accomplished?
  • Is there support for granular / brick level / message level recovery for Microsoft Exchange?
  • Is there support for archiving Microsoft Exchange?
  • Is there support for backing up Microsoft SQL server?
  • Is there support for backing up Microsoft SharePoint server?
  • Is there support for backing up Intuit Quick Books?
  • Is there support for backing up VMWare virtual machines?

Restoring data:

  • Is there support for accessing backups via a web interface?
  • Is there support for end users to restore their own data?
  • Is there support for performing bare metal restores?
  • Is there support for shipping a hard disk of restored data for disaster recovery?
  • What other restoration options are available?

Archiving data:

  • What are the options for granular policies for archiving and retention?
  • Is there support for storing versions of a file?
  • How long can an archived version of a file that has been deleted be recovered?

Practical Data Cleaning for Mailing Lists

Data cleaning is often one of the more frustrating aspects of a data-intensive project. If you have ever had to import a dataset of size, you will know what I mean. The first project where I needed to clean data from CSV dump containing tens of millions of rows was a particularly good learning experience. After innumerable parsing errors trying to import the file, I realized that the data contained all the common delimiters (commas, tabs, colons, pipes, spaces, etc). I was inexperienced enough that I had not known to use custom delimiters and so all my tools were failing to import the data. It was also here where I learned about the signed integer limit of 65,535 records for many applications. I also learned about the 3.5-gigabyte practical limit of addressable memory on most 32-bit machines limiting the amount of data I could load in at one time.

I have come to appreciate the need for data cleaning and de-duplication tools that have user interfaces which are accessible to non-programmers, but are also capable of handling larger datasets without crashing or making a modern machine grind to a halt. Microsoft Excel is the standard multi-tool for corporations. It is not ideal for many tasks, but it is more than capable of getting the job done. Fields such as email addresses are difficult to validate in Excel in a more than a cursory way. The problem is compounded by the fact that many web forms do not adequately validate input, which can leave datasets with substantial numbers of user input errors. I wrote about this issue in Validating Email Address in Web Forms—The Hazards of Complexity

It is possible to validate large numbers of addresses by attempting to connect to the mail server for each address, but this is inefficient since it means people end up testing a great many addresses that were simply errors in data collection or improper exporting, etc. It would be far more efficient to analyze the data and remove or fix the clearly invalid addresses before submitting them for testing.

Excel has some built in data cleaning functions, but they are limited. Recent versions of Excel on Windows are capable of handling more than 65535 rows. ASAP Utilities for Excel on Windows is not dedicated to data cleaning, but it has a number of cleaning features as well as many other handy functions to search, replace, normalize, sort, add and remove formatting. The tool is quite fast, which I appreciate. ASAP Utilities is free for non-commercial use and $49 for commercial use. I recommend it.

WinPure Clean & Match, WinPure ListCleaner Pro ,and WinPure ListCleaner Lite are straightforward data cleaning and normalization tools that are primarily targeted at large mailing lists. The tools include features that will be compelling for large list owners such as statistics on missing fields and automated email addresses validation features. This software ranges from $225, which is aimed at smaller list cleaning operations (under 100,000 records) to $1199 for up to 500,000 records in one go. This tool is particularly attractive if you would like to allow non-programmers to larger mailing list datasets.

R10Clean offers a good number of tools to perform standard data cleaning. While it does not have any facilities specifically for email lists, the application does make it straightforward to find duplicate rows, empty strings, to remove extraneous text. The main web page is sparse, which makes it difficult to get a good idea of the feature set. The [R10Clean]Manual (pdf) offers a better overview. The application is cross-platform and works on Windows, Mac OS X, and Linux. It costs about 50 Pounds (around $80).

XTabulator for Mac OS X is a nice and simple solution to manipulate tabular data. The feature set is fairly functional, but it makes it possible to quickly modify column-based data and it is useful for the first pass at cleaning up mailing lists. For example, the application makes it simple to rapidly move columns around and perform basic autofill. Exporting to a new output format is also simple. In addition to the standard delimited-formats, the application will export to XML, an HTML table, or a SQLite 3 database. XTabulator is $20.

Password Managers Relieve Password Headaches

Passwords Are a Hassle

I’ll be the first to admit I can’t remember all my passwords. Most of us can’t, so we pick a few passwords that are easy to remember and then use them with multiple sites. This results in two immediate problems. A password manager can help with both of these problems. First, passwords that are easy to remember are typically also easy to guess. Second, a compromised password is a risk to every site where it has been reused. A password manager helps alleviate both of these problems since it can generate a secure and unique password for each site, but only requires that you remember a single password to unlock the database. While it is possible to create passwords that are secure and memorable, it is more difficult to do this with the significant number of passwords we frequently use in modern life. I detailed some additional problems with passwords in previous articles Your NYE Resolution—Pick Better Passwords and Data Evaporation and the Security of Recycled Accounts. I find that password manager with solid browser integration is well worth the initial setup time and expense.

While there are many good options, my password manager of choice is 1Password from AgileBits that is available for Mac OS X, Windows, and the iPhone, iPad, iPod Touch. I consider it an indispensable tool and I use it daily both on my desktop and my phone. 1Password integrates with many popular browsers, which makes logging into web sites faster and more convenient. The application allows me to easily switch between multiple browsers and multiple devices without worrying, which browser I might have saved a particular password.

When I first looked at 1Password in 2006, I thought there was no way I would be willing pay for it since all modern browsers ship with password management functionality. Shortly after I started testing the application I found it so convenient, I changed my mind and purchased it. Nearly six years and many major upgrades later, I have no regrets. I have nearly eight hundred logins saved in 1Password. Even though I regularly clean out duplicates and entries for dead services, this is still a ridiculous number of accounts. Look at it this way, I test services so you don’t have to.

We All Forget Passwords

A 2007 paper A Large-Scale Study Of Web Password Habits of more than half a million users found that about 1.5% of all Yahoo! users forgot their password each month. Yahoo Mail alone has more than 200 million accounts, so this is a significant number. The authors found that the “average user has 6.5 passwords, each of which is shared across 3.9 different sites. Each user has about 25 accounts that require passwords, and types an average of 8 passwords per day.”

Complicated Passwords and Compact Keyboards Don’t Mix

The current crop of smartphones ship with highly capable browsers, but entering lengthy passwords on a phone keyboard is even more error prone and frustrating on the desktop. Here again, a password manager can reduce the complexities of entering many different password strings on a mobile device. The application allows you to make a mobile keyboard optimized and possibly simplified password that protects your longer more complex passwords and notes. This is of course a security tradeoff.

Mobile Safari on the iPhone and iPad does not permit plugins, so the 1Password application on iOS devices embeds a browser that is able to offer the automatic login feature. I prefer the default browser, but unfortunately there is no option for direct integration. The 1Password bookmarklet makes it relatively quick to look up an entry in the database and then copy and paste long passwords from its database far more easily than trying to type them in by hand

Other Advantages of 1Password

I regularly use multiple browsers. I also frequently delete my cookies and browser settings when I test services. This would typically cause a nightmare of needing to re-authenticate to each web site where I deleted the cookies. Since all of my login information is stored in 1Password rather than the browser, I don’t have to care about which browser I am currently using or even if my cookies still exist.

Since 1Password is also a general form filler it can cope with login forms that have partial entries or multi-stage. For example, many services require that users re-enter their password to access account management features even if they are already logged in. This is to prevent another person from simply walking up to your unattended computer from viewing or making changes to billing information, email forwarding, and passwords. In most cases, 1Password is able treat the re-authentication sign in forms exactly like a standard sign in form.

Some sign in forms are multi-stage where login process is split across several forms. For example, many online banks are multi-stage sign in forms. In the first stage, the user enters a username and their browser must acquire a cookie from the bank. If the user does not already have a cookie from a previous session, the user must enter a second authentication factor such responding to a text message with a unique code or entering the code from a hardware token. Next, on a second form on a separate page the user enters a password.

In cases where 1Password is confused by multiple stage forms, the work around for this type of site is to simply make two separately named entries in 1Password. For example, the first entry would contain the username and the second entry would contain the password. The user must go through the full sign in process the first time to received a cookie from the bank by completing the two-factor authentication process and has create a 1Password entry for each step in the form. Each subsequent login to the bank will be treated like all other sites and can be automated with the auto-login and auto-submit features.

Here is a small laundry list of other features I regularly use and appreciate about 1Password.

  • General form saving support. 1Password can save and replay many kind of web forms, which is a useful feature if you find yourself filling out the same information over and over again.
  • Support for “identities” where the application stores commonly used bits of information such as name, email, phone number and can populate this information into many types of forms with little effort.
  • Basic anti-phishing protection since by default 1Password will only post usernames, passwords, and other forms back to the same domain name as the original.
  • The application can generate random passwords with several different templates that will satisfy most password requirements.
  • In addition to usernames, passwords, forms and identities, 1Password also supports encrypted notes.
  • The Mac OS X desktop application will sync over the local wired network and WiFi for iOS devices
  • 1Password will sync with Dropbox for all desktop and mobile applications including Windows and Android

Limitations of 1 Password

There are several important limitations with 1Password. The application cannot handle login forms built with Adobe Flash. Previous generations of 1Password supported login forms with HTTP basic authentication, however the new plugin architecture for Safari and Chrome do not offer support for HTTP basic. AgileBits says it is working on a solution for Firefox.

The features of the Windows version of 1Password are not quite yet on part with the Mac, for example it only supports 32-bit Internet Explorer, 32-bit Firefox, Chrome, and Safari. This said that covers most browsers that user’s need.

Pricing

1Password for Mac and 1Password for Windows is $49.99, 1Password Pro is $14.95 is available for iPhone, iPad, and iPod touch.

1Password Bookmarklet Gone Missing

If you are a frequent 1Password user, particularly on iOS devices, you may have noticed that AgileBits discontinued support for the 1Password bookmarklet, which was the best option for integrating with Mobile Safari rather than the integrated browser in the application. Fortunately, Kevin Yank and * have produced a working 1Password bookmarklet. I have reproduced it here:

javascript:window.location='onepassword://'+window.location.href.substring(window.location.href.indexOf('//')+2)

Your New Year's Resolution--Pick Better Passwords

As we near the end of 2011, I can’t help but think this is the year I had the most trouble telling the difference between actual news stories and pieces from “America’s Finest News Source”, The Onion. As I write this article, details are still unfolding from the data breach at the private intelligence firm Stratfor.

According to reports, the Stratfor hackers found a weakly protected database of usernames and passwords and an unencrypted database of credit card information. The hackers proceeded to make donations to charitable organizations with the credit cards in the database. As any story benefits from more absurdity, there were claims and counter claims of whether or not the attack was associated with Anonymous, the discerning hacker’s first choice of affiliation.

According to Identity Finder, the Stratfor database contained approximately 44,000 hashed passwords in the database, roughly half of which have already been exposed. Unfortunately, another 20,000 or passwords on pastebin would not even be newsworthy, if it were not for the notoriety of Stratfor. Note: if you think you might have been on the list of compromised accounts in the Stratfor database, you can check at Dazzlepod.

There is plenty of blame to go around. First, Stratfor stored user passwords as basic unsalted MD5 hashes, which is simply irresponsible. There are well-regarded and widely-available solutions for storing passwords such as bcrypt, which is nicely summarized in Coda Hale’s How To Safely Store A Password. Secondly, and more importantly, storing customer’s credit cards in clear text is unconscionable. Never mind the question of why on earth Stratfor stored CCVs in their database, which is never OK.

Given the recent attacks against Sony, Gawker, HBGary Federal, and Infragard Atlanta, one could reasonably expect that Stratfor would pay more attention to the operational security side of their business. To put the Stratfor hack in a more global context, the 2011 Verizon Data Breach Investigations Report aggregates data from Verizon RISK, the U.S. Secret Service and the Dutch High Tech Crime Unit. DataLossDB Statistics collected data from open sources including news reports, Freedom of Information Act (FOIA) requests, and public records. These reports give a more nuanced breakdown of the types of breaches and data exposed across many industries.

As much as it pains me to blame the victim, a great many of the subscribers to Stratfor’s service, clearly could and should have picked better passwords. According to Stratfor Confidential Customer’s passwords analysis, we could start with the 418 users who picked “stratfor” as their password or even the 71 users who picked “123456.” The database was full of weak passwords, which was why the clear text of nearly half the passwords followed in a post shortly after the original password hashes appeared online.

In Data Evaporation and the Security of Recycled Accounts, I described how passwords for email accounts are frequently the weak link in the security chain. It is common for sites to allow users to reset their passwords to the email address listed on the account. This means that a compromised email account may be the only method an attacker needs to gain access to other accounts.

In my dissertation interviews, I talked with people about how they managed their accounts and passwords. Many of my interviewees told me they effectively had 2-3 passwords they used for most accounts with some minor variations due to password complexity rules. The interviewees frequently reported using a set of low, medium, and high security passwords. Unfortunately, the email accounts were often given the low security passwords.

It pains me to think how many of the customers in Stratfor’s database likely reuse the same password on multiple sites. In Measuring password re-use empirically, Joseph Bonneau analyzed the overlap between rootkit.com and gawker.com passwords in addition to other studies and found a wide-spread ranging from 10% to 50% overlap. Even with 10% overlap, there are significant benefits from leveraging one exploited password database to compromise another. As always, XKCD keeps track of the pulse of the internet and has informative comics for both Password Reuse and Password Strength.

Realistically, it’s getting to the point where unless you have a pretty fantastic password, if your password is in a database of poorly hashed passwords then someone with a bit of time can discover it. Why is that you might ask? Whitepixel the purveyors of fine open source GPU accelerated password hashing software report that it currently achieves 33.1 billion password/sec on 4 x AMD Radeon HD 5970 for MD5 hashes. This is fast enough to make rainbow tables (pre-computed hashes for a dictionary attack) much less compelling. If the attacker has any additional personal information this significantly increases the chance of a successful attack since so many people use bits of personal information in their passwords. Bruce Schneier describes commercial software that exploits personal information when attempting compromise password hashes in Secure Passwords Keep You Safer.

In general, unless your password or pass phrase is quite long you are far better off with a long randomly generated string that you manage with a password manager. There are many good options including my personal favorite 1Password, UsableLogin, LastPass, RoboForm, or the open source projects PwdHash or Password Safe. PasswordCard is a nice alternative if you would prefer a solution you can always carry with you that does not require any dependencies besides what you can carry in your wallet.

Unfortunately, none of the password managers are magic. You will still have to deal with a depressingly large number of services that force you to choose poor passwords with arbitrary restrictions. Troy Hunt names some offenders in the Who’s who of bad password practices – banks, airlines and more. Still, if you simply use a password manager and different password with each service, you will dramatically limit any potential damage, as an attacker cannot reuse your password on another service.

You should follow me on Twitter.

Security, Productivity, and Usability in the Enterprise

During interviews I conducted for my dissertation research, I asked individuals how the security policies and systems affected their daily life in terms of productivity and work and personal communication. Interviewees gave many examples of tradeoffs between security and usability. People understood the reasoning behind many of the security restrictions. However, these implementations often significantly reduced productivity and frustrated employees everyday work practices and basic personal communications needs. Many implementations actively motivated employees to subvert security protections. The lengths to which people went to “work around” what they perceive as overly restrictive security and compliance implementations led to distinctly counterproductive measures in terms of overall security.

Security implementations in systems and security policies vary widely across the enterprise. These systems can help prevent unauthorized access, dissemination of proprietary business information, and confidential customer data. Security and compliance systems are also essential to passing an audit. The effectiveness of a system’s security is directly related to the overall user experience of the system. Security implementations that do not adequately consider a range of factors including existing work practices, the overall usability of the system, and basic social communication requirements may have serious negative consequences for morale, productivity, and information security.

Unsurprisingly, interviewees often responded that they were more concerned with job performance and completing the tasks at hand than with complying with corporate security policies. In short, they were far more worried about a lost job or a promotion from not getting their work done than about violating security policies. Don Norman summarized the problem nicely as “The more secure you make something, the less secure it becomes.”

People did not distinguish between the technology failing, not understanding how the technology works, and not realizing that a task was technically infeasible. In one example, an employee had tried to work from home over the weekend. This employee was not able to access the corporate network, because the VPN was inoperable over the weekend and the situation was possibly complicated due to a user misconfiguration. The following Monday morning, the employee was rebuked for not completing the project by the deadline.

Institutions that do not pay attention to employee’s perception that they can be productive and efficient when implementing security policies may find their employees at odds with their own policies. The employee perceived the situation as technological failure that prevented the work from being completed. This had significant consequences as the employee began to regularly copy data to an external device or via a personal email account to ensure he would be able to work. It is easy to criticize employees who violate security policies and argue they should be reprimanded or fired. However, in nearly every case in my interviews, the employees who violated policies did so to work around situations the company could have avoided though a more nuanced implementation that took productivity into account. In the particular case of the VPN, it was clear there were widespread problems with remote access that lad to undesirable methods of replicating data.

Companies would be rewarded with higher levels of job satisfaction and productivity if they took greater efforts to both explain security policies and ensure that users, especially mobile users, were not regularly prevented from communicating or managing documents. When companies did this, employees were appreciative of how productive the system allowed them to be while still mindful of the risks involved. Explaining the reasoning behind the policies and implementations goes a long way to improve compliance. In the now classic paper, “Users Are Not the Enemy” Adams and Sasse found that individuals did not have adequate understanding of security issues and that security mechanisms were not adequately explained to them. In addition, the authors found that security departments did not understand their users’ perceptions of security or their needs. The lack of understanding combined with lack of communication resulted in reduced security overall.

Many businesses could reduce the risk of compliance violations by taking into consideration their employees’ everyday communications needs and practices. Internal needs assessments, possibly including surveys and interviews, can be used to determine how well corporate needs for security and compliance align with employee’s work practices and other communications needs. Security policies and compliance systems that take social factors, work practices, and overall understanding of the reasoning behind the requirements into consideration will be far more effective than those that do not. Unfortunately, it seems that this is the exception and not the rule.

References

A. Adams and M. A. Sasse. Users are not the enemy. Communications of the ACM, 42(12):40–46, 1999.

D. Norman When Security Gets in the Way

 

You should follow me on Twitter.

The World Is Not Flat and Neither Are Social Networks

Now that I and the rest of the Internet has grown accustomed to Google Plus and Facebook’s most recent friend categorization features, I thought it was time to revisit and revise a previously unpublished piece of mine. Take a moment and think about your friends, family, colleagues, friends of friends, acquaintances, and members of the same social club. These six groups could comprise a large part, but certainly not all, of the people that you know. You may also have extended family, classmates, common members of sports teams, religious associations, and the familiar strangers you recognize, but don’t know their names. To further complicate matters, the people in these groups often change over time as we move through life. How we conduct ourselves depends on the situation. It is highly unlikely that you act the same way around your grandmother as you do at a party with your friends and people do not expect you to act the same way. Your friends, work colleagues, and extended family do not all know each other and I suspect that in many cases you would like to keep it that way. For this reason, it seems odd to expect that our interactions in online social networks would be any different.

I had the final word in Erica Naone’s Technology Review article Can Google Get Social Networking Right?. Naone’s piece argues that Google needed to dramatically improve its social offerings to compete against Facebook. She asked me to comment on Google’s social services such as Buzz and Profiles and how they might interact with user’s search history. It is interesting to see how much the discussion has changed since the article appeared. Disclosure: I worked as an engineering intern on Google Accounts during 2005-2006, but this was well before any of Google’s social options existed. I responded with a discussion of broad problems I saw with social network services. The following quote in the Naone’s article mostly reflects my statements, although the quote makes it appear that I am singling out Facebook for criticism, which misses the point that I think this is a fundamental problem across many social networks.

“Facebook, meanwhile, has its own problems, and some of these could turn out to be opportunities for Google. Ben Gross, an expert in online identity, notes that Facebook and other social networks don’t accurately differentiate between people’s social connections, making their social graph information less valuable to users and advertisers. For example, social networks tend to put all of a user’s connections into a single group of “friends,” and expect users to manage complex privacy settings to sort out family, work connections, and bar buddies. “Social network services should not assume that networks are flat, or that people are willing to put in the effort to articulate these networks or that they even want to,” he says.”

My full response from which the quote was taken follows below. I fixed a few typos, but it is otherwise unedited.

“I see several consistent problems with many of the social network services. First, they often unify disparate social networks in ways that do not match people’s actual experience and may not even make sense to them. In order to have a real representation of people’s social networks, they would have to fully articulate these networks to the service, which is a pretty unnatural thing to do. For many people the edges of the network shift regularly. Most social network services do not make it easy to maintain multiple independent networks on the service. It is common for people to maintain independent social networks, where individuals may not want the networks unified and people may not even care or wish to know about the other networks. For example, one’s extended family vs. one’s work colleagues vs. one’s friends they have brunch with on the weekend. The idea that there is a single flat network is sort of ridiculous.

I often hear people say that people who want to maintain independent identities or networks are somehow up to no good. I have interviewed quite a few people about this topic for my dissertation. It’s clear that people’s lives are complicated and their identifiers and networks reflect this. If you think about it, it is not at all strange for someone to want to separate their work life, from their family life, from their friend, or all manner of combinations. The boundaries of these relationships shift and behaviors vary widely. Social network services should not assume that networks are flat, that people are willing to put in the effort to articulate these networks, or that they even want to. Also for many people, they may have portions of their network that they are connected to online and therefore the online representation of their network may be very skewed. Even if people are connected to multiple networks online, they may use different social network services for different social networks. For example, it is not at all unusual for people to primarily have email conversations with some connections, use AIM for others, Google Talk for others, SMS for another group, and Facebook for yet another. Each service would be missing the chunk of connections for the other service.”

You need context to create a meaningful representation of a person’s social network. To make matters worse, that context shifts constantly as do peoples social relations, particularly those with whom we have weak connections. This is why people often see online social network representations as a cartoonish view of their own complex and ever changing social worlds. This is not a new revelation about social relations. William James published the following in 1890.

Properly speaking, a man has as many social selves as there are individuals who recognize him and carry an image of him in their mind. To wound any one of these his images is to wound him. But as the individuals who carry the images fall naturally into classes, we may practically say that he has as many different social selves as there are distinct groups of persons about whose opinion he cares. He generally shows a different side of himself to each of these different groups. Many a youth who is demure enough before his parents and teachers, swears and swaggers like a pirate among his ‘tough’ young friends. We do not show ourselves to our children as to our club-companions, to our customers as to the laborers we employ, to our own masters and employers as to our intimate friends. From this there results what practically is a division of the man into several selves; and this may be a discordant splitting, as where one is afraid to let one set of his acquaintances know him as he is elsewhere; or it may be a perfectly harmonious division of labor, as where one tender to his children is stern to the soldiers or prisoners under his command.

It is important to recognize that forcing people to interact with their social relations as a flat network has many undesirable consequences. Figuring out how to restore a more natural balance to social relations is a grand challenge for social networks. People we think of as friends, enemies, and acquaintances change over time as friendships intensify and cool and we move through life phases. Also, complete visibility in networks is not always desirable or healthy. When we remove people’s choice to disclose their relationships and group memberships we strip them of something that is fundamentally human. We provide people with only one option for presenting themselves at a time denies them an important means of self-expression that is also fundamentally human.

I find it heartening to see how much has improved over the last year as both Google Plus and Facebook have dramatically improved the situation in allowing us more options to interact naturally with different social spheres. Framing choices about self presentation as choices about privacy misses the point that the issue is usually about context. Previously, the issue with online social networks was that they typically lacked this context. Far too often this forced people to articulate everyone that should be included or excluded from a particular interaction. In these cases, the cognitive overhead of potentially making this judgment for each interaction is staggeringly high. Unless you are a public figure, you likely never need to decide if what you say is appropriate or even remotely interesting to someone you went to grade school with, someone you went to college with, a work colleague, your aunt, your next door neighbor, and a dear friend. We should not force people to work this hard unnecessarily.

References

danah michele boyd. Friendster and publicly articulated social networking. In CHI ‘04 extended abstracts on Human factors in computing systems, pages 1279–1282, New York, NY, USA, 2004. ACM. Articulated Social Networks: An Ethnographic Study of Friendster

Erving Goffman. Presentation of Self in Everyday Life. Anchor Books, New York, 1959.

Francesca Grippa, Antonio Zilli, Robert Laubacher, and Peter A. Gloor. E-mail may not reflect the social network. In Proceedings of the North American Association for Computational Social and Organizational Science Conference, 2006.

Ido Guy, Michal Jacovi, Noga Meshulam, Inbal Ronen, and Elad Shahar. Public vs. private: Comparing public social network information with email. In CSCW ‘08: Proceedings of the ACM 2008 conference on Computer supported cooperative work, pages 393–402, New York, NY, USA, 2008. ACM

Kai Fischbach, Peter A. Gloor, and Detlef Schoder. Analysis of informal communication networks – a case study. Business & Information Systems Engineering, 1:140–149, 2009.

William James. The Principles of Psychology, volume 1. Henry Holt & Co., 1890

Hat tip to Gaurav Mishra whose similar titled article The World is Not Flat and Neither is the Social Web (site is currently offline), from 2008 I found after I finished writing this post.

You should follow me on Twitter.

Tracking, Geolocation and Digital Exhaust

You are unique… In so many ways…

The accounting systems on which modern society depends are surveillance systems when viewed with another lens. All administrative, financial, logistics, public heath, and intelligence systems rely on the ability to track people, objects, and data. Efficiency and effectiveness in tracking have been greatly aided by improvements in data analysis, computational capabilities, and greater aggregations of data.

Advances in social network analysis, traffic analysis, fingerprinting, profiling, de-anonymization/re-identification, and behavioral modeling techniques have all contributed to better tracking capabilities. In addition, modern technological artifacts typically contain one or more unique hardware device identifiers. These identifiers—particularly in mobile devices, but also RFIDs, and soon Intelligent Vehicle-Highway Systems—are widespread, but also effectively unmodifiable and relatively unknown to most of their owners. For example, with mobile devices, each network interface (cellular, Bluetooth, WiFi) requires a minimum of one unique hardware identifier—all uniquely trackable. One hand, aggregating these unique identifiers allows services like Google, Skyhook, and others to associate geolocation data with WiFi access points and provide useful services. On the other hand, Samy Kamkar’s work described in Hack pinpoints where you live: How I met your girlfriend shows the potentially awkward and invasive side effects.

Individuals generate transactional data from common interactions offline such as card key systems and nearly every online transaction. Improvements in techniques to correlate disparate data as well as techniques to analyze the unique characteristics of software, hardware, network traffic to form a fingerprint is frequently unique. For example, a large-scale analysis of web browsers from the Panopticlick project showed that over 90% of seemingly common consumer configurations were effectively unique. IP geolocation data can be used to increase security as with Detecting Malice with ModSecurity: GeoLocation Data or it can be used in ways that are quite Creepy.

Another major shift is the widespread collection and aggregation of geolocation information from mobile devices. Location can be a highly unique identifier, even if the mobile device changes. Philippe Golle and Kurt Partridge show that two data points sampled during the day—one at home and one at work are enough to uniquely identify many individuals, even in anonymized data. Geolocation data can also reveal significant information about the people spend time with and a view of their social network. Jeff Jonas sums this up well in Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! In a sense the mobile phone has caused an enormous increase in uniquely identifiable data that can be used for tracking.

An average person now generates a constant stream of geolocation data that is collected by mobile carriers. Geolocation information is generated from cellular triangulation, geolocated IP addresses, and integrated GPS units, which deliver down to 10 meter accuracy. Geolocated mobile transaction data aggregated across multiple carriers is increasingly available for commercial use. It is possible to accurately track large numbers of individuals in constrained environments simply by sniffing the ITMI (temporary ID) as //p10 [dot] hostingprod [dot] com/ [at] spyblog [dot] org [dot] uk/blog/2008/05/20/path-intelligence-footpathtm-a-few-more-details.html">Path Intelligence does in mall, although they could sniff the IMEI just as easily, but they say they do not to protect privacy. Still, large-scale analysis of geolocation data is in its infancy. ReadWriteWeb describes how Developers Can Now Access Locations of 250 Million Phones Across U.S. Carriers

Tracking technologies—particularly when combined with geolocation information—have matured far beyond tracking individuals and are rapidly becoming capable of tracking groups and larger populations, which could be applied to entire enterprises or political organizations. Tools and techniques have made it feasible to correlate geolocation information, commercially aggregated profiles of online use, digital fingerprints, and offline transactional data. In addition, analysis of current anonymization techniques has repeatedly shown that simply adding another source of data is enough to re-identify a large percentage of the population. The Spatial Law and Policy blog is doing a nice job of tracking the policy implications of geolocation data.

The immense potential value of geolocation and other tracking data may well provide enough incentive for it to be used in ways counter to our own interests. Potential threats for misuse of the data need to be taken into account when designing systems. For example, what is the value of highly accurate logistical data about a US corporation derived from geolocation data and social network analysis to a foreign industrial competitor? Even a small amount of data that allowed a rudimentary analysis of external individuals meeting with internal high-level executives would be a worthwhile target. Similarly, both foreign industrial interests and foreign states may be willing to spend significant resources to acquire details on the movements and meetings of political parties.

More broadly I have been thinking about the question—What does it mean for a third-party to acquire better logistics about an organization than the organization has itself? What are the policy implications when and if these tracking tools are deployed in places without the rule of law, stable transitions of government, and low levels of corruption that we assume in the US? Could changes in the design and implementation of these systems mitigate the risks outlined? For example, should these design changes include internal controls, data scrubbing capabilities, and user interfaces that more clearly indicate a big picture of what data is being given off. Are there behavioral strategies that would reduce risks? To what extent can user education reduce risk?

You should follow me on Twitter.

Dragon Dictation Mobile: A Transcriber in Your Pocket

Dragon Dictation is the mobile version of Nuance Communication’s flagship Dragon Dictate voice recognition product made for Apple iOS devices. Even after a year, using the application often makes me smile and think “It’s nice to live in the future.”

The simple user interface and high quality transcription are a winning combination. To use the application, you press the record button and start speaking until you are finished and then press the done button. That is all. The recording of your voice is sent to Nuance’s servers via Wi-Fi or cellular connection, processed and the text is returned to your application.

Once Dragon Dictation has finished transcribing, the application offers choices to send the transcribed text via an SMS message, email, Facebook, Twitter, or to copy the text to the clipboard. You may also edit the text using the built in keyboard after the transcription is complete. I was able to produce the first draft of this entire article using only Dragon dictate on my iPhone. In some ways the product is similar to the Mac Speech Transcribe application that allows you to take pre-recorded text and transcribe it after-the-fact.

Overall, the quality of the transcription is quite good and I recommended highly. Unlike the desktop version of Dragon Dictate, you don’t see the transcription until it is complete. This means there is no real-time feedback mechanism or method to edit or correct in real-time. The application does not require training to transcribe, but it will adapt to your voice over time and it is possible to correct mis-recognized words to improve future accuracy. You can record up to sixty seconds at a time, although Dragon will continue to append to existing text if you press record again. The application needs low levels of ambient noise. This unfortunately meant that I had limited success with transcriptions made while speaking and walking—a situation I find particularly useful. The Dragon Dictation support documentation is brief, but provides a set of useful tips and tricks for improving accuracy and listing the spoken commands for providing punctuation and movement commands.

The Dragon Dictation mobile application is available for the iPhone, iPad, and iPod Touch. Dragon Dictate for Email is available on BlackBerry App World. The product is currently free although it is ad-supported and advertisements for nuances own transcription products appear at the bottom of the screen but this has no impact on the usability in short if you are looking for an easy to use transcription pride on your mobile device DragonDictate mobile is an excellent option.

Paper in, PDF out: Fujitsu ScanSnap S1500M

The Fujitsu ScanSnap line of scanners is an impressive combination of good design, usability, and smoothly integrated hardware and software. This is unfortunately a rare occurrence in business devices.The Fujitsu ScanSnap S1500 has earned a prominent place on my desk.

Simple and Straightforward to Use

The ScanSnap makes the process of converting stacks of paper into PDF files simple. It is a sheet-fed scanner—not much bigger than a toaster—that can process twenty pages a minute duplex. This means you can scan forty pages a minute if all your pages are double-sided. That is fast for a consumer device. The user interface for the scanner is a single button. The ScanSnap will scan color and gray scale documents at up 300 DPI and black and white documents at up to 600 DPI. The scanner connects via USB, there is no on or off button, if it is plugged in, it is on. When you close up the device it will go to sleep There are no options on the hardware to fiddle with, and it all just works.

The bundled software is large, but relatively painless to install. The sheet feeder is convenient, although it sometimes may grab a couple of sheets at one if the paper is in poor condition. The default output is PDF, you can optionally choose to OCR text from the scans, but this makes the process take considerably longer.

Bundled Software Is Obsolete

The ScanSnap is available in two models. A Mac version (white) the ScanSnap S1500M and a PC version (black) the S1500. Both models ship with the ScanSnap software (which can not be found online) in addition to ABBYY FineReader, Acrobat Professional, and business card scanning software. Cardiris for the Mac and CardMinder for Windows. Unfortunately, the bundled software is now mostly obsolete.

The Macintosh model includes a copy of ABBYY FineReader 4, Acrobat Professional 8, Cardiris 3.6 (upgradable to version 4). ABBY released FineReader Express Edition for Mac version 8 (they skipped a few versions) in 2010.However, Acrobat Professional 8 is more problematic as it does not work on recent versions of Mac OS X and it is now only useful for obtaining a discount on more recent versions. The Windows model of the S1500 ships with ABBYY FineReader 4, Acrobat Standard 9, and CardMinder 4.

The outdated versions make the bundle of hardware and software less attractive that when the product was not released. Acrobat does not offer combo updaters for versions of Acrobat prior to version 10, so I had to install many incremental updates individually and the older version of Adobe Updater can be finicky. Updating old version of Adobe Acrobat is overall a tiresome and unpleasant experience. Luckily, the most recent versions have improved dramatically. The downside is that the older versions are only useful as a discount for modern versions.

Overall Recommended

The ScanSnap S1500 and S1500m retail for $495. Fujitsu makes two other ScanSnap lines—A highly compact S1100 model ($199) meant for use while traveling, which scans about 8 pages a minute and a mid-range model, the S1300 ($295) that will scan 16 pages a minute. The S1100 and S1300 models only ship with the ScanSnap software and no third party software. All in all I highly recommend the ScanSnap S1500, my only significant complaint is the outdated bundle of software, in all other respects the scanner is an excellent product.