On Message with Ben Gross

Lightweight Scheduling With Doodle

Doodle is one of the few online scheduling services that I find worthwhile. The web interface is straightforward and minimalist. Most scheduling applications add enough overhead and complexity that I fall back to scheduling via email. The problem is that inevitably the email results in a flurry of back and forth negotiation that makes me wish I never tried to schedule the event in the first place. The planning process is even more difficult when participants from different organizations do not have access to common scheduling applications.

There are two types of polls in Doodle, one to schedule events and one to present a series of choices. You start the scheduling process by creating a poll with potential dates and times and decide whether you want to send a link to the poll yourself or have Doodle send out the email. Participants open the URL for the poll and simply select check boxes with their desired day and time combinations. Choice-based polls display a simple list of selections. Participants may also add comments or files to both types of polls. It really takes longer to describe the process than it does to complete it. The service is free and ad supported, although some features require paid premium accounts.

Options for Doodle polls include limiting the number of selections for each participant, enabling “if need be” time slots, limiting comments or changes to responses, and support for time zones. Paid premium accounts are $28 a year without ads and include features such as hiding responses, requiring additional information such as email or phone numbers, avatars, and support for custom designs. Doodle corporate accounts called Branded Doodle start at $240 a year for custom corporate branding without ads. Additional corporate options are response tracking and the ability to request additional information for $240 a year, and additional security and SSL access for $240 a year.

Doodle supports direct integration with Google Calendar and provides calendar feeds for use with Google Calendar, Yahoo Calendar, Microsoft Live Calendar, Apple iCal, Outlook and others. Doodle provides calendar plugins for Microsoft Outlook and Lotus Notes. Registration is required for calendar integration. Polls may be exported to PDF, Excel, or .ics calendar files.

Doodle is available as a widget on iGoogle, as an application on Facebook, as a mobile web application, and as a $2.99 iPhone application. The iPhone application is well done and is integrated with the iPhone address book. However, due to restrictions on the iPhone OS, it cannot integrate directly with the calendar application on the iPhone. The workaround is to simply subscribe to the Doodle calendar feed from the iPhone application.

Overall, I highly recommend Doodle for simple meeting scheduling. The one feature I wish Doodle would add is support for multiple email addresses. This would take the guesswork out selecting the right email address for people with more than one address. People scheduling events with complicated requirements such as matching meetings rooms with specific audio visual configurations to particular time slots will want to stick to traditional corporate scheduling applications. For everyday use, I find Doodle to be the right balance of functionality and simplicity.

Notational Velocity - Elegant Note Taking for the Mac

Notational Velocity is a free and open source note taking application for Mac OS X that is extremely simple, fast, and stable. I find the minimalist interface very functional and pleasant to use. It is one of my favorite applications.

I mentioned Notational Velocity’s ability to sync with the Simplenote iPhone note taking application in my Messaging News Magazine column Great iPhone and iPad Apps for Reading and Sharing Docs. The combination of Notational Velocity and Simplenote allows me to create, edit, and manage notes that are seamlessly synchronized between my desktop and iPhone without worrying that I will have the latest version on the other device.

Dropbox and SimpleText.ws allow for synchronizing Notational Velocity across multiple machines. The author of SimpleText.ws provides the source code you can run your own private server on Google App Engine.

Aside from the ease of use and speed some of the features of Notation Velocity I like are:

  • Makes no distinction between searching notes and creating new notes
  • Displays search results incrementally to help rapidly filter documents
  • Saves automatically, no save button needed
  • Allows data export with a single click
  • Preserves creation and modification timestamps for both import and export
  • Optionally stores notes as plain text, rich text, or HTML
  • Optionally stores notes as a single database or as plain text files in a directory
  • Optionally encrypts the database and provides secure text entry mechanism
  • All commands have keyboard equivalents

Preparing Your Site for the iPad

The Apple iPad does an excellent job of displaying most web sites. However, there are a few obstacles you may want to avoid. There are also a few customizations that will make your site look even better on the iPad. I will summarize the most important issues you should start to plan for and the differences between the iPad browser, the iPhone browser, and desktop browsers. As an added benefit, most improvements made for the iPad will also benefit users with an iPhone or an iPod Touch. There is list of resources to find more information and a list of tools to help you test your site at the end of the article.

Differences in Mobile Safari on the iPad

The primary differences you should account for first are:

  • No support for plugins such as Adobe’s Flash or Sun’s Java for ads, navigation, and multimedia
  • The fixed viewable screen size (viewport) may affect your layout
  • The touch screen is the primary means of interaction and offers different modes of user control

Unlike most desktop browsers, the iPad does not support plugins such as Flash or Java. Any navigation elements, embedded audio and video, or banner ads written in Flash or Java will not appear. Based on public statements, Apple is unlikely to support either language in the future. This means you will need to provide alternative or fallback navigation elements and multimedia embedding options. Apple’s official recommendation is to avoid plugins entirely and use HTML5 elements across your site. Navigation elements may be implemented with standard AJAX techniques. If your revenue depends on banner advertising delivered via Flash or Java, you will need to need to make some changes. If your ad server supports mobile devices, you can turn this on for iPad users. An alternative is to treat mobile users the same as email campaign advertisements. Today at the iPhone OS 4.0 press event, apple announced its own mobile ad platform and ad network called iAd, implemented entirely in HTML5. The mobiThinking Guide to Mobile Advertising Networks in the references surveys most of the available mobile ad network options.

The standards and implementations of HTML5 audio and video tags are still evolving and making your content available in all browsers is still complicated. Supporting HTML5 H.264 encoded video with a fallback to Flash for browsers that do not support it is likely your most straightforward solution. In the references, I have linked to some of John Gruber’s articles on H.264 and Flash that explain the problem in more detail. Video for Everybody from Camen Design and the upcoming SublimeVideo from Jilion are two options for hosting HTML5 friendly video on your site.

The iPad has a 9.7-inch touch-sensitive screen, a fast processor, and fast network connectivity. It provides a web browser experience that is much closer to the desktop experience than a smartphone. This means you should avoid sending iPad users to versions of your site optimized for mobile phones if you are sniffing for iPhone or mobile user agents. If you look at the user-agent strings for the iPad and the iPhone, you will notice that the iPad user-agent lists “like Mac OS X” rather than “iPhone OS.” Both browsers include the “Mobile” in the user-agent string. Most browsers have mechanisms to change the user agent string. I’ve listed some of these in the references.

The current version of iPhone OS (version 3.1.3) uses the following user agent string:

Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_1_3 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7E18 Safari/528.16

While the iPad with iPhone OS 3.2 uses the following user agent string:

Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B367 Safari/531.21.10

The iPad viewport is set to 980 pixels wide, in portrait mode the iPad is 768 pixels wide, but the content will scale to 980 pixels. If you have content that wider than the viewport that uses fixed CSS positioning, that content may end up off screen and your users will not see it since they can not resize the window in Mobile Safari.

Users control the iPad with a multi-touch interface and a touch screen keyboard. The “Apple iPhone Human Interface Guidelines: Introduction” is a great document for starting to think about multi-touch user interaction as the metaphors and modes of physical interaction differ. For example, a flick action rather than a mouse controls scrolling and a pinching action controls how a page scales up and down.

There are other issues, some of which Apple may resolve in a future update. In John Gruber’s review of the iPad, he points out that often only a single page is held in memory at one time, subsequent pages often take all the memory available for web pages. This means that if you could loose form data on a page that you have not submitted if you open another page. The memory problem could also appear on AJAX heavy pages.

iPhone OS User Base

Apple announced the iPad at then end of January and released specifications, documentation, and a software development kit (SDK) for those paid members of the iPhone developer program under an non-disclosure agreement. The WiFi only model of iPad began shipping this week and Apple released the SDK to everyone registered in the Apple Developer Program. Apple announced that it sold more than 300,000 iPads on the first day and more than 450,000 as of April 8th. The iPhone OS platform user base is significant. Steve Jobs announced that there were 75 Million iPhones and iPad Touch devices running iPhone OS at the iPad launch in January. The Apple’s 2010 Q1 filing said that it had sold more than 42 million iPhones total. Today at the iPhone OS 4.0 launch Jobs announced that there were 85 million iPhone OS devices.

Mobile Safari on the iPad uses the open source WebKit rendering engine as do iPhone, and iPod Touch devices. Testing your site with the WebKit rendering engine is now essential. Desktop versions of the Safari browser, Google’s Chrome browser, all iPad, iPhone, and iPod Touch devices, Android devices, Palm webOS devices, Symbian Series 60 (S60) devices all use WebKit. RIM has stated that future BlackBerry devices will use WebKit. This means that every major smartphone browser aside from Windows Mobile will be WebKit-based in 2010.

Testing Your Site on the iPad

Testing your site directly on an iPad is the only way to guarantee that your experience will match your visitors with iPads. There are numerous reports by developers of minor differences between the iPad and the iPad in a simulator.

However, next to owning an iPad, the iPhone simulator comes closest to rendering your site as an iPad would. The iPhone simulator that ships with the iPhone SDK 3.2 has an iPad mode under the device option. Anyone can register as an Apple Developer for free and then download the SDK. The iPhone SDK includes the XCode development environment and is nearly a 2.5 gig download, it also only works on Mac OS X 10.6.2 (Snow Leopard) or higher.

The paid iPhone Developer Program is $99 a year. The subscription allows developers to submit native iPhone and iPad applications to Apple’s App Store. Apple also allows paid developers early access to upcoming versions of its SDK such as the iPhone OS 4.0 SDK announced today.

iPad Peek by Pavol Rusnak is a web service that allows you to see what your web site will look like on an iPad. It is free and the source code is available under an open source license. Three things will make your experience with iPad Peek closer to than of an actual iPad.

  • Use a browser with a WebKit-based rendering engine, preferably Safari, since it is the most similar to the iPad browser. Chrome will works too.
  • Disable all plugins in your browser. Otherwise your browser will still load the plugins even though an iPad would not.
  • Change your user agent string in your browser to match the iPad one listed earlier.

Resources

From Apple’s official developer documentation:

Other resources:

Tools

The easiest way to change your user agent in Safari is to use the option in the developer menu. The easiest way to change the user agent in Chrome and Firefox (uses the Gecko rendering engine, not WebKit) is to use an extension.

Further Reading

John Gruber at Daring Fireball has written a series of posts about Flash, HTML5, and H.264 video. They are really worth reading for background on the technical and political issues related to HTML5.

March 2010 MessageLabs Intelligence Report Highlights

Many security vendors produce internet security reports that summarize the attacks and threats seen from the vendor’s vantage point of the network. The concise analysis of trending security threats and predictions of future threats make taking a look at the reports worthwhile. The reports are available no cost as they also promote the vendors services.

I will discuss these reports in a series of posts starting with the monthly MessageLabs Intelligence Report. Highlights from March 2010 MessageLabs Intelligence Report (PDF) (podcast) include:

  • Spam has increased to 90.7%, which is up 1.4% since February.
  • Viruses and malware in email decreased to one per 358.3 email, which is down 0.05% since February.
  • Phishing decreased to one in 513.7 emails, which is down 0.02% since February.
  • Malicious websites down to 1,919 websites blocked per day, which is down 61.6% since February.
  • The Rustock botnet sent 77% of its spam using TLS encrypted connections during March.

The report also discusses malware that targets senior officials, the roles most often targeted and the most common countries of origin for the malware.

The top four roles targeted are:

  • Director 8.7%
  • Senior Official 7.3%
  • Vice President 4.4%
  • Manager 4.3%

Top four sources of targeted email attacks based on IP address of the sender:

  • China 28.2%
  • Romania 21.1%
  • United States 13.8%
  • Taiwan 12.9%

New Ways to Read Messaging News: Twitter, Facebook, and RSS

We regularly look for new ways to make our content more accessible to our readers. We are pleased to announce that we now have more options for reading Messaging News. We want to help you keep track of the latest industry news, events, webinars, whitepapers, commentary, and analysis.

As always, you can find everything we publish on the Messaging News website. In addition you can sign up for one of our weekly newsletters, the print or digital edition of Messaging News magazine, or one of our webinars. You will also find job listings, our annual resource directory, whitepapers, and industry briefings.

Privacy, Large Dataset Research, and the Netflix Prize

Netflix recently announced the cancellation of the second Netflix Prize in a post on its blog. A large number of researchers entered the first contest as it offered an opportunity to work with a large real world dataset combined with the promise of a one million dollar prize and worldwide publicity.

The company’s decision to cancel the contest settled a private lawsuit described by Ryan Singel in his Wired article Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims and closed an inquiry from the Federal Trade Commission explained in a Wall Street Journal blog post FTC’s Privacy Worries Prompt Netflix to Cancel Contest byJennifer Valentino-DeVries.

In an earlier column, The State of User Tracking and the Impossibility of Anonymizing Data, I described current research on de-anonymization and re-identification and in particular problems with the Netflix contest. Arvind Narayanan and Vitaly Shmatikov wrote An open letter to Netflix from the authors of the de-anonymization paper. The authors say they hope Netflix will continue to work with researchers in a way that allows for further advances, but that also preserves privacy through techniques such as differential privacy.

Bellkor’s Pragmatic Chaos, the team that claimed the prize wrote about the contest and the implications of their findings in the IEEE Spectrum article The Million Dollar Programming Prize. Dan Gillick further describes the winning solution in Predicting Movie Ratings: The Math That Won The Netflix Prize.

Markdown Simplifies Writing for the Web

Why I like Markdown

Several months ago I began format my articles using Markdown, a lightweight syntax designed to emulate the simple markup style commonly used in email messages. For example, if you would like to make text bold, just put asterisks around it. If you would like to make a list, just put a dash in front of each item. Overall, I’m happy with the change, as it has simplified the process for me to publish online. I can write with any text editor or word processor and then Markdown will convert my text to nicely formatted HTML.

Markdown is both a markup language and tool to convert markdown to HTML. The syntax for Markdown is simple and adds very little bulk to my text. Effectively, the only change made when I write was to add a small amount of formatting for the Markdown hyperlinks and headings. John Gruber, Markdown’s primary developer, wrote Dive Into Markdown, an essay describing his design goals, soon after he released the software in 2004. It is well written and worth reading.

I now prefer to keep my documents in Markdown over HTML as they are smaller, easier to read, and I can convert them to modern standards-based HTML on demand. I prefer this setup to WYSIWG tools or graphical HTML editors as viewing the content in the same browser version is the only way to ensure that you will see the same HTML rendering as your readers. When the W3C updates the HTML specification or the Markdown conversion tools add new features, I can just install a new version of Markdown. I don’t need to modify my original text. Markdown is great for producing basic HTML documents like blog entries or simple web pages, but it is not well suited for long, complex, or highly formatted documents. There are several extensions to Markdown that add features to publish more specialized and complex documents.

If you would like to try Markdown for yourself right now, the Markdown Web Dingus or PHP Markdown Dingus will both give you a live preview of any Markdown formatted text you type. Markdown works on Mac OS X, Windows, and Unix/Linux and is widely supported as a plugin for most popular blog and wiki software. The reference version is written in Perl and developers have ported Markdown to Python, C, JavaScript, and other languages.

Gruber also wrote SmartyPants, which transforms plain text to include nice typographic elements such as curly quotes, en-dashes, em-dashes, and ellipses. Many implementations of Markdown include support for SmartyPants by default. Markdown has a liberal BSD-style license that makes it easy for developers to embed it in other packages. There are several Markdown test suites that can test compatibility between versions, including one that ships with the reference version of Markdown. Wikipedia has a good technical comparison between lightweight markup languages if you would like to see how Markdown differs from similar projects.

Markdown Implementations and Utilities

These days, I write almost everything using the TextMate editor on Mac OS X, which includes support for SmartyPants, Markdown, and PHP Markdown extra. I use the QuickLook Markdown plugin when I want to quickly see a formatted version of a Markdown file from the Finder.

Markdownify converts from HTML to Markdown. The script is available as a web-based conversion tool or you can run the script on your own machine. It supports PHP Markdown Extra as well.

PHP Markdown Extra by Michel Fortin is a PHP implementation of Markdown that supports definition lists, footnotes, tables, and intermix HTML with Markdown. The developer has also created a PHP version of SmartyPants unsurprisingly called PHP SmartyPants.

MultiMarkdown by Fletcher Penney is an implementation of Markdown with additional conversion options and supports extensions to the Markdown syntax such as footnotes, tables, bibliographic citations, image attributes, internal cross-references, glossary entries, and definition lists. MultiMarkdown first converts the plain text to XHTML and then uses XSLT transforms convert the XHTML into HTML, LaTeX, PDF, or RTF. It includes many features similar to PHP Markdown Extra. Penny’s MultiMarkdown Bundle for TextMate adds support for the MultiMarkdown variant.

Discount by David Parsons is a C version of Markdown, PHP Markdown extra, and SmartyPants that focuses on speed.

Pandoc by John MacFarlane can convert from Markdown, HTML, reStructuredText, and LaTeX to “reStructuredText, HTML, LaTeX, ConTeXt, PDF, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, groff man pages, and S5 HTML slide shows.” Pandoc includes Markdown extensions for definition lists, embedded LaTeX equations, footnotes, and tables. Pandoc is written in Haskell, which and currently requires a bit of tweaking to make it work on Mac OS X 10.6/Snow Leopard.

Babelmark, the Markdown Testbed, allows you to compare the output of different Markdown implementations.

You should follow me on Twitter.

Printing Mailing Labels and Envelopes from Address Books and Spreadsheets

I recently spent some time researching how to print mailing labels and envelopes for a family member. I found that depending on your configuration the process could be simple or frustratingly complex. It is the time of year where many people are still agonizing whether or not they should send tardy holiday cards or last minute New Year’s cards. If this description fits you or you want to print labels or envelopes for another reason, read on, and hopefully I will be able to give you some tips or software recommendations to make the process go faster and more smoothly.

I primarily investigated printing labels and envelopes from the Mac OS X Address Book, but along the way I found a number of other solutions for the Mac, Windows, as well as options to generate label sheets on the Web.

Generating label sheets from the web, no software needed

The Avery Design and Print Online service is a free tool (registration required) that allows anyone to create label sheets online and download them as PDF suitable for printing. In order to use the online tool, you will need to export your address book as a CSV. How to do this can vary greatly depending on which system you use. For example, Apple Address Book cannot export directly to CSV, only to vCard and the Address Book Archive backup file. Microsoft Outlook can export addresses into a CSV file.

FileMaker Bento 3 ($50) can open the Apple Address Book directly and export it to a number of formats including CSV. Address Book to CSV Exporter can export the entire Apple Address Book or groups to CSV. It’s free and open source. There are a number of additional packages that offer additional flexibility for exporting from Apple Address Book. I’ll address those in a future article.

Mac OS X and Apple Address Book

My original goal was to figure out how to generate labels from data that was both stored in Apple Address Book and in spreadsheets. The Mac OS X Address book has built in functionality to print Avery Standard, Avery A4, and DYMO type mailing labels. Apple has two primary documents that discuss printing labels and envelopes. The documents are clear and useful, but surprisingly hard to locate either in the local Help Viewer or on Apple’s support site. Both documents have been recently updated for Snow Leopard and may be more up to date than versions in the local Help Viewer. The first document, AddressBook 5.0 Help: Printing, is a brief and useful introduction. The second document, Address Book: Printing mailing labels or envelopes with multiple names, covers a topic that people clearly find confusing given the number of posts to various support forums.

Tech Talk Point has a longer introduction with screenshots to better explain the process with their page on Printing Labels, Mailing List & Envelopes in Mac OSX with Address Book

PostCheck ($10) is a great little plugin for the Mac OS X Address Book by Brian Toth that will look up and add missing zip codes (and optionally zip+4 codes) to address book entries. PostCheck will can also bulk validate addresses and reformat address to conform to preferred USPS guidelines.

If the Apple Address Book does not print labels as you wish, you still have a number of options, some of which offer additional flexibility over the built in features.

  • Microsoft for Mac Office will print labels and envelopes using a mail merge in Microsoft Word with an address list stored in Microsoft Excel.

  • The Print Shop for Mac , $70, from Software MacKiev can print envelopes and Avery, CD Stomper, Memorex and NEATO labels using address taken directly from Apple Address Book. The application makes it simple to add graphics to labels.

  • pearLabelizer from pearworks is a free application that can take contacts or groups of contacts directly from the Mac OS X Address Book. It has an option to print individual labels from plain text using the Services Menu.

  • EasyEnvelopes from Ambrosia is a free Dashboard widget for printing envelopes. The EasyEnvelopes is attractive and straightforward to use. It integrates with the Apple Address book and includes support for USPS bar codes.

  • Apple Pages can print mailing labels using free label templates from Avery. This MacFixIt article, How-To: Using label templates in Pages, provides a step by step guide. My recommendation is to try the free Avery Design and Print Online service first, although you will need to export your contacts as CSV.

  • The iWorkCommunity Templates Exchange section on labels includes a variety of additional label types. The templates are free.

  • Avery DesignPro for Mac is a free application for designing labels, business cards, greeting cards, and many other types of print work. The application is a large download and can be somewhat unstable. Like the online version, you will need to export your address book as a CSV file. I would recommend trying the online version first.

Microsoft Windows

Microsoft Windows users will likely find that using a combination of Microsoft Word and Excel or Outlook is the easiest path to printing mailing labels. The following support documents on the Microsoft Support site clearly describe how to create mailing labels and envelopes using the mail merge functionality of Microsoft Word with data stored in Microsoft Outlook or Microsoft Excel. The examples provided are for Office 2007, although similar documents are available for older versions of Microsoft Office.

Why Does My Text Look Funny? Character Set Encoding Detection and Conversion

Character set encoding

Character encoding is the low-level representation of the letters, numbers, and symbols we see in our daily interactions with computers. Common encodings for documents in English are ISO-8859-1 (a superset of ASCII), UTF-8 (an 8 bit Unicode character encoding), and Windows-1252. There are a great number of character set encodings in use and a long and complicated history of how they came to be. This complexity often leads to problems. Typically, these problems are caused when the document is encoded with one encoding, but is interpreted as another.

If you don’t ever have to deal with character encoding issues, then consider yourself fortunate, as it can be a royal pain to decipher and correct large numbers of character encoding issues.

Why you might care

It is likely that you see character set encoding problems all the time. If you have ever opened an email, a web page, or document and some of the letters looked wrong then there this is a good chance this is due to a character set encoding mismatch. You are mostly likely to notice problems with curly quotes, bullets, and accented characters. If you are interested in learning more, there are some excellent sources at the end of this article.

Just to illustrate the extent of the problem—A composite approach to language/encoding detection](http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html) is the original research paper by the Netscape employees who wrote the character detection algorithm that is still used in Firefox. The page is encoded as ISO-8859-1, but the meta tags in the page are set to UTF-8. In most browsers, you should see the resulting funny looking characters due to the character encoding mismatch. Email can have character set encoding problems as well. RFC 2047 defines MIME extensions for non-ASCII text and HTML email has the same problems as web pages.

The best tools I have found are primarily open source command line-based utilities. Specialized GUIs are hard to come by although as I will describe a browser and some text editors will work for many basic tasks. I only tested the command line tools under Mac OS X, Linux, and FreeBSD Unix variants, although most can be compiled under Windows with Cygwin or similar systems. Some of the tools are available as pre-compiled Windows binaries.

Detecting character set encodings

The absolute quickest way to check to see if you have a character encoding problem is to open the web page or file in Firefox and go to the Character Encodings option under the view menu. You can experiment by changing to a different character encoding and see if your document displays correctly.

If you are unsure of which character set your document is encoded in then that is a good place to start. I would first try the file command. It is a standard utility in every modern Unix system I have used. The program attempts to determine many characteristics about the file including types of line ending and the text encoding of the file.

If you need more sophisticated tests for character encoding than the file command offers, then chardet, the Universal Encoding Detector, is your most sophisticated option. The software is a Python port of the code from Mozilla/Firefox code base that includes multiple character encoding auto-detection mechanisms. The most recent version now has a limited command line interface. Previously, it was only accessible to developers willing to wrap their own code around the library. rchardet is a Ruby variant.

Converting between character set encodings

It is possible to use a text editor many character encoding conversions, if you know or can guess the original encoding. Simply open your text file in your favorite editor such as the built in TextEdit or TextMate on(Mac OS X, TextPad or the E - TextEditor on Windows, Yudit on Unix systems with X-Windows, and GNU Emacs on most systems. Then simply select a different encoding in the editor and re-save the file.

Uni2ascii can perform both ends of the conversion between UTF-8 and a large number of encodings and formats including many ASCII variants, quoted printable, HTML, XML, and escapes for POSIX and many programming languages. I like many options to decompose UTF-8 into other encodings. The -B flag creates best effort ASCII by decomposing UTF-8 characters into a reasonable plain ASCII alternative. For example, the copyright symbol becomes (C). In my experiments, there were minor problems where the following characters were not converted middle dot (0x00B7/U+00B7), next line (0x0085/U+0085), and line separator (0x2028/U+2028). Aside from these the program did a tremendous job.

iconv/libiconv is the standard for character set conversion. The application needs to be used as a filter so it can be less convenient if you would prefer to operate on files directly.

I have used GNU Recode for a number of projects. Recode relies on libiconv and can process files directly. The release version of Recode has not been updated in many years, however it is under active development and a recent beta of Recode can be found on the author’s site.

convmv converts the character encoding of filenames (not the contents of the files) and can work on entire directories of files.

The Commetdocs service (formerly known as the iconv.com) allows you convert between many character sets and files types. The service is currently free.

I have not tried either extensively, but Enca the “Extremely Naive Charset Analyzer” and UTRAC the “Universal Text Recognizer and Converter” both provide extensive support for conversion between non-Western character encodings.

Examples

Example — Convert files to UTF-8:

iconv -f original_charset -t utf-8 oldfile.txt > newfile.txt

recode UTF-8 file.txt

Example — Convert UTF-8 into readable 7-bit ASCII. The -B option is equivalent to the flag combination -cdefx.

uni2ascii -B file.txt

find . -type f -exec recode utf8..ascii {} \;

Example: use convmv to convert the filenames of a directory of files from IS0-8895-1 to UTF-8. The —notest flag is a dry run feature that can be very useful for testing.

convmv -f iso-8859-1 -t utf8 --notest  directory/

The Future

In general, I recommend that people use the UTF-8 for all new documents. UTF-8 is capable of representing the vast majority of alphabets and is a mature internationally accepted standard. More than a year ago, Google found that the majority of the pages on the web used UTF-8 character encoding.

References

If you want to learn more about character encoding, the following sources are good places to start

Three Pew Research Reports Analyzing Messaging Use

Three recent reports Pew Research Center’s Internet & American Life Project analyze current topics related to messaging. Pew reports are noteworthy for the size of their samples and the rigor they apply to both the data collection process and the analysis. Their surveys are conducted using random digit dialing and include both landline and cellular numbers in the United States.

The report Teens and Distracted Driving analyses the prevalence of text messaging and cell phone use by teenage drivers and teen passengers. Three quarters of American teenagers own a cell phone. Approximately one third of these have texted while driving, half have been a passenger while the driver was texting, and forty percent say they have been in the car while the driver was distracted enough by talking on the cell phone to put them in danger. The population for this report included 800 teens between the ages of 12 and 17 years old along with a parent or guardian.

Pew’s report on Social Isolation and New Technology takes a fresh look on early research computer mediated communication and social isolation. Contrary to previously well-publicized studies, Pew researchers found that American’s are not nearly as isolated as previously reported. They also find that online participation and social media often improve people’s civic participation and that cell phone users have larger core social networks. The population for this report included a representative national sample of more than 2500 adults.

The Twitter and Status Updating, Fall 2009 report found that 19% percent of US adults use Twitter or another service that allows status updates. This is a rapid increase from late 2008 and early 2009 surveys that found 11% use. Mobile phones owners are more likely to use status update services and social networks that those without. Individuals with multiple Internet connected devices use Twitter at a much greater rate. Only 10% of users with a single device use Twitter, while 40% with four or more devices do. The population for this report more than 2200 adults.