Subscribe to Dr. Granville's newsletter

Vincent Granville's Posts (15)

  • Originally posted on DataScienceCentral.

    AWS - Amazon Web Services - allows you to deploy your analytic app or API on the cloud, and make it public.

    However, if you are co-located on a shared server and your "neighbors" are criminals engaging in click fraud or email spamming, your IP address will be blocked by most IP blacklist vendors such as Spamhaus. This is nothing new, but what is new here is the very high proportion of bad neighbors found on AWS, and Amazon does not have the technology to detect them.

    A classified study by Vincent Granville shows that 20% of click fraud on large ad networks, comes from hundreds of AWS IP addresses - the largest single source of fraud, bigger than any single traditional Botnet. Spamhaus catches about 15% of the bad IP addresses in question. The remaining IP addresses can easily be detected using IP address clustering techniques or large scale (distributed) nslookups. In some security circles, people have suggested to block all Internet traffic from Ashburn, Virginia, as this is where a large AWS server farm (infested by criminals) is located, with new IP addresses popping up every day.

    If you share your IP address with one of these criminals (even though you are a good guy), your clients might just not be able to access your services, as your AWS public website/folders will be blocked by most browsers.

    So what are the solutions - safer cloud - to host your analytic app if you don't have a budget for a dedicated IP address? Even a dedicated IP address is not great if it's located in an IP address block filled with blacklisted IP addresses.

    There are plenty of articles about this issue, click here to find many of them.

    Here are two comments posted by members:

    • It is based on analyzing web traffic from a large ad network, and finding that a significant proportion of the click fraud is from AWS IP addresses blacklisted by Spamhaus and several other similar providers (Adometry, etc.) and using our own rules. Bad IP address results in email blacklisting and other issues, for AWS users on the wrong address, just like it happens for any shared IP address that get blacklisted because of a single (very nasty) black sheep. I can't share the document, I signed a NDA with the client.
    • This is a sidebar..It's worse than that. Amazon is prob trying to mitigate malevolent users internally, however there is a nasty expolitation in VM's that are popular with Amazon users. Xen and Vmware according to this report from sec researchers, can be cross hacked from a malevolent VM. The paper is straightforward and I post it here for your delectation http://eprint.iacr.org/2014/248.pdf. The paper in general says that if you are collocated on the same metal malevolent clients can grab all your AES crypto key info from your vm app and penetrate your other infrastructure via the Bernstein correlation methods . There are ways to prevent it but like everything else it is costly ( not every chip set has AES-IN enabled) or counter to the purpose of the public clouds ( no colloc ) Security is hard to do on the cloud infrastructure. It's the reason why I was told to stop working with "sensitive" "big" data on the cloud for now.

    Read more…
  • Join us for our latest Messaging News / DSC Webinar on July 17th, 2014

    Voltage

    Space is limited.

    Reserve your Webinar seat now
     
    Join us July 17th at 9am PDT for our latest Messaging News Webinar Series: Rethinking Email Security: Best Practices to Protect and Maintain Private Communications, Sponsored by Voltage Security.

    2014 has been called “The Year of Encryption”. The recent data breaches, personal identity theft cases, and email snooping concerns have put a spotlight on the importance of protecting sensitive data, both inside and outside the enterprise. Email is invaluable to enterprises. It’s the easiest mode of communication which also makes it an easy target for data theft.

    In this webinar we will explore the key concepts and best practices to protect and maintain private email communications and why you need to rethink your email security. You will learn:
    • Best practices for securing sensitive email communications.
    • When is the right time to replace your legacy email security.
    • How to protect sensitive email information as it flows to and from the enterprise.
    • Latest insights on Identity-Based Encryption and Stateless Key Management.
    • How other enterprises deploy the world’s most popular email security solution.
    Panelists:
    Michael Osterman of Osterman Research
    Mark Schweighardt of Voltage Security

    Hosted by: Tim Matteson, Cofounder, Messaging News
     
    Title:  Rethinking Email Security: Best Practices to Protect and Maintain Private Communications
    Date:  Thursday, July 17th, 2014
    Time:  9:00 AM - 10:00 AM PDT
     
    Again, Space is limited so please register early:
    Reserve your Webinar seat now
     
    After registering you will receive a confirmation email containing information about joining the Webinar.
    Read more…
  • I am doing some research to compress data available as tables (rows and columns, or cubes) more efficiently. This is the reverse data science approach: instead of receiving compressed data and applying statistical techniques to extract insights, here, we are looking at uncompressed data, extract all possible insights, and eliminate everything but the insights, to compress the data.

    In this process, I was wondering if one can design an algorithm that can compress any data set, by at least one bit. Intuitively, the answer is clearly no, otherwise you could recursilvely compress any data set to 0 bit. Any algorithm will compress some data sets, and make some other data sets bigger after compression. Data that looks random, that has no pattern, can not be compressed. I have seen contests offering an award if you find a compression algorithm that defeats this principle, but it would be a waste of time participating.

    But what if you design an algorithm that, when a data set can not be compressed, leaves the data set unchanged? Would you be able, on average, to compress any data set then? Note that if you assemble numbers together to create a data set, the resulting data set would be mostly random. In fact, the vast majority of all data sets, are almost random and not compressible. But data sets resulting from experiments are usually not random, but they represent a tiny minority of all potential data sets. In practice this tiny minority represents all data sets that data scientists are confronted to.

    It turns out that the answer is no. Even if you leave uncompressible data sets "as is" and compress those that can be compressed, on average, the compression factor (of any data compression algorithm) will be negative. The explanation is as follows: you need to add 1 bit to any data set: this extra bit tells you whether the data set is compressed using your algorithm, or left uncompressed. This extra bit makes the whole thing impossible. Interestingly, there have been official patents claiming that all data can be compressed. These are snake oil (according to the founder of the GZIP compressing tool), it is amazing that they were approved by the patent office.

    Anyway, here's the mathematical proof, in simple words.

    Theorem

    There is no algorithm that, on average, will successfully compress any data set, even it leaves uncompressible data sets uncompressed.

    Proof

    Let y be a multivariate vector with integer values, representing the compressed data. Let say that y can take on m different values. Let x be the original data, and for any x, y=f(y) represents the compressed data.

    How many solutions can we have to the equation f(y) ∈ S, where S is a set that has k distinct elements? Let denote the number of solutions in question as n. In other words, how many different values can n take, if the uncompressed data can take on k potential values? Note that n depends on k and m. Now we need to prove that:

    [1] n * (1 + log2 m) + (k -n ) * (1 + log2 k) ≥ k log2 k

    where: 

    • log2 m is the number of bits required to store the compressed data
    • log2 k is the number of bits required to store the uncompressed data 
    • the number 1 corresponds to the extra bit necessary to tell whether we store the data in compressed or uncompressed format
    • k log2 k represents the number of bits required to store ALL data sets of size k, as is, without using any compression whatsoever
    • n * (1 + log2 m) + (k -n ) * (1 + log2 k) represents the number of bits required to store ALL data sets, compressing data sets if and only if efficient, leaving them uncompressed when compression is inefficient
    • n is the number of data sets (out of k) that can be compressed efficiently
    • log2 is the logarithm, in base 2

    The proof consists in showing that the left hand side of the equation [1] is always larger than the right hand side (k log2 k)

    In practice, m < k, otherwise the result is obvious and meaningless (if m > k, it means that your compression algorithm ALWAYS increases the size of the initial data set, regardless of the data set). As a result, we have

    [2] n ≤ m, and n ≤ k

    Equation [1] can be written as n * log2 (m / k) + k ≥ 0. And since m < k, we have

    [3] n ≤ k / log2 (k / m).

    Equation [3] is always verified when m < k and [2] is satisfied. Indeed k / log2 (k / m) is always minimum (for a given k) when m = 1, and since n ≤ k / log2 k, the theorem is proved.

    Read more…
  • Proposal for bulk email processing

    Bulk email represents one of the largest portions of legitimate emailing (spam is not included in this category). Sending bulk email requires a lot of bandwidth, and technical expertize to obtain high delivery rates. Newsletter that you subscribe too are typically sent via newsletter management companies, such as Vertical Response, MailChimp, Constant Contact or iContact. It is also expensive, with $10,000 per year to manage a 100,000 mailing list (including mailing, unsubscribes, reporting, A/B testing, resolving issues with ISP's and blacklisting services such as Spamhaus, and so on).

    What if Gmail, Yahoo mail and Hotmail (they account to more than 60% of email addresses targeted by bulk email) offered the following services to make bulk emailing less bandwidth-consuming, and easier to monitor. Any time you send a newsletter to more than (say) 50,000 Gmail recipients, here is how it works:

    1. You upload (automatically, via an API) your list of Gmail addresses to a specified Google server
    2. You email your message to a single gmail address (managed by Google for this purpose) 
    3. Google then distribute your single message to all 50,000 Gmail recipients that are in your list.

    This achieves the following goal: Gmail actually distributes the message (not you), using Google servers that are close to their Gmail servers. There is also one fewer node between the sender (the mailing list management company) and the recipient, thus saving considerable bandwidth. In short, it benefits both the sender, Gmail and the recipient (the latter one benefits thanks to better monitoring capability by Gmail, to block a message when deemed spammy).

    There is a problem: what if you send a customized email to 50,000 recipents? For instance, the message starts with "Hi [Your Name]". The workaround is simple: Gmail could accept a few macros in your message, such as [Your Name], and deliver the customized version to all 50,000. All is needed is a very rudimentary macro language. And of course, the mailing list uploaded on Google servers must contain the email address but also the first name., for this type of customized message.

    Related articles

    Read more…
  • This is potentially one of the worst nightmares for security experts. This type of fraud has been observed in the context of click fraud, but the payload potential is far bigger if it ever gets implemented to steal bank account login/password.

    About the scheme:

    An infected user - his computer has been infected by a virus, and (say) Firefox is now corrupt on his computer - tries to logon to his bank account. He types the correct domain name (say www.key.com) on the URL box in Firefox, and the real key.com webpage in question shows up. But when the key.com page shows us on the browser, everything is legit except the key.com login box that was substituted, on the fly, by a script on your hijacked computer, planted by a Botnet client who wants to access your bank account to make wire transfers to his account.

    Once you enter your loging/password in the box, your info gets transferred to the criminals. If the criminals are smart enough, you won't notice anything: atfer entering your credentials, maybe you get served a genuine key.com error page, but it's too late: criminals got your login/password and are now wiring all your money to external bank accounts.

    A potential strategy, for criminals to make this system more effective, is to have the Botnet operator send millions of email messages to users known to be infected by its Botnet. The Botnet operator just have to send a message (that will look very legitimate), providing the real URL for you to sign up on your real key.com account, knowing that your browser is infected.

    While I haven't seen any scheme like this so far (involving hijacking your bank account via browser sign-on Trojan through browser infection), I've seen the exact same scheme used in the context of click fraud, deployed by a company known as MediaForce.com, still operating as of today, substituting genuine banner ads by fake ones - to promote their porn and Viagra ads from their clients.

    Read more…
  • Here's a new idea for Google to make money and cover the costs of processing / filtering billions of messages per day.

    This is a solution to eliminate spam as well, without too many false positives as currently.

    Solution: Google to create its own newsletter management system!

    Or at least, Google works with major providers (Vertical Response, Constant Contact, iContact, Mail Chimp etc.) to allow their clients (the companies sending billions of messages each day, such as LinkedIn) to pay a fee based on volume. The fee would help the sender to not end up in Gmail spam box, as long as it complies with Google policies. Even better: let Google offer this newsletter management service directly to clients who want to reach Gmail more effectively, under Google's controls and conditions.

    I believe Google is now in position to offer this service, as more than 50% of new personal email accounts currently created are Gmail, and they last much longer than any corporate email accounts (you don't lose your Gmail account when you lose your job). Indeed, we would be one of the first clients to sign up with Gmail Contact (that's the name I have invented for the Google newsletter management service). Google could reasonably charge $100 per 20,000 messages sent to Gmail accounts: the potential revenue is huge.

    If Google would offer this service internally (rather than through a 3rd party such as Constant Contact), they would make more money and have more control, and the task of eliminating spam would be easier and less costly.

    Currently, since Google offers none of these services, we face the following issues:

    • A big component in Gmail anti-spam technology is collaborative filtering algorithms: your newsletter quickly ends up in the spam box, a few milliseconds after the delivery process has started, if too many users complaint about it, do not open it, or don't click
    • Thus fraudsters can create tons of fake Gmail accounts to boost the "open" and "click" rates so that their spam goes through, leveraging collaborative filtering to their advantage
    • Fraudsters can also use tons of fake Gmail accounts to fraudulently and massively flag email received from real companies or competitors, as fraud.
    • Newsletter are delivered way too fast: 100,000 messages are typically delivered in 5 minutes by newsletter management companies. If Gmail was delivering these newsletters via their own system (say Gmail Contact), then it could deliver much more slowly, and thus do a much better job at controlling spam without creating tons of false positives.

    In the meanwhile, a solution for companies regularly sending newsletters to a large number of subscribers is to:

    1. Create a special segment for all Gmail accounts, and use that segment more sparingly. In our case, it turns out that our Gmail segment is the best one (among all our segments), in terms of low churn, open and click rate - if we do not use it too frequently, and reserve it for our best messages.
    2. Ask your newsletter management vendor to use a dedicated IP to send messages
    3. Every three months, remove all subscribers who never open or even those who never clicked (though you will lose good subscribers with email clients having images turned off)
    4. Create SFP records.
    Read more…
  • Guest blog by Vincent Granville, first posted here.

    Here's some simple JavaScript code to encode numbers, such as credit card numbers, passwords made up of digits, phone numbers, social security numbers, dates such as 20131014 etc.

    NSA Headquarters

    How does it work?

    1. Open our web app in a different browser tab
    2. Enter number to encode / decode in box, on the web page in question
    3. Select Encrypt / Decrypt
    4. Email the encoded number (it should start with e) to your contact
    5. Your contact use the same form, enters the encoded number, select Encrypt / Decrypt, and then the original number is immediately retrieved.

    This code is very simple, it is by no means strong encryption. It is indeed less sophisticated than uuencode. But uuencode is for geeks, while our app is easy to use by any mainstream people. The encoded value is also a text string, easy to copy and paste in any email client. The encoded value has some randomness, in the sense that encoding twice the same values will result in two different encoded values. Finally, it is more secure than it seems at first glance, if you don't tell anyone (except over the phone) where the decoder can be found. I will create a version that accepts parameters, to make it even more secure.

    Related articles

    Here's the JavaScript / HTML code for those interested (this is the source code of the web page where our application is hosted). You could save it as an HTML document on your local machine, with file name (say)encode.html in a folder (say) C://Webpages, and then open and run it from a browser on your local machine: the URL for this local webpage would be \\/C:/Webpages/encode.html if you use Chrome.  

    <html>
    <script language="Javascript">
    <!--
    function encrypt2() {
      var form=document.forms[0] 
      if (form.encrypt.checked) {
        form.cardnumber.value=crypt(form.cardnumber.value)
      } else {
        form.cardnumber.value=decrypt(form.cardnumber.value) 
      }
    }
    function crypt(string) {
      var len=string.length
      var intCarlu
      var carlu
      var newString="e"
      if ((string.charCodeAt(i)!=101)&&(len>0)) {
        for (var i=0; i<len; i++) {
          intCarlu=string.charCodeAt(i)
          rnd=Math.floor(Math.random()*7)
          newIntCarlu=30+10*rnd+intCarlu+i-48
          if (newIntCarlu<48) { newIntCarlu+=50 }
          if (newIntCarlu>=58 && newIntCarlu<=64) { newIntCarlu+=10 }
          if (newIntCarlu>=90 && newIntCarlu<=96) { newIntCarlu+=10 }
          carlu=String.fromCharCode(newIntCarlu)
          newString=newString.concat(carlu)
        }
        return newString
      } else {
        return string
      }
    }
    function decrypt(string) {
      var len=string.length
      var intCarlu
      var carlu
      var newString=""

      if (string.charCodeAt(i)==101) { 
        for (var i=1; i<len; i++) {
          intCarlu=string.charCodeAt(i)
          carlu=String.fromCharCode(48+(intCarlu-i+1)%10) 
          newString=newString.concat(carlu)
        }
        return newString
      } else {
        return string
      }
    }
    // -->
    </script>


    <form>
    Enter Number <input type=text name=cardnumber size=19><p>
    Encrypt / Decrypt <input type=checkbox name=encrypt onClick="encrypt2()">
    </form> 
    </html>

    Read more…
  • A new type of weapons-grade secure email

    Guest blog by Vincent Granville, first posted here.

    With email encryption being targeted by the government as if it was criminal activity (read the story about the Lavabit platform), this could be a great opportunity for mathematicians and data scientists: creating a startup that offers encrypted email that no government or entity could ever decrypt, offering safe solutions to corporations who don't want their secrets stolen by competitors, criminals or the government.

    Key on a sheet with encrypted data Stock Photo - 13903139

    Here's the kind of email platform that I have in mind:

    • It is offered as a web app, for text-only messages limited to 100 KB. You copy and paste your text on some web form hosted on some web server (referred to as A). You also create a password for retrieval, maybe using a different app that creates long, random, secure passwords. When you click on submit, the text is encrypted and made accessible on some other web server (referred to as B). A shortened URL is displayed on your screen: that's where you or the recipient can read the encrypted text.
    • You call (or fax) the recipient, possibly from and to a public phone, provide him with the shortened URL and password necessary to retrieve and decrypt the message. 
    • The recipient visit the shortened URL, enter your password, and can read the unencrypted message online (on server B). The encrypted text is deleted once the recipient has read it, or 48 hours after the encrypted message was created, whichever comes first.
    • The encryption algorithm (which adds semi-random text to your message prior to encryption, and also has an encrypted time stamp, and won't work if no semi-random text is added first), is such that (i) the message can never be decrypted after 48 hours (if the encrypted version is intercepted) as a self-destruction mechanism is embedded into the encrypted message and into the executable file itself, and (ii) if you encrypt twice the same message (even an empty message or one consisting of just one character), the two encrypted versions will be very different, of random length and at least 1 KB in size, to make reverse-engineering next to impossible. Maybe the executable file that does perform the encryption would change every 3-4 days for increased security and to make sure a previously encrypted message can no longer be decrypted (you would have the old version and new version simultaneously available on B for just 48 hours).
    • The executable file (on A) tests if it sits on the right IP address before doing any encryption, to prevent it from being run on (say) a government server. This feature is encrypted within the executable code. The same feature is incorporated into the executable file used to decrypt the message, on B.
    • A crime detection system is embedded in the encryption algorithm, to prevent criminals from using the system, by detecting and refusing to encrypt messages that seem suspicious (child pornography, terrorism, fraud, hate speech etc.)
    • The platform is monetized via paid advertising, by advertisers such as bitcoin and anti-virus software.
    • The URL associated with B can be anywhere, change all the time, or based on the password provided by the user, and located outside US. 
    • The URL associated with A must be more static. This is a weakness as it can be taken down by the government. However a workaround consists in using several specific keywords for this app, such as (say) ArmuredMail, so that if A is down, a new website based on the same keywords will emerge elsewhere, allowing for uninterrupted service (the user would have to do a Google search for ArmuredMail to find one website - a mirror of A - that works).
    • Finally, no unencrypted text is stored anywhere.

    Indeed, the government could create such an app and disguise it as a private enterprise: it would in this case be an honeypot app. Some people worry that the government is tracking everyone and that you could get in trouble (your Internet connection shut down, bank account frozen) because you posted stuff that the government algorithms deem extremely dangerous, maybe a comment about pressure cookers. At the same time, I believe the threat is somewhat exaggerated. While there is a risk for false positives, you will never be sent in jail for talking about pressure cooker recipes (at worst, you'll get a visit from the NSA - someone indeed did). While big data and big brother are getting bigger and more powerful every second, the number of available cells in prison is not increasing. Maybe it is even decreasing. So even if magically, millions of people suddenly wanted to become law enforcement, NSA, CIA or FBI agents (and the money was available to train and hire them), there is just simply not enough prison cells to accommodate more prisoners (US has the largest prison population of any country, measured as the proportion of people incarcerated at any given time).

    On the other side, many people seemed to be OK with increased regulations and more police. I think this is a side effect of living in an over-crowded world, with unsustainable population growth: the younger generation accepts or is forced into lower quality of life, having to share a small apartment with many roommates in over-crowded cities. They are more risk-adverse on average, and worry about all sorts of real issues such as increased terrorism, the risk of an epidemics, giant financial systems that could collapse under their own weight, pollution killing people at a younger age, etc. I believe eventually people will find solutions to escape from this environment, maybe by building floating cities, cities under the see, or underground cities. In my case, after many years of cubicle life and the morning and afternoon rat race (AKA the commute), I no longer drive to work, and have a much better lifestyle working from home 100% of the time - for the safest job one could ever wish to have: one that you created yourself, an adaptive, lean, agile enterprise that you founded yourself with a few great partners. But this is another story.

    Anyone interested in building this encryption app? Note that no system is perfectly safe. If there's an invisible camera behind you, filming everything you do on your computer, then my system offers no protection for you - though it would still be safe for the recipient, unless he also has a camera tracking all his computer activity. But the link between you and the recipient (the fact that both of you are connected) would be invisible to any third party. And increased security can be achieved if you use the web app from an anonymous computer - maybe from a public computer in some hotel lobby.

    Related articles

    Read more…
  • Originally posted on Analyticbridge.

    Here are two multiple-choice questions that could be used to uniquely characterize each human that will ever exist on Earth. Even twins will have different answers. It is expected no two human beings to have the same answers.

    First question: Order the following types of food, from your favorite (#1) to the one you like least (#9). Possible choices: fruit, vegetable, dairy, carbohydrate, red meat, poultry, fish, seafood, dessert.

    Second question: Order the following types of environment, from your favorite (#1) to the one you like least (#9). Possible choices: beach, mountain, desert, plain / rural, urban, small town, lake / river bank, hills, forest.

    The number of potential answers (that is, the number of potential orderings) for each question is factorial 9. The total number of potential answers for both questions is square of factorial 9, that is 132 billion.

    Of course some combinations are more likely to appear than others, some people will have a hard time ranking and would rather allow for ties, and if you've lived all your life in the same place eating the same food, you can't correctly answer these questions. Same if you are a little kid. But for most of us, this works and could even be used by companies such as match.com or advertisers. Also, this type of ID has the following advantages:

    • It is universal (it could even apply to dogs),
    • It is personal unlike arbitrary social security numbers,
    • You know what's in your ID (government IDs such as SSN might be hiding some encoded data about you, in your ID, for profiling purposes) 
    • It's easy to retrieve if lost (at least partially, which might be good enough) by answering the two questions
    • Unlike genome, this ID is (to a large extent) is independent from gender and race (or age)

    It may change over time as tastes change, but I think this is OK, your ID follows your personality. You might want to add a third question (maybe about favorite colors or climates) to increase the discriminating power, but I think it is not necessary.

    Potential Improvement

    Another option is to have more questions with fewer choices. For instance, 8 questions each with 4 choices (rather than 2 questions, each with 6 choices) would allow for pretty much the same number of unique IDs (a bit above 100 billion) but would be less error-prone, as people are more likely to correctly remember how their rank 4 items (e.g. colors), rather than 6 items. If you allow for only 2 choices per question, then you would need to ask 37 questions to cover 100+ billion unique IDs.

    Experimental design to choose good questions and good choices 

    The possible choices (answers) should be determined using experimental design and testing, not the other way around. Let's say that your first question is about food, with two choices: fish versus dirt. You do a test, you realize everybody rank fish as #1.  The test tells you that this is not a good, there will be lots of people with same ID. You change you choices from fish/dirt to fish/meat. Now you see that the distribution is more uniform. You continue testing till you have something good enough.

    You can even test choice stability: Ask a person to rank 9 choices today and in 7 days, retain the choices that

    1. are most stable over time and
    2. provide an even distribution (or as close as possible to uniform distribution)

    Related articles

    Read more…
  • Useful if you occasionally send/request credit card information by email or as text messages. This article was first published on DataScienceCentral.

    Here's some simple JavaScript code to encode numbers, such as credit card numbers, passwords made up of digits, phone numbers, social security numbers, dates such as 20131014 etc.

    NSA Headquarters

    How does it work?

    1. Open our web app in a different browser tab
    2. Enter number to encode / decode in box, on the web page in question
    3. Select Encrypt / Decrypt
    4. Email the encoded number (it should start with e) to your contact
    5. Your contact use the same form, enters the encoded number, select Encrypt / Decrypt, and then the original number is immediately retrieved.

    This code is very simple, it is by no means strong encryption. It is indeed less sophisticated than uuencode. But uuencode is for geeks, while our app is easy to use by any mainstream people. The encoded value is also a text string, easy to copy and paste in any email client. The encoded value has some randomness, in the sense that encoding twice the same values will result in two different encoded values. Finally, it is more secure than it seems at first glance, if you don't tell anyone (except over the phone) where the decoder can be found. I will create a version that accepts parameters, to make it even more secure.

    Related articles

    Here's the JavaScript / HTML code for those interested (this is the source code of the web page where our application is hosted). You could save it as an HTML document on your local machine, with file name (say)encode.html in a folder (say) C://Webpages, and then open and run it from a browser on your local machine: the URL for this local webpage would be \\/C:/Webpages/encode.html if you use Chrome.  

    <html>
    <script language="Javascript">
    <!--
    function encrypt2() {
      var form=document.forms[0] 
      if (form.encrypt.checked) {
        form.cardnumber.value=crypt(form.cardnumber.value)
      } else {
        form.cardnumber.value=decrypt(form.cardnumber.value) 
      }
    }
    function crypt(string) {
      var len=string.length
      var intCarlu
      var carlu
      var newString="e"
      if ((string.charCodeAt(i)!=101)&&(len>0)) {
        for (var i=0; i<len; i++) {
          intCarlu=string.charCodeAt(i)
          rnd=Math.floor(Math.random()*7)
          newIntCarlu=30+10*rnd+intCarlu+i-48
          if (newIntCarlu<48) { newIntCarlu+=50 }
          if (newIntCarlu>=58 && newIntCarlu<=64) { newIntCarlu+=10 }
          if (newIntCarlu>=90 && newIntCarlu<=96) { newIntCarlu+=10 }
          carlu=String.fromCharCode(newIntCarlu)
          newString=newString.concat(carlu)
        }
        return newString
      } else {
        return string
      }
    }
    function decrypt(string) {
      var len=string.length
      var intCarlu
      var carlu
      var newString=""

      if (string.charCodeAt(i)==101) { 
        for (var i=1; i<len; i++) {
          intCarlu=string.charCodeAt(i)
          carlu=String.fromCharCode(48+(intCarlu-i+1)%10) 
          newString=newString.concat(carlu)
        }
        return newString
      } else {
        return string
      }
    }
    // -->
    </script>


    <form>
    Enter Number <input type=text name=cardnumber size=19><p>
    Encrypt / Decrypt <input type=checkbox name=encrypt onClick="encrypt2()">
    </form> 
    </html>

    Read more…
  • How to benchmark a metric?

    In all our IT activities, we have, almost daily, to deal with numbers, tests and metrics. How do you decide on a specific metric, to measure some activity, such as spam score, server performance, compression factor when archiving data etc. Sometimes, the decision is straightforward (compression factor), sometimes not (spam score).
    Read more…
  • Originally posted on DataScienceCentralWe recommend sending a message similar to the following one to your subscribers, using a different mailing list management system, and testing with small batches of email addresses - 1,000 at a time, initially.

    If you have a Gmail account (and most of us do), you might have noticed a new feature that Google is slowly deploying across all users. Not everybody is impacted yet, though more and more users are every day, and it is still possible to reverse back to the old Gmail if you don't like it.

    Google, in the new Gmail version, displays three tabs at the top: PrimarySocial and Promotions email. Google automatically assigns a category to each message sent to your Gmail account. I invite you to check your Promotion tab: if you find any message (from us or other senders) that should be under the Primary tab, you can easily move the message fromPromotions to Primary and even change your settings so that all messages from the sender in question be directed to your primary tab. So you will never miss our career alertsweekly digests and other announcements (conferences, webinars, data science programs, visualization tools, new product releases, new books etc.).

    Here's how the new Gmail looks like, with the three tabs at the top (click on the image below to zoom in):

    Read more…
  • A Few Encryption Ideas

    Originally posted on DataScienceCentral.

    With email encryption being targeted by the government as if it was criminal activity (read the story about the Lavabit platform shut down by the government because it was used by Edward Snowden in the recent NSA leak), this could be a great opportunity for mathematicians and data scientists: creating a startup that offers encrypted email that no government or entity could ever decrypt, offering safe solutions to corporations who don't want their secrets stolen by competitors, criminals or the government.

    Key on a sheet with encrypted data Stock Photo - 13903139

    Here's the kind of email platform that I have in mind:

    • It is offered as a web app, for text-only messages limited to 100 KB. You copy and paste your text on some web form hosted on some web server (referred to as A). You also create a password for retrieval, maybe using a different app that creates long, random, secure passwords. When you click on submit, the text is encrypted and made accessible on some other web server (referred to as B). A shortened URL is displayed on your screen: that's where you or the recipient can read the encrypted text.
    • You call (or fax) the recipient, possibly from and to a public phone, provide him with the shortened URL and password necessary to retrieve and decrypt the message. 
    • The recipient visit the shortened URL, enter your password, and can read the unencrypted message online (on server B). The encrypted text is deleted once the recipient has read it, or 48 hours after the encrypted message was created, whichever comes first.
    • The encryption algorithm (which adds semi-random text to your message prior to encryption, and also has an encrypted time stamp, and won't work if no semi-random text is added first), is such that (i) the message can never be decrypted after 48 hours (if the encrypted version is intercepted) as a self-destruction mechanism is embedded into the encrypted message and into the executable file itself, and (ii) if you encrypt twice the same message (even an empty message or one consisting of just one character), the two encrypted versions will be very different, of random length and at least 1 KB in size, to make reverse-engineering next to impossible. Maybe the executable file that does perform the encryption would change every 3-4 days for increased security and to make sure a previously encrypted message can no longer be decrypted (you would have the old version and new version simultaneously available on B for just 48 hours).
    • The executable file (on A) tests if it sits on the right IP address before doing any encryption, to prevent it from being run on (say) a government server. This feature is encrypted within the executable code. The same feature is incorporated into the executable file used to decrypt the message, on B.
    • A crime detection system is embedded in the encryption algorithm, to prevent criminals from using the system, by detecting and refusing to encrypt messages that seem suspicious (child pornography, terrorism, fraud, hate speech etc.)
    • The platform is monetized via paid advertising, by advertisers such as bitcoin and anti-virus software.
    • The URL associated with B can be anywhere, change all the time, or based on the password provided by the user, and located outside US. 
    • The URL associated with A must be more static. This is a weakness as it can be taken down by the government. However a workaround consists in using several specific keywords for this app, such as (say) ArmuredMail, so that if A is down, a new website based on the same keywords will emerge elsewhere, allowing for uninterrupted service (the user would have to do a Google search for ArmuredMail to find one website - a mirror of A - that works).
    • Finally, no unencrypted text is stored anywhere.

    Indeed, the government could create such an app and disguise it as a private enterprise: it would in this case be an honeypot app. Some people worry that the government is tracking everyone and that you could get in trouble (your Internet connection shut down, bank account frozen) because you posted stuff that the government algorithms deem extremely dangerous, maybe a comment about pressure cookers. At the same time, I believe the threat is somewhat exaggerated. While there is a risk for false positives, you will never be sent in jail for talking about pressure cooker recipes (at worst, you'll get a visit from the NSA - someone indeed did). While big data and big brother are getting bigger and more powerful every second, the number of available cells in prison is not increasing. Maybe it is even decreasing. So even if magically, millions of people suddenly wanted to become law enforcement, NSA, CIA or FBI agents (and the money was available to train and hire them), there is just simply not enough prison cells to accommodate more prisoners (US has the largest prison population of any country, measured as the proportion of people incarcerated at any given time).

    On the other side, many people seemed to be OK with increased regulations and more police. I think this is a side effect of living in an over-crowded world, with unsustainable population growth: the younger generation accepts or is forced into lower quality of life, having to share a small apartment with many roommates in over-crowded cities. They are more risk-adverse on average, and worry about all sorts of real issues such as increased terrorism, the risk of an epidemics, giant financial systems that could collapse under their own weight, pollution killing people at a younger age, etc. I believe eventually people will find solutions to escape from this environment, maybe by building floating cities, cities under the see, or underground cities. In my case, after many years of cubicle life and the morning and afternoon rat race (AKA the commute), I no longer drive to work, and have a much better lifestyle working from home 100% of the time - for the safest job one could ever wish to have: one that you created yourself, an adaptive, lean, agile enterprise that you founded yourself with a few great partners. But this is another story.

    Anyone interested in building this encryption app? Note that no system is perfectly safe. If there's an invisible camera behind you, filming everything you do on your computer, then my system offers no protection for you - though it would still be safe for the recipient, unless he also has a camera tracking all his computer activity. But the link between you and the recipient (the fact that both of you are connected) would be invisible to any third party. And increased security can be achieved if you use the web app from an anonymous computer - maybe from a public computer in some hotel lobby.

    Related article:

    Read more…
  • This is related to data encryption and security. The article was originally posted on DataScienceCentral. Imagine that you need to transmit the details of a patent or a confidential financial transaction over the Internet. There are three critical issues:
    Read more…
  • Teleworking: Good or Bad?

    There is an interesting June 24, 2013 piece from David Amerlandon the problems associated with teleworking entitled The Real Problem In Working From Home (It’s Not What You Think). Among the good points made by the author are:
    Read more…