Tag Archives: data

Thumbnail

20 Conversion Optimization Tips for Zooming Past Your Competition

20 Conversion Optimization Tips for Zooming Past Your Competition

Conversion optimization (CRO) is one of the most impactful things you can do as a marketer.

I mean, bringing traffic to a website is important (because without traffic you’re designing for an audience of crickets). But without a cursory understanding of conversion optimization—including research, data-driven hypotheses, a/b tests, and analytical capabilities—you risk making decisions for your website traffic using only gut feel.

CRO can give your marketing team ideas for what you can be doing better to convert visitors into leads or customers, and it can help you discover which experiences are truly optimal, using A/B tests.

However, as with many marketing disciplines, conversion optimization is constantly misunderstood. It’s definitely not about testing button colors, and it’s not about proving to your colleagues that you’re right.

I’ve learned a lot about how to do CRO properly over the years, and below I’ve compiled 20 conversion optimization tips to help you do it well, too.

Conversion Optimization Tip 1:
Learn how to run an A/B test properly

Running an A/B test (an online controlled experiment) is one of the core practices of conversion optimization.

Testing two or more variations of a given page to see which performs best can seem easy due to the increased simplification of testing software. However, it’s still a methodology that uses statistical inference to make a decision as to which variant is best delivered to your audience. And there are a lot of fine distinctions that can throw things off.

What is A/B Testing?

There are many nuances we could get into here—Bayesian vs. frequentist statistics, one-tailed vs. two-tailed tests, etc.—but to make things simple, here are a few testing rules that should help you breeze past most common testing mistakes:

  • Always determine a sample size in advance and wait until your experiment is over before looking at “statistical significance.” You can use one of several online sample size calculators to get yours figured out.
  • Run your experiment for a few full business cycles (usually weekly cycles). A normal experiment may run for three or four weeks before you call your result.
  • Choose an overall evaluation criterion (or north star metric) that you’ll use to determine the success of an experiment. We’ll get into this more in Tip 4.
  • Before running the experiment, clearly write your hypothesis (here’s a good article on writing a true hypothesis) and how you plan to follow up on the experiment, whether it wins or loses.
  • Make sure your data tracking is implemented correctly so you’ll be able to pull the right numbers after the experiment ends.
  • Avoid interaction effects if you’re running multiple concurrent experiments.
  • QA your test setup and watch the early numbers for any wonky technical mistakes.

I like to put all of the above fine details in an experiment document with a unique ID so that it can be reviewed later—and so the process can be improved upon with time.

An example of experiment documentation
An example of experiment documentation using a unique ID.
Tip 1: Ensure you take the time to set up the parameters of your A/B test properly before you begin. Early mistakes and careless testing can compromise the results.

Conversion Optimization Tip 2:
Learn how to analyze an A/B test

The ability to analyze your test after it has run is obviously important as well (and can be pretty nuanced depending on how detailed you want to get).

For instance, do you call a test a winner if it’s above 95% statistical significance? Well, that’s a good place to begin, but there are a few other considerations as you develop your conversion optimization chops:

  • Does your experiment have a sample ratio mismatch?
    Basically, if your test was set up so that 50% of traffic goes to the control and 50% goes to the variant, your end results should reflect this ratio. If the ratio is pretty far off, you may have had a buggy experiment. (Here’s a good calculator to help you determine this.)
  • Bring your data outside of your testing tool.
    It’s nice to see your aggregate data trends in your tool’s dashboard, and their math is a good first look, but I personally like to have access to the raw data. This way you can analyze it in Excel and really trust it. You can also import your data to Google Analytics to view the effects on key segments.

This can also open up the opportunity for further insights-driven experiments and personalization. Does one segment react overwhelmingly positive to a test you’ve run? Might be a good opportunity to implement personalization.

Checking your overall success metric first (winner, loser, inconclusive) and then moving to a more granular analysis of segments and secondary effects is common practice among CRO practitioners.

Here’s how Chris McCormick from PRWD explains the process:

Once we have a high level understanding of how the test has performed, we start to dig below the surface to understand if there are any patterns or trends occurring. Examples of this would be: the day of the week, different product sets, new vs returning users, desktop vs mobile etc.

Also, there are tons of great A/B test analysis tools out there, like this one from CXL:

AB Test Calculator
Tip 2: Analyze your data carefully by ensuring that your sample ratio is correct. Then export it to a spreadsheet where you can check your overall success metric before moving on to more granular indicators.

Conversion Optimization Tip 3:
Learn how to design your experiments

At the beginning, it’s important to consider the kind of experiment you want to run. There are a few options in terms of experimental design (at least, these are the most common ones online):

  1. A/B/n test
  2. Multivariate test
  3. Bandit test

A/B/n test

An A/B/n test is what you’re probably most used to.

It splits traffic equally among two or more variants and you determine which test won based on its effect size (assuming that other factors like sample size and test duration were sufficient).

ABCD Test Example
An A/B test with four variants: Image source

Multivariate test

In a multivariate test, on the other hand, you can test several variables on a page and hope to learn what the interaction effects are among elements.

In other words, if you were changing a headline, a feature image, and a CTA button, in a multivariate test you’d hope to learn which is the optimal combination of all of these elements and how they affect each other when grouped together.

A Multivariate Test

Generally speaking, it seems like experts run about ten a/b tests for every multivariate test. The strategy I go by is:

  • Use A/B testing to determine best layouts at a more macro-level.
  • Use MVT to polish the layouts to make sure all the elements interact with each other in the best possible way.

Bandit test

Bandits are a bit different. They are algorithms that seek to automatically update their traffic distribution based on indications of which result is best. Instead of waiting for four weeks to test something and then exposing the winner to 100% traffic, a bandit shifts its distribution in real time.

Experimental Design: Bandits

Bandits are great for campaigns where you’re looking to minimize regrets, such as short-term holiday campaigns and headline tests. They’re also good for automation at scale and targeting, specifically when you have lots of traffic and targeting rules and it’s tough to manage them all manually.

Unfortunately, while they are simpler from an experimental design perspective, they are much harder for engineers to implement technically. This is probably why they’re less common in the general marketing space, but an interesting topic nonetheless. If you want to learn more about bandits, read this article I wrote on the topic a few years ago.

Tip 3: Consider the kind of experiment you want to run. Depending on your needs, you might run an A/B/n test, a multivariate test, a bandit test, or some other form of experimental design.

Conversion Optimization Tip 4:
Choose your OEC

Returning to a point made earlier, it’s important to choose which north star metric you care about: this is your OEC (Overall Evaluation Criterion). If you don’t state this and agree upon it up front as stakeholders in an experiment, you’re welcoming the opportunity for ambiguous results and cherry-picked data.

Basically, we want to avoid the problem of HARKing: hypothesizing after results are known.

Twitter, for example, wrote on their engineering blog that they solve this by stating their overall evaluation criterion up front:

One way we guide experimenters away from cherry-picking is by requiring them to explicitly specify the metrics they expect to move during the set-up phase….An experimenter is free to explore all the other collected data and make new hypotheses, but the initial claim is set and can be easily examined.

The term OEC was popularized by Ronny Kohavi at Microsoft, and he’s written many papers that include the topic, but the sentiment is widely known by people who run lots of experiments. You need to choose which metric really matters, and which metric you’ll make decisions with.

Tip 4: In order to avoid ambiguous or compromised data, state your OEC (Overall Evaluation Criterion) before you begin and hold yourself to it. And never hypothesize after results are known.

Conversion Optimization Tip 5:
Some companies shouldn’t A/B test

You can still do optimization without A/B testing, but not every company can or should run A/B tests.

It’s a simple mathematical limitation:

Some businesses just don’t have the volume of traffic or discrete conversion events to make it worth running experiments.

Getting an adequate amount of traffic to a test ultimately helps ensure its validity, and you’ll need this as part of your sample size to ensure a test is cooked.

In addition, even if you could possibly squeeze out a valid test here and there, the marginal gains may not justify the costs when you compare it to other marketing activities in which you could engage.

That said, if you’re in this boat, you can still optimize. You can still set up adequate analytics, run user types on prototypes and new designs, watch session replays, and fix bugs.

Running experiments is a ton of fun, but not every business can or should run them (at least not until they bring some traffic and demand through the door first).

Tip 5: Determine whether your company can or even should run A/B tests. Consider both your volume of traffic and the resources you’ll need to allocate before investing the time.

Conversion Optimization Tip 6:
Landing pages help you accelerate and simplify testing

Using landing pages is correlated with greater conversions, largely because using them makes it easier to do a few things:

  • Measure discrete transitions through your funnel/customer journey.
  • Run controlled experiments (reducing confounding variables and wonky traffic mixes).
  • Test changes across templates to more easily reach a large enough sample size to get valid results.

To the first point, having a distinct landing page (i.e. something separate and easier to update than your website) gives you an easy tracking implementation, no matter what your user journey is.

For example, if you have a sidebar call to action that brings someone to a landing page, and then when they convert, they are brought to a “Thank You” page, it’s very easy to track each step of this and set up a funnel in Google Analytics to visualize the journey.

Google Analytics Funnel

Landing pages also help you scale your testing results while minimizing the resource cost of running the experiment. Ryan Farley, co-founder and head of growth at LawnStarter, puts it this way:

At LawnStarter, we have a variety of landing pages….SEO pages, Facebook landing pages, etc. We try to keep as many of the design elements such as the hero and explainer as similar as possible, so that way when we run a test, we can run it sitewide.

That is, if you find something that works on one landing page, you can apply it to several you have up and running.

Tip 6: Use landing pages to make it easier to test. Unbounce lets you build landing pages in hours—no coding required—and conduct unlimited A/B tests to maximize conversions.

Conversion Optimization Tip 7:
Build a growth model for your conversion funnel

Creating a model like this requires stepping back and asking, “how do we get customers?” From there, you can model out a funnel that best represents this journey.

Most of the time, marketers set up simple goal funnel visualization in Google Analytics to see this:

Google Analytics Funnel Visualization

This gives you a lot of leverage for future analysis and optimization.

For example, if one of the steps in your funnel is to land on a landing page, and your landing pages all have a similar format (e.g. offers.site.com), then you can see the aggregate conversion rate of that step in the funnel.

More importantly, you can run interesting analyses, such as path analysis and landing page comparison. Doing so, you can compare apples to apples with your landing pages and see which ones are underperforming:

Landing Page Comparison
The bar graph on the right allows you to quickly see how landing pages are performing compared to the site average.

I talk more about the process of finding underperforming landing pages in my piece on content optimization if you want to learn step-by-step how to do that.

Tip 7: Model out a funnel that represents the customer journey so that you can more easily target underperforming landing pages and run instructive analyses focused on growth.

Conversion Optimization Tip 8:
Pick low hanging fruit in the beginning

This is mostly advice from personal experience, so it’s anecdotal: when you first start working on a project or in an optimization role, pick off the low hanging fruit. By that, I mean over-index on the “ease” side of things and get some points on the board.

It may be more impactful to set up and run complex experiments that require many resources, but you’ll never pull the political influence necessary to set these up without some confidence in your abilities to get results as well as in the CRO process in general.

To inspire trust and to be able to command more resources and confidence, look for the easiest possible implementations and fixes before moving onto the complicated or risky stuff.

And fix bugs and clearly broken things first! Persuasive copywriting is pretty useless if your site takes days to load or pages are broken on certain browsers.

Tip 8: Score some easy wins by targeting low hanging fruit before you move on to more complex optimization tasks. Early wins give you the clout to drive bigger experiments later on.

Conversion Optimization Tip 9:
Where possible, reduce friction

Most conversion optimization falls under two categories (this is simplified, but mostly true):

  • Increasing motivation
  • Decreasing friction

Friction occurs when visitors become distracted, when they can’t accomplish a task, or simply when a task is arduous to accomplish. Generally speaking, the more “nice to have” your product is, the more friction matters to the conversion. This is reflected in BJ Fogg’s behavior model:

BJ FOGGs Behaviour Model

In other words, if you need to get a driver’s license, you’ll put up with pure hell at the DMV to get it, but you’ll drop out of the funnel at the most innocent error message if you’re only trying to buy something silly on drunkmall.com.

A few things that cut down on friction:

  • Make your site faster.
  • Trim needless form fields.
  • Cut down the amount of steps in your checkout or signup flow.

For an example on the last one, I like how Wordable designed their signup flow. You start out on the homepage:

Wordable

Click “Try It Free” and get a Google OAuth screen:

Wordable 0auth

Give permissions:

Wordable permissions

And voila! You’re in:

Wordable Dashboard

You can decrease friction by reducing feelings of uncertainty as well. Most of the time, this is done with copywriting or reassuring design elements.

An example is with HubSpot’s form builder. We emphasize that it’s “effortless” and that there is “no technical expertise required” to set it up:

Hubspot Form Builder

(And here’s a little reminder that HubSpot integrates beautifully with Unbounce, so you’ll be able to automatically populate your account with lead info collected on your Unbounce landing pages.)

Tip 9: Cut down on anything that makes it harder for users to convert. This includes making sure your site is fast and trimming any forms or steps that aren’t necessary for checkout or signup.

Conversion Optimization Tip 10:
Help increase motivation

The second side of the conversion equation, as I mentioned, is motivation.

An excellent way to increase the motivation of a visitor is simply to make the process of conversion…fun. Most tasks online don’t need to be arduous or frustrating, we’ve just made them that way due to apathy and error.

Take, for example, your standard form or survey. Pretty boring, right?

Well, today, enough technological solutions exist to implement interactive or conversational forms and surveys.

One such solution is Survey Anyplace. I asked their founder and CEO, Stefan Debois, about how their product helps motivate people to convert, and here’s what he said:

An effective and original way to increase conversion is to use an interactive quiz on your website. Compared to a static form, people are more likely to engage in a quiz, because they get back something useful. An example is Eneco, a Dutch Utility company: in just 6 weeks, they converted more than 1000 website visitors with a single quiz.

Full companies have been built on the premise that the typical form is boring and could be made more fun and pleasant to complete (e.g. TypeForm). Just think, “how can I compel more people to move through this process?”

Other ways to do this that are quite commonplace involve invoking certain psychological triggers to compel forward momentum:

  • Implement social proof on your landing pages.
  • Use urgency to compel users to act more quickly.
  • Build out testimonials with well-known users to showcase authority.

There are many more ways to use psychological triggers to motivate conversions. Check out Robert Cialdini’s classic book, Influence, to learn more. Also, check out The Wheel of Persuasion for inspiration on persuasive triggers.

Tip 10: Make your conversion process fun in order to compel your visitors to keep moving forward. Increased interactivity, social proof, urgency, and testimonials that showcase authority can all help you here too.

Conversion Optimization Tip 11:
Clarity > Persuasion

While persuasion and motivation are really important, often the best way to convert visitors is to ensure they understand what you’re selling.

Stated differently, clarity trumps persuasion.

Use a five-second test to find out how clear your messaging is.

Conversion Optimization Tip 12:
Consider the “Pre-Click” Experience

People forget the pre-click experience. What does a user do before they hit your landing pages? What ad did they click? What did they search in Google to get to your blog post?

Knowing this stuff can help you create strong message match between your pre-click experience and your landing page.

Sergiu Iacob, SEO Manager at Bannersnack, explains their process for factoring in keywords:

When it comes to organic traffic, we establish the user intent by analyzing all the keywords a specific landing page ranks for. After we determine what the end result should look like, we adjust both our landing page and our in app user journey. The same process is used in the optimization of landing pages for search campaigns.

I’ve recommended the same thing before when it comes to capturing email leads. If you can’t figure out why people aren’t converting, figure out what keywords are bringing them to your site.

Usually, this results in a sort of passive “voice of customer” mining, where you can message match the keywords you’re ranking for with the offer on that page.

It makes it much easier to predict what messages your visitors will respond to. And it is, in fact, one of the cheapest forms of user research you can conduct.

AHRefs Keywords
Using Ahrefs to determine what keywords brought traffic to a page.
Tip 12: Don’t forget the pre-click experience. What do your users do before they hit your landing page? Make sure you have a strong message match between your ads (or emails) and the pages they link to.

Conversion Optimization Tip 13:
Build a repeatable CRO process

Despite some popular blog posts, conversion optimization isn’t about a series of “conversion tactics” or “growth hacks.” It’s about a process and a mindset.

Here’s how Peep Laja, founder of CXL, put it:

The quickest way to figure out whether someone is an amateur or a pro is this: amateurs focus on tactics (make the button bigger, write a better headline, give out coupons etc) while pros have a process they follow.

And, ideally, the CRO process is a never-ending one:

CRO Process

Conversion Optimization Tip 14:
Invest in education for your team

CRO people have to know a lot about a lot:

  • Statistics
  • UX design
  • User research
  • Front end technology
  • Copywriting

No one comes out the gate as a 10 out of 10 in all of those areas (most never end up there either). You, as an optimizer, need to be continuously learning and growing. If you’re a manager, you need to make sure your team is continuously learning and growing.

Conversion Optimization Tip 15:
Share insights

The fastest way to scale and leverage experimentation is to share your insights and learnings among the organization.

This becomes more and more valuable the larger your company grows. It also becomes harder and harder the more you grow.

Essentially, by sharing you can avoid reinventing the wheel, you can bring new teammates up to speed faster, and you can scale and spread winning insights to teams who then shorten their time to testing. Invest in some sort of insights management system, no matter how basic.

Full products have been built around this, such as GrowthHackers’ North Star and Effective Experiments.

Effective Experiments
Tip 15: Share what you learn within your organization. The bigger your company grows, the more important information sharing becomes—but the more difficult it will become as well.

Conversion Optimization Tip 16:
Keep your cognitive biases in check

As the great Richard Feynman once said, “The first principle is that you must not fool yourself and you are the easiest person to fool.”

We’re all afflicted by cognitive biases, ranging from confirmation bias to the availability heuristic. Some of these can really impact our testing programs, specifically confirmation bias (and its close cousin, the Texas Sharpshooter Fallacy) where you only seek out pieces of data that confirm your previous beliefs and throw out those that go against them.

Experimenter Bias

It may be worthwhile (and entertaining) simply to run down Wikipedia’s giant list of cognitive biases and gauge where you may currently be running blind or biases.

Tip 16: Be cognizant of your own cognitive biases. If you’re not careful, they can influence the outcome of your experiments and cause you to miss (or misinterpret) key insights in your data.

Conversion Optimization Tip 17:
Evangelize CRO to your greater org

Having a dedicated CRO team is great. Evangelizing the work you’re doing to the rest of the organization? Even better.

Evangelize your CRO
Spread the word about the importance of CRO within your org.

When an entire organization buys into the value of data-informed decision making and experimentation, magical things can happen. Ideas burst forth, and innovation becomes easy. Annoying roadblocks are deconstructed. HiPPO-driven decision making is deprioritized behind proper experiments.

Things you can do to evangelize CRO and experimentation:

  • Write down your learnings each week on a company wiki.
  • Send out a newsletter with live experiments and experiment results each week to interested parties.
  • Recruit an executive sponsor with lots of internal influence.
  • Sing your praises when you get big wins. Sing it loud.
  • Make testing fun, and make it easier for others to join in and pitch ideas.
  • Make it easier for people outside of the CRO team to sponsor tests.
  • Say the word “hypothesis” a lot (who knows, it might work).

This is all a kind of art; there are no universal methods for spreading the good gospel of CRO. But it’s important that you know it’s probably going to be something of an uphill battle, depending on how big your company is and what the culture has traditionally been like.

Tip 17: Spread the gospel of CRO across your organization in order to ensure others buy into the value of data-driven decision making and experimentation.

Conversion Optimization Tip 18:
Be skeptical with CRO case studies

This isn’t so much a conversion optimization tip as it is life advice: be skeptical, especially when marketing is involved.

I say this as a marketer. Marketers exaggerate stuff. Some marketers omit important details that derail a narrative. Sometimes, they don’t understand p values, or how to set up a proper test (maybe they haven’t read Tip 1 in this article).

In short, especially in content marketing, marketers are incentivized to publish sensational case studies regardless of their statistical merit.

All of that results in a pretty grim standard for the current CRO case study.

Don’t get me wrong, some case studies are excellent, and you can learn a lot from them. Digital Marketer lays out a few rules for detecting quality case studies:

  • Did they publish total visitors?
  • Did they share the lift percentage correctly?
  • Did they share the raw conversions? (Does the lack of raw conversions hurt my case study?)
  • Did they identify the primary conversion metric?
  • Did they publish the confidence rate? Is it >90%?
  • Did they share the test procedure?
  • Did they only use data to justify the conclusion?
  • Did they share the test timeline and date?

Without context or knowledge of the underlying data, a case study might be a whole lot of nonsense. And if you want a good cathartic rant on bad case studies, then Andrew Anderson’s essay is a must-read.

According to a study...
Tip 18: Approach existing material on CRO with a skeptical mindset. Marketers are often incentivized to publish case studies with sensational results, regardless of the quality of the data that supports them.

Conversion Optimization Tip 19:
Calculate the cost of additional research vs. just running it

Matt Gershoff, CEO of Conductrics, is one of the smartest people I know regarding statistics, experimentation, machine learning, and general decision theory. He has stated some version of the following on a few occasions:

  • Marketing is about decision-making under uncertainty.
  • It’s about assessing how much uncertainty is reduced with additional data.
  • It must consider, “What is the value in that reduction of uncertainty?”
  • And it must consider, “Is that value greater than the cost of the data/time/opportunity costs?”

Yes, conversion research is good. No, you shouldn’t run blind and just test random things.

But at the end of the day, we need to calculate how much additional value a reduction in uncertainty via additional research gives us.

If you can run a cheap A/B test that takes almost no time to set up? And it doesn’t interfere with any other tests or present an opportunity cost? Ship it. Because why not?

But if you’re changing an element of your checkout funnel that could prove to be disastrous to your bottom line, well, you probably want to mitigate any possible downside. Bring out the heavy guns—user testing, prototyping, focus groups, whatever—because this is a case where you want to reduce as much uncertainty as possible.

Tip 19: Balance the value of doing more research with the costs (including opportunity costs) associated with it. Sometimes running a quick and dirty A/B test will be sufficient for your needs.

Conversion Optimization Tip 20:
CRO never ends

You can’t just run a few tests and call it quits.

The big wins from the early days of working on a relatively unoptimized site may taper off, but CRO never ends. Times change. Competitors and technologies come and go. Your traffic mix changes. Hopefully, your business changes as well.

As such, even the best test results are perishable, given enough time. So plan to stick it out for the long run and keep experimenting and growing.

Think Kaizen.

Kaizen

Conclusion

There you go, 20 conversion optimization tips. That’s not all there is to know; this is a never-ending journey, just like the process of growth and optimization itself. But these tips should get you started and moving in the right direction.

This article:

20 Conversion Optimization Tips for Zooming Past Your Competition

Thumbnail

Experimentation in product development: How you can maximize the customer experience

Hila Qu, Vice President of Growth at Acorns, has a theory. She says there are two kinds of product managers:…Read blog postabout:Experimentation in product development: How you can maximize the customer experience

The post Experimentation in product development: How you can maximize the customer experience appeared first on WiderFunnel Conversion Optimization.

View this article: 

Experimentation in product development: How you can maximize the customer experience

Thumbnail

Suffering From Analysis Paralysis? You Should See An Optimization Specialist

crazy egg analysis tips

Have you ever faced down a giant table or spreadsheet of data and thought, “I have no idea what to do with this”? As marketers we’ve all probably had those deer-in-the-headlights moments once or twice, where we’ve floundered to figure out what the hell we’re looking at. Crazy Egg was built on the premise of simplicity and ease of use, for those that I fondly like to call “Google Analytics-averse” – but there’s always room for improvement when it comes to helping folks switch from analysis to action mode. Whether you’re a UX designer, small business owner, SEO expert or…

The post Suffering From Analysis Paralysis? You Should See An Optimization Specialist appeared first on The Daily Egg.

Link: 

Suffering From Analysis Paralysis? You Should See An Optimization Specialist

Thumbnail

Getting Started With Machine Learning




Getting Started With Machine Learning

Alvin Wan



The goal of machine learning is to find patterns in data and use those patterns to make predictions. It can also give us a framework to discuss machine learning problems and solutions — as you’ll see in this article.

First, we will start with definitions and applications for machine learning. Then, we will discuss abstractions in machine learning and use that to frame our discussion: data, models, optimization models, and optimization algorithms. Later on in the article, we will discuss fundamental topics that underlie all machine learning methods and conclude with practical guidance for getting started with using machine learning. By the end, you should have an understanding of how to advance your practice and study of machine learning.

Let’s begin.

So, What Exactly Is Machine Learning?

Machine learning is generically a set of techniques to find patterns in data. Applications range from self-driving cars to personal AI assistants, from translating between French and Taiwanese to translating between voice and text. There are a few common applications of machine learning that already or could potentially permeate your day-to-day.

  1. Detecting anomalies
    Recognize spikes in website traffic or highlight abnormal bank activity.
  2. Recommend similar content
    Find products you may be looking for or even Smashing Magazine articles that are relevant.
  3. Predict the future
    Plan the path of neighboring vehicles or identify and extrapolate market trends for stocks.

The above are few of many applications of machine learning, but most applications tie back to learning the underlying distribution of data. A distribution specifies events and probability of each event. For example:

  • With 50% probability, you buy an item $5 or less.
  • With 25% probability, you buy an item $5-$10.
  • With 24% probability, you buy an item $10-100.
  • With 1% probability, you buy an item > $100.

Using this distribution, we can accomplish all of our tasks above:

  1. Detecting anomalies
    With a $100 purchase, we can confidently call this an anomaly.
  2. Recommend similar content
    A purchase of $3 means we should recommend more items $5 or less.
  3. Predict the future
    Without any prior information, we can predict that the next purchase will be $5 or less.

With a distribution of data, we can accomplish a myriad of tasks. In sum, one goal in machine learning is to learn this distribution.

Even more generically, our goal is to learn a specific function with particular inputs and outputs. We call this function our model. Our input is denoted x. Say our model, which accepts input x, is

f(x) = ax

Here, a is a parameter of our model. Each parameter corresponds to a different instance of our model. In other words, the model where a=2 is different from the model where a=3. In machine learning, our goal is to learn this parameter, changing it until we do “well.” How do we determine which values of a do “well”?

We need to define a way to evaluate our model, for each parameter a. To start, the output of f(x) is our prediction. We will refer to y as our label, meaning the true and desired output. With our predictions and our labels, we can define a loss function. One such loss function is simply the difference between our prediction and our label, |f(x) - y|. Using this loss function, we can then evaluate different parameters for our model. Picking the best parameter for our model is known as training. If we have a few possible parameters, we can simply try each parameter and pick the one with the smallest loss!

However, most problems are not as simple. What happens if there are an infinite number of different parameters? Let’s say all decimal values between 0 and 1? Between 0 and infinity? This brings us to our next topic: abstractions in machine learning. We will discuss different facets of machine learning, to compartmentalize your knowledge into data, models, objectives, and methods of solving objectives. Beyond learning the right parameter, there are plenty of other challenges: how do we break down a problem as complex as controlling a robot? How do we control a self-driving car? What does it mean to train a model that identifies faces? The section below will help you organize answers to these questions.

Abstractions

There are countless topics in machine learning — at various levels of specificity. To better understand where each piece fits in the larger picture, consider the following abstractions for machine learning. These abstractions compartmentalize our discussion of machine learning topics, and knowing them will make it easier for you to frame topics. The following classifications are taken from Professor Jonathan Shewchuck at UC Berkeley:

  1. Application and Data
    Consider the possible inputs and the desired output for the problem.
    Questions: What is your goal? How is your data structured? Are there labels? Is it reasonable for us to extract output from the provided inputs?

    Example: The goal is to classify pictures of handwritten digits. The input is an image of a handwritten number. The output is a number.

  2. Model
    Determine the class of functions under consideration.
    Questions: Are linear functions sufficient? Quadratic functions? Polynomials? What types of patterns are we interested in? Are neural networks appropriate? Logistic regression?

    Example: Linear regression

  3. Optimization Problem
    Formulate a concrete objective in mathematics.
    Questions: How do we define loss? How do we define success? Should we apply additionally penalties to bias our algorithm? Are there imbalances in the data our objective needs to consider?

    Example: Find `x` that minimizes |Ax-b|^2

  4. Optimization Algorithm
    Determine how you will solve the optimization problem.
    Questions: Can we compute a solution by hand? Do we need an iterative algorithm? Can we convert this problem to an equivalent but easier-to-solve objective, and solve that one?

    Example: Take derivative of the function. Set it to zero. Solve for our optimal parameter.

Abstraction 1: Data

In practice, collecting, managing, and packaging data is 90% of the battle. The data contains samples in which each sample is a specific realization of our input. For example, our input may generically be images of dogs. The first sample is specifically a picture of Maxie, my Bernese Mountain dog-chow chow mix at home. The second sample is specifically a picture of Charlie, a young corgi.

While training your model, it is important to handle your data properly. This means separating our data accordingly and not peeking prematurely at any set of data. In general, our data is split into three portions:

  1. Training set
    This is the dataset you train your model on. The model may see this set hundreds of times.
  2. Validation set
    This is the dataset you evaluate your model on, to assess accuracy and tune your model or method accordingly.
  3. Test set
    This is the dataset you evaluate on to assess accuracy, once at the very end. Running on the test set prematurely could mean your model overfits to the test set as well, so run only once. We will discuss the notion of “overfitting” in more detail below.

Abstraction 2: Models

Machine learning methods are split into the following two:

Supervised Learning

In supervised learning, our algorithm has access to labeled data. Still, we explore the following two classes of problems:

  • Classification
    Determine which of k classes C_1, C_2, ... C_k to which each sample belongs, e.g. “Which breed of dog is this?” The dog could be one of "corgi", "bernese mountain dog", "chow chow"...
  • Regression
    Determine a real-valued output (which are often probabilities), e.g. “What is the probability this patient has neuroblastoma (eye cancer)?”
Unsupervised Learning

In unsupervised learning, our algorithm does not have access to labels, and we explore the following classes of problems:

  • Clustering
    Cluster samples into k clusters. We do not have a label for the resulting clusters. “Which DNA sequences are most similar?”
  • Dimensionality reduction
    Reduce the number of “unique” (linearly independent) features we consider. “What are common features of faces?”

Abstraction 3: Optimization Objective

Before discussing optimization objectives and algorithms, we’ll need an example to discuss. Least squares are the canonical example. We will restrict our attention to a specific form of least squares: Let us return to our grade-school problem of fitting a line to some points.

Let’s recall the equation of a line:

y = m * x + b

Assume we have such a line. This is the true underlying model.


 true model


True model. The line that generates our data. (Large preview)

Now, sample points from this line.


true data


True data. Data that is sampled from the true model. (Large preview)

For each point, jiggle it a little bit. In other words, add noise, which is random perturbations. This noise is due to real-world processes.


noise


Noise. Real-world perturbations that affect our data. This may be due to imprecision in measurements, lossy compression, and so on. (Large preview)

This gives us our observed data. We will call these points (x_1, y_1), (x_2, y_2), (x_3, y_3).... This is the training data we are given to train a model on. We do not have access to the underlying line that generated this data (the original green line).


observations


Observations. Our true data with noise and ultimately what we will use to train a model. (Large preview)

Say we have an estimate for the parameters of a line. In this case, the parameters are m and b. This gives us a predicted line, drawn in blue below.


proposed model


Proposed model. The result of training a model on our observations. (Large preview)

We wish to evaluate our blue line, to see how accurate it is. To start, we use m and b to estimate y. We compute a set of ŷ values.

ŷ_i = m * x_i + b

The error for a single predicted ŷ_i and true y_i is simply

(ŷ_i−y_i)^2

Our total error is then the sum of squared differences, across all samples. This yields our loss.

∑(ŷ_i−y_i)^2

Presented visually, this is the vertical distance between our observed points and our predicted line.


observed error


Observed error. The distance between our observed data and our proposed model. (Large preview)

Plugging in ŷ_i from above, we then have the total error in terms of m and b.

∑(m * x_i + b − y_i)^2

Finally, we want to minimize this quantity. This yields our objective function, abstraction 3 from our list of abstractions above.

min_m, b ∑(m * x_i + b−y_i)^2

The above states in mathematics that the goal is to minimize the loss by changing values of m and b. The purpose of this section was to motivate fitting a line of best of fit, a special case of least squares. Additionally, we showed examined the least squares objective. Next, we need to solve this objective.

Abstraction 4: Optimization Algorithm

How do we minimize this? We take the derivative with respect to m`, set to 0 and solve. After solving, we obtain the analytical solution. Solving for an analytical solution was our optimization algorithm, the fourth and final abstraction in our list of abstractions.

Note: The important portion of this section is to inform you that least squares have a closed form solution, meaning that the optimal solution for our problem can be computed, explicitly. To understand why this is significant, we need to examine a problem without a closed-form solution. For example, we could never solve x=logx for a standard base-10 logarithm. Try graphing these two lines, and we see that they never intersect. In which case, we have no closed-form solution. On the other hand, ordinary least squares have a closed-form — which is good news. For any problem reduced to least squares, we can then compute the optimal solution, given our data and assumptions.

Fundamental Topics

Before studying more methods, it is necessary to understand the undercurrents of machine learning. These will govern the initial study of machine learning:

Bias-Variance Tradeoffs

One of machine learning’s most dreaded evils is overfitting in which a model is too closely tailored to the training data. In the limit, the most overfit model will memorize the data. This might mean that if one does well on exam A, one repeats every detail for exam B — down to the duration of an inter-exam restroom trip and whether or not one used the urinal.

A related but less common evil is underfitting, where the model is not sufficiently expressive to capture important information in the data. This could mean that one looks only at homework scores to predict exam scores, ignoring the effects of reading notes, completing practice exams, and more. Our goal is to build a model that generalizes to new examples while making the appropriate distinctions.

Given these two evils, there are a variety of approaches to fighting both. One is modifying your optimization objective to include a term that penalizes model complexity. Another is tuning hyperparameters that govern either your objective or your algorithm, which may correspond to notions such as “training speed” or “momentum.” The bias-variance tradeoff gives us a precise way of defining and handling both overfitting and underfitting.

Maximum Likelihood Estimation (MLE) + Maximum A Posteriori (MAP)

Say we have ice cream flavors A, B, and C. We observe different recipes. Our goal is to predict which flavor each recipe produces.

One way to predict flavors based on recipes is to first estimate the following probability:

P(flavor|recipe)

Given this probability and a new recipe, how can we predict the flavor? Given a recipe, simply consider the probability of each of the flavors A, B, C.

P(flavor=A|recipe) = 0.4
P(flavor=B|recipe) = 0.5
P(flavor=C|recipe) = 0.1

Then, pick the flavor that has the highest probability. Above, flavor B has the highest probability, given our recipe. Thus, we predict flavor B. Restating the above rule in mathematics, we have:

argmax_flavor P(flavor|recipe)  # argmax means take the flavor that corresponds to the max value

However, the only information at our disposal is the reverse: the probability of some recipe given the flavor.

P(recipe|flavor)

For Maximum Likelihood Estimates, we make assumptions and find that the two values are proportional.

P(recipe|flavor) ~ P(flavor|recipe)

Since we’re only interested in the class with maximum probability P(flavor|recipe), we can simply find the class with maximum probability, for a proportional value P(recipe|flavor).

argmax_flavor P(recipe|flavor)

MLE offers the above objective as one way to predict, using the probability of data given the labels.

However, allow me to convince you that it’s reasonable to assume we have (x|y). We can estimate this from observed, real-world data. For example, say we wish to estimate the number of marbles each student in your class carries, based on the number of rubber ducks the student carries.

Each student’s number of rubber ducks is the data x, and the number of marbles she or he has is y. We will use this sample data below.

| x | y |
|---|---|
| 1 | 2 |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 1 | 2 |

For every y, we can compute the number of x, given us P(x|y). For the first one, P(x=1|y=1), consider all of the rows where y=1. There are 2, and only one of them has x=1. Therefore, P(x=1|y=1) = 12. We can repeat this for all values of x and y.

P(x=1|y=1) = 1/2
P(x=2|y=1) = 1/2
P(x=1|y=2) = 3/4
P(x=2|y=2) = 1/4

Featurizations, Regularization

Least squares draw lines of best fit for us. Note that least squares can fit the model anytime the model is linear in its inputs x and outputs y.

Say m=1. We have the following equation:

y = x + b

However, what if we had data that doesn’t generally follow a line? Specifically, consider a set of data sampled along a circle. Recall that the equation for a circle is:

x^2 + y^2 = r^2

Can least squares fit this well? As it stands, no. The model is not linear in its inputs x and outputs y. Instead, the model above is quadratic in x and y. However, it turns out that we can use still use least squares, just with a modification. To accomplish this, we featurize our samples.

Consider the following: what if the input to our model was x_ = x^2 and y_ = y^2? Then, our model is trying to learn the following model.

x_ + y_ = r^2

Is this linear in the model’s input x_ and output y_? Yes. Note the subtlety. The current model is still quadratic in x,y but it is linear in x_,y_. This means that least squares can fit the data if we square x^2 and y^2 before training least squares.

More generally, we can take any non-linear featurization to apply least squares to labels that are non-linear in the features. This is a fairly powerful tool, known as featurization.

However, featurizations lead to more complex models. Regularization allows us to penalize model complexity, ensuring that we do not overfit the training data.

Conclusion

In this article, you’ve touched on major topics in the fundamentals of machine learning. Using the abstractions above, you now have a framework to discuss machine learning problems and solutions. Using the fundamental topics above, you now also have quintessential concepts to learn more about, giving you the necessary tools to evaluate risk and other concerns in a machine learning application.

Further Reading

We will continue to explore these topics in depth, both the undercurrents of machine learning and specific methods. In the interim, here are resources to further your study and exploration of machine learning:

Smashing Editorial
(ra, il)


Continue reading here: 

Getting Started With Machine Learning

Thumbnail

Best Practices For Mobile Form Design




Best Practices For Mobile Form Design

Nick Babich



(This article is kindly sponsored by Adobe.) Forms are the linchpin of all mobile interactions; it stands between the person and what they’re looking for. Every day, we use forms for essential online activities. Recall the last time you bought a ticket, booked a hotel room or made a purchase online — most probably those interactions contained a step with filling out a form.

Forms are just a means to an end. Users should be able to complete them quickly and without confusion. In this article, you’ll learn practical techniques that will help you design an effective form.

What Makes For An Effective Form

The primary goal with every form is completion. Two factors have a major impact on completion rate:

  • Perception of complexity
    The first thing users do when they see a new form is estimate how much time is required to complete it. Users do this by scanning the form. Perception plays a crucial role in the process of estimation. The more complex a form looks, the more likely users will abandon the process.
  • Interaction cost
    Interaction cost is the sum of efforts — both cognitive and physical — that the users put into interacting with an interface in order to reach their goal. Interaction cost has a direct connection with form usability. The more effort users have to make to complete a form, the less usable the form is. A high interaction cost could be the result of data that is difficult to input, an inability to understand the meaning of some questions, or confusion about error messages.

The Components Of Forms

A typical form has the following five components:

  • Input fields
    These include text fields, password fields, checkboxes, radio buttons, sliders and any other fields designed for user input.
  • Field labels
    These tell users what the corresponding input fields mean.
  • Structure
    This includes the order of fields, the form’s appearance on the page, and the logical connections between different fields.
  • Action buttons
    The form will have at least one call to action (the button that triggers data submission).
  • Feedback
    Feedback notifies the user about the result of an operation. Feedback can be positive (for example, indicating that the form was submitted successfully) or negative (saying something like, “The number you’ve provided is incorrect”).

This article covers many aspects related to structure, input fields, labels, action buttons and validation. Most points mentioned in this article have visual do and don’t examples; all such examples were created using Adobe XD.

Input Fields

When it comes to form design, the most important thing a designer can do is to minimize the need for typing. Reducing input effort is essential. Designers can achieve this goal by focusing on form field design.

Minimize The Total Number Of Fields

Every field you ask users to fill out requires some effort. The more effort is needed to fill out a form, the less likely users will complete the form. That’s why the foundational rule of form design is shorter is better — get rid of all inessential fields.

Baymard Institute analyzed checkout forms and found that a too long or too complicated checkout process is one of the top reasons for abandonment during checkout. The study found that the average checkout contains almost 15 form fields. Most online services could reduce the number of fields displayed by default by 20 to 60%.




Top reasons for abandonment during checkout. (Image: Baymard Institute) (Large preview)

Many designers are familiar with the “less is more” rule; still, they ask additional questions in an attempt to gather more data about their users. It might be tempting to collect more data about your users during the initial signup, but resist that temptation. Think about it this way: With every additional field you add to your form, you increase the chance of losing a prospective user. Is the information you gain from a field worth losing new users? Remember that, as long as you’ve collected a user’s contact information, you can always follow up with a request for more data.

Clearly Distinguish All Optional Fields

Before optimizing optional fields, ask yourself whether you really need to include them in your form. Think about what information you really need, not what you want. Ideally, the number of optional fields in your form should be zero.

If after a brainstorming session, you still want to include a few optional questions in your form, make it clear for users that those fields are optional:

  • Mark optional fields instead of mandatory ones.
    If you ask as little as possible, then the vast majority of fields in your form will be mandatory. Therefore, mark only those fields in the minority. For instance, if five out of six fields are mandatory, then it makes sense to mark only one field as optional.
  • Use the “Optional” label to denote optional fields.
    Avoid using the asterisk (*) to mean “optional.” Not all users will associate the asterisk with optional information, and some users will be confused by the meaning (an asterisk is often used to denote mandatory fields).

Clearly distinguish all optional fields.


Clearly distinguish all optional fields. (Large preview)

Size Fields Accordingly

When possible, use field length as an affordance. The length of an input field should be in proportion to the amount of information expected in the field. The size of the field will act as a visual constraint — the user will know how much text is expected to be entered just by looking at the field. Generally, fields such as ones for area codes and house numbers should be shorter than ones for street addresses.


The size of a field is used as a visual constraint.


The size of a field is used as a visual constraint. (Large preview)

Offer Field Focus

Auto-focus the first input field in your form. Auto-focusing a field gives the user an indication and a starting point, so that they are able to quickly start filling out the form. By doing that, you reduce the interaction cost — saving the user one unnecessary tap.

Make the active input field prominent and focused. The field focus itself should be crystal clear — users should be able to understand at a glance where the focus is. It could be an accented border color or a fade-in of the box.


Amazon puts strong visual focus on the input field.


Amazon puts strong visual focus on the input field. (Large preview)

Don’t Ask Users To Repeat Their Email Address

The reason why an extra field for the email address is so popular among product developers is apparent: Every company wants to minimize the risk of hard bounces (non-deliverables caused by invalid email addresses). Unfortunately, following this approach doesn’t guarantee that you’ll get a valid address. Users often copy and paste their address from one field to another.


Avoid asking users to retype their email address.


Avoid asking users to retype their email address. (Large preview)

Provide “Show Password” Option

Duplicating the password input field is another common mistake among product designers. Designers follow this approach because they believe it will prevent users from mistyping a password. In reality, a second field for a password not only increases interaction cost, but also doesn’t guarantee that users will proceed without mistakes. Because users don’t see what they’ve entered in the field, they can make the same mistake twice (in both fields) and will face a problem when they try to log in using a password. As Jakob Nielsen summarized:

Usability suffers when users type in passwords and the only feedback they get is a row of bullets. Typically, masking passwords doesn’t even increase security, but it does cost you business due to login failures.

Instead of duplicating the password field, provide an option that allows users to view the password they have chosen to create. Have an icon or checkbox that unmasks the password when clicked. A password preview can be an opportunity for users to check their data before sending.


Show password' option


Not being able to see what you’re typing is a huge issue. Providing a ‘Show password’ option next to the password field will help to solve this problem. (Large preview)

Don’t Slice Data Fields

Do not slice fields when asking for a full name, phone number or date of birth. Sliced fields force the user to make additional taps to move to the next field. For fields that require some formatting (such as phone numbers or a date of birth), it’s also better to have a single field paired with clear formatting rules as its placeholder.


“Full name” field


Avoid splitting input fields; don’t make people jump between fields. Instead of asking for a first name and last name in two separate fields, have a single ‘Full name’ field. (Large preview)

Avoid Dropdown Menus

Luke Wroblewski famously said that dropdowns should be the UI of last resort. Dropdowns are especially bad for mobile because collapsed elements make the process of data input harder on a small screen: Placing options in a dropdown requires two taps and hides the options.

If you’re using a dropdown for selection of options, consider replacing it with radio buttons. They will make all options glanceable and also reduce the interaction cost — users can tap on the item and select at once.




(Large preview)

Use Placeholders And Masked Input

Formatting uncertainty is one of the most significant problems of form design. This problem has a direct connection with form abandonment — when users are uncertain of the format in which they should provide data, they can quickly abandon the form. There are a few things you can do to make the format clear.

Placeholder Text

The text in an input field can tell users what content is expected. Placeholder text is not required for simple fields such as “Full name”, but it can be extremely valuable for fields that require data in a specific format. For example, if you design search functionality for tracking a parcel, it would be good to provide a sample tracking number as a placeholder for the tracking-number field.




(Large preview)

It’s vital that your form should have a clear visual distinction between the placeholder text and the actual value entered by the user. In other words, placeholder text shouldn’t look like a preset value. Without clear visual distinction, users might think that the fields with placeholders already have values.

Masked Input

Field masking is a technique that helps users format inputted text. Many designers confuse field masking with placeholder text — they are not the same thing. Unlike placeholders, which are basically static text, masks automatically format the data provided by the user. In the example below, the parentheses, spaces and dashes appear on the screen automatically as a phone number is entered.

Masked input also makes it easy for users to validate information. When a phone number is displayed in chunks, it makes it easier to find and correct a typo.

Masked input for a phone number. (Image: Josh Morony)

Provide Matching Keyboard

Mobile users appreciate apps and websites that provide an appropriate keyboard for the field. This feature prevents them from doing additional actions. For example, when users need to enter a credit card number, your app should only display the dialpad. It’s essential to implement keyboard matching consistently throughout the app (all forms in your app should have this feature).

Set HTML input types to show the correct keypad. Seven input types are relevant to form design:

  • input type="text" displays the mobile device’s normal keyboard.
  • input type="email" displays the normal keyboard and ‘@’ and ‘.com’.
  • input type="tel" displays the numeric 0 to 9 keypad.
  • input type="number" displays a keyboard with numbers and symbols.
  • input type="date" displays the mobile device’s date selector.
  • input type="datetime" displays the mobile device’s date and time selector.
  • input type="month" displays the mobile device’s month and year selector.



When users tap into a field with credit card number, they should see a numerical dialpad — all numbers, no letters. (Large preview)

Use A Slider When Asking For A Specific Range

Many forms ask users to provide a range of values (for example, a price range, distance range, etc.). Instead of using two separate fields, “from” and “to”, for that purpose, use a slider to allow users to specify the range with a thumb interaction.


Sliders are good for touch interfaces because they allow users to specify a range without typing.


Sliders are good for touch interfaces because they allow users to specify a range without typing. (Large preview)

Clearly Explain Why You’re Asking For Sensitive Information

People are increasingly concerned about privacy and information security. When users see a request for information they consider as private, they might think, “Hm, why do they need this?” If your form asks users for sensitive information, make sure to explain why you need it. You can do that by adding support text below relevant fields. As a rule of thumb, the explanation text shouldn’t exceed 100 characters.


A request for a phone number in a booking form might confuse users. Explain why you are asking for it.


A request for a phone number in a booking form might confuse users. Explain why you are asking for it. (Large preview)

Be Careful With Static Defaults

Unlike smart defaults, which are calculated by the system based on the information the system has about users, static defaults are preset values in forms that are the same for all users. Avoid static defaults unless you believe a significant portion of your users (say, 95%) would select those values — particularly for required fields. Why? Because you’re likely to introduce errors — people scan forms quickly, and they won’t spend extra time parsing all of the questions; instead, they’ll simply skip the field, assuming it already has a value.

Protect User Data

Jef Raskin once said, “The system should treat all user input as sacred.” This is absolutely true for forms. It’s great when you start filling in a web form and then accidentally refresh the page but the data remains in the fields. Tools such as Garlic.js help you to persist a form’s values locally until the form is submitted. This way, users won’t lose any precious data if they accidentally close the tab or browser.

Automate Actions

If you want to make the process of data input as smooth as possible, it’s not enough to minimize the number of input fields — you should also pay attention to the user effort required for the data input. Typing has a high interaction cost — it’s error-prone and time-consuming, even with a physical keyboard. But when it comes to mobile screens, it becomes even more critical. More typing increases the user’s chance of making errors. Strive to prevent unnecessary typing, because it will improve user satisfaction and decrease error rates.

Here are a few things you can do to achieve this goal:

Autocomplete

Most users experience autocompletion when typing a question in Google’s search box. Google provides users with a list of suggestions related to what the user has typed in the field. The same mechanism can be applied to form design. For example, a form could autocomplete an email address.

This form suggests the email host and saves users from typing a complete address. (Image: GitHub)
Autocapitalize

Autocapitalizing makes the first letter a capital automatically. This feature is excellent for fields like names and street addresses, but avoid it for password fields.

Autocorrect

Autocorrection modifies words that appear to be misspelled. Turn this feature off for unique fields, such as names, addresses, etc.

Auto-filling of personal details

Typing an address is often the most cumbersome part of any online signup form. Make this task easier by using the browser function to fill the field based on previously entered values. According to Google’s research, auto-filling helps people fill out forms 30% faster.

Address prefill. Image: Google

Use The Mobile Device’s Native Features To Simplify Data Input

Modern mobile devices are sophisticated devices that have a ton of amazing capabilities. Designers can use a device’s native features (such as camera or geolocation) to streamline the task of inputting data.

Below are just a few tips on how to make use of sensors and device hardware.

Location Services

It’s possible to preselect the user’s country based on their geolocation data. But sometimes prefilling a full address can be problematic due to accuracy issues. Google’s Places API can help solve this problem. It uses both geolocation and address prefilling to provide accurate suggestions based on the user’s exact location.

Address lookup using Google Places API. (Image: Chromatic HQ) (Large preview)

Using location services, it’s also possible to provide smart defaults. For example, for a “Find a flight” form, it’s possible to prefill the “From” field with the nearest airport to the user based on the user’s geolocation.

Biometric Authorization

The biggest problem of using a text password today is that most people forget passwords. 82% of people can’t remember their passwords, and 5 to 10% of sessions require users to reset a password. Password recovery is a big deal in e-commerce. 75% of users wouldn’t complete a purchase if they had to attempt to recover their password while checking out.

The future of passwords is no passwords. Even today, mobile developers can take advantage of biometric technologies. Users shouldn’t need to type a password; they should be able to use biometric readers for authentication — signing in using a fingerprint or face scanning.


eBay took advantage of the biometrics functionality on smartphones. Users can use their thumbprint to login into their eBay account.


eBay took advantage of the biometrics functionality on smartphones. Users can use their thumbprint to login into their eBay account. (Large preview)

Camera

If your form asks users to provide credit card details or information from their driver’s license, it’s possible to simplify the process of data input by using the camera as a scanner. Provide an option to take a photo of the card and fill out all details automatically.

Let users scan their identity card, instead of having to fill out their credit card information manually. (Image: blinkid)

But remember that no matter how good your app fills out the fields, it’s essential to leave them available for editing. Users should be able to modify the fields whenever they want.

Voice

Voice-controlled devices, such as Apple HomePod, Google Home and Amazon Echo, are actively encroaching on the market. The number of people who prefer to use voice for common operations has grown significantly. According to ComScore, 50% of all searches will be voice searches by 2020.




How people in the US use smart speakers (according to comScore) (Large preview)

As users get more comfortable and confident using voice commands, they will become an expected feature of mobile interactions. Voice input provides a lot of advantages for mobile users — it’s especially valuable in situations when users can’t focus on a screen, for example, while driving a car.

When designing a form, you can provide voice input as an alternative method of data input.




Google Translate provides an option to enter the text for translation using voice. (Large preview)

Field Labels

Write Clear And Concise Labels

The label is the text that tells users what data is expected from them in a particular input field. Writing clear labels is one of the best ways to make a form more accessible. Labels should help the user understand what information is required at a glance.

Avoid using complete sentences to explain. A label is not help text. Write succinct and crisp labels (a word or two), so that users can quickly scan your form.

Place The Label And Input Close Together

Put each label close to the input field, because the eye will visually know they’re tied together.


A label and its field should be visually grouped, so that users can understand which label belongs to which field.


A label and its field should be visually grouped, so that users can understand which label belongs to which field. (Large preview)

Don’t Use Disappearing Placeholder Text As Labels

While inline labels look good and save valuable screen estate, these benefits are far outweighed by the significant usability drawbacks, the most critical of which is the loss of context. When users start entering text in a field, the placeholder text disappears and forces people to recall this information. While it might not be a problem for simple two-field forms, it could be a big deal for forms that have a lot of fields (say, 7 to 10). It would be tough for users to recall all field labels after inputting data. Not surprisingly, user testing continually shows that placeholders in form fields often hurt usability more than help.


Don’t use placeholder text that disappears when the user interacts with the field.


Don’t use placeholder text that disappears when the user interacts with the field. (Large preview)

There’s a simple solution to the problem of disappearing placeholders: the floating (or adaptive) label. After the user taps on the field with the label placeholder, the label doesn’t disappear, it moves up to the top of the field and makes room for the user to enter their data.

Floating labels assure the user that they’ve filled out the fields correctly. (Image: Matt D. Smith)

Top-Align Labels

Putting field labels above the fields in a form improves the way users scan the form. Using eye-tracking technology for this, Google showed that users need fewer fixations, less fixation time and fewer saccades before submitting a form.

Another important advantage of top-aligned labels is that they provide more space for labels. Long labels and localized versions will fit more easily in the layout. The latter is especially suitable for small mobile screens. You can have form fields extend the full width of the screen, making them large enough to display the user’s entire input.




(Large preview)

Sentence Case Vs. Title Case

There are two general ways to capitalize words:

  • Title case: Capitalize every word. “This Is Title Case.”
  • Sentence case: Capitalize the first word. “This is sentence case.”

Using sentence case for labels has one advantage over title case: It is slightly easier (and, thus, faster) to read. While the difference for short labels is negligible (there’s not much difference between “Full Name” and “Full name”), for longer labels, sentence case is better. Now You Know How Difficult It Is to Read Long Text in Title Case.

Avoid Using Caps For Labels

All-caps text  —  meaning text with all of the letters cap­i­tal­ized  —  is OK in contexts that don’t involve substantive reading (such as acronyms and logos), but avoid all caps otherwise. As mentioned by Miles Tinker in his work Legibility of Print, all-capital print dramatically slows the speed of scanning and reading compared to lowercase type.


All-capitalized letters


All-capitalized letters are hard to scan and read. (Large preview)

Layout

You know by now that users scan web pages, rather than read them. The same goes for filling out forms. That’s why designers should design a form that is easy to scan. Allowing for efficient, effective scanning is crucial to making the process of the filling out a form as quick as possible.

Use A Single-Column Layout

A study by CXL Institute found that single-column forms are faster to complete than multi-column forms. In that study, test participants were able to complete a single-column form an average of 15.4 seconds faster than a multi-column form.

Multiple columns disrupt a user’s vertical momentum; with multiple columns, the eyes start zigzagging. This dramatically increases the number of eye fixations and, as a result, the completion time. Moreover, multiple-column forms might raise unnecessary questions in the user, like “Where should I begin?” and “Are questions in the right column equal in importance to questions in the left one?”

In a one-column design, the eyes move in a natural direction, from top to bottom, one line at a time. This helps to set a clear path for the user. One column is excellent for mobile because the screens are longer vertically, and vertical scrolling is a natural motion for mobile users.

There are some exceptions to this rule. It’s possible to place short and logically related fields on the same row (such as for the city and area code).




If a form has horizontally adjacent fields, the user has to scan the form following a Z pattern. When the eyes start zigzagging, it slows the speed of comprehension and increases completion time. (Large preview)




(Large preview)

Create A Flow With Your Questions

The way you ask questions also matters. Questions should be asked logically from the user’s perspective, not according to the application or database’s logic, because it will help to create a sense of conversation with the user. For example, if you design a checkout form and asks for details such as full name, phone number and credit card, the first question should be for the full name. Changing the order (for example, starting with a phone number instead of a name) leads to discomfort. In real-world conversations, it would be unusual to ask for someone’s phone number before asking their name.

Defer In-Depth Questions To The End

When it comes to designing a flow for questions you want to ask, think about prioritization. Follow the rule “easy before difficult” and place in-depth or personal questions last. This eases users into the process; they will be more likely to answer complex and more intrusive questions once they’ve established a rapport. This has a scientific basis: Robert Cialdini’s principle of consistency stipulates that when someone takes a small action or step towards something, they feel more compelled to finish.

Group Related Fields Together

One of the principles of Gestalt psychology, the principle of proximity, states that related elements should be near each other. This principle can be applied to the order of questions in a form. The more related questions are, the closer they should be to each other.

Designers can group related fields into sections. If your form has more than six questions, group related questions into logical sections. Don’t forget to provide a good amount of white space between sections to distinguish them visually.




Generally, if your form has more than six questions, it’s better to group related questions into logical sections. Put things together that make sense together. (Large preview)

Make A Long Form Look Simpler

How do you design a form that asks users a lot of questions? Of course, you could put all of the questions on one screen. But this hinder your completion rate. If users don’t have enough motivation to complete a form, the form’s complexity could scare them away. The first impression plays a vital role. Generally, the longer or more complicated a form seems, the less likely users will be to start filling in the blanks.

Minimize the number of fields visible at one time. This creates the perception that the form is shorter than it really is.

There are two techniques to do this.

Progressive Disclosure

Progressive disclosure is all about giving users the right thing at the right time. The goal is to find the right stuff to put on the small screen at the right time:

  • Initially, show users only a few of the most important options.
  • Reveal parts of your form as the user interacts with it.
Using progressive disclosure to reduce cognitive load and keep the user focused on a task. (Image: Ramotion)
Chunking

Chunking entails breaking a long form into steps. It’s possible to increase the completion rate by splitting a form into a few steps. Chunking can also help users process, understand and remember information. When designing multi-step forms, always inform users of their progress with a completeness meter.




Progress tracker for e-commerce form. (Image: Murat Mutlu) (Large preview)

Designers can use either a progress tracker (as shown in the example above) or a “Step # out of #” indicator both to tell how many steps there are total and to show how far along the user is at the moment. The latter approach could be great for mobile forms because step indication doesn’t take up much space.

Action Buttons

A button is an interactive element that direct users to take an action.

Make Action Buttons Descriptive

A button’s label should explain what the button does; users should be able to understand what happens after a tap just by looking at the button. Avoid generic labels such as “Submit” and “Send”, using instead labels that describe the action.




Label should help users finish the sentence, ‘I want to…’ For example, if it’s a form to create an account, the call to action could be ‘Create an account’. (Large preview)

Don’t Use Clear Or Reset Buttons

Clear or reset buttons allow users to erase their data in a form. These buttons almost never help users and often hurt them. The risk of deleting all of the information a user has entered outweighs the small benefit of having to start again. If a user fills in a form and accidentally hits the wrong button, there’s a good chance they won’t start over.

Use Different Styles For Primary And Secondary Buttons

Avoid secondary actions if possible. But if your form has two calls to action (for example, an e-commerce form that has “Apply discount” and “Submit order”) buttons, ensure a clear visual distinction between the primary and secondary actions. Visually prioritize the primary action by adding more visual weight to the button. This will prevent users from tapping on the wrong button.




Ensure a clear visual distinction between primary and secondary buttons. (Large preview)

Design Finger-Friendly Touch Targets

Tiny touch targets create a horrible user experience because they make it challenging for users to interact with interactive objects. It’s vital to design finger-friendly touch targets: bigger input fields and buttons.

The image below shows that the width of the average adult finger is about 11 mm.




People often blame themselves for having “fat fingers”. But even baby fingers are wider than most touch targets. (Image: Microsoft) (Large preview)

According to material design guidelines, touch targets should be at least 48 × 48 DP. A touch target of this size results in a physical size of about 9 mm, regardless of screen size. It might be appropriate to use larger touch targets to accommodate a wider spectrum of users.

Not only is target size important, but sufficient space between touch targets matters, too. The main reason to maintain a safe distance between touch targets is to prevent users from touching the wrong button and invoking the wrong action. The distance between buttons becomes extremely important when binary choices such as “Agree” and “Disagree” are located right next to each other. Material design guidelines recommend separating touch targets with 8 DP of space or more, which will create balanced information density and usability.




(Large preview)

Disable Buttons After Tap

Forms actions commonly require some time to be processed. For example, data calculation might be required after a submission. It’s essential not only to provide feedback when an action is in progress, but also to disable the submit button to prevent users from accidentally tapping the button again. This is especially important for e-commerce websites and apps. By disabling the button, you not only prevent duplicate submissions, which can happen by accident, but you also provide a valuable acknowledgment to users (users will know that the system has received their submission).

This form disables the button after submission. (Image: Michaël Villar)

Assistance And Support

Provide Success State

Upon successful completion of a form, it’s critical to notify users about that. It’s possible to provide this information in the context of an existing form (for example, showing a green checkmark above the refreshed form) or to direct users to a new page that communicates that their submission has been successful.

Example of success state. (Image: João Oliveira Simões)

Errors And Validation

Users will make mistakes. It’s inevitable. It’s essential to design a user interface that supports users in those moments of failures.

While the topic of errors and validation deserves its own article, it’s still worth mentioning a few things that should be done to improve the user experience of mobile forms.

Use Input Constraints for Each Field

Prevention is better than a cure. If you’re a seasoned designer, you should be familiar with the most common cases that can lead to an error state (error-prone conditions). For example, it’s usually hard to correctly fill out a form on the first attempt, or to properly sync data when the mobile device has a poor network connection. Take these cases into account to minimize the possibility of errors. In other words, it’s better to prevent users from making errors in the first place by utilizing constraints and offering suggestions.

For instance, if you design a form that allows people to search for a hotel reservation, you should prevent users from selecting check-in dates that are in the past. As shown in the Booking.com example below, you can simply use a date selector that allows users only to choose today’s date or a date in the future. Such a selector would force users to pick a date range that fits.




You can significantly decrease the number of mistakes or incorrectly inputted data by putting constraints on what can be inputted in the field. The date picker in Booking.com’s app displays a full monthly calendar but makes past dates unavailable for selection. (Large preview)

Don’t Make Data Validation Rules Too Strict

While there might be cases where it’s essential to use strict validation rules, in most cases, strict validation is a sign of lazy programming. Showing errors on the screen when the user provides data in a slightly different format than expected creates unnecessary friction. And this would have a negative impact on conversions.

It’s very common for a few variations of an answer to a question to be possible; for example, when a form asks users to provide information about their state, and a user responds by typing their state’s abbreviation instead of the full name (for example, CA instead of California). The form should accept both formats, and it’s the developer job to convert the data into a consistent format.

Clear Error Message

When you write error messages, focus on minimizing the frustration users feel when they face a problem in interacting with a form. Here are a few rules on writing effective error messages:

  • Never blame the user.
    The way you deliver an error message can have a tremendous impact on how users perceive it. An error message like, “You’ve entered a wrong number” puts all of the blame on the user; as a result, the user might get frustrated and abandon the app. Write copy that sounds neutral or positive. A neutral message sounds like, “That number is incorrect.”
  • Avoid vague or general error messages.
    Messages like “Something went wrong. Please, try again later” don’t say much to users. Users will wonder what exactly went wrong. Always try to explain the root cause of a problem. Make sure users know how to fix errors.
  • Make error messages human-readable.
    Error messages like “User input error: 0x100999” are cryptic and scary. Write like a human, not like a robot. Use human language, and explain what exactly the user or system did wrong, and what exactly the user should do to fix the problem.
Display Errors Inline

When it comes to displaying error messages, designers opt for one of two locations: at the top of the form or inline. The first option can make for a bad experience. Javier Bargas-Avila and Glenn Oberholzer conducted research on online form validation and discovered that displaying all error messages at the top of the form puts a high cognitive load on user memory. Users need to spend extra time matching error messages with the fields that require attention.




Avoid displaying errors at the top of the form. (Image: John Lewis) (Large preview)

It’s much better to position error messages inline. First, this placement corresponds with the user’s natural top-to-bottom reading flow. Secondly, the errors will appear in the context of the user’s input.


eBay uses inline validation.


eBay uses inline validation. (Large preview)

Use Dynamic Validation

The time at which you choose to display an error message is vital. Seeing an error message only after pressing the submit button might frustrate users. Don’t wait until users finish the form; provide feedback as data is being entered.

Use inline validation with real-time feedback. This validation instantly tells people whether the information they’ve typed is compatible with the form’s requirements. In 2009, Luke Wroblewski tested inline validation against post-submission validation and found the following results for the inline version:

  • 22% increase in success rate,
  • 22% decrease in errors made,
  • 31% increase in satisfaction rating,
  • 42% decrease in completion times,
  • 47% decrease in the number of eye fixations.

But inline validation should be implemented carefully:

  • Avoid showing inline validation on focus.
    In this case, as soon as the user taps a field, they see an error message. The error appears even when the field is completely empty. When an error message is shown on focus, it might look like the form is yelling at the user before they’ve even started filling it out.
  • Don’t validate after each character typed.
    This approach not only increases the number of unnecessary validation attempts, but it also frustrates users (because users will likely see error messages before they have completed the field). Ideally, inline validation messages should appear around 500 to 1000 milliseconds after the user has stopped typing or after they’ve moved to the next field. This rule has a few exceptions: It’s helpful to validate inline as the user is typing when creating a password (to check whether the password meets complexity requirements), when creating a user name (to check whether a name is available) and when typing a message with a character limit.
Reward early, punish late is a solid validation  approach. (Image: Mihael Konjević)

Accessibility

Users of all abilities should be able to access and enjoy digital products. Designers should strive to incorporate accessibility needs as much as they can when building a product. Here are a few things you can do to make your forms more accessible.

Ensure The Form Has Proper Contrast

Your users will likely interact with your form outdoors. Ensure that it is easy to use both in sun glare and in low-light environments. Check the contrast ratio of fields and labels in your form. The W3C recommends the following contrast ratios for body text:

  • Small text should have a contrast ratio of at least 4.5:1 against its background.
  • Large text (at 14-point bold, 18-point regular and up) should have a contrast ratio of at least 3:1 against its background.

Measuring color contrast can seem overwhelming. Fortunately, some tools make the process simple. One of them is Web AIM Color Contrast Checker, which helps designers to measure contrast levels.

Do Not Rely On Color Alone To Communicate Status

Color blindness (or color vision deficiency) affects approximately 1 in 12 men (8%) and 1 in 200 women in the world. While there are many types of color blindness, the most common two are protanomaly, or reduced sensitivity to red light, and deuteranomaly, or reduced sensitivity to green light. When displaying validation errors or success messages, don’t rely on color alone to communicate the status (i.e. by making input fields green or red). As the W3C guidelines state, color shouldn’t be used as the only visual means of conveying information, indicating an action, prompting a response or distinguishing a visual element. Designers should use color to highlight or complement what is already visible. Support colorblind people by providing additional visual cues that help them understand the user interface.


Use icons and supportive text to show which fields are invalid. This will help colorblind people fix the problems.


Use icons and supportive text to show which fields are invalid. This will help colorblind people fix the problems. (Large preview)

Allow Users To Control Font Size

Allow users to increase font size to improve readability. Mobile devices and browsers include features to enable users to adjust the font size system-wide. Also, make sure that your form has allotted enough space for large font sizes.


WhatsApp provides an option to change the font size in the app’s settings


WhatsApp provides an option to change the font size in the app’s settings. (Large preview)

Test Your Design Decisions

All points mentioned above can be considered as industry best practices. But just because something is called a “best practice” doesn’t mean it is always the optimal solution for your form. Apps and websites largely depend on the context in which they are used. Thus, it’s always essential to test your design decisions; make sure that the process of filling out a form is smooth, that the flow is not disrupted and that users can solve any problems they face along the way. Conduct usability testing sessions on a regular basis, collect all valuable data about user interactions, and learn from it.

Conclusion

Users can be hesitant to fill out forms. So, our goal as designers is to make the process of filling out a form as easy as possible. When designing a form, strive to create fast and frictionless interactions. Sometimes a minor change — such as properly writing an error message — can significantly increase the form’s usability.

his article is part of the UX design series sponsored by Adobe. Adobe XD tool is made for a fast and fluid UX design process, as it lets you go from idea to prototype faster. Design, prototype and share — all in one app. You can check out more inspiring projects created with Adobe XD on Behance, and also sign up for the Adobe experience design newsletter to stay updated and informed on the latest trends and insights for UX/UI design.

Smashing Editorial
(al, yk, il)


Excerpt from: 

Best Practices For Mobile Form Design

Thumbnail

How to Use a Website Click Tracking Tool to Improve the User Experience

When it comes to understanding your audience, you can’t get more granular than a website click tracking tool. Instead of looking at big picture metrics, you can drill down to the basics and get to know what works with your audience — and what doesn’t. While many site tracking tools exist, website click tracking tools offer the most depth when you want to better understand user behavior. To see what I mean, visit a website you’ve never been to before. Just Google a broad topic such as “marathon training” or “best thriller novels.” It doesn’t matter. Click on one of…

The post How to Use a Website Click Tracking Tool to Improve the User Experience appeared first on The Daily Egg.

Jump to original: 

How to Use a Website Click Tracking Tool to Improve the User Experience

Thumbnail

How to Track User Behavior on a Website Using CRO Tools

how-track-user-behaviour

You have the ability to compile more data than you could ever need about your website visitors using conversion rate optimization (CRO) tools. But why should you care about user behavior? After all, CRO is just about maximizing clicks, conversions, and sales, right? Why should it matter what happens in between those actions? I’m here to tell you that it matters — a lot. User behavior gives you insight into how your website visitors act, think, and make decisions. While they’re contemplating a decision — for instance, about whether or not to sign up for your email newsletter — you…

The post How to Track User Behavior on a Website Using CRO Tools appeared first on The Daily Egg.

Read original article:

How to Track User Behavior on a Website Using CRO Tools

Thumbnail

Introducing “The WebP Manual”




Introducing “The WebP Manual”

Markus Seyfferth



What’s WebP in the first place? Can we actually use it today? And if yes, how exactly? The role of media in performance, specifically images, is of huge concern. Images are powerful. Engaging visuals evoke visceral feelings. They can provide key information and context to articles, or merely add humorous asides. They do anything for us that plain text just can’t by itself.

But when there’s too much imagery, it can be frustrating for users on slow connections, or run afoul of data plan allowances. In the latter scenario, that can cost users real money. This sort of inadvertent trespass can carry real consequences.

In this eBook, you’ll learn all about WebP: what it’s capable of, how it performs, how to convert images to the format in a variety of ways, and most importantly, how to use it. Of course — the eBook is — and always will be, free for all Smashing Members.

84 pages. Written by Jeremy Wagner. Cover Design by Ricardo Gimenes. Available in PDF, Kindle, and ePub formats.

Smashing Book 6
$14.90Get the eBook

PDF, ePUB, Kindle.

$0.00 $14.90 Free for Members →

…along with 12 webinars and 56 other eBooks.

What’s In The eBook

This guide will encourage you to experiment and see what’s possible with WebP:

  • WebP Basics
    WebP images usually use less disk space when compared to other formats at reasonably comparable visual similarity. Depending on your site’s audience and the browsers they use, this is an opportunity to deliver less data-intensive user experiences for a significant segment of your audience.

  • Performance
    We’ll cover how both lossy and lossless WebP compare to JPEGs and PNGs exported by a number of image encoders.

  • Converting Images To WebP (Excerpt)
    This can be done in a myriad of ways, from something as simple as exporting from your preferred design program, by using Cloudinary and similar services, and even in Node.js-based build systems. Here, we’ll cover all avenues.

  • Using WebP Images
    Because WebP isn’t supported in all browsers just yet, you’ll need to learn how to use it that sites and applications gracefully fall back to established formats when WebP support is lacking. Here, we’ll discuss the many ways you can use WebP responsibly, starting by detecting browser support in the Accept request header.

About The Author

Dan Mall
Jeremy Wagner is a performance-obsessed front-end developer, author and speaker living and working in the frozen wastes of Saint Paul, Minnesota. He is also the author of Web Performance in Action, a web developer’s companion guide for creating fast websites. You can find him on Twitter @malchata, or read his blog of ramblings.

Here’s Why This eBook Is For You

The WebP Manual will get you ready for the new image format that is capable to significantly less data-intensive user experiences for a majority of your audience:

  • Learn how lossy and lossless WebP compare to JPEGs and PNGs exported by a number of image encoders.
  • Learn which services and plugins you can use to export or convert images to WebP with your preferred design tool or command line tool.
  • Learn how to can use WebP in production, and how to implement proper fallbacks for browsers that don’t support WebP just yet.
  • Learn how to use the full potential of the WebP format. It will substantially improve loading performance for many of your users, customers, and clients, and it will become one of your favorite tools for making websites as lean as possible.

The eBook is free for Smashing Members (you can cancel anytime, of course).

Smashing Book 6
$14.90Get the eBook

PDF, ePUB, Kindle.

$0.00 $14.90 Free for Members →

…along with 12 webinars and 56 other eBooks.


Visit link – 

Introducing “The WebP Manual”

Creative Data Visualization Techniques

Full-day workshop • June 28th
With so many tools available to visualize your data, it’s easy to get stuck in thinking about chart types, always just going for that bar or line chart, without truly thinking about effectiveness. In this workshop, Nadieh will teach you how you can take a more creative and practical approach to the design of data visualization.
Broaden your horizon on what is possible beyond the basic set of charts.

Follow this link:

Creative Data Visualization Techniques

Thumbnail

Building A Pub/Sub Service In-House Using Node.js And Redis




Building A Pub/Sub Service In-House Using Node.js And Redis

Dhimil Gosalia



Today’s world operates in real time. Whether it’s trading stock or ordering food, consumers today expect immediate results. Likewise, we all expect to know things immediately — whether it’s in news or sports. Zero, in other words, is the new hero.

This applies to software developers as well — arguably some of the most impatient people! Before diving into BrowserStack’s story, it would be remiss of me not to provide some background about Pub/Sub. For those of you who are familiar with the basics, feel free to skip the next two paragraphs.

Many applications today rely on real-time data transfer. Let’s look closer at an example: social networks. The likes of Facebook and Twitter generate relevant feeds, and you (via their app) consume it and spy on your friends. They accomplish this with a messaging feature, wherein if a user generates data, it will be posted for others to consume in nothing short of a blink. Any significant delays and users will complain, usage will drop, and if it persists, churn out. The stakes are high, and so are user expectations. So how do services like WhatsApp, Facebook, TD Ameritrade, Wall Street Journal and GrubHub support high volumes of real-time data transfers?

All of them use a similar software architecture at a high level called a “Publish-Subscribe” model, commonly referred to as Pub/Sub.

“In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.“

Wikipedia

Bored by the definition? Back to our story.

At BrowserStack, all of our products support (in one way or another) software with a substantial real-time dependency component — whether its automate tests logs, freshly baked browser screenshots, or 15fps mobile streaming.

In such cases, if a single message drops, a customer can lose information vital for preventing a bug. Therefore, we needed to scale for varied data size requirements. For example, with device logger services at a given point of time, there may be 50MB of data generated under a single message. Sizes like this could crash the browser. Not to mention that BrowserStack’s system would need to scale for additional products in the future.

As the size of data for each message differs from a few bytes to up to 100MB, we needed a scalable solution that could support a multitude of scenarios. In other words, we sought a sword that could cut all cakes. In this article, I will discuss the why, how, and results of building our Pub/Sub service in-house.

Through the lens of BrowserStack’s real-world problem, you will get a deeper understanding of the requirements and process of building your very own Pub/Sub.

Our Need For A Pub/Sub Service

BrowserStack has around 100M+ messages, each of which is somewhere between approximately 2 bytes and 100+ MB. These are passed around the world at any moment, all at different Internet speeds.

The largest generators of these messages, by message size, are our BrowserStack Automate products. Both have real-time dashboards displaying all requests and responses for each command of a user test. So, if someone runs a test with 100 requests where the average request-response size is 10 bytes, this transmits 1×100×10 = 1000 bytes.

Now let’s consider the larger picture as — of course — we don’t run just one test a day. More than approximately 850,000 BrowserStack Automate and App Automate tests are run with BrowserStack each and every day. And yes, we average around 235 request-response per test. Since users can take screenshots or ask for page sources in Selenium, our average request-response size is approximately 220 bytes.

So, going back to our calculator:

850,000×235×220 = 43,945,000,000 bytes (approx.) or only 43.945GB per day

Now let’s talk about BrowserStack Live and App Live. Surely we have Automate as our winner in form of size of data. However, Live products take the lead when it comes to the number of messages passed. For every live test, about 20 messages are passed each minute it turns. We run around 100,000 live tests, which each test averaging around 12 mins meaning:

100,000×12×20 = 24,000,000 messages per day

Now for the awesome and remarkable bit: We build, run, and maintain the application for this called pusher with 6 t1.micro instances of ec2. The cost of running the service? About $70 per month.

Choosing To Build vs. Buying

First things first: As a startup, like most others, we were always excited to build things in-house. But we still evaluated a few services out there. The primary requirements we had were:

  1. Reliability and stability,
  2. High performance, and
  3. Cost-effectiveness.

Let’s leave the cost-effectiveness criteria out, as I can’t think of any external services that cost under $70 a month (tweet me if know you one that does!). So our answer there is obvious.

In terms of reliability and stability, we found companies that provided Pub/Sub as a service with 99.9+ percent uptime SLA, but there were many T&C’s attached. The problem is not as simple as you think, especially when you consider the vast lands of the open Internet that lie between the system and client. Anyone familiar with Internet infrastructure knows stable connectivity is the biggest challenge. Additionally, the amount of data sent depends on traffic. For example, a data pipe that’s at zero for one minute may burst during the next. Services providing adequate reliability during such burst moments are rare (Google and Amazon).

Performance for our project means obtaining and sending data to all listening nodes at near zero latency. At BrowserStack, we utilize cloud services (AWS) along with co-location hosting. However, our publishers and/or subscribers could be placed anywhere. For example, it may involve an AWS application server generating much-needed log data, or terminals (machines where users can securely connect for testing). Coming back to the open Internet issue again, if we were to reduce our risk we would have to ensure our Pub/Sub leveraged the best host services and AWS.

Another essential requirement was the ability to transmit all types of data (Bytes, text, weird media data, etc.). With all considered, it did not make sense to rely on a third-party solution to support our products. In turn, we decided to revive our startup spirit, rolling up our sleeves to code our own solution.

Building Our Solution

Pub/Sub by design means there will be a publisher, generating and sending data, and a Subscriber accepting and processing it. This is similar to a radio: A radio channel broadcasts (publishes) content everywhere within a range. As a subscriber, you can decide whether to tune into that channel and listen (or turn off your radio altogether). 

Unlike the radio analogy where data is free for all and anyone can decide to tune in, in our digital scenario we need authentication which means data generated by the publisher could only be for a single particular client or subscriber.


Basic working of Pub/Sub


Basic working of Pub/Sub (Large preview)

Above is a diagram providing an example of a good Pub/Sub with:

  • Publishers
    Here we have two publishers generating messages based on pre-defined logic. In our radio analogy, these are our radio jockeys creating the content.
  • Topics
    There are two here, meaning there are two types of data. We can say these are our radio channels 1 and 2.
  • Subscribers
    We have three that each read data on a particular topic. One thing to notice is that Subscriber 2 is reading from multiple topics. In our radio analogy, these are the people who are tuned into a radio channel. 

Let’s start understanding the necessary requirements for the service.

  1. An evented component
    This kicks in only when there is something to kick in.
  2. Transient storage
    This keeps data persisted for a short duration so if the subscriber is slow, it still has a window to consume it.
  3. Reducing the latency
    Connecting two entities over a network with minimum hops and distance.

We picked a technology stack that fulfilled the above requirements:

  1. Node.js
    Because why not? Evented, we wouldn’t need heavy data processing, plus it’s easy to onboard.
  2. Redis
    Supports perfectly short-lived data. It has all the capabilities to initiate, update and auto-expire. It also puts less load on the application.

Node.js For Business Logic Connectivity

Node.js is a nearly perfect language when it comes to writing code incorporating IO and events. Our particular given problem had both, making this option the most practical for our needs.

Surely other languages such as Java could be more optimized, or a language like Python offers scalability. However, the cost of starting with these languages is so high that a developer could finish writing code in Node in the same duration. 

To be honest, if the service had a chance of adding more complicated features, we could have looked at other languages or a completed stack. But here it is a marriage made in heaven. Here is our package.json:


  "name": "Pusher",
  "version": "1.0.0",
  "dependencies": 
    "bstack-analytics": "*****", // Hidden for BrowserStack reasons. :)
    "ioredis": "^2.5.0",
    "socket.io": "^1.4.4"
  ,
  "devDependencies": {},
  "scripts": 
    "start": "node server.js"
  
}

Very simply put, we believe in minimalism especially when it comes to writing code. On the other hand, we could have used libraries like Express to write extensible code for this project. However, our startup instincts decided to pass on this and to save it for the next project. Additional tools we used:

  • ioredis
    This is one of the most supported libraries for Redis connectivity with Node.js used by companies including Alibaba.
  • socket.io
    The best library for graceful connectivity and fallback with WebSocket and HTTP.

Redis For Transient Storage

Redis as a service scales is heavily reliable and configurable. Plus there are many reliable managed service providers for Redis, including AWS. Even if you don’t want to use a provider, Redis is easy to get started with.

Let’s break down the configurable part. We started off with the usual master-slave configuration, but Redis also comes with cluster or sentinel modes. Every mode has its own advantages.

If we could share the data in some way, a Redis cluster would be the best choice. But if we shared the data by any heuristics, we have less flexibility as the heuristic has to be followed across. Fewer rules, more control is good for life!

Redis Sentinel works best for us as data lookup is done in just one node, connecting at a given point in time while data is not sharded. This also means that even if multiple nodes are lost, the data is still distributed and present in other nodes. So you have more HA and less chances of loss. Of course, this removed the pros from having a cluster, but our use case is different.

Architecture At 30000 Feet

The diagram below provides a very high-level picture of how our Automate and App Automate dashboards work. Remember the real-time system that we had from the earlier section?


BrowserStack’s real-time Automate and App Automate dashboards.


BrowserStack’s real-time Automate and App Automate dashboards (Large preview)

In our diagram, our main workflow is highlighted with thicker borders. The “automate” section consists of:

  1. Terminals
    Comprised of the pristine versions of Windows, OSX, Android or iOS that you get while testing on BrowserStack.
  2. Hub
    The point of contact for all your Selenium and Appium tests with BrowserStack.

The “user service” section here is our gatekeeper, ensuring data is sent to and saved for the right individual. It is also our security keeper. The “pusher” section incorporates the heart of what we discussed in this article. It consists of the usual suspects including:

  1. Redis
    Our transient storage for messages, where in our case automate logs are temporarily stored.
  2. Publisher
    This is basically the entity that obtains data from the hub. All your request responses are captured by this component which writes to Redis with session_id as the channel.
  3. Subscriber
    This reads data from Redis generated for the session_id. It is also the web server for clients to connect via WebSocket (or HTTP) to get data and then sends it to authenticated clients.

Finally, we have the user’s browser section, representing an authenticated WebSocket connection to ensure session_id logs are sent. This enables the front-end JS to parse and beautify it for users.

Similar to the logs service, we have pusher here that is being used for other product integrations. Instead of session_id, we use another form of ID to represent that channel. This all works out of pusher!

Conclusion (TLDR)

We’ve had considerable success in building out Pub/Sub. To sum up why we built it in-house:

  1. Scales better for our needs;
  2. Cheaper than outsourced services;
  3. Full control over the overall architecture.

Not to mention that JS is the perfect fit for this kind of scenario. Event loop and massive amount of IO is what the problem needs! JavaScript is magic of single pseudo thread. 

Events and Redis as a system keep things simple for developers, as you can obtain data from one source and push it to another via Redis. So we built it.

If the usage fits into your system, I recommend doing the same!

Smashing Editorial
(rb, ra, il)


See original article: 

Building A Pub/Sub Service In-House Using Node.js And Redis