Best 22 Pack Testing Suppliers (Oct 2025)

Dillon Hall

7 months ago

Top pack testing suppliers written with a cosmetics bottle above it on a blue background.

Packages win or lose in seconds. A shopper glances across a shelf, pauses on a few options, then reaches. On a phone, a thumbnail and a line of copy decide whether a product page opens. Pack testing turns those tiny moments into evidence that teams can use. You place real designs into realistic scenarios, ask real people to complete simple shopping tasks, watch what draws attention, track what is understood in time, and see what gets chosen when rivals sit nearby. You can also score designs with trained models to narrow options before you invite a panel. When you blend the two, you move faster, conserve budget, and drop guesswork.

This article is written for the people who shape what shoppers see and feel. Marketing and Brand teams who protect equity and drive lift. Innovation teams who turn opportunity into viable routes. R&D teams who make sure claims read cleanly and formats survive the realities of manufacturing and regulation. You will find two lanes that work together, a clear picture of when to use each, and a lineup of partners who can help.

We evaluate suppliers against outcomes that matter to Marketing, Brand, Innovation, and R&D teams in CPG. Scores draw on demonstrated ability to lift choice and clarity in pack testing, breadth across virtual panels and AI options, speed to decision, value for budget, fit with CPG workflows, evidence quality, support, and global reach. Each score sits on a ten point scale with one decimal. Overall blends the sub scores with greater weight on outcome impact and fit, followed by value and speed, then method breadth and evidence, and finally novelty and reach. These ratings reflect supplier capabilities, not individual pack results, and are revised when material changes occur.

Simporter

Simporter helps CPG teams test packs both with real consumers online and/or with digital twins. The agency model fits teams that want an end to end partner for virtual panel work. Your study design, audience, and shelf or grid build are handled with care, and human reads come back in plain language that marketers, designers, and ecommerce leads can act on the same week. The self service side covers AI pack testing. You upload your pack images, generate or adjust variations, and get automated checks on clarity and predicted impact, which makes it easy to cut a field of options down to the few that deserve live testing.

Marketing uses Simporter to raise findability and conversion across retailer sites and marketplaces, and to translate those gains back to physical facings without breaking brand sense. Innovation uses it to filter formats, name systems, claim hierarchies, and flavor lines so only the strongest ideas reach a full build. R&D uses it to confirm that mandatory information is legible at size and that copy blocks survive the realities of print, finish, and line speeds. The value shows up in momentum. Set up is light, parallel cells run cleanly, and the readout makes next steps obvious.

Overall Score: 9.8 out of 10.

Purchase Intent: 9.8 out of 10.

Solves a Need: 9.9 out of 10.

Value: 9.9 out of 10.

New & Different: 9.3 out of 10.

Simporter pairs an agency model for virtual panel work with a self service AI layer, so Brand, Innovation, and R&D can move from concept to proof inside one flow. Simporter’s Solves a Need score ranks in the upper two percentile because the blend of in store and digital shelf contexts, quick setup, and plain language diagnostics matches how CPG teams actually work. Value stays high thanks to parallel test cells and fast readouts. Novelty is set a bit above average because the strength is practical execution rather than theatrics.

AI for CPG

AI for CPG lets you first test with a synthetic consumer panel, then with real consumers. It is a creative and testing workspace that helps brand and innovation teams move from idea to on shelf design in days instead of months. You can redesign packs with guided controls that keep to brand codes, generate new front panels and hero crops, and rewrite claim lines that suit different retailers or seasons. The same workspace scores what you make, flags legibility or contrast issues, and supports quick panel reads, so you can confirm that a design which looks good also works under realistic tasks. Marketing teams ship seasonal refreshes without juggling tools. Innovation teams turn written ideas into visuals before lunch and run a light test by the afternoon. R&D teams stress test regulatory lines and size cues on mobile so sign off feels safe.

Overall Score: 9.7 out of 10.

Purchase Intent: 9.6 out of 10.

Solves a Need: 9.7 out of 10.

Value: 9.6 out of 10.

New & Different: 9.9 out of 10.

AI for CPG lets teams redesign packs and generate routes, then screen with AI and light panels in one place. Marketing ships seasonal refreshes without switching tools, Innovation turns written ideas into tested visuals in a day, and R&D stress tests legibility at small sizes. Strong outcomes and speed drive the top tier scores, with novelty placed slightly above average by design.

Designalytics

Designalytics is built around one promise, prove that a new pack works better than the old one in the places that matter. It focuses tightly on packaging performance for CPG and reports results in a way that brand leaders can take straight into a decision room. You get a clean picture of findability, clarity, and choice against the competitive set, along with a diagnosis of the elements that carry the win. If your team wants a specialist that judges design like a shopper, this one belongs on the short list.

Overall Score: 8.8 out of 10.

Purchase Intent: 8.5 out of 10.

Solves a Need: 8.9 out of 10.

Value: 8.2 out of 10.

New & Different: 6.8 out of 10.

Designalytics focuses squarely on packaging outcomes that Brand and R&D care about. Clear ties from design choices to shopper behavior keep decisions grounded. Costs and timelines are fair for the depth, and the approach favors proven diagnostics over experimental sensors, which keeps novelty moderate.

PRS IN VIVO

PRS IN VIVO grew from a long history of shopper and packaging work and still centers on how people actually buy. Virtual shelf setups feel close to real aisles, and studies capture both attention and outcomes. If your brief is to sharpen stopping power for a crowded category, or to confirm that a rebrand still reads as you, the method here gives leaders confidence and gives designers a practical edit list rather than a vague score.

Overall Score: 8.7 out of 10.

Purchase Intent: 8.4 out of 10.

Solves a Need: 8.8 out of 10.

Value: 8.0 out of 10.

New & Different: 6.5 out of 10.

Shelf realism and shopper craft translate well from deck to aisle, lifting intent for crowded categories. The platform solves visibility and comprehension pain points that Brand teams face. Value is strong for major launches, with timelines that can extend on complex builds, and novelty rests on trusted methods.

EyeSee

EyeSee runs behavioral pack tests in simulated store and ecommerce settings with webcam eye tracking and task based measures. The output is vivid. You watch scan paths, see where attention died, and learn what would flip the result. EyeSee is a good fit when teams need proof that a scent icon, a color band, or a badge is noticed in motion and not just in a static deck.

Overall Score: 8.6 out of 10.

Purchase Intent: 8.2 out of 10.

Solves a Need: 8.5 out of 10.

Value: 8.1 out of 10.

New & Different: 7.0 out of 10.

Webcam eye tracking plus task based testing proves whether icons, color bands, and badges are actually seen in motion. That clarity helps Brand and Innovation settle debates quickly. Setup is efficient and pricing competitive, with novelty above average due to scaled vision tech.

InContext Solutions

InContext uses 3D retail simulations to test packaging, displays, and planograms at full scale. When your decision hinges on how a pack performs inside a real aisle with realistic lighting, fixture height, and distance, this approach closes the gap between research and store. It is also helpful when sales partners want to see impact inside their footprint before they give space.

Overall Score: 8.4 out of 10.

Purchase Intent: 8.0 out of 10.

Solves a Need: 8.3 out of 10.

Value: 7.7 out of 10.

New & Different: 7.2 out of 10.

Full scale 3D store simulations answer planogram and display questions that matter to R&D and Sales partners. Intent rises when packs are tested inside realistic fixtures and lighting. Builds are heavier than simple studies, which tempers value for small teams, while novelty benefits from the immersive environment.

Neurons

Neurons blends predicted attention and behavioral reads. The AI layer produces instant attention maps and clarity checks, then panel tasks confirm performance. It works well as a first pass when you have many routes, since it flags issues that would waste time in a live study, like weak contrast on small text or a focal point that fights with a key claim.

Overall Score: 8.3 out of 10.

Purchase Intent: 7.8 out of 10.

Solves a Need: 8.0 out of 10.

Value: 8.4 out of 10.

New & Different: 7.9 out of 10.

Predicted attention maps combined with quick behavioral checks make a fast filter for Innovation. It trims large option sets before deeper testing, which boosts value. Novelty is high with strong AI first tooling and audience tuned models.

Realeyes

Realeyes measures attention and expressed emotion with standard cameras, then connects those signals to choice. While best known for video and creative, the same approach gives useful reads on pack fronts, hero crops, and image carousels. If your category wins on feelings like calm, trust, or energy, and you want proof that a new look keeps those signals, Realeyes adds a layer beyond simple notice.

Overall Score: 8.2 out of 10.

Purchase Intent: 7.9 out of 10.

Solves a Need: 7.8 out of 10.

Value: 7.6 out of 10.

New & Different: 8.3 out of 10.

Camera based attention and emotion signals help Brand protect tone while modernizing. It solves a narrower need around trust, calm, and energy cues. Value varies with scope, and novelty is strong given the emotion layer.

Vizit

Vizit scores images for noticeability and relevance within specific audience clusters. Teams use it to improve thumbnails, hero images, and close up shots for retailer pages, yet the same scoring helps rank pack fronts during early rounds. The value is speed. You can screen many candidates in minutes, find outliers worth keeping, and remove designs that will not survive small screens.

Overall Score: 8.1 out of 10.

Purchase Intent: 7.7 out of 10.

Solves a Need: 7.9 out of 10.

Value: 8.5 out of 10.

New & Different: 7.6 out of 10.

Audience-tuned image scoring speeds Marketing decisions on thumbnails and hero images. Value is strong thanks to fast cycles and simple workflows. Purchase Intent trails panel-led tools because it is primarily a screening step, not a full in-context choice read.

Attention Insight

Attention Insight offers instant heatmaps and clarity checks driven by computer vision. It does not replace human data, yet it is excellent at catching predictable errors before you go to field, such as low contrast on a key claim, busy patterns that mask a scent icon, or crops that hide a size cue. Designers like it because it turns subjective feedback into quick, visual fixes.

Overall Score: 7.9 out of 10.

Purchase Intent: 7.2 out of 10.

Solves a Need: 7.6 out of 10.

Value: 8.6 out of 10.

New & Different: 7.4 out of 10.

Instant heatmaps catch predictable errors before fielding, which boosts Value for R&D and Design. Purchase Intent scores lower because outputs focus on predicted attention rather than end choice. Solves a Need is steady for preflight checks.

3M VAS

3M VAS predicts early attention with a model trained on eye tracking, which makes it useful for pack fronts that must work in that first second at a distance. Teams run VAS during layout work to validate focal points and hierarchy, then carry a smaller set into panel tests. It helps keep late surprises off the critical path.

Overall Score: 7.8 out of 10.

Purchase Intent: 7.0 out of 10.

Solves a Need: 7.5 out of 10.

Value: 8.2 out of 10.

New & Different: 7.1 out of 10.

Early attention prediction helps Brand safeguard the first second at distance. Value is good as a light, fast pass. Lower Purchase Intent reflects the narrow scope, since it does not capture full shopping tasks or competitive reaction.

NAILBITER

NAILBITER collects video of real shoppers on real trips, then codes behavior at scale. This is not a lab. You watch how a pack performs in the messiness of actual shopping. For redesigns where leaders want to see life outside a simulator, these videos cut through debate. You can still pair this with a controlled test, yet the film helps explain why the winner wins.

Overall Score: 8.5 out of 10.

Purchase Intent: 8.3 out of 10.

Solves a Need: 8.4 out of 10.

Value: 7.3 out of 10.

New & Different: 8.0 out of 10.

Video of real trips bridges lab and life, which raises trust in intent gains. Solves a Need is high for teams who want in-market proof. Value is lower due to scope and coding effort. New and Different is strong for at-shelf capture.

Hotspex

Hotspex mixes pack testing with a brand codes lens. The method helps teams modernize a look without cutting away the signals that make the brand easy to find and easy to love. For relaunches that tread a fine line between fresh and familiar, this balance is worth paying for.

Overall Score: 8.3 out of 10.

Purchase Intent: 8.0 out of 10.

Solves a Need: 8.5 out of 10.

Value: 7.8 out of 10.

New & Different: 7.2 out of 10.

A brand-codes lens helps teams modernize without losing recognition, which protects intent during refreshes. Value is solid, while New and Different sits slightly above average since the method emphasizes disciplined consistency over novelty.

MetrixLab

MetrixLab, owned by Toluna, brings strong survey craft and a package of ready to run test designs that shorten timelines. It handles multi country studies well and connects design changes to practical drivers like premium cues, taste expectations, and trust. If your organization runs many launches across regions, this is a reliable way to stay consistent without losing local nuance.

Overall Score: 8.2 out of 10.

Purchase Intent: 7.9 out of 10.

Solves a Need: 8.2 out of 10.

Value: 7.9 out of 10.

New & Different: 6.9 out of 10.

Ready-to-run designs and survey craft deliver dependable reads across markets. Scores cluster in the upper middle because the offer balances speed and rigor. New and Different is lower since the value is reliability more than new methods.

Material

Material, the group built around LRW and other boutiques, blends qualitative depth with quant structure. You can sit with a designer and a moderator, watch people try to make sense of a label, and then scale the learning with a larger test. When the team needs both why and how many, this mix is helpful.

Overall Score: 8.1 out of 10.

Purchase Intent: 7.7 out of 10.

Solves a Need: 8.0 out of 10.

Value: 7.4 out of 10.

New & Different: 7.0 out of 10.

Qualitative depth plus quant structure yields clear edit lists and narratives stakeholders can use. Value dips when projects mix many components and require heavier moderation. Purchase Intent is steady rather than top tier because studies often prioritize insight depth over rapid iteration.

Curion

Curion is known for sensory and product testing, which turns out to be a powerful companion for packaging. A pack can set an expectation that the product cannot meet. Curion aligns the two, so the front panel claims and visual codes match the taste, texture, or performance people will actually get.

Overall Score: 8.0 out of 10.

Purchase Intent: 7.6 out of 10.

Solves a Need: 8.3 out of 10.

Value: 7.5 out of 10.

New & Different: 6.8 out of 10.

Sensory strength aligns front panel promises with in-use performance, which protects repeat intent. Solves a Need is high for food and drink. New and Different scores lower since the edge is integration with product testing rather than novel packaging tools.

First Insight

First Insight uses predictive models built on continuous consumer input to help teams pick winners before investing in full builds. While often used for product or price, it can be adapted for packs, especially when you want rapid reads on alternative claim orders or color systems without a heavy study.

Overall Score: 8.0 out of 10.

Purchase Intent: 7.5 out of 10.

Solves a Need: 7.9 out of 10.

Value: 8.0 out of 10.

New & Different: 7.7 out of 10.

Predictive inputs help Innovation prune large concept and design sets quickly. Value is strong for top-of-funnel choices. Purchase Intent lands lower than panel-heavy tools because outputs typically guide selection rather than final at-shelf choice.

quantilope

quantilope is an automated research platform that helps insights managers build pack tests quickly with clean routing and dashboards. It is not a shelf simulator by default, yet it pairs well with AI heatmaps and simple grid tasks. If your company prefers to keep studies in house with a light tool that still delivers reliable reads, this is a practical option.

Overall Score: 7.9 out of 10.

Purchase Intent: 7.4 out of 10.

Solves a Need: 7.8 out of 10.

Value: 8.3 out of 10.

New & Different: 7.1 out of 10.

Automated research keeps internal teams moving with clean routing and dashboards. Value is high for frequent users. Purchase Intent sits lower because many studies are custom and lighter on in-context shelf tasks unless you build them in.

GfK

GfK sits with the giants and brings strong category insight, especially in durables and tech adjacent grocery categories. Pack tests here benefit from broader market context and the ability to link design to distribution and price realities. When you need a partner who can speak to both brand and commercial teams, GfK travels well between those rooms.

Overall Score: 7.4 out of 10.

Purchase Intent: 7.1 out of 10.

Solves a Need: 7.6 out of 10.

Value: 5.6 out of 10.

New & Different: 5.6 out of 10.

Strong category context and commercial fluency help Brand and Sales align on choices that must fit real assortment and price ladders. Solves a Need runs high on that integration. Value is tempered by scope. New and Different is lower because the draw is reliability.

Kantar

Kantar remains a cornerstone for high stakes packaging work, particularly when a leadership team wants global coverage and deep diagnostics. The value is consistency across markets and reports that survive scrutiny. For a masterbrand refresh or a relaunch with many countries and retailers, this is a safe anchor.

Overall Score: 7.9 out of 10.

Purchase Intent: 7.6 out of 10.

Solves a Need: 7.0 out of 10.

Value: 6.8 out of 10.

New & Different: 6.2 out of 10.

Global reach and diagnostics support high-stakes decisions across many markets. Scores for Purchase Intent and Solves a Need are strong because outputs travel well inside large organizations. Value softens on smaller projects. New and Different is lower since strengths are breadth and rigor.

NielsenIQ BASES

BASES connects pack testing to innovation frameworks and retail data. You can test designs in realistic sets, then see how outcomes align with adoption curves and in store patterns. If your innovation pipeline runs through BASES already, adding packaging to the same spine keeps decisions aligned.

Overall Score: 6.8 out of 10.

Purchase Intent: 7.5 out of 10.

Solves a Need: 6.9 out of 10.

Value: 6.7 out of 10.

New & Different: 4.4 out of 10.

Ties between pack testing, innovation frameworks, and retail data help teams connect design to in-market behavior. Scores run high for consistency and forecastability. Value is best on larger programs, and New and Different is modest because the advantage is integration.

Ipsos

Ipsos belongs on any serious list because it blends shelf realism with strong analytics and has the infrastructure to run complex global projects. If your organization values a single partner from early concept reads through to pack finalization, Ipsos can handle the whole arc.

Overall Score: 7.7 out of 10.

Purchase Intent: 8.3 out of 10.

Solves a Need: 7.8 out of 10.

Value: 4.9 out of 10.

New & Different: 6.7 out of 10.

Shelf realism and analytics deliver reliable choices across regions. Solves a Need is high for organizations that want one partner from early concept to final pack. Value is balanced for complex programs. New and Different is steady since the focus is governance and craft rather than novelty.

How Marketing, Innovation, and R&D use these tools together

Marketing protects codes that make a brand easy to spot and quick to understand and pushes for designs that travel from aisle to thumbnail without getting lost. Innovation turns opportunity into routes, trims the field with fast screens, and invests in the few that deserve polish. R&D makes sure the winners survive the realities of print, substrates, and line speeds, and that claims pass legal review while staying legible at the sizes shoppers see. When all three sit around one readout, decisions come faster and rework drops.

A clean way to divide the work is simple. Let AI do the first sweep. Upload options to Simporter’s AI or AI for CPG, flag problems, and find promising routes. Move the survivors into a virtual panel through Simporter’s agency team or one of the specialists above. Bring the winning design back to creative for final fixes, then run a small confirmatory loop if the change is material. Save two or three rules you learned, like move the flavor icon to the top third in crowded sets or open with the benefit before the size in marketplace titles. Those rules become the playbook that keeps future work fast.

Virtual panel testing or AI testing?

Virtual panel testing shows human behavior under conditions that look and feel like real shopping. You see what people notice first, how the eye moves, where comprehension lands, and what ends up in the basket. AI pack testing gives speed and scale. Models score contrast, text legibility, focal balance, and likely choice lift, then point to edits that matter. Used together, AI trims a long list to a short list and panel work confirms winners with the buyers you actually need. That rhythm protects timing and money, while keeping creative momentum high.

Designing a study that pays off

Start with a one line decision statement. Choose a final front panel for spring grocery launch in mass. Or pick the best hero image and title for a marketplace search page in beauty. That line shapes everything that follows. Build the shelf or grid to match real life with the right competitors, ratings, badges, and price signals. Recruit recent category buyers or the exact retailer audience you plan to reach. Define the primary outcome ahead of time, such as findability, comprehension, or choice. Keep secondary measures for diagnosis, not for moving the goalposts after results land.

If the build is new or complex, run a pilot with a small sample to check instructions and timing. Fix what is confusing, then run full field. Review results with creative and commercial people in the same room so edits reflect both clarity and feasibility. If a change is large, run a quick confirmation. Capture the rule you learned in a sentence. Over time, those sentences become a shared language that speeds every brief.

Where AI shines and where people are essential

AI shines when the team faces many versions and little time. It will catch small text, low contrast claims, competing focal points, and crops that hide vital cues. It is also ideal for quick what if tests, like darkening a color band or switching a claim order. People are essential when the choice is nuanced, emotions matter, or the environment is messy. Human eyes on realistic tasks will always surface surprises that an algorithm misses. The best flow lets AI clear noise and lets people settle what counts.

Avoiding common traps

Do not test designs in isolation if the choice happens in clutter. Place your pack among real rivals and under realistic lighting or on real grids. Do not recruit broad general samples for launches that depend on specific buyers. If you plan to defend a base, test with your buyers. If you plan to steal share, include rival buyers and show the rivals they actually choose. Do not polish a weak idea. If attention and comprehension are low, step back and restate the promise rather than adjusting pixels. Do not treat eye movement as the finish line. Use attention to explain outcomes, then let choice make the call.

A simple view of return

Think about what a wrong decision costs. A print run, a pallet display, a retail media burst, a missed season, a rework that eats a month. Estimate how often an informed test will flip a losing choice to a winning one. Even modest lifts add up across thousands of stores and millions of screens. Add the time saved by letting AI trim a long list and by letting panels settle debates in days. In most cases the study pays for itself before the first truck leaves the dock.

Final take

Pack testing is not a chore from a research playbook. It is a steady habit that protects brand codes, speeds creative loops, and raises the odds of winning the first second that matters. Simporter gives you both an agency path for virtual panels and a self serve path for AI, which means you can move from concept to proof without switching platforms. AI for CPG adds creative redesign inside the same rhythm, so your ideas travel from words to visuals to evidence in a single flow. The broader field brings specialists for attention, emotion, realism, and global governance, with fresh partners that many teams have yet to try.

Pick the partner that matches the decision in front of you. Place your design in the environment where the choice happens. Recruit the buyers who will make or break your year. Let AI cut the field. Let people confirm the winner. Capture the rule you learned and carry it forward. That is how Marketing, Innovation, and R&D teams in CPG turn packaging into a dependable edge, not a seasonal gamble.