What the agency needed

An influencer marketing agency managing campaigns for beauty, fashion, food, and fitness brands needed to vet creators at scale. They were spending $50,000 to $200,000 per campaign on creator fees and needed better data to inform which creators to partner with.

The SaaS influencer platforms they subscribed to covered the top of the market well. Large creators with obvious metrics were easy to find and compare. The gap was in the mid-tier and emerging creators where the campaigns actually delivered the best ROI.

Specifically, the agency needed:

  • Engagement rates computed from actual video performance, not profile-level vanity metrics
  • Audience demographic estimates at the niche-and-geo intersection (beauty enthusiasts in Southeast Asia, fitness content watched by 18-24 males in the US)
  • Brand partnership history showing which brands each creator had worked with in the last 6 months and how those sponsored posts performed
  • Custom scoring that weighted factors the agency cared about: posting consistency, audience match to the brand brief, growth velocity, and engagement trend (improving or declining)

No SaaS platform delivered all four in one place. The data existed on TikTok, but extracting it at scale required going beyond what any standard tool exposed.

What the pipeline delivers

Per creator:

  • Username, display name, follower count, verified status
  • Engagement rate (computed across last 20 videos, configurable)
  • Average video views, like-to-view ratio, comment-to-view ratio
  • Estimated audience age distribution and top countries
  • Posting cadence (posts per week, averaged over 60 days)
  • Content niche classification (from video content and hashtag patterns)
  • Brand partnerships detected in last 6 months (from sponsored content signals)
  • 30-day follower growth velocity
  • Composite quality score (0 to 100) weighted by the agency's factors

Discovery mode: Given a seed set of trusted creators, the pipeline builds lookalike lists based on niche, audience overlap, and engagement patterns.

Tracking mode: For creators the agency is actively working with, weekly refreshes track performance trends, audience shifts, and posting consistency.

The 14-niche problem

The agency worked across 14 content niches: beauty, fashion, fitness, food, travel, gaming, tech, parenting, pets, home decor, comedy, education, music, and wellness. Each niche has different engagement norms, different audience demographics, and different creator ecosystems.

A beauty creator with 500,000 followers and a 4% engagement rate is performing well. A comedy creator with the same numbers is performing averagely. A parenting creator with those numbers is exceptional.

The scoring model had to be niche-aware. Engagement rates, posting cadence, and growth velocity were all normalized within niche before computing the composite score. This meant a creator's quality score reflected how they compared to peers in their niche, not to all creators globally.

Building the niche normalization required profiling enough creators in each niche to establish statistical baselines. For the initial build, that was about 1,500 to 2,000 creators per niche, totaling roughly 25,000 profiles across all 14 niches.

The audience demographic challenge

TikTok does not publicly expose audience demographics for creator profiles. The Creator Marketplace provides first-party audience data, but only for creators enrolled in the program. Most mid-tier and emerging creators are not enrolled.

The pipeline estimates audience demographics from observable signals:

  • Comment language analysis indicates primary audience geography
  • Posting times (adjusted for the creator's timezone) correlate with audience activity patterns
  • Content context and hashtag language indicate audience interest segments
  • Engagement pattern timing reveals when the audience is most active, which correlates with age bracket

These estimates are disclosed as estimates in the output. They are not raw platform data. For creators enrolled in the Creator Marketplace, first-party demographics are more accurate and can be incorporated when available. For the 80%+ of mid-tier creators not enrolled, the modeled estimates are the best available signal.

What the agency got out of it

68% faster creator vetting. The prior process involved a strategist manually checking 50 to 100 profiles per campaign brief, copying metrics into a spreadsheet, and making gut-feel recommendations. The pipeline delivers a scored, filtered, ranked list in the agency's format within hours of receiving a brief.

4.8x campaign ROI improvement. Campaigns using pipeline-vetted creators outperformed campaigns using the agency's prior selection process by 4.8x on average (measured as engagement-per-dollar-spent). The improvement came from two factors: better creator selection (higher actual engagement, not just follower count) and better audience matching (demographic alignment to the brand brief).

Competitive differentiation. The agency now sells "data-driven creator selection" as a service to brand clients. The pipeline's output (the scoring model, the demographic estimates, the partnership history) is part of the pitch deck. It is a competitive advantage against agencies that still vet creators manually.

What broke

The sponsored content detection problem

Detecting brand partnerships from public data is imperfect. The pipeline looks for signals: #ad, #sponsored, #partner hashtags, @brand mentions in captions, and consistent patterns of a creator posting about a specific brand multiple times.

Some creators do not disclose partnerships with hashtags. Some use branded content tools that are not visible in the public post metadata. The detection rate is estimated at about 70 to 80% of actual partnerships. The 20 to 30% that are missed are partnerships with no public disclosure signals.

The agency treats the partnership data as directional, not comprehensive. It is useful for spotting creators who are already working with competing brands, but it does not catch every deal.

The engagement spike problem

Some creators have one viral video that inflates their average engagement rate temporarily. A creator with a typical 3% engagement rate who gets one video with 10 million views suddenly looks like a 15% engagement creator in the scoring model.

The fix was adding an outlier detection step that flags viral spikes and offers two engagement rate calculations: with and without outliers. The agency uses the "without outliers" number for vetting and the "with outliers" number for identifying breakout potential.

How ScrapeBase fits

For agencies and brands building their own creator analysis tools, the ScrapeBase API at scrapebase.io offers TikTok endpoints for profiles, posts, comments, and search results. The API handles the platform access layer. You call the endpoint, you get structured JSON back.

For agencies that want the full intelligence layer (scoring, demographics, discovery, tracking), the managed TikTok data service handles everything end to end. The TikTok data service page has the full output schema, pricing, and delivery details.

When this approach makes sense

Custom TikTok creator intelligence fits agencies and brands that:

  • Spend $10,000+ per campaign on creator fees and need data to justify the spend
  • Work across multiple niches and need niche-normalized scoring
  • Want audience demographic matching beyond what SaaS databases provide
  • Need competitive intelligence on which creators are working with competing brands
  • Are building "data-driven creator selection" as a service offering to their own clients

For smaller campaigns or broad discovery, the SaaS influencer platforms are a fine starting point. The managed service fills the gap when the SaaS data is not deep enough for the spend level.