ByteDance’s Bytespider: The Aggressive AI Data Scraper Raising Alarms
ByteDance, the parent company of TikTok, has launched an aggressive web scraper bot called Bytespider to collect data for training its generative AI models. Research from Kasada reveals that Bytespider has become one of the most aggressive scrapers on the internet since its release in April.
Key Findings:
- Bytespider scrapes data at 25 times the rate of OpenAI’s GPTbot
- It operates 3,000 times faster than Anthropic’s ClaudeBot
- The bot ignores robots.txt, a code used to signal websites off-limits to scrapers
- Scraping activity has shown huge spikes over the past six weeks
ByteDance’s aggressive data collection comes despite the potential U.S. ban on TikTok. President Biden has signed legislation requiring ByteDance to sell TikTok or shut it down due to national security concerns.
Implications for AI Development:
- ByteDance appears to be rapidly catching up in the generative AI race
- The company is likely developing a new large language model (LLM)
- A new AI model could enhance TikTok’s search function for advertisers
This aggressive scraping raises concerns about copyright infringement and data ethics. It also highlights the intensifying competition in the AI industry, with ByteDance seemingly determined to close the gap with other major tech companies.
The post ByteDance’s Bytespider: The Aggressive AI Data Scraper Raising Alarms appeared first on BizStack — Entrepreneur’s Business Stack.
Disclosure: We might receive commission on purchases made through links on this page. Learn more.