Empowering Data Professionals to Collect Clean Structured Web Data
Self-service platform for your team to code, scale and maintain your own data collection processes.
Request a Quote or Learn MoreCode
Easily code, deploy & maintain your data collection processes.
Scale
Scale your data-collection processes to millions of page requests with a few mouse clicks.
Connect
Connect your favorite Business Intelligence tools to your clean structured web data easily.
Code
Easily code, deploy & maintain your data collection processes.
Ruby Programming Language
Powerful yet easy-to-learn programming language.
# initialize nokogiri
nokogiri = Nokogiri.HTML(content)
# get the listings
listings = nokogiri.css('ul.b-list__items_nofooter li.s-item')
# loop through the listings
listings.each do |listing|
# save the product info to outputs.
outputs << {
_collection: "products",
title: listing.at_css('h3.s-item__title')&.text,
price: listing.at_css('.s-item__price')&.text
}
# enqueue more pages to be scraped
pages << {
url: item_link['href'] unless item_link.nil?,
page_type: 'details'
}
end
Save Time & Effort
Short Learning Curve. Easy to use Platform for Web Scraping, API Integrations and ETL processes.
Integrated Development Flow
Robust End to End Platform for your Team to Develop, Run & Maintain Data Collection Processes.
Export to Various Formats
Easily export to JSON, CSV, or other formats.
Custom Rubygem
Use your favorite rubygems that can easily help you collect data better.
Ensure Clean & Accurate Data
Use the JSON-schema specifications to ensure clean and accurate data.
Easy troubleshooting of bugs
View the log to pinpoint bugs in your code.
Scale
Scale your data-collection processes to millions of page requests with a few mouse clicks.
Parallel Processing
Whether you want to collect data from multiple sources at once, or one source faster, we can handle it.
Auto Proxy Rotation
No need to worry about IP bans, we auto rotate IPs on any requests that are made.
Cron Based Scheduler
Use CRON's powerful scheduling syntax to schedule your process to run on your specified time.
Connect
Connect your favorite Business Intelligence tools to your clean structured web data easily.
Full API Access
Integrate your apps to interact with your recently collected data, or any deeper platform functionalities.
Business Intelligence Connectivity
Connect Google Data Studio, Tableau, Microsoft Power BI, or other tools to your data via APIs and connectors
Internet as a database
No longer are you constrained by existing data inside your company, the DataHen platform can collect cleanse data for you from anywhere on the internet.
Testimonials
Don't take our words for it, read what others have to say.
Pricing
Our self-service platform comes in two flexible pricing models based on how much scale and support you need.
Professional Plan
From$149 Per Month, USD. Best Value- Build Unlimited scrapers
- Rotating Proxies
- Export to JSON, CSV, and others
- Run Up to 3 concurrent scrapers
- Extract data from up to 300,000 web pages per month
- Forum based support
Enterprise Plan
From$1000 Per Month, USD. Scalable- Build Unlimited scrapers
- Rotating Proxies
- Export to JSON, CSV, and others
- Run Up to 20 concurrent scrapers
- Extract data from up to 2,000,000 web pages per month
- Email/Phone based support
- Business Intelligence Connectivity