Client Success Story

Retail Energy Pricing
Data Extraction and Related
Data Support for a Consulting Firm with 40% Cost Reduction

The client

A Management Consulting Firm in the Energy Sector

Our client is based in the United States and specializes in delivering strategic and operational excellence to enterprises. It partners with energy companies to help them navigate complex industry challenges, drive innovation, and achieve sustainable growth.

PROJECT REQUIREMENTS

Data Collection and Data Management Support

Retail energy pricing (REP) refers to the cost that consumers pay for electricity, gas, or other forms of energy that they use. It's also an essential consideration in the broader context of energy policy and sustainability efforts. The client ran a REP feed that enabled end-users to analyze direct market activity within the energy sector. For that feed, they needed to collect detailed data on provider-wise and region-wise prices and terms for natural gas and electricity plans across the United States.

Our team had to-

  • Go through the websites of energy companies from a list provided by the client
  • Perform manual website data extraction to extract necessary data, including rates and terms for natural gas and electricity plans
  • Use area ZIP codes to ensure the data collected is relevant to specific regions, capturing local variations in pricing and terms
  • Conduct weekly data collection to keep the information current and reflective of any changes in rates and terms
  • Maintain a consistent format for data entry to facilitate easy analysis and comparison
  • Ensure data validation and 100% accuracy
PROJECT CHALLENGES

Anti-Scraping Measures, Website Variations, and Large Data Volumes

The client attempted and failed in their automated data collection efforts. The diverse structure, layout, and navigation of target websites made it challenging to create a standardized, automated scraping script. Many websites employed dynamic content loading techniques and anti-scraping measures (e.g., CAPTCHAs, IP blocking), further complicating things and reducing data accuracy. Additionally, they faced challenges in managing and organizing large volumes of data from numerous providers and regions and ensuring it was organized and accessible for analysis.

We proposed an entirely manual website data extraction approach. It allowed for greater accuracy and relevance of the data collected, directly addressing the client's requirements.

OUR SOLUTION

A Specialized Approach to Data Collection and Management

With a team of three, we commenced manual data extraction. The subject matter experts aligned with this project took up manual data extraction wherever the client's scripts failed to produce results. The website sources and zip codes helped us search and update the offer rate, renewable percentage, early termination fee, price to compare, and monthly fee.

1

Customized Data Extraction

We developed a data collection routine tailored to the unique structure and layout of each energy provider's website. This ensured efficient and accurate data extraction from diverse sources.

2

Dealing with CAPTCHAs

CAPTCHAs are designed to differentiate between human users and automated bots. During manual data extraction, we solved CAPTCHAs and accessed the data that automated scripts were blocked from retrieving.

3

Bypassing IP Blocking with Rotation

IP Blocking is a common anti-scraping measure where websites block IP addresses that make too many requests in a short period. We bypassed this by using VPNs (Virtual Private Networks) to change our IP address regularly. Proxy servers helped us route our internet connection through different servers, presenting different IP addresses to the target website.

4

Handling Dynamic Content on Websites

Websites often use JavaScript to dynamically load content. While automated scripts have trouble interacting with these elements directly, an experienced professional can easily do so. By clicking buttons or scrolling down pages to trigger content loading, we were able to access and extract the needed data.

5

Dealing with Session-Based Anti-Scraping Measures

Some target websites tracked user behavior across sessions to detect scraping patterns. Our data extraction operators avoided this by clearing cookies and cache regularly, logging in and out of websites as needed, and mimicking normal browsing behavior, such as spending a reasonable amount of time on each page.

6

Weekly Data Collection Workflow

We established a robust weekly data collection workflow to keep the information current. This included setting up schedules and assigning dedicated team members to ensure timely updates.

7

Data Validation and Quality Assurance

We implemented rigorous data validation protocols, including manual checks and cross-verification, to ensure 100% accuracy and consistency. Regular audits and quality assurance processes were put in place to maintain the integrity of the data.

Project Outcomes

Owing to the results our team produced, the project that was started in early 2023 has been extended for another year. By ensuring that the client had access to accurate, timely, and comprehensive data, SunTec India supported their efforts to provide strategic insights and operational excellence in the energy sector.

Zero data discrepancies reported in client audits over 12 months

By filling the gaps in automated data extraction, we reduced the client's overhead costs by 40%

Covered all the provider websites and ZIP codes listed by the client, ensuring 100% on-time data delivery

Successfully extracted required data from websites with stringent anti-scraping defenses

The client has also requested data visualization support for a related project, where they need our team to build executive dashboards of REP, as well as other energy industry metrics, to be displayed to their end users.

CONTACT US

Get Ready-to-Use Data with Web Research Support from SunTec

We have managed website data collection and market research tasks for a variety of clients across the energy, healthcare, finance, and IT sectors. With a human-in-the-loop approach, we have helped tech-based market research platforms perform better. Additionally, our service range (including data visualization, lead research, and data management) serves as a 360-degree support for consulting firms, helping them quickly get to analysis with readily available data.

Achieve the targeted results for your organization with the SunTec team.