ScrapingAnt

Branched out from !helpers/scraping-apis

14 Apr 2023

scraping-apis/scrapingant-pricing.jpg

Starting to test on free plan.

scraping-apis/scrapingant-dashboard.jpg

Basic Use

import requests

url = 'https://api.scrapingant.com/v2/general'
params = {
    'url': 'https://example.com',
    'x-api-key': '<YOUR-API-KEY>'
}

response = requests.get(url, params=params)

print(response.text)

working function

def get_soup(url):

    request_url = 'https://api.scrapingant.com/v2/general'
    params = {
        'url': url,
        'x-api-key': SCRAPINGANT_API_KEY,
        'browser': True, # default: True / Set to True for JS rendering (10 credits), False otherwise (1 credit)
        'proxy_country': 'GB', # default: world
        'return_page_source': False, # default: False
    }

    response = requests.get(request_url, params=params)

    html_text = response.text

    soup = BeautifulSoup(html_text, 'html.parser')

    return soup

using class selector

When website uses JS to load content, you can use the class selector option to ensure data is only returned once that CSS class is availabe.

CSS class (as .hero-unit) needs to be URL encoded.

    request_url = 'https://api.scrapingant.com/v2/general'

    params = {
        'url': url,
        'x-api-key': SCRAPINGANT_API_KEY,
        'browser': True,  # Set to True for JS rendering (10 credits), False otherwise (1 credit)
        'proxy_country': 'GB',
        'wait_for_selector': urllib.parse.quote('.hero-unit'),
    }

    response = requests.get(request_url, params=params)

See

API client

Library resources
PyPI ---
Github https://github.com/ScrapingAnt/scrapingant-client-python
Documentation https://docs.scrapingant.com/python-client

Getting Started

pip install scrapingant-client

Usage

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site.
result = client.general_request('https://example.com')
print(result.content)

Error codes

HTTP status code Reason
400 Wrong request format. Make sure that you're using a proper JSON input.
403 The API token is wrong or you have exceeded the API credits limit.
404 The requested URL is not reachable. Please, check it in your browser or try again.
405 The API endpoint can only be accessed using the following HTTP methods: GET, POST, PUT, DELETE.
409 Concurrent requests limit exceeded. Please, try again or upgrade to the paid plan
422 Invalid value provided. Please, look into detail for more info.
423 The anti-bot detection system has detected the request. Please, retry or change the request settings.
500 Something went wrong with the server side code. Rare case. We're recommending to contact us in this case.

Response example

apparent_encoding: utf-8
close: <bound method Response.close of <Response [404]>>
connection: <requests.adapters.HTTPAdapter object at 0x1069c6860>
content: b'{"detail":"This site can\xe2\x80\x99t be reached"}'
cookies: <RequestsCookieJar[]>
elapsed: 0:01:00.988698
encoding: utf-8
headers: {'Content-Length': '41', 'Content-Type': 'application/json', 'Date': 'Sat, 15 Apr 2023 11:11:07 GMT', 'Server': 'uvicorn', 'Vary': 'Accept-Encoding'}
history: []
is_permanent_redirect: False
is_redirect: False
iter_content: <bound method Response.iter_content of <Response [404]>>
iter_lines: <bound method Response.iter_lines of <Response [404]>>
json: <bound method Response.json of <Response [404]>>
links: {}
next: None
ok: False
raise_for_status: <bound method Response.raise_for_status of <Response [404]>>
raw: <urllib3.response.HTTPResponse object at 0x106c63040>
reason: Not Found
request: <PreparedRequest [GET]>
status_code: 404
text: {"detail":"This site can’t be reached"}
url: https://api.scrapingant.com/v2/general?url=https%3A%2F%2Fbit.ly%2F3K8jfVx&x-api-key=c967d3da0a484315a5ecbb2aaaeb367a&browser=False&proxy_country=GB&return_page_source=True

AI extraction

27 Sep 2023

Launched AI data extraction.

Can handle requests like extracting product title, price(number), full description, reviews(list: review title, review content).

Instructions: https://docs.scrapingant.com/ai-data-extraction/ai-extractor

Price seems prohibitive though:

each 30 characters of input and output text cost 1 API credit

https://docs.scrapingant.com/credits-cost#ai-extractor-cost

So a webpage with 30k characters would cost 1k credits!

links

social