ScrapingAnt

Branched out from Scraping APIs

14 Apr 2023

scraping-apis/scrapingant-pricing.jpg

Starting to test on free plan.

scraping-apis/scrapingant-dashboard.jpg

Basic Use

import requests

url = 'https://api.scrapingant.com/v2/general'
params = {
    'url': 'https://example.com',
    'x-api-key': '<YOUR-API-KEY>'
}

response = requests.get(url, params=params)

print(response.text)

working function

def get_soup(url):

    request_url = 'https://api.scrapingant.com/v2/general'
    params = {
        'url': url,
        'x-api-key': SCRAPINGANT_API_KEY,
        'browser': True, # default: True
        'proxy_country': 'GB', # default: world
        'return_page_source': False, # default: False
    }

    response = requests.get(request_url, params=params)

    html_text = response.text

    soup = BeautifulSoup(html_text, 'html.parser')

    return soup

API client

Library resources
PyPI ---
Github https://github.com/ScrapingAnt/scrapingant-client-python
Documentation https://docs.scrapingant.com/python-client

Getting Started

pip install scrapingant-client

Usage

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site.
result = client.general_request('https://example.com')
print(result.content)

Error codes

HTTP status code Reason
400 Wrong request format. Make sure that you're using a proper JSON input.
403 The API token is wrong or you have exceeded the API credits limit.
404 The requested URL is not reachable. Please, check it in your browser or try again.
405 The API endpoint can only be accessed using the following HTTP methods: GET, POST, PUT, DELETE.
409 Concurrent requests limit exceeded. Please, try again or upgrade to the paid plan
422 Invalid value provided. Please, look into detail for more info.
423 The anti-bot detection system has detected the request. Please, retry or change the request settings.
500 Something went wrong with the server side code. Rare case. We're recommending to contact us in this case.

Response example

apparent_encoding: utf-8
close: <bound method Response.close of <Response [404]>>
connection: <requests.adapters.HTTPAdapter object at 0x1069c6860>
content: b'{"detail":"This site can\xe2\x80\x99t be reached"}'
cookies: <RequestsCookieJar[]>
elapsed: 0:01:00.988698
encoding: utf-8
headers: {'Content-Length': '41', 'Content-Type': 'application/json', 'Date': 'Sat, 15 Apr 2023 11:11:07 GMT', 'Server': 'uvicorn', 'Vary': 'Accept-Encoding'}
history: []
is_permanent_redirect: False
is_redirect: False
iter_content: <bound method Response.iter_content of <Response [404]>>
iter_lines: <bound method Response.iter_lines of <Response [404]>>
json: <bound method Response.json of <Response [404]>>
links: {}
next: None
ok: False
raise_for_status: <bound method Response.raise_for_status of <Response [404]>>
raw: <urllib3.response.HTTPResponse object at 0x106c63040>
reason: Not Found
request: <PreparedRequest [GET]>
status_code: 404
text: {"detail":"This site can’t be reached"}
url: https://api.scrapingant.com/v2/general?url=https%3A%2F%2Fbit.ly%2F3K8jfVx&x-api-key=c967d3da0a484315a5ecbb2aaaeb367a&browser=False&proxy_country=GB&return_page_source=True

links

social