Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Guest Token fetching for twint run in AWS using proxy #1146

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dpakpdl
Copy link

@dpakpdl dpakpdl commented Mar 12, 2021

use the proxy host, port and proxy type from config to use proxy server so that twitter does not block requests from AWS

@dpakpdl
Copy link
Author

dpakpdl commented Mar 17, 2021

Example Usage:

import twint
config = twint.Config()
config.Proxy_host = "51.158.68.68"
config.Proxy_port = "8761"
config.Proxy_type = "http"
config.Username = username
twint.run.Lookup(config)

@yemregundogmus
Copy link

Hello, i got this error with your new update;

File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 410, in Search
    run(config, callback)
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 329, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 35, in __init__
    self.token = token.Token(config)
  File "/usr/local/lib/python3.7/site-packages/twint/token.py", line 23, in __init__
    self.proxies = self._get_proxies()
  File "/usr/local/lib/python3.7/site-packages/twint/token.py", line 29, in _get_proxies
    if not self.config.get('Proxy_host'):
AttributeError: 'Config' object has no attribute 'get'

@yemregundogmus
Copy link

Hello, i got this error with your new update;

File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 410, in Search
    run(config, callback)
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 329, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 35, in __init__
    self.token = token.Token(config)
  File "/usr/local/lib/python3.7/site-packages/twint/token.py", line 23, in __init__
    self.proxies = self._get_proxies()
  File "/usr/local/lib/python3.7/site-packages/twint/token.py", line 29, in _get_proxies
    if not self.config.get('Proxy_host'):
AttributeError: 'Config' object has no attribute 'get'

if fix this issue with replacing self.config.get('proxy_host') to self.config.proxy_host but when i make the replacements, i got this error.

File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 410, in Search
    run(config, callback)
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 329, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/usr/local/lib/python3.7/site-packages/twint/run.py", line 36, in __init__
    self.token.refresh()
  File "/usr/local/lib/python3.7/site-packages/twint/token.py", line 93, in refresh
    raise RefreshTokenException('Could not find the Guest token in HTML')
twint.token.RefreshTokenException: Could not find the Guest token in HTML`

@iHandle
Copy link

iHandle commented Mar 29, 2021

Same error after using your fork of twint.
My code

import twint

c = twint.Config()
c.Username = 'nytimes'
c.Proxy_host = '127.0.0.1'
c.Proxy_port = '7890'
c.Proxy_type = 'http'

twint.run.Lookup(c)

Error:

WARNING:root:Error retrieving https://twitter.com/: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001497FE4D6D8>, 'Connection to twitter.com timed out. (connect timeout=10)'))",),), retrying

I am sure my http proxy works well, because I have run pip3 --proxy http://127.0.0.1:7890 install -r requirement.txt successfully.

@dpakpdl
Copy link
Author

dpakpdl commented Mar 30, 2021

Same error after using your fork of twint.
My code

import twint

c = twint.Config()
c.Username = 'nytimes'
c.Proxy_host = '127.0.0.1'
c.Proxy_port = '7890'
c.Proxy_type = 'http'

twint.run.Lookup(c)

Error:

WARNING:root:Error retrieving https://twitter.com/: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001497FE4D6D8>, 'Connection to twitter.com timed out. (connect timeout=10)'))",),), retrying

I am sure my http proxy works well, because I have run pip3 --proxy http://127.0.0.1:7890 install -r requirement.txt successfully.

Check if you local proxy is working or not using this code. If not use other proxies.

import requests
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'})
req = session.prepare_request(requests.Request('GET', 'https://twitter.com'))
proxies = {'http': '127.0.0.1:7890'}
session.send(req, allow_redirects=True, timeout=10, proxies=proxies, verify=False)

@iHandle
Copy link

iHandle commented Apr 1, 2021

Same error after using your fork of twint.
My code

import twint

c = twint.Config()
c.Username = 'nytimes'
c.Proxy_host = '127.0.0.1'
c.Proxy_port = '7890'
c.Proxy_type = 'http'

twint.run.Lookup(c)

Error:

WARNING:root:Error retrieving https://twitter.com/: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001497FE4D6D8>, 'Connection to twitter.com timed out. (connect timeout=10)'))",),), retrying

I am sure my http proxy works well, because I have run pip3 --proxy http://127.0.0.1:7890 install -r requirement.txt successfully.

Check if you local proxy is working or not using this code. If not use other proxies.

import requests
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'})
req = session.prepare_request(requests.Request('GET', 'https://twitter.com'))
proxies = {'http': '127.0.0.1:7890'}
session.send(req, allow_redirects=True, timeout=10, proxies=proxies, verify=False)

I can not access twitter by using this code regardless of which proxy I use. But I can visit twitter on Firefox with any of my proxies.

@okkymabruri
Copy link

Please accept this patch @pielco11 @haccer

@ybs5
Copy link

ybs5 commented May 24, 2021

This commit is error when I test it. Here is my code:

token.py

import re
import time
import logging as logme
import requests


class TokenExpiryException(Exception):
    def __init__(self, msg):
        super().__init__(msg)


class RefreshTokenException(Exception):
    def __init__(self, msg):
        super().__init__(msg)


class Token:
    def __init__(self, config):
        self._session = requests.Session()
        self._session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'})
        self.config = config
        self._proxies = self._set_proxies()
        self._retries = 5
        self._timeout = 10
        self.url = 'https://twitter.com'

    def _set_proxies(self) -> dict:
        settings = [self.config.Proxy_type, self.config.Proxy_host, self.config.Proxy_port]
        if not all(settings):
            logme.debug(f"No proxy in config")
            return {}

        proxy_type = self.config.Proxy_type.lower()
        proxy_val = f"{self.config.Proxy_host}:{self.config.Proxy_port}"
        proxies = {proxy_type: proxy_val}
        if proxy_type == 'http':
            proxies['https'] = proxy_val
        return proxies

    def _request(self):
        for attempt in range(self._retries + 1):
            # The request is newly prepared on each retry because of potential cookie updates.
            req = self._session.prepare_request(requests.Request('GET', self.url))
            logme.debug(f'Retrieving {req.url}')
            try:
                if self._proxies:
                    r = self._session.send(
                        req,
                        allow_redirects=True,
                        timeout=self._timeout,
                        proxies=self._proxies,
                        verify=False
                    )
                else:
                    r = self._session.send(req, allow_redirects=True, timeout=self._timeout)
            except requests.exceptions.RequestException as exc:
                if attempt < self._retries:
                    retrying = ', retrying'
                    level = logme.WARNING
                else:
                    retrying = ''
                    level = logme.ERROR
                logme.log(level, f'Error retrieving {req.url}: {exc!r}{retrying}')
            else:
                success, msg = (True, None)
                msg = f': {msg}' if msg else ''

                if success:
                    logme.debug(f'{req.url} retrieved successfully{msg}')
                    return r
            if attempt < self._retries:
                # TODO : might wanna tweak this back-off timer
                sleep_time = 2.0 * 2 ** attempt
                logme.info(f'Waiting {sleep_time:.0f} seconds')
                time.sleep(sleep_time)
        else:
            msg = f'{self._retries + 1} requests to {self.url} failed, giving up.'
            logme.fatal(msg)
            self.config.Guest_token = None
            raise RefreshTokenException(msg)

    def refresh(self):
        logme.debug('Retrieving guest token')
        res = self._request()
        match = re.search(r'\("gt=(\d+);', res.text)
        if match:
            logme.debug('Found guest token in HTML')
            self.config.Guest_token = str(match.group(1))
        else:
            self.config.Guest_token = None
            raise RefreshTokenException('Could not find the Guest token in HTML')

@dpakpdl
Copy link
Author

dpakpdl commented May 24, 2021

This commit is error when I test it. Here is my code:

token.py

import re
import time
import logging as logme
import requests


class TokenExpiryException(Exception):
    def __init__(self, msg):
        super().__init__(msg)


class RefreshTokenException(Exception):
    def __init__(self, msg):
        super().__init__(msg)


class Token:
    def __init__(self, config):
        self._session = requests.Session()
        self._session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'})
        self.config = config
        self._proxies = self._set_proxies()
        self._retries = 5
        self._timeout = 10
        self.url = 'https://twitter.com'

    def _set_proxies(self) -> dict:
        settings = [self.config.Proxy_type, self.config.Proxy_host, self.config.Proxy_port]
        if not all(settings):
            logme.debug(f"No proxy in config")
            return {}

        proxy_type = self.config.Proxy_type.lower()
        proxy_val = f"{self.config.Proxy_host}:{self.config.Proxy_port}"
        proxies = {proxy_type: proxy_val}
        if proxy_type == 'http':
            proxies['https'] = proxy_val
        return proxies

    def _request(self):
        for attempt in range(self._retries + 1):
            # The request is newly prepared on each retry because of potential cookie updates.
            req = self._session.prepare_request(requests.Request('GET', self.url))
            logme.debug(f'Retrieving {req.url}')
            try:
                if self._proxies:
                    r = self._session.send(
                        req,
                        allow_redirects=True,
                        timeout=self._timeout,
                        proxies=self._proxies,
                        verify=False
                    )
                else:
                    r = self._session.send(req, allow_redirects=True, timeout=self._timeout)
            except requests.exceptions.RequestException as exc:
                if attempt < self._retries:
                    retrying = ', retrying'
                    level = logme.WARNING
                else:
                    retrying = ''
                    level = logme.ERROR
                logme.log(level, f'Error retrieving {req.url}: {exc!r}{retrying}')
            else:
                success, msg = (True, None)
                msg = f': {msg}' if msg else ''

                if success:
                    logme.debug(f'{req.url} retrieved successfully{msg}')
                    return r
            if attempt < self._retries:
                # TODO : might wanna tweak this back-off timer
                sleep_time = 2.0 * 2 ** attempt
                logme.info(f'Waiting {sleep_time:.0f} seconds')
                time.sleep(sleep_time)
        else:
            msg = f'{self._retries + 1} requests to {self.url} failed, giving up.'
            logme.fatal(msg)
            self.config.Guest_token = None
            raise RefreshTokenException(msg)

    def refresh(self):
        logme.debug('Retrieving guest token')
        res = self._request()
        match = re.search(r'\("gt=(\d+);', res.text)
        if match:
            logme.debug('Found guest token in HTML')
            self.config.Guest_token = str(match.group(1))
        else:
            self.config.Guest_token = None
            raise RefreshTokenException('Could not find the Guest token in HTML')

The last commit is correct. You can check the changes in the PR.

@davewang
Copy link

davewang commented Jan 7, 2022

Same error after using your fork of twint.
My code

import twint

c = twint.Config()
c.Username = 'nytimes'
c.Proxy_host = '127.0.0.1'
c.Proxy_port = '7890'
c.Proxy_type = 'http'

twint.run.Lookup(c)

Error:

WARNING:root:Error retrieving https://twitter.com/: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001497FE4D6D8>, 'Connection to twitter.com timed out. (connect timeout=10)'))",),), retrying

I am sure my http proxy works well, because I have run pip3 --proxy http://127.0.0.1:7890 install -r requirement.txt successfully.

Check if you local proxy is working or not using this code. If not use other proxies.

import requests
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'})
req = session.prepare_request(requests.Request('GET', 'https://twitter.com'))
proxies = {'http': '127.0.0.1:7890'}
session.send(req, allow_redirects=True, timeout=10, proxies=proxies, verify=False)

twitter is https change to this,add https proxy,Is work. bug twint has is bug.

proxies = {'https': '127.0.0.1:7890','http': '127.0.0.1:7890'}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants