InvalidSchema(“未找到”%S“%URL的连接适配器)

我能够使用此网页收集数据

import requests
import lxml.html
import re
url = "http://animesora.com/flying-witch-episode-7-english-subtitle/"
r = requests.get(url)
page = r.content
dom =  lxml.html.fromstring(page)

for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
    down = re.findall('https://.*',link)     
    print (down)

当我尝试这个问题时收集更多数据时,我出现了此错误的结果:

Traceback (most recent call last):
  File "/home/sven/PycharmProjects/untitled1/.idea/test4.py", line 21, in <module>
    r2 = requests.get(down)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 590, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 672, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9zTGZYZ0s=&c=1&user=51757']'

这是我使用的代码:

for link2 in down:
    r2 = requests.get(down)
    page2 = r.url
    dom2 = lxml.html.fromstring(page2)

for link2 in dom2('//div[@class="button green"]//onclick'):

    down2 = re.findall('.*',down2)
    print (down2)
请包括这一点FULL回溯除了例外。你只给了我们一个Exception消息,在将URL命名为失败的点处截断。之后有更多的数据were found for ',至少还有一个'例如,偏移。Martijn Pieters
Why are you using a regex? Also you can see pretty clearly that the url is inside a listPadraic Cunningham
@ padraiccunningham我不知道它是正则表达式。我刚看到这里的stackoverflow然后修改它Pande Lemon
@ PandeLemon,绝对不需要正则表达式,你实际上从页面上看了什么?Padraic Cunningham

回答 1

  1. 赞同 0

    你正在进入Whole列表:

    for link2 in down:
        r2 = requests.get(down)
    

    注意你如何通过downnot. link2.down列表,不是单个URL字符串。

    Pass in link2:

    for link2 in down:
        r2 = requests.get(link2)
    

    我不确定为什么要使用正则表达式。在循环

    for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
    

    EACH.linkIS.Already.A完全合格的URL:

    >>> for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
    ...     print link
    ...
    https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9FZEk2Qg==&c=1&user=51757
    https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC95Tmg2Qg==&c=1&user=51757
    https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC93dFBmVFg=&c=1&user=51757
    https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9zTGZYZ0s=&c=1&user=51757
    

    You don't need to do any further processing on that.

    您的剩余代码有更多的错误;你困惑r2.urlr2.content并忘了.xpath在你的dom2.xpath(...)Query。

    Martijn Pieters
    我应该将此作为文件从那里打开它们吗?Pande柠檬
    @ Pandelemon:我不确定你在问什么。Martijn Pieters
    @ Martin Pieters我的产出应该是这样的东西http.://ouo.io.Pande柠檬
    @ Pandelemon:对不起,这不是这个网站的工作原理;我们可以回答特定的问题Only,在此之后没有帮助解决每个问题。如果您不完全明白您自己的代码所做的操作,您可能希望更多地研究Python编程一些。Martijn Pieters
    @ Martin Pieters非常感谢。我今天刚刚尝试过。你是一个很棒的帮助Pande柠檬