日本不卡不码高清免费观看,久久国产精品久久w女人spa,黄色aa久久,三上悠亚国产精品一区二区三区

您的位置:首頁技術(shù)文章
文章詳情頁

python - scrapy爬取手機(jī)版微博weibo.cn模擬登錄出先問題

瀏覽:212日期:2022-08-03 14:03:35

問題描述

代碼如下,不知道為什么一直不能成功登錄

># -*- coding: utf-8 -*-import scrapyimport reimport requests#import urllibfrom bs4 import BeautifulSoupfrom scrapy.linkextractors import LinkExtractorfrom scrapy.spiders import CrawlSpider, Rulefrom scrapy.loader import ItemLoaderfrom scrapy.loader.processors import MapCompose, Joinfrom scrapy.http import Request,FormRequestfrom getweibo.items import InformationItem,TweetsItemloginURL = 'https://login.weibo.cn/login/'#獲得驗(yàn)證碼等信息def get_captchainfo(loginURL): html = requests.get(loginURL).content bs = BeautifulSoup(html,'lxml') #print bs #注意通過bs.select元素尋找對象,返回的是列表對象 password_name = (bs.select(’input[type='password']’))[0].get(’name’) vk = (bs.select(’input[name='vk']’))[0].get(’value’) capId = (bs.select(’input[name='capId']’))[0].get(’value’) #print password_name,vk,capId captcha_img = bs.find('img', src=re.compile(’http://weibo.cn/interface/f/ttt/captcha/’)).get(’src’) print captcha_img #captchaid可以從驗(yàn)證碼圖片地址中直接截取獲得 #urllib.urlretrieve(captcha_img, ’weibo_spider/image/captcha.jpg’) #print 'captcha download success!' captcha_input = raw_input('please input the captchan>') return (captcha_input,password_name,vk,capId)class WeiboSpider(CrawlSpider): name = ’weibo’ allowed_domains = [’weibo.cn’] start_urls = [’http://weibo.cn/dafendi’]#先暫時確定精分君的微博,之后start_urls可以從文件提取 rules = (Rule(LinkExtractor(restrict_xpaths=’//*[@id='pagelist']/form/p/a’)),Rule(LinkExtractor(restrict_xpaths=’//*[contains(@href,'repost')]’),callback=’parse_item’) ) headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.8', 'Connection': 'keep-alive', 'Content-Type':' application/x-www-form-urlencoded', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Referer': 'https://login.weibo.cn/login/' } # Start on the welcome page def start_requests(self):return [ Request(loginURL,meta = {’cookiejar’: 1},headers=self.headers,callback=self.parse_login)] # Post welcome page’s first form with the given user/pass def parse_login(self, response):print ’Preparing login’captcha=get_captchainfo(loginURL)print captcha return FormRequest.from_response( response,#from loginURL method='POST', meta = {’cookiejar’ : response.meta[’cookiejar’]},#獲取cookies headers = self.headers, formdata = { 'mobile': '帳號', captcha[1]: '密碼', 'code': captcha[0], 'remember':'on', 'backurl': 'http%3A%2F%2Fweibo.cn', 'backtitle':u’手機(jī)新浪網(wǎng)’, 'tryCount':'', 'vk': captcha[2], 'capId': captcha[3], 'submit': u’登錄’}, callback = self.after_login, dont_filter = True) def after_login(self, response) :for url in self.start_urls : yield self.make_requests_from_url(url) def parse_start_url(self, response):#用來處理初始responsehtml = response.xpath(’/html’).extract()print html # Create the loader using the response l = ItemLoader(item=InformationItem(), response=response) # Load fields using XPath expressionsl.add_xpath(’id_’, ’//title/text()’, MapCompose(lambda i:i[0:len(i)-3])),l.add_xpath(’Info’,’//span[contains(@class,'ctt')][2]/text()’),l.add_xpath(’Num_Tweets’,’//span[contains(@class,'tc')]/text()’,MapCompose(lambda i: i[(i.index('[')+1):(i.index(']'))])),l.add_xpath(’Num_Follows’,’//a[contains(@href,'follow')]/text()’,MapCompose(lambda i: i[(i.index('[')+1):(i.index(']'))])),l.add_xpath(’Num_Fans’,’//a[contains(@href,'fans')]/text()’,MapCompose(lambda i: i[(i.index('[')+1):(i.index(']'))])),return l.load_item() def parse_item(self, response): l = ItemLoader(item=TweetsItem(), response=response) l.add_xpath(’Content’,’//span[contains(@class,'ctt')]/text()’) #l.add_xpath(’’) return l.load_item()

下邊settins.py的內(nèi)容

ROBOTSTXT_OBEY = FalseHTTPERROR_ALLOWED_CODES = [302,]#返回400時按正常的返回對待REDIRECT_ENABLED = False #關(guān)掉重定向,不會重定向到新的地址DOWNLOAD_DELAY = 3COOKIES_ENABLED = TrueCOOKIES_DEBUG = True

下邊是輸出

2017-04-09 15:53:17 [scrapy] DEBUG: Sending cookies to: <POST https://login.weibo.cn/login/?rand=201282002&backURL=http%3A%2F%2Fweibo.cn&backTitle=%E6%89%8B%E6%9C%BA%E6%96%B0%E6%B5%AA%E7%BD%91&vt=4>Cookie: _T_WM=6348fb8a523fe1bc486f14d1304cf0d22017-04-09 15:53:19 [scrapy] DEBUG: Received cookies from: <302 https://login.weibo.cn/login/?rand=201282002&backURL=http%3A%2F%2Fweibo.cn&backTitle=%E6%89%8B%E6%9C%BA%E6%96%B0%E6%B5%AA%E7%BD%91&vt=4>Set-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cnSet-Cookie: SUB=_2A2517Zg9DeRhGeVG61ER8yrEwzyIHXVXETh1rDV6PUJbkdAKLRXgkW0wSZc8S6dp1d-NlyAraSqa-1-_0Q..; expires=Tue, 09-May-2017 07:53:17 GMT; path=/; domain=.weibo.cn; httponlySet-Cookie: gsid_CTandWM=4uuCcdef1lRXUEnMtsgL1fXlgec; expires=Tue, 09-May-2017 07:53:19 GMT; path=/; domain=.weibo.cn; httponly2017-04-09 15:53:19 [scrapy] DEBUG: Crawled (302) <POST https://login.weibo.cn/login/?rand=201282002&backURL=http%3A%2F%2Fweibo.cn&backTitle=%E6%89%8B%E6%9C%BA%E6%96%B0%E6%B5%AA%E7%BD%91&vt=4> (referer: https://login.weibo.cn/login/)2017-04-09 15:53:20 [scrapy] DEBUG: Received cookies from: <200 http://weibo.cn/dafendi>Set-Cookie: _T_WM=80e15f38a0dfb65ea7bbcd00ebcaf1c0; expires=Tue, 09-May-2017 07:53:19 GMT; path=/; domain=.weibo.cn; httponlySet-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cn2017-04-09 15:53:20 [scrapy] DEBUG: Crawled (200) <GET http://weibo.cn/dafendi> (referer: https://login.weibo.cn/login/?rand=201282002&backURL=http%3A%2F%2Fweibo.cn&backTitle=%E6%89%8B%E6%9C%BA%E6%96%B0%E6%B5%AA%E7%BD%91&vt=4)2017-04-09 15:53:20 [scrapy] DEBUG: Scraped from <200 http://weibo.cn/dafendi>{’Info’: [u’u8ba4u8bc1uff1au77e5u540du5e7du9ed8u535au4e3b u5faeu535au7b7eu7ea6u81eau5a92u4f53’], ’Num_Fans’: [u’2055326’], ’Num_Follows’: [u’891’], ’Num_Tweets’: [u’1958’], ’id_’: [u’u7cbeu5206u541b’]}2017-04-09 15:53:20 [scrapy] DEBUG: Sending cookies to: <GET http://weibo.cn/repost/EDsDTFqfJ?rl=0&uid=2626948743>Cookie: _T_WM=80e15f38a0dfb65ea7bbcd00ebcaf1c02017-04-09 15:53:20 [scrapy] DEBUG: Sending cookies to: <GET http://weibo.cn/repost/EDxAwrBrG?rl=0&uid=2626948743>Cookie: _T_WM=80e15f38a0dfb65ea7bbcd00ebcaf1c02017-04-09 15:53:20 [scrapy] DEBUG: Sending cookies to: <GET http://weibo.cn/repost/EDBmajRBl?rl=0&uid=2626948743>Cookie: _T_WM=80e15f38a0dfb65ea7bbcd00ebcaf1c02017-04-09 15:53:20 [scrapy] DEBUG: Sending cookies to: <GET http://weibo.cn/repost/CsN9LnQiG?rl=0&uid=2626948743>Cookie: _T_WM=80e15f38a0dfb65ea7bbcd00ebcaf1c02017-04-09 15:53:24 [scrapy] DEBUG: Received cookies from: <200 http://weibo.cn/repost/EDsDTFqfJ?rl=0&uid=2626948743>Set-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cn2017-04-09 15:53:24 [scrapy] DEBUG: Crawled (200) <GET http://weibo.cn/repost/EDsDTFqfJ?rl=0&uid=2626948743> (referer: http://weibo.cn/dafendi)2017-04-09 15:53:24 [scrapy] DEBUG: Scraped from <200 http://weibo.cn/repost/EDsDTFqfJ?rl=0&uid=2626948743>{’Content’: [u’:’, u’ u5047u5982u4efbu4f55u4e8bu90fdu80fdu6210u4e3au804cu4e1auff0cu4f60u4f1au9009u62e9u4ec0u4e48u4f5cu4e3au804cu4e1auff1f u200bu200bu200b’]}2017-04-09 15:53:28 [scrapy] DEBUG: Received cookies from: <200 http://weibo.cn/repost/EDxAwrBrG?rl=0&uid=2626948743>Set-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cn2017-04-09 15:53:28 [scrapy] DEBUG: Crawled (200) <GET http://weibo.cn/repost/EDxAwrBrG?rl=0&uid=2626948743> (referer: http://weibo.cn/dafendi)2017-04-09 15:53:28 [scrapy] DEBUG: Scraped from <200 http://weibo.cn/repost/EDxAwrBrG?rl=0&uid=2626948743>{’Content’: [u’u7279u522bu7684u751fu65e5u793cu7269u3002 u200bu200bu200b’]}2017-04-09 15:53:32 [scrapy] DEBUG: Received cookies from: <200 http://weibo.cn/repost/EDBmajRBl?rl=0&uid=2626948743>Set-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cn2017-04-09 15:53:32 [scrapy] DEBUG: Crawled (200) <GET http://weibo.cn/repost/EDBmajRBl?rl=0&uid=2626948743> (referer: http://weibo.cn/dafendi)2017-04-09 15:53:32 [scrapy] DEBUG: Scraped from <200 http://weibo.cn/repost/EDBmajRBl?rl=0&uid=2626948743>{’Content’: [u’u7231u7b11u7684u5973u5b69u5b50uff0cu8fd0u6c14u4e00u5b9au4e0du4f1au592au597du2026u2026’, u’ u200bu200bu200b’]}2017-04-09 15:53:36 [scrapy] DEBUG: Received cookies from: <200 http://weibo.cn/repost/CsN9LnQiG?rl=0&uid=2626948743>Set-Cookie: WEIBOCN_FROM=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.weibo.cn2017-04-09 15:53:36 [scrapy] DEBUG: Crawled (200) <GET http://weibo.cn/repost/CsN9LnQiG?rl=0&uid=2626948743> (referer: http://weibo.cn/dafendi)2017-04-09 15:53:36 [scrapy] DEBUG: Scraped from <200 http://weibo.cn/repost/CsN9LnQiG?rl=0&uid=2626948743>{’Content’: [u’:u4e00u4e2au957fu5faeu535au5408u96c6uff0cu5927u5bb6u65e0u804au53c8u6ca1u770bu8fc7u7684u8bddu53efu4ee5u770bu770b[u7f9eu55d2u55d2] u200bu200bu200b’]}2017-04-09 15:53:36 [scrapy] INFO: Closing spider (finished)2017-04-09 15:53:36 [scrapy] INFO: Stored json feed (5 items) in: wanghongmingdan.json2017-04-09 15:53:36 [scrapy] INFO: Dumping Scrapy stats:{’downloader/request_bytes’: 3029, ’downloader/request_count’: 7, ’downloader/request_method_count/GET’: 6, ’downloader/request_method_count/POST’: 1, ’downloader/response_bytes’: 22746, ’downloader/response_count’: 7, ’downloader/response_status_count/200’: 6, ’downloader/response_status_count/302’: 1, ’finish_reason’: ’finished’, ’finish_time’: datetime.datetime(2017, 4, 9, 7, 53, 36, 596076), ’item_scraped_count’: 5, ’log_count/DEBUG’: 27, ’log_count/INFO’: 8, ’log_count/WARNING’: 2, ’request_depth_max’: 3, ’response_received_count’: 7, ’scheduler/dequeued’: 7, ’scheduler/dequeued/memory’: 7, ’scheduler/enqueued’: 7, ’scheduler/enqueued/memory’: 7, ’start_time’: datetime.datetime(2017, 4, 9, 7, 53, 2, 180831)}2017-04-09 15:53:36 [scrapy] INFO: Spider closed (finished)2017-04-09 20:11:50 [scrapy] DEBUG: Redirecting (302) to <GET http://weibo.cn/crossDomain/?g=4uegcdef1d93rkj4S3ZomfXlgec&t=1491739909&m=9144&r=&u=http%3A%2F%2Fweibo.cn%3Fgsid%3D4uegcdef1d93rkj4S3ZomfXlgec%26PHPSESSID%3D%26vt%3D4&cross=1&st=ST-MzgwMzAzNDg4MA==-1491739909-tc-27ED8C8D7528C9185E75F7986B8050B7-1,ST-MzgwMzAzNDg4MA==-1491739909-tc-BED83CC16AC311D2BBA234E8F08BBD39-1> from <POST https://login.weibo.cn/login/?rand=842328789&backURL=http%3A%2F%2Fweibo.cn&backTitle=%E6%89%8B%E6%9C%BA%E6%96%B0%E6%B5%AA%E7%BD%91&vt=4>2017-04-09 20:11:50 [scrapy] DEBUG: Redirecting (meta refresh) to <GET http://weibo.cn/> from <GET http://weibo.cn/crossDomain/?g=4uegcdef1d93rkj4S3ZomfXlgec&t=1491739909&m=9144&r=&u=http%3A%2F%2Fweibo.cn%3Fgsid%3D4uegcdef1d93rkj4S3ZomfXlgec%26PHPSESSID%3D%26vt%3D4&cross=1&st=ST-MzgwMzAzNDg4MA==-1491739909-tc-27ED8C8D7528C9185E75F7986B8050B7-1,ST-MzgwMzAzNDg4MA==-1491739909-tc-BED83CC16AC311D2BBA234E8F08BBD39-1>

問題解答

回答1:

建議你在做模擬登陸的時候,打開抓包軟件抓包,進(jìn)行調(diào)試,這樣你才能知道通過程序請求目標(biāo)服務(wù)器返回的內(nèi)容和你手動請求服務(wù)器返回的內(nèi)容是否有差異。對于微博數(shù)據(jù)采集我也有一定的經(jīng)驗(yàn),我剛看了你的代碼,發(fā)現(xiàn)和我以前寫的模擬登陸微博有一定的差異,這是我的代碼,我剛檢查了還能用。我又去對比了一下我兩代碼的差異,發(fā)現(xiàn)你雖然是抓的wap版微博,但是你的UA用的是PC端的UA,所以會彈出驗(yàn)證碼,提交的參數(shù)也不相同。你的代碼出錯應(yīng)該是有一步跳轉(zhuǎn)需要手動訪問,你沒有進(jìn)行訪問,這個你可以抓包看看。感覺現(xiàn)在微博wap端的反爬也開始重視起來了啊。如果想更好的理解模擬登陸微博,可以看看我的這篇文章。截至現(xiàn)在,該方法都可用

標(biāo)簽: 微博 Python
相關(guān)文章:
日本不卡不码高清免费观看,久久国产精品久久w女人spa,黄色aa久久,三上悠亚国产精品一区二区三区
视频精品一区| 国产999精品在线观看| 国产精品99一区二区三| 777久久精品| 日本不卡不码高清免费观看| 亚洲精品乱码| 日韩精品视频网站| 麻豆久久一区二区| 日韩电影二区| 91成人网在线观看| 99国产精品久久久久久久成人热| 视频在线观看一区二区三区| 日韩黄色在线观看| 国产一区二区三区四区大秀| 日韩久久精品网| 国产日韩专区| 日韩av三区| 日韩在线短视频| 一区二区精品| 麻豆一区二区三| 亚洲经典在线| 日韩av成人高清| 国产一区精品福利| 99久久精品费精品国产| 久久电影一区| 亚洲精品一二| 超碰99在线| 四虎精品永久免费| 日韩在线欧美| 天使萌一区二区三区免费观看| 久久精品xxxxx| 99精品综合| 久久精品国产福利| 男女性色大片免费观看一区二区 | 99亚洲视频| 久久的色偷偷| 日韩在线播放一区二区| 午夜亚洲福利| 欧美久久精品一级c片| 国产日韩欧美高清免费| 999国产精品永久免费视频app| 日本成人在线一区| 国产免费成人| 国产国产精品| 日韩成人综合| 精品视频97| 欧美亚洲三级| 视频在线观看一区| 欧美成人精品| 精品国模一区二区三区| 午夜久久av| 日韩精品一级中文字幕精品视频免费观看| 国产日韩欧美一区二区三区| 亚洲一级淫片| 亚洲2区在线| 亚洲综合图色| 亚洲理论在线| 亚洲资源在线| 88久久精品| 国产精品美女在线观看直播| 国产欧美在线观看免费| 国产精品亚洲二区| 国产伦理一区| 欧美成a人片免费观看久久五月天| 日韩区一区二| 国产精品亚洲产品| 国产一区二区三区四区二区| 国产精品九九| 国产精品不卡| 国产伊人精品| 一区三区视频| 亚洲一区二区三区四区电影| 蜜桃传媒麻豆第一区在线观看| 亚洲另类av| 欧美激情久久久久久久久久久| 最新中文字幕在线播放| 久久精品动漫| 亚洲狼人精品一区二区三区| 国产精品99精品一区二区三区∴| 美女视频免费精品| 欧美午夜精品一区二区三区电影| 夜夜嗨一区二区三区| 综合激情一区| 国产不卡精品在线| 国产女优一区| 国精品产品一区| 男人的天堂久久精品| 欧美激情麻豆| 国产韩日影视精品| 国产精品日本一区二区三区在线 | 91精品在线免费视频| 成人午夜在线| 国产高清一区| 青草国产精品| 99精品综合| 国产精品.xx视频.xxtv| 不卡在线一区二区| 卡一卡二国产精品| 免费不卡在线视频| 成人在线黄色| 日韩欧美中文在线观看| 日韩欧美另类一区二区| 日本少妇一区二区| 久久性天堂网| 欧美aa国产视频| 国产成人精品亚洲线观看| 日韩三级一区| 黄色亚洲在线| 亚洲91视频| 四虎国产精品免费观看| 亚洲精品日本| 日韩中文字幕不卡| 免费视频亚洲| 久久久久国产一区二区| 97精品国产| 久久99久久人婷婷精品综合| 亚洲精品少妇| 免播放器亚洲| 黄色成人精品网站| 午夜精品福利影院| 免费黄网站欧美| 亚洲不卡视频| 国产乱人伦丫前精品视频| 2023国产精品久久久精品双| 亚洲精品动态| 亚洲久久一区| 国产精品成人自拍| 欧美不卡高清| 欧美日韩a区| 91九色综合| 国产区精品区| 久久av免费| 国产精品伦理久久久久久| 国精品产品一区| 成人片免费看| 欧美日韩在线观看视频小说| 久久精品xxxxx| 亚洲第一区色| 中文字幕人成乱码在线观看| 日韩专区精品| 亚洲一区二区三区四区五区午夜| 丝袜美腿亚洲一区| 国产欧美日韩影院| 97精品国产| 91精品精品| 免费不卡在线观看| 国产精品调教| 热三久草你在线| 日韩亚洲精品在线| 日本不卡视频一二三区| 麻豆久久一区| 夜夜嗨一区二区| 老牛国内精品亚洲成av人片 | 久久成人福利| 999久久久免费精品国产| 免费成人在线影院| 国产不卡精品| 首页国产欧美日韩丝袜| 欧美激情在线精品一区二区三区| 99久久www免费| 国产精品资源| 99国产精品久久久久久久| 国产精品nxnn| 91久久中文| 91日韩免费| 久久精品72免费观看| jizzjizz中国精品麻豆| 日本伊人久久| 91成人精品视频| 日韩av专区| 欧美午夜网站| 免费美女久久99| 免费黄色成人| 日韩成人高清| 色综合五月天| 国产精品s色| 日韩黄色免费网站| 男女性色大片免费观看一区二区 | 亚洲精品第一| 亚洲午夜天堂| 精品国产麻豆| 国产极品一区| 国产精品亚洲二区| 国产精品一区亚洲| 日本电影久久久| 无码日韩精品一区二区免费| 黄色亚洲大片免费在线观看| 成人久久一区| 日韩欧美一区二区三区在线观看| 国产极品久久久久久久久波多结野| 一区二区国产精品| 99视频在线精品国自产拍免费观看| 亚洲a一区二区三区| 99久久夜色精品国产亚洲1000部| 欧美国产不卡| 精品国产一级| 亚洲精品88| 激情自拍一区| 亚洲最新av| 欧美日韩一视频区二区|