[ Pobierz całość w formacie PDF ]
.HTTPConnection.debuglevel = 1>>> request = urllib2.Request(.'http://diveintomark.org/redir/example301.xml')>>> opener = urllib2.build_opener()>>> f = opener.open(request)connect: (diveintomark.org, 80)send: 'GET /redir/example301.xml HTTP/1.0Host: diveintomark.orgUser-agent: Python-urllib/2.1'reply: 'HTTP/1.1 301 Moved Permanently\r\n'header: Date: Thu, 15 Apr 2004 22:06:25 GMTheader: Server: Apache/2.49 (Debian GNU/Linux)header: Location: http://diveintomark.org/xml/atom.xmlheader: Content-Length: 338header: Connection: closeheader: Content-Type: text/html; charset=iso-8859-1connect: (diveintomark.org, 80)send: 'GET /xml/atom.xml HTTP/1.0Host: diveintomark.orgUser-agent: Python-urllib/2.1'reply: 'HTTP/1.1 200 OK\r\n'header: Date: Thu, 15 Apr 2004 22:06:25 GMTheader: Server: Apache/2.49 (Debian GNU/Linux)header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMTheader: ETag: "e842a-3e53-55d97640"header: Accept-Ranges: bytesheader: Content-Length: 15955header: Connection: closeDive Into Python 159 header: Content-Type: application/atom+xml>>> f.url'http://diveintomark.org/xml/atom.xml'>>> f.headers.dict{'content-length': '15955','accept-ranges': 'bytes','server': 'Apache/2.49 (Debian GNU/Linux)','last-modified': 'Thu, 15 Apr 2004 19:45:21 GMT','connection': 'close','etag': '"e842a-3e53-55d97640"','date': 'Thu, 15 Apr 2004 22:06:25 GMT','content-type': 'application/atom+xml'}>>> f.statusTraceback (most recent call last):File "", line 1, in ?AttributeError: addinfourl instance has no attribute 'status'You'll be better able to see what's happening if you turn on debugging.This is a URL which I have set up to permanently redirect to my Atom feed athttp://diveintomark.org/xml/atom.xml.Sure enough, when you try to download the data at that address, the server sends back a 301 status code, tellingyou that the resource has moved permanently.The server also sends back a Location: header that gives the new address of this data.urllib2 notices the redirect status code and automatically tries to retrieve the data at the new locationspecified in the Location: header.The object you get back from the opener contains the new permanent address and all the headers returnedfrom the second request (retrieved from the new permanent address).But the status code is missing, so youhave no way of knowing programmatically whether this redirect was temporary or permanent.And that mattersvery much: if it was a temporary redirect, then you should continue to ask for the data at the old location.But ifit was a permanent redirect (as this was), you should ask for the data at the new location from now on.This is suboptimal, but easy to fix.urllib2 doesn't behave exactly as you want it to when it encounters a 301 or302, so let's override its behavior.How? With a custom URL handler, just like you did to handle 304 codes.Example 11.11.Defining the redirect handlerThis class is defined in openanything.py.class SmartRedirectHandler(urllib2.HTTPRedirectHandler):def http_error_301(self, req, fp, code, msg, headers):result = urllib2.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)result.status = codereturn resultdef http_error_302(self, req, fp, code, msg, headers):result = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)result.status = codereturn resultRedirect behavior is defined in urllib2 in a class called HTTPRedirectHandler.Youdon't want to completely override the behavior, you just want to extend it a little, so you'llsubclass HTTPRedirectHandler so you can call the ancestor class to do all the hard work.Dive Into Python 160 When it encounters a 301 status code from the server, urllib2 will search through its handlersand call the http_error_301 method.The first thing ours does is just call thehttp_error_301 method in the ancestor, which handles the grunt work of looking for theLocation: header and following the redirect to the new address.Here's the key: before you return, you store the status code (301), so that the calling program canaccess it later.Temporary redirects (status code 302) work the same way: override the http_error_302method, call the ancestor, and save the status code before returning.So what has this bought us? You can now build a URL opener with the custom redirect handler, and it will stillautomatically follow redirects, but now it will also expose the redirect status code.Example 11.12.Using the redirect handler to detect permanent redirects>>> request = urllib2.Request('http://diveintomark.org/redir/example301.xml')>>> import openanything, httplib>>> httplib.HTTPConnection.debuglevel = 1>>> opener = urllib2.build_opener(.openanything.SmartRedirectHandler())>>> f = opener.open(request)connect: (diveintomark.org, 80)send: 'GET /redir/example301.xml HTTP/1.0Host: diveintomark.orgUser-agent: Python-urllib/2.1'reply: 'HTTP/1.1 301 Moved Permanently\r\n'header: Date: Thu, 15 Apr 2004 22:13:21 GMTheader: Server: Apache/2.49 (Debian GNU/Linux)header: Location: http://diveintomark.org/xml/atom.xmlheader: Content-Length: 338header: Connection: closeheader: Content-Type: text/html; charset=iso-8859-1connect: (diveintomark.org, 80)send: 'GET /xml/atom.xml HTTP/1.0Host: diveintomark.orgUser-agent: Python-urllib/2.1'reply: 'HTTP/1.1 200 OK\r\n'header: Date: Thu, 15 Apr 2004 22:13:21 GMTheader: Server: Apache/2.49 (Debian GNU/Linux)header: Last-Modified: Thu, 15 Apr 2004 19:45:21 GMTheader: ETag: "e842a-3e53-55d97640"header: Accept-Ranges: bytesheader: Content-Length: 15955header: Connection: closeheader: Content-Type: application/atom+xml>>> f.status301>>> f.url'http://diveintomark.org/xml/atom.xml'First, build a URL opener with the redirect handler you just defined.You sent off a request, and you got a 301 status code in response.At this point, the http_error_301method gets called.You call the ancestor method, which follows the redirect and sends a request at the newlocation (http://diveintomark.org/xml/atom.xml).Dive Into Python 161 This is the payoff: now, not only do you have access to the new URL, but you have access to the redirect statuscode, so you can tell that this was a permanent redirect.The next time you request this data, you should requestit from the new location (http://diveintomark.org/xml/atom.xml, as specified in f.url).If youhad stored the location in a configuration file or a database, you need to update that so you don't keep poundingthe server with requests at the old address.It's time to update your address book.The same redirect handler can also tell you that you shouldn't update your address book.Example 11.13.Using the redirect handler to detect temporary redirects>>> request = urllib2.Request(.'http://diveintomark.org/redir/example302.xml')>>> f = opener.open(request)connect: (diveintomark.org, 80)send: 'GET /redir/example302.xml HTTP/1.0Host: diveintomark [ Pobierz całość w formacie PDF ]
  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • centka.pev.pl
  •