jsoup抓取个别gzip网站报EOFException的错误的解决
发布时间:2016-10-13 作者: 点击:1217
今天发现jsoup连接某站点报如下错误:
java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:264) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:254) at java.util.zip.GZIPInputStream.readUInt(GZIPInputStream.java:246) at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:218) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.jsoup.helper.DataUtil.readToByteBuffer(DataUtil.java:154) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:567) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
自己使用jsoup的连接代码如下:
Connection conn=Jsoup.connect(url); conn.userAgent("Mozilla/3.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/527.36 (KHTML, like Gecko) Chrome/28.0.1600.95 Safari/527.36"); conn.timeout(5000); conn.referrer(referer); doc=conn.get();
为了保险,修改为如下代码:
Connection conn=Jsoup.connect(url); conn.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36"); conn.timeout(5000); conn.referrer(referer); try { doc=conn.get(); } catch(EOFException e) { //gzip出错,尝试换个方法 doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url); }
自己根据目标网站编码要求调整UTF-8为需要的编码即可。
更多关于 jsoup,gzip 的信息
暂无相关信息
本站部分文章转载于网上,版权归原作者所有。如果侵犯您的权益,请Email和本站联系!