This is the website I’m trying to scrape from.
I’m able to login to the website fairly easily. However, I’m unable to retrieve and reuse the cookies or session ID to scrape a page other than the one the login page redirects to. I receive a 403 everytime.
Here is an example of what I’ve tried:
try { String userAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0"; Connection.Response res = Jsoup.connect("http://www.interpals.net/login.php") .data("action", "login") .data("login", username) .data("password", password) .data("auto_login", "1") .userAgent(userAgent) .method(Connection.Method.POST) .followRedirects(false) .execute(); res.parse(); String sessionID = res.cookie("interpals_sessid"); Document doc = Jsoup.connect("http://www.interpals.net/friends.php").cookie("interpals_sessid", sessionID).get();
Advertisement
Answer
This code works for me:
try { String url = "http://www.interpals.net/login.php"; String userAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36"; Connection.Response response = Jsoup.connect(url).userAgent(userAgent) .method(Connection.Method.GET) .execute(); response = Jsoup.connect(url) .cookies(response.cookies()) .data("action", "login") .data("login", "login") .data("password", "password") .data("auto_login", "1") .userAgent(userAgent) .method(Connection.Method.POST) .followRedirects(true) .execute(); Document doc = Jsoup.connect("http://www.interpals.net/friends.php") .cookies(response.cookies()) .userAgent(userAgent) .get(); System.out.println(doc); } catch (IOException e) { e.printStackTrace(); }