Skip to content

tanpenggood/xiaohongshu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕷️xiaohongshu

Warning

The primary purpose of this repository is to learn. It is important to note that web crawling may be considered illegal, and therefore, it is crucial to refrain from exerting any pressure or engaging in unauthorized activities on websites.

Introduction

tanpenggood/xiaohongshu is a crawling application designed to extract data from xiaohongshu page.

crawling data range: only parsed data in window.__INITIAL_STATE__ of xiaohongshu page.

Development Environment

  • windows 11
  • jdk 1.8
  • maven 3.6.0

Useage

Use UI

Run com.itplh.xhs.XhsCrawlabUI

See:

home.png

Use API

  • crawl notes

    reference test class: com.itplh.xhs.XhsCrawlabTest

    @Test
    public void crawlHome() {
    // url带参数时,请求头可不携带cookie
    String url = "https://www.xiaohongshu.com/user/profile/64a91898000000001001e673?xhsshare=CopyLink&appuid=62064cd3000000001000acd1&apptime=1690553952";
    UserInfo userInfo1 = XhsCrawlab.getInstance().crawlHome(url);
    Assert.assertNotNull(userInfo1);
    Assert.assertNotNull(userInfo1.getRedId());
    // url不带参数时,请求头需要携带cookie
    String url2 = "https://www.xiaohongshu.com/user/profile/64a91898000000001001e673";
    UserInfo userInfo2 = XhsCrawlab.getInstance().crawlHome(url2);
    Assert.assertNotNull(userInfo2);
    Assert.assertNotNull(userInfo2.getRedId());
    }

  • save note to excel

    com.itplh.xhs.excel.ExcelGenerator.writeNotes2Excel(UserInfo userInfo)

Project Structure

xiaohongshu
├── src/main
│   ├── java/com.itplh.xhs       
│   │   ├── constant
│   │   ├── domain
│   │   ├── excel            # generate excel, use easyexcel
│   │   ├── parse            # parse json data (parse window.__INITIAL_STATE__)
│   │   ├── ui               # build ui, use javafx    
│   │   ├── util               
│   │   ├── XhsCrawlab       # core api   
│   │   └── XhsCrawlabUI     # ui
│   └── resources
│       ├── desktop          # response data of desktop access xiaohongshu
│       ├── mobile           # response data of mobile access xiaohongshu
│       └── logback.xml      # log config
├── src/test/java            # unit test
├── pom.xml
└── README.md

Technology Stack

Build

mvn clean package -Dmaven.test.skip=true

Download exe

https://github.com/tanpenggood/xiaohongshu/releases

About

简单的抓小红书笔记,仅用于学习。

Resources

License

Stars

Watchers

Forks

Packages

No packages published