top of page
Search

 Crawling

  • maxwellapex
  • Oct 3
  • 1 min read
ree

One of my friends is doing crawling, and there are much to say. First of all, for each website, the anti-crawl (including robot.txt) are different, so you need to find a way to pass that. Second, what is the frequency? It will be easy if you ran a .bat and manually crawl it, but much difficult for continuous doing it. Another question is that after your successfully crawl it, where do you store it? As mentioned in the previous article, storing space online is expensive. There are still a lot of things to say, including using NLP for those data, and I’ll discuss about it later.

 
 
 

Comments


bottom of page