📌 About the Project
We are a company specializing in the analysis and negotiation of court-ordered payments. We are looking for a developer to build a robust crawler capable of accessing court websites (e.g., TJSP, TRF3, and others), automating the collection of case files, downloading documents (especially PDFs), and extracting relevant data in a structured format.
🎯 Objective
Create a tool that:
- Performs automated queries by case number, CPF/CNPJ, or case type;
- Downloads available case documents (petitions, rulings, judgments);
- Extracts relevant information from these documents (parties, subject matter, values, procedural stages);
- Organizes data in JSON format or saves it to a relational or NoSQL database;
- Is scalable and maintainable, with error handling and basic logging.
🧠 Technical Requirements
- Proficiency in Python
- Experience with web scraping using requests, BeautifulSoup, Selenium, Scrapy, or Playwright
- Knowledge of data extraction from PDFs (using libraries such as PyMuPDF, pdfplumber, or Apache Tika)
- Familiarity with CAPTCHA handling and authentication in public systems
- Experience with databases (PostgreSQL, MySQL, MongoDB, or similar)
- Basic knowledge of legal case systems (e-SAJ, PJe, Projudi, etc.) will be considered an asset
🎁 Preferred Qualifications
- Previous experience with legal crawlers or projects in legaltechs
- Knowledge of the structure of petitions and case movements
- Experience with IP/header management to avoid blocking during scraping
🛠️ Format and Conditions
- One-time project with potential for continuation
- Payment via PJ (value to be negotiated based on experience and scope)
- Remote work
- Deliverables organized in stages with progressive validation