Medium term plan
- New ingest method: User must explicitly opt-in to LLM Guided Ingest, per ingest request.
- New feature of our existing ingest methods: All docs will be processed for Metadata/insights extraction (even for regular docs).
- LLM Providers support - Anthropic, Google, etc.
Frontend - Ruixin
- Ability to create New projects without uploading. Add description field.
- All new projects get an entry in the
projects Supabase table (and Redis). Add description there, too. And create an entry for all the ones we missed.
- Nit: Separate project name from project URL.
- On homepage, users who are are Admins or Users of a different project, that should appear in their “Your Projects” list, under a different heading like “Shared with you”
- Dropdown/switch to choose Academic PDF parsing on /materials page, as part of Ingest component.
- Delete docs → add support for new doc type. TBD (waiting on LLM guided retrieval DB structure)
- [Stretch, push to phase 2] On chat page, show traversal of various documents, and possibly a sneak peak of the documents. (ambitious, just linking them would work too). LangSmith?
Backend - Asmita
- New Insights Table
- Create “insights” during ingest.
- Arbitrary JSON. Problem: same type of info is represented differently.
- Ex: list of strs vs list of dicts for author name.
- Prompt format: “Using this
*project-level description*, generate insights and relevant metadata that might be useful to this project. Respond in JSON. Here’s the doc to evaluate: {document}”
- Connect insights across documents. Like deduplication.
- Cleanup at the end. User can analyze insights, and give specific cleanup instructions, like “always represent authors as a list of strs”
- Intelligent unification and dedup? What if an single “insight” comes from multiple documents or chunks?
- Use insights via SQL generation by the LLM to filter on metadata properties. Or Full-Text search on insights directly in Postgres.
Recommendations for insights:
Show potential insights to the users, after parsing through the documents. They can simply click on the insight and get it created.