Medium term plan

  1. New ingest method: User must explicitly opt-in to LLM Guided Ingest, per ingest request.
  2. New feature of our existing ingest methods: All docs will be processed for Metadata/insights extraction (even for regular docs).
  3. LLM Providers support - Anthropic, Google, etc.

Frontend - Ruixin

  1. Ability to create New projects without uploading. Add description field.
    1. All new projects get an entry in the projects Supabase table (and Redis). Add description there, too. And create an entry for all the ones we missed.
    2. Nit: Separate project name from project URL.
    3. On homepage, users who are are Admins or Users of a different project, that should appear in their “Your Projects” list, under a different heading like “Shared with you”
  2. Dropdown/switch to choose Academic PDF parsing on /materials page, as part of Ingest component.
  3. Delete docs → add support for new doc type. TBD (waiting on LLM guided retrieval DB structure)
  4. [Stretch, push to phase 2] On chat page, show traversal of various documents, and possibly a sneak peak of the documents. (ambitious, just linking them would work too). LangSmith?

Backend - Asmita

  1. New Insights Table
    1. Create “insights” during ingest.
    2. Arbitrary JSON. Problem: same type of info is represented differently.
      1. Ex: list of strs vs list of dicts for author name.
    3. Prompt format: “Using this *project-level description*, generate insights and relevant metadata that might be useful to this project. Respond in JSON. Here’s the doc to evaluate: {document}
  2. Connect insights across documents. Like deduplication.
    1. Cleanup at the end. User can analyze insights, and give specific cleanup instructions, like “always represent authors as a list of strs”
    2. Intelligent unification and dedup? What if an single “insight” comes from multiple documents or chunks?
  3. Use insights via SQL generation by the LLM to filter on metadata properties. Or Full-Text search on insights directly in Postgres.

Recommendations for insights:

Show potential insights to the users, after parsing through the documents. They can simply click on the insight and get it created.