문제

"AI로 문서 생성"은 간단히 만들 수 있습니다.

prompt
-> LLM
-> markdown

하지만 OpenCairn에서 만들고 싶은 것은 채팅 답변이 아니라 product artifact입니다.

사용자가 기대하는 것은 다음에 가깝습니다.

workspace 자료를 근거로
-> 특정 목적의 문서를 만들고
-> 출처와 생성 상태를 남기고
-> 파일/노트/워크플로우 결과로 다시 열 수 있게 하기

그래서 OpenCairn의 document generation은 단순 route handler가 아니라 API, RAG source gathering, Temporal worker, artifact callback으로 나뉩니다.

왜 Temporal인가

문서 생성은 짧은 HTTP request로 처리하기 애매합니다.

source retrieval이 오래 걸릴 수 있다.
LLM 호출이 retry될 수 있다.
문서 변환/파일 등록 단계가 실패할 수 있다.
사용자가 페이지를 닫아도 작업은 계속되어야 한다.
실패 후 어느 단계부터 복구할지 알아야 한다.

Temporal workflow는 이런 긴 작업에 맞습니다.

Python worker 쪽 구조를 단순화하면 이렇습니다.

@workflow.defn
class DocumentGenerationWorkflow:
    @workflow.run
    async def run(self, input: DocumentGenerationInput) -> DocumentGenerationResult:
        sources = await workflow.execute_activity(
            gather_document_sources,
            input,
            start_to_close_timeout=timedelta(minutes=2),
        )

        draft = await workflow.execute_activity(
            generate_document,
            GenerateDocumentInput(input=input, sources=sources),
            start_to_close_timeout=timedelta(minutes=5),
        )

        artifact = await workflow.execute_activity(
            register_document_artifact,
            draft,
            start_to_close_timeout=timedelta(minutes=1),
        )

        return DocumentGenerationResult(artifact=artifact)

이 코드는 개념 축약본입니다. 중요한 것은 source gathering, generation, registration이 workflow step으로 분리된다는 점입니다.

source gathering

문서 생성에서 제일 중요한 것은 source입니다.

OpenCairn은 workspace knowledge OS이기 때문에 생성 문서가 workspace 자료와 연결되어야 합니다.

source gathering은 대략 다음 자료를 대상으로 합니다.

notes
imported files
Google Drive documents
chat-selected context
project/page scoped evidence

worker activity는 API에서 제공한 scope와 retrieval policy를 기준으로 source를 모읍니다.

@activity.defn
async def gather_document_sources(input: DocumentGenerationInput) -> list[DocumentSource]:
    readable_sources = await fetch_sources_for_scope(
        workspace_id=input.workspace_id,
        project_id=input.project_id,
        requested_source_ids=input.source_ids,
    )

    return [
        source
        for source in readable_sources
        if source.text and source.freshness != "missing"
    ]

여기서도 핵심은 "많이 넣기"가 아니라 "권한과 freshness를 통과한 source를 넣기"입니다.

internal callback

worker가 생성한 파일을 곧바로 DB에 쓰게 만들 수도 있습니다. 하지만 OpenCairn에서는 API boundary를 유지합니다.

worker는 internal endpoint로 artifact registration을 요청합니다.

worker
-> POST /api/internal/document-generation/agent-files
-> API validates internal auth
-> create agent file
-> attach to workflow run

이 구조는 책임을 나눕니다.

worker
-> long-running generation

API
-> auth, DB write, product object registration

장기적으로 provider가 늘어나거나 hosted/service split이 생겨도 API boundary를 유지할 수 있습니다.

artifact가 중요한 이유

생성된 문서가 chat answer로만 남으면 사용자는 다시 찾기 어렵습니다.

OpenCairn에서는 결과를 artifact로 봅니다.

type AgentFile = {
  id: string;
  workspaceId: string;
  kind: "document";
  title: string;
  mimeType: string;
  sourceWorkflowRunId: string;
};

이렇게 되면 생성 문서는 다음 기능으로 이어질 수 있습니다.

Workflow Console output으로 표시
note에 attach
export
다시 RAG source로 사용
version/history 추적

구현하며 배운 점

문서 생성 기능은 prompt engineering보다 pipeline engineering에 가깝습니다.

OpenCairn에서 중요한 질문은 다음이었습니다.

어떤 source를 근거로 만들었나?
누가 요청했나?
생성 중 어디서 실패했나?
결과물은 제품 안에서 어디에 남나?
다시 열고, 공유하고, export할 수 있나?

그래서 document generation은 chat feature가 아니라 workflow + artifact feature로 설계했습니다.

OpenCairn Document Generation Pipeline: 근거가 있는 문서를 artifact로 만들기

문제

관련 코드

왜 Temporal인가

source gathering

internal callback

artifact가 중요한 이유

구현하며 배운 점

OpenCairn Permission-aware RAG: 권한을 먼저 통과하는 검색

OpenCairn Code Workspace Execution Loop: AI가 만든 코드를 검토하고 실행하는 경계

목차

관련 글

OpenCairn Permission-aware RAG: 권한을 먼저 통과하는 검색

OpenCairn Agentic Workflow Ledger: AI 작업을 기록 가능한 action으로 만들기

OpenCairn Code Workspace Execution Loop: AI가 만든 코드를 검토하고 실행하는 경계