Przejdź do głównej zawartości

Class: SentenceSplitter

SentenceSplitter is our default text splitter that supports splitting into sentences, paragraphs, or fixed length chunks with overlap.

One of the advantages of SentenceSplitter is that even in the fixed length chunks it will try to keep sentences together.

Constructors

new SentenceSplitter()

new SentenceSplitter(options?): SentenceSplitter

Parameters

options?

options.chunkOverlap?: number

options.chunkSize?: number

options.chunkingTokenizerFn?

options.paragraphSeparator?: string

options.splitLongSentences?: boolean

options.tokenizer?: any

options.tokenizerDecoder?: any

Returns

SentenceSplitter

Source

packages/core/src/TextSplitter.ts:78

Properties

chunkOverlap

chunkOverlap: number

Source

packages/core/src/TextSplitter.ts:70


chunkSize

chunkSize: number

Source

packages/core/src/TextSplitter.ts:69


chunkingTokenizerFn()

private chunkingTokenizerFn: (text) => string[]

Parameters

text: string

Returns

string[]

Source

packages/core/src/TextSplitter.ts:75


paragraphSeparator

private paragraphSeparator: string

Source

packages/core/src/TextSplitter.ts:74


splitLongSentences

private splitLongSentences: boolean

Source

packages/core/src/TextSplitter.ts:76


tokenizer

private tokenizer: any

Source

packages/core/src/TextSplitter.ts:72


tokenizerDecoder

private tokenizerDecoder: any

Source

packages/core/src/TextSplitter.ts:73

Methods

combineTextSplits()

combineTextSplits(newSentenceSplits, effectiveChunkSize): TextSplit[]

Parameters

newSentenceSplits: SplitRep[]

effectiveChunkSize: number

Returns

TextSplit[]

Source

packages/core/src/TextSplitter.ts:215


getEffectiveChunkSize()

private getEffectiveChunkSize(extraInfoStr?): number

Parameters

extraInfoStr?: string

Returns

number

Source

packages/core/src/TextSplitter.ts:114


getParagraphSplits()

getParagraphSplits(text, effectiveChunkSize?): string[]

Parameters

text: string

effectiveChunkSize?: number

Returns

string[]

Source

packages/core/src/TextSplitter.ts:131


getSentenceSplits()

getSentenceSplits(text, effectiveChunkSize?): string[]

Parameters

text: string

effectiveChunkSize?: number

Returns

string[]

Source

packages/core/src/TextSplitter.ts:157


processSentenceSplits()

private processSentenceSplits(sentenceSplits, effectiveChunkSize): SplitRep[]

Splits sentences into chunks if necessary.

This isn't great behavior because it can split down the middle of a word or in non-English split down the middle of a Unicode codepoint so the splitting is turned off by default. If you need it, please set the splitLongSentences option to true.

Parameters

sentenceSplits: string[]

effectiveChunkSize: number

Returns

SplitRep[]

Source

packages/core/src/TextSplitter.ts:186


splitText()

splitText(text, extraInfoStr?): string[]

Parameters

text: string

extraInfoStr?: string

Returns

string[]

Source

packages/core/src/TextSplitter.ts:309


splitTextWithOverlaps()

splitTextWithOverlaps(text, extraInfoStr?): TextSplit[]

Parameters

text: string

extraInfoStr?: string

Returns

TextSplit[]

Source

packages/core/src/TextSplitter.ts:281