% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils.R
\name{.is_chinese_char_cp}
\alias{.is_chinese_char_cp}
\title{Check whether cp is the codepoint of a CJK character.}
\usage{
.is_chinese_char_cp(cp)
}
\arguments{
\item{cp}{A unicode codepoint, as an integer.}
}
\value{
Logical TRUE if cp is codepoint of a CJK character.
}
\description{
R implementation of BasicTokenizer.\emph{is_chinese_char from
BERT: tokenization.py. From that file:
This defines a "chinese character" as anything in the CJK Unicode block:
https://en.wikipedia.org/wiki/CJK_Unified_Ideographs}(Unicode_block)
}
\details{
Note that the CJK Unicode block is NOT all Japanese and Korean characters,
despite its name. The modern Korean Hangul alphabet is a different block,
as is Japanese Hiragana and Katakana. Those alphabets are used to write
space-separated words, so they are not treated specially and are handled
like the alphabets of the other languages.
}
\keyword{internal}
