Natural Language Processing for Python

Embedding

  • CharEmbedding:
  • PositionEmbedding:
  • WordEmbedding:

Text classification

Available models

All the following models includes Dropout, Pooling and Dense layers with hyperparameters tuned for reasonable performance across standard text classification tasks. If necessary, they are good basis for further performance tuning.

  • text_cnn:
  • text_rnn:
  • attention_rnn:
  • text_rcnn:
  • text_han:

Examples

Choose a pre-trained word embedding by setting the embedding_type and the corresponding embedding dimensions. Set embedding_type=None to initialize the word embeddings randomly (but make sure to set trainable_embeddings=True so you actually train the embeddings).

FastText

Several pre-trained FastText embeddings are included. For now, we only have the word embeddings and not the n-gram features. All embedding have 300 dimensions.

##Dataset

segment

Dataset and Model

Reading Comprehension

Dataset

  • HistoryQA: Joseon History Question Answering Dataset (SQuAD Style)
  • KorQuAD: KorQuAD는 한국어 Machine Reading Comprehension을 위해 만든 데이터셋입니다. 모든 질의에 대한 답변은 해당 Wikipedia 아티클 문단의 일부 하위 영역으로 이루어집니다. Stanford Question Answering Dataset(SQuAD) v1.0과 동일한 방식으로 구성되었습니다.
  • SQuAD: Stanford Question Answering Dataset is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Semantic Parsing

Dataset

Pretrained Vector

English