The TF-IDF approach considers information about the occurrences of tokens in all documents of a text corpus:判断题
A
True
B
False
登录即可查看完整答案
我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。
类似问题
我们在语料库中有 10 个文档,即 d1, d2, ..., d10。 你将算出下列单词和文档中的 IDF 和 TF-IDF(词频-逆文档频率)值。 单词“机器”在 d1 中出现 10 次,出现在 5 个文档中。 单词“学习”在 d2 中出现 8 次,出现在 2 个文档中。 We have 10 documents in the corpus, d1, d2, ..., d10. You will calculate the IDF and TF-IDF (Term Frequency-Inverse Document Frequency) values for the following words and documents. Word "machine" appears 10 times in d1 and appears in 5 documents. Word "learning" appears 8 times in d2 and appears in 2 documents. 1. “机器”的 IDF(逆文档频率)值是多少? What is the IDF (Inverse Document Frequency) value of "machine"? idfj = [選擇] 0 1 2 4 5 2. “学习”的 TF-IDF(词频-逆文档频率)值是多少? What is the TF-IDF (Term Frequency-Inverse Document Frequency) value of "learning"? tf-idf("learning", d2) = [選擇] log2(5) 10*log2(5) 8*log2(5) 2*log2(5)
When analyzing articles, the tf-idf-tf_idf framework is used to:
By vectorizing text using TF-IDF approach we lose some information contained in the raw document:
The term frequency - inverse document frequency (TF-IDF) approach to text vectorization is based on the bag-of-words representation:
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!