下面是一段文档的向量化的程序,且未经停用词过滤 from...
下面是一段文档的向量化的程序,且未经停用词过滤 from sklearn.feature_extraction.text import CountVectorizer corpus = [ 'Jobs was the chairman of Apple Inc., and he was very famous', 'I like to use apple computer', 'And I also like to eat apple' ] vectorizer =CountVectorizer() print(vectorizer.vocabulary_) print(vectorizer.fit_transform(corpus).todense()) #转化为完整特征矩阵 已知print(vectorizer.vocabulary_)的输出结果为: {u'and': 1, u'jobs': 9, u'apple': 2, u'very': 15, u'famous': 6, u'computer': 4, u'eat': 5, u'he': 7, u'use': 14, u'like': 10, u'to': 13, u'of': 11, u'also': 0, u'chairman': 3, u'the': 12, u'inc': 8, u'was': 16}. 则最后一条print语句中文档D1,即'Jobs was the chairman of Apple Inc., and he was very famous'的向量为
A、[0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 1 2]
B、[0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0]
C、[1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0]
D、其它答案都不对