What are the common strategies of reducing the dimensionality of the bag-of-words representation? Choose one best answer:单项选择题

A

Removing tokens present in almost all documents

B

Diligent pre-processing: removal of stopwords and doing stemming or lemmatization

C

Removing tokens with very low occurrence

D

All of the above

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

In the preprocessing of text data for natural language processing, what transformations are commonly applied? (Select all that apply)

In text data cleaning, which are the steps can be applied? (Select all that apply)

Which of the following is typically NOT part of standard NLP text preprocessing? 

Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders. Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities.  Home Credit can provide three types of loans: Cash loans are one-time loans for any purpose Consumer loans will be for a specific item such as a refrigerator, washing machine or car. Revolving loans allow a client to borrow up to a limit, repay the loan and then borrow again. The company would like to improve their ability to select clients who will successfully repay loans, so that additional money can be loaned to future borrowers.    Here are the variables available for the analysis:  Variable Description Application_id ID of the loan application Loan_type Cash, Consumer or Revolving (see description above) Loan_term_months Number of months until loan maturity (pay off due date) Education_level Highest education level (None, High School, 2-Year College, etc.) Own_car_flag Client owns a car (true/false) Own_home_flag Client owns a home (true/false) Months_in_current_residence Number of months residing at current apartment or home Monthly_income_amount Average total monthly income for the household, including tips and informal payments (ex. Venmo) Total_consumer_debt Total debt for the household, including home, car and credit card debt Credit_bureau_score Credit rating score (FICO) Cash_savings_total Total amount of money available as cash  Cell_phone_payments_last_12_months Number of completed cell phone payments in the last year Profession Description of the employment of the primary borrower Loan_amount Amount of the requested loan Default Final outcome of the loan account ('true' indicates that the account was not paid in full by the end of the loan term) Underwriter_notes Text notes from interviews with the prospective client Loan_purpose Description of the reason for the loan    How could the underwriter_notes variable be transformed so it can be used as a predictor in a supervised model, given that it contains free-text notes from loan officers?  

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!