Which of the following is typically NOT part of standard NLP text preprocessing? 单项选择题
A
Removing stop words
B
Tokenization
C
Sorting words alphabetically
D
Lower-casing
登录即可查看完整答案
我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。
类似问题
In the preprocessing of text data for natural language processing, what transformations are commonly applied? (Select all that apply)
In text data cleaning, which are the steps can be applied? (Select all that apply)
Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders. Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities. Home Credit can provide three types of loans: Cash loans are one-time loans for any purpose Consumer loans will be for a specific item such as a refrigerator, washing machine or car. Revolving loans allow a client to borrow up to a limit, repay the loan and then borrow again. The company would like to improve their ability to select clients who will successfully repay loans, so that additional money can be loaned to future borrowers. Here are the variables available for the analysis: Variable Description Application_id ID of the loan application Loan_type Cash, Consumer or Revolving (see description above) Loan_term_months Number of months until loan maturity (pay off due date) Education_level Highest education level (None, High School, 2-Year College, etc.) Own_car_flag Client owns a car (true/false) Own_home_flag Client owns a home (true/false) Months_in_current_residence Number of months residing at current apartment or home Monthly_income_amount Average total monthly income for the household, including tips and informal payments (ex. Venmo) Total_consumer_debt Total debt for the household, including home, car and credit card debt Credit_bureau_score Credit rating score (FICO) Cash_savings_total Total amount of money available as cash Cell_phone_payments_last_12_months Number of completed cell phone payments in the last year Profession Description of the employment of the primary borrower Loan_amount Amount of the requested loan Default Final outcome of the loan account ('true' indicates that the account was not paid in full by the end of the loan term) Underwriter_notes Text notes from interviews with the prospective client Loan_purpose Description of the reason for the loan How could the underwriter_notes variable be transformed so it can be used as a predictor in a supervised model, given that it contains free-text notes from loan officers?
You are working for a storage company named StoreItAll, with a national footprint of storage units. They are interested in learning more about their customer feedback using text analytics. StoreItAll has collected thousands of customer reviews from their website and wants to convert this unstructured text into something that can be analyzed. Which of the following is the most appropriate first step in the text mining process?
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!