Porter stem token filter

Porter stem token filter

Provides algorithmic stemming for the English language, based on the Porter stemming algorithm.

This filter tends to stem more aggressively than other English stemmer filters, such as the kstem filter.

The porter_stem filter is equivalent to the stemmer filter’s english variant.

The porter_stem filter uses Lucene’s PorterStemFilter.

Example

The following analyze API request uses the porter_stem filter to stem the foxes jumping quickly to the fox jump quickli:

  1. resp = client.indices.analyze(
  2. tokenizer="standard",
  3. filter=[
  4. "porter_stem"
  5. ],
  6. text="the foxes jumping quickly",
  7. )
  8. print(resp)
  1. response = client.indices.analyze(
  2. body: {
  3. tokenizer: 'standard',
  4. filter: [
  5. 'porter_stem'
  6. ],
  7. text: 'the foxes jumping quickly'
  8. }
  9. )
  10. puts response
  1. const response = await client.indices.analyze({
  2. tokenizer: "standard",
  3. filter: ["porter_stem"],
  4. text: "the foxes jumping quickly",
  5. });
  6. console.log(response);
  1. GET /_analyze
  2. {
  3. "tokenizer": "standard",
  4. "filter": [ "porter_stem" ],
  5. "text": "the foxes jumping quickly"
  6. }

The filter produces the following tokens:

  1. [ the, fox, jump, quickli ]

Add to an analyzer

The following create index API request uses the porter_stem filter to configure a new custom analyzer.

To work properly, the porter_stem filter requires lowercase tokens. To ensure tokens are lowercased, add the lowercase filter before the porter_stem filter in the analyzer configuration.

  1. resp = client.indices.create(
  2. index="my-index-000001",
  3. settings={
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "whitespace",
  8. "filter": [
  9. "lowercase",
  10. "porter_stem"
  11. ]
  12. }
  13. }
  14. }
  15. },
  16. )
  17. print(resp)
  1. response = client.indices.create(
  2. index: 'my-index-000001',
  3. body: {
  4. settings: {
  5. analysis: {
  6. analyzer: {
  7. my_analyzer: {
  8. tokenizer: 'whitespace',
  9. filter: [
  10. 'lowercase',
  11. 'porter_stem'
  12. ]
  13. }
  14. }
  15. }
  16. }
  17. }
  18. )
  19. puts response
  1. const response = await client.indices.create({
  2. index: "my-index-000001",
  3. settings: {
  4. analysis: {
  5. analyzer: {
  6. my_analyzer: {
  7. tokenizer: "whitespace",
  8. filter: ["lowercase", "porter_stem"],
  9. },
  10. },
  11. },
  12. },
  13. });
  14. console.log(response);
  1. PUT /my-index-000001
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "whitespace",
  8. "filter": [
  9. "lowercase",
  10. "porter_stem"
  11. ]
  12. }
  13. }
  14. }
  15. }
  16. }