Recipes Dataset

RecipeNLG dataset is available for download here. It contains 2.2 million recipes. The size is slightly less than 1 GB.

Download and Unpack the Dataset

  1. Go to the download page https://recipenlg.cs.put.poznan.pl/dataset.
  2. Accept Terms and Conditions and download zip file.
  3. Unpack the zip file with unzip. You will get the full_dataset.csv file.

Create a Table

Run clickhouse-client and execute the following CREATE query:

  1. CREATE TABLE recipes
  2. (
  3. title String,
  4. ingredients Array(String),
  5. directions Array(String),
  6. link String,
  7. source LowCardinality(String),
  8. NER Array(String)
  9. ) ENGINE = MergeTree ORDER BY title;

Insert the Data

Run the following command:

  1. clickhouse-client --query "
  2. INSERT INTO recipes
  3. SELECT
  4. title,
  5. JSONExtract(ingredients, 'Array(String)'),
  6. JSONExtract(directions, 'Array(String)'),
  7. link,
  8. source,
  9. JSONExtract(NER, 'Array(String)')
  10. FROM input('num UInt32, title String, ingredients String, directions String, link String, source LowCardinality(String), NER String')
  11. FORMAT CSVWithNames
  12. " --input_format_with_names_use_header 0 --format_csv_allow_single_quote 0 --input_format_allow_errors_num 10 < full_dataset.csv

This is a showcase how to parse custom CSV, as it requires multiple tunes.

Explanation:
- The dataset is in CSV format, but it requires some preprocessing on insertion; we use table function input to perform preprocessing;
- The structure of CSV file is specified in the argument of the table function input;
- The field num (row number) is unneeded - we parse it from file and ignore;
- We use FORMAT CSVWithNames but the header in CSV will be ignored (by command line parameter --input_format_with_names_use_header 0), because the header does not contain the name for the first field;
- File is using only double quotes to enclose CSV strings; some strings are not enclosed in double quotes, and single quote must not be parsed as the string enclosing - that’s why we also add the --format_csv_allow_single_quote 0 parameter;
- Some strings from CSV cannot parse, because they contain \M/ sequence at the beginning of the value; the only value starting with backslash in CSV can be \N that is parsed as SQL NULL. We add --input_format_allow_errors_num 10 parameter and up to ten malformed records can be skipped;
- There are arrays for ingredients, directions and NER fields; these arrays are represented in unusual form: they are serialized into string as JSON and then placed in CSV - we parse them as String and then use JSONExtract function to transform it to Array.

Validate the Inserted Data

By checking the row count:

Query:

  1. SELECT count() FROM recipes;

Result:

  1. ┌─count()─┐
  2. 2231141
  3. └─────────┘

Example Queries

Top Components by the Number of Recipes:

In this example we learn how to use arrayJoin function to expand an array into a set of rows.

Query:

  1. SELECT
  2. arrayJoin(NER) AS k,
  3. count() AS c
  4. FROM recipes
  5. GROUP BY k
  6. ORDER BY c DESC
  7. LIMIT 50

Result:

  1. ┌─k────────────────────┬──────c─┐
  2. salt 890741
  3. sugar 620027
  4. butter 493823
  5. flour 466110
  6. eggs 401276
  7. onion 372469
  8. garlic 358364
  9. milk 346769
  10. water 326092
  11. vanilla 270381
  12. olive oil 197877
  13. pepper 179305
  14. brown sugar 174447
  15. tomatoes 163933
  16. egg 160507
  17. baking powder 148277
  18. lemon juice 146414
  19. Salt 122557
  20. cinnamon 117927
  21. sour cream 116682
  22. cream cheese 114423
  23. margarine 112742
  24. celery 112676
  25. baking soda 110690
  26. parsley 102151
  27. chicken 101505
  28. onions 98903
  29. vegetable oil 91395
  30. oil 85600
  31. mayonnaise 84822
  32. pecans 79741
  33. nuts 78471
  34. potatoes 75820
  35. carrots 75458
  36. pineapple 74345
  37. soy sauce 70355
  38. black pepper 69064
  39. thyme 68429
  40. mustard 65948
  41. chicken broth 65112
  42. bacon 64956
  43. honey 64626
  44. oregano 64077
  45. ground beef 64068
  46. unsalted butter 63848
  47. mushrooms 61465
  48. Worcestershire sauce 59328
  49. cornstarch 58476
  50. green pepper 58388
  51. Cheddar cheese 58354
  52. └──────────────────────┴────────┘
  53. 50 rows in set. Elapsed: 0.112 sec. Processed 2.23 million rows, 361.57 MB (19.99 million rows/s., 3.24 GB/s.)

The Most Complex Recipes with Strawberry

  1. SELECT
  2. title,
  3. length(NER),
  4. length(directions)
  5. FROM recipes
  6. WHERE has(NER, 'strawberry')
  7. ORDER BY length(directions) DESC
  8. LIMIT 10

Result:

  1. ┌─title────────────────────────────────────────────────────────────┬─length(NER)─┬─length(directions)─┐
  2. Chocolate-Strawberry-Orange Wedding Cake 24 126
  3. Strawberry Cream Cheese Crumble Tart 19 47
  4. Charlotte-Style Ice Cream 11 45
  5. Sinfully Good a Million Layers Chocolate Layer Cake, With Strawb 31 45
  6. Sweetened Berries With Elderflower Sherbet 24 44
  7. Chocolate-Strawberry Mousse Cake 15 42
  8. Rhubarb Charlotte with Strawberries and Rum 20 42
  9. Chef Joey's Strawberry Vanilla Tart │ 7 │ 37 │
  10. │ Old-Fashioned Ice Cream Sundae Cake │ 17 │ 37 │
  11. │ Watermelon Cake │ 16 │ 36 │
  12. └──────────────────────────────────────────────────────────────────┴─────────────┴────────────────────┘
  13. 10 rows in set. Elapsed: 0.215 sec. Processed 2.23 million rows, 1.48 GB (10.35 million rows/s., 6.86 GB/s.)

In this example, we involve has function to filter by array elements and sort by the number of directions.

There is a wedding cake that requires the whole 126 steps to produce! Show that directions:

Query:

  1. SELECT arrayJoin(directions)
  2. FROM recipes
  3. WHERE title = 'Chocolate-Strawberry-Orange Wedding Cake'

Result:

  1. ┌─arrayJoin(directions)───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  2. Position 1 rack in center and 1 rack in bottom third of oven and preheat to 350F.
  3. Butter one 5-inch-diameter cake pan with 2-inch-high sides, one 8-inch-diameter cake pan with 2-inch-high sides and one 12-inch-diameter cake pan with 2-inch-high sides.
  4. Dust pans with flour; line bottoms with parchment.
  5. Combine 1/3 cup orange juice and 2 ounces unsweetened chocolate in heavy small saucepan.
  6. Stir mixture over medium-low heat until chocolate melts.
  7. Remove from heat.
  8. Gradually mix in 1 2/3 cups orange juice.
  9. Sift 3 cups flour, 2/3 cup cocoa, 2 teaspoons baking soda, 1 teaspoon salt and 1/2 teaspoon baking powder into medium bowl.
  10. using electric mixer, beat 1 cup (2 sticks) butter and 3 cups sugar in large bowl until blended (mixture will look grainy).
  11. Add 4 eggs, 1 at a time, beating to blend after each.
  12. Beat in 1 tablespoon orange peel and 1 tablespoon vanilla extract.
  13. Add dry ingredients alternately with orange juice mixture in 3 additions each, beating well after each addition.
  14. Mix in 1 cup chocolate chips.
  15. Transfer 1 cup plus 2 tablespoons batter to prepared 5-inch pan, 3 cups batter to prepared 8-inch pan and remaining batter (about 6 cups) to 12-inch pan.
  16. Place 5-inch and 8-inch pans on center rack of oven.
  17. Place 12-inch pan on lower rack of oven.
  18. Bake cakes until tester inserted into center comes out clean, about 35 minutes.
  19. Transfer cakes in pans to racks and cool completely.
  20. Mark 4-inch diameter circle on one 6-inch-diameter cardboard cake round.
  21. Cut out marked circle.
  22. Mark 7-inch-diameter circle on one 8-inch-diameter cardboard cake round.
  23. Cut out marked circle.
  24. Mark 11-inch-diameter circle on one 12-inch-diameter cardboard cake round.
  25. Cut out marked circle.
  26. Cut around sides of 5-inch-cake to loosen.
  27. Place 4-inch cardboard over pan.
  28. Hold cardboard and pan together; turn cake out onto cardboard.
  29. Peel off parchment.Wrap cakes on its cardboard in foil.
  30. Repeat turning out, peeling off parchment and wrapping cakes in foil, using 7-inch cardboard for 8-inch cake and 11-inch cardboard for 12-inch cake.
  31. Using remaining ingredients, make 1 more batch of cake batter and bake 3 more cake layers as described above.
  32. Cool cakes in pans.
  33. Cover cakes in pans tightly with foil.
  34. (Can be prepared ahead.
  35. Let stand at room temperature up to 1 day or double-wrap all cake layers and freeze up to 1 week.
  36. Bring cake layers to room temperature before using.)
  37. Place first 12-inch cake on its cardboard on work surface.
  38. Spread 2 3/4 cups ganache over top of cake and all the way to edge.
  39. Spread 2/3 cup jam over ganache, leaving 1/2-inch chocolate border at edge.
  40. Drop 1 3/4 cups white chocolate frosting by spoonfuls over jam.
  41. Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge.
  42. Rub some cocoa powder over second 12-inch cardboard.
  43. Cut around sides of second 12-inch cake to loosen.
  44. Place cardboard, cocoa side down, over pan.
  45. Turn cake out onto cardboard.
  46. Peel off parchment.
  47. Carefully slide cake off cardboard and onto filling on first 12-inch cake.
  48. Refrigerate.
  49. Place first 8-inch cake on its cardboard on work surface.
  50. Spread 1 cup ganache over top all the way to edge.
  51. Spread 1/4 cup jam over, leaving 1/2-inch chocolate border at edge.
  52. Drop 1 cup white chocolate frosting by spoonfuls over jam.
  53. Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge.
  54. Rub some cocoa over second 8-inch cardboard.
  55. Cut around sides of second 8-inch cake to loosen.
  56. Place cardboard, cocoa side down, over pan.
  57. Turn cake out onto cardboard.
  58. Peel off parchment.
  59. Slide cake off cardboard and onto filling on first 8-inch cake.
  60. Refrigerate.
  61. Place first 5-inch cake on its cardboard on work surface.
  62. Spread 1/2 cup ganache over top of cake and all the way to edge.
  63. Spread 2 tablespoons jam over, leaving 1/2-inch chocolate border at edge.
  64. Drop 1/3 cup white chocolate frosting by spoonfuls over jam.
  65. Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge.
  66. Rub cocoa over second 6-inch cardboard.
  67. Cut around sides of second 5-inch cake to loosen.
  68. Place cardboard, cocoa side down, over pan.
  69. Turn cake out onto cardboard.
  70. Peel off parchment.
  71. Slide cake off cardboard and onto filling on first 5-inch cake.
  72. Chill all cakes 1 hour to set filling.
  73. Place 12-inch tiered cake on its cardboard on revolving cake stand.
  74. Spread 2 2/3 cups frosting over top and sides of cake as a first coat.
  75. Refrigerate cake.
  76. Place 8-inch tiered cake on its cardboard on cake stand.
  77. Spread 1 1/4 cups frosting over top and sides of cake as a first coat.
  78. Refrigerate cake.
  79. Place 5-inch tiered cake on its cardboard on cake stand.
  80. Spread 3/4 cup frosting over top and sides of cake as a first coat.
  81. Refrigerate all cakes until first coats of frosting set, about 1 hour.
  82. (Cakes can be made to this point up to 1 day ahead; cover and keep refrigerate.)
  83. Prepare second batch of frosting, using remaining frosting ingredients and following directions for first batch.
  84. Spoon 2 cups frosting into pastry bag fitted with small star tip.
  85. Place 12-inch cake on its cardboard on large flat platter.
  86. Place platter on cake stand.
  87. Using icing spatula, spread 2 1/2 cups frosting over top and sides of cake; smooth top.
  88. Using filled pastry bag, pipe decorative border around top edge of cake.
  89. Refrigerate cake on platter.
  90. Place 8-inch cake on its cardboard on cake stand.
  91. Using icing spatula, spread 1 1/2 cups frosting over top and sides of cake; smooth top.
  92. Using pastry bag, pipe decorative border around top edge of cake.
  93. Refrigerate cake on its cardboard.
  94. Place 5-inch cake on its cardboard on cake stand.
  95. Using icing spatula, spread 3/4 cup frosting over top and sides of cake; smooth top.
  96. Using pastry bag, pipe decorative border around top edge of cake, spooning more frosting into bag if necessary.
  97. Refrigerate cake on its cardboard.
  98. Keep all cakes refrigerated until frosting sets, about 2 hours.
  99. (Can be prepared 2 days ahead.
  100. Cover loosely; keep refrigerated.)
  101. Place 12-inch cake on platter on work surface.
  102. Press 1 wooden dowel straight down into and completely through center of cake.
  103. Mark dowel 1/4 inch above top of frosting.
  104. Remove dowel and cut with serrated knife at marked point.
  105. Cut 4 more dowels to same length.
  106. Press 1 cut dowel back into center of cake.
  107. Press remaining 4 cut dowels into cake, positioning 3 1/2 inches inward from cake edges and spacing evenly.
  108. Place 8-inch cake on its cardboard on work surface.
  109. Press 1 dowel straight down into and completely through center of cake.
  110. Mark dowel 1/4 inch above top of frosting.
  111. Remove dowel and cut with serrated knife at marked point.
  112. Cut 3 more dowels to same length.
  113. Press 1 cut dowel back into center of cake.
  114. Press remaining 3 cut dowels into cake, positioning 2 1/2 inches inward from edges and spacing evenly.
  115. Using large metal spatula as aid, place 8-inch cake on its cardboard atop dowels in 12-inch cake, centering carefully.
  116. Gently place 5-inch cake on its cardboard atop dowels in 8-inch cake, centering carefully.
  117. Using citrus stripper, cut long strips of orange peel from oranges.
  118. Cut strips into long segments.
  119. To make orange peel coils, wrap peel segment around handle of wooden spoon; gently slide peel off handle so that peel keeps coiled shape.
  120. Garnish cake with orange peel coils, ivy or mint sprigs, and some berries.
  121. (Assembled cake can be made up to 8 hours ahead.
  122. Let stand at cool room temperature.)
  123. Remove top and middle cake tiers.
  124. Remove dowels from cakes.
  125. Cut top and middle cakes into slices.
  126. To cut 12-inch cake: Starting 3 inches inward from edge and inserting knife straight down, cut through from top to bottom to make 6-inch-diameter circle in center of cake.
  127. Cut outer portion of cake into slices; cut inner portion into slices and serve with strawberries.
  128. └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  129. 126 rows in set. Elapsed: 0.011 sec. Processed 8.19 thousand rows, 5.34 MB (737.75 thousand rows/s., 480.59 MB/s.)

Online Playground

The dataset is also available in the Online Playground.