{
    "componentChunkName": "component---src-templates-blog-post-js",
    "path": "/posts/feature-generating-framework/",
    "result": {"data":{"sanityPost":{"id":"-c884edc1-7150-5759-b861-66f02c77f631","slug":{"current":"feature-generating-framework"},"title":"Feature Generating Framework","mainImage":{"altText":null,"asset":{"path":"images/znqtjj88/production/e73a4aebf336efa83f9c8b09a17ffae5a1583e7b-3622x2212.jpg","metadata":{"dimensions":{"width":3622,"height":2212,"aspectRatio":1.6374321880650995},"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAMABQDASIAAhEBAxEB/8QAGAAAAgMAAAAAAAAAAAAAAAAAAAUEBgf/xAAeEAACAgIDAQEAAAAAAAAAAAABAgARAwQFEiExYf/EABcBAAMBAAAAAAAAAAAAAAAAAAECAwX/xAAWEQEBAQAAAAAAAAAAAAAAAAAAEQH/2gAMAwEAAhEDEQA/ANbAtbBFGLeT5NNNetHuflyIm9mCAWPPyLeRytsZgctGZsxBYOO2GbUxtme3b2ETjKyqAPBUIsB//9k=","palette":{"dominant":{"background":"#bccca4"}}}}},"categories":[{"_id":"20b9cbd8-4962-471c-8b5f-6b396cf95c26","title":"Analytics"},{"_id":"baa46497-c2bf-4eeb-bf51-bd08238af629","title":"Data Engineering"}],"publishedAt":"2022-06-02T17:28:00.000Z","noHeaderImage":null,"authors":[{"name":"Snezana Jeremic","_rawBio":[{"_key":"76d07acdcca2","_type":"block","children":[{"_key":"a1033d1f3d000","_type":"span","marks":[],"text":"Senior Data Engineer, Product Partnering"}],"markDefs":[],"style":"normal"}]}],"_rawBodyCopy":[{"_key":"52b09e15f1f9","_type":"block","children":[{"_key":"24d48f53ed410","_type":"span","marks":["strong"],"text":"Background"}],"markDefs":[],"style":"h2"},{"_key":"1a952988a12b","_type":"block","children":[{"_key":"86596e27f1960","_type":"span","marks":[],"text":"At Sonos, we face at least three defining dimensions of big data: volume, variety and velocity; the Product Data Engineering team has to think outside of the box to provide efficient ways to work with the company’s complex IoT data."}],"markDefs":[],"style":"normal"},{"_key":"b5711e04d685","_type":"block","children":[{"_key":"552cfe985e940","_type":"span","marks":[],"text":"Every day, when people use their Sonos products and opt in to data sharing, a wide range of data/events used to enhance user experience flows from their devices to us."}],"markDefs":[],"style":"normal"},{"_key":"007a88833a01","_type":"block","children":[{"_key":"84007379108e0","_type":"span","marks":[],"text":"Raw events are typically ingested into our cloud "},{"_key":"84007379108e1","_type":"span","marks":["strong"],"text":"warehouse"},{"_key":"84007379108e2","_type":"span","marks":[],"text":" as packed semi-structured JSON documents. These JSON documents are partially unpacked and stored into the tables in the cloud warehouse in real time. Each event is usually stored in one large table, which over time includes hundreds of billions of rows."}],"markDefs":[],"style":"normal"},{"_key":"3960d8193995","_type":"sonosImage","altText":"IoT Data Flow","asset":{"_id":"image-01513114028091195be2d0fec4321d1a5d3402ea-2238x597-jpg","_type":"sanity.imageAsset","_rev":"8F4yeKwlkKXWnqQAG6EeJB","_createdAt":"2022-06-07T17:40:44Z","_updatedAt":"2022-06-07T17:40:44Z","assetId":"01513114028091195be2d0fec4321d1a5d3402ea","extension":"jpg","metadata":{"_type":"sanity.imageMetadata","blurHash":"D9O43ixuofxut7~qj[j[fQfQ","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":3.748743718592965,"height":597,"width":2238},"hasAlpha":false,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAFABQDASIAAhEBAxEB/8QAFwABAAMAAAAAAAAAAAAAAAAAAAECB//EAB4QAAICAgIDAAAAAAAAAAAAAAABAhESIgQTMkFR/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhEDEQA/ANslGSzebd+vhEb12fiABfj31q3YAA//2Q==","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#5f5f5f","foreground":"#fff","population":1.07,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#424242","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#5f5f5f","foreground":"#fff","population":1.07,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0.99,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":0.77,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#7f7f7f","foreground":"#fff","population":0,"title":"#fff"}}},"mimeType":"image/jpeg","originalFilename":"feat_gen_framework_01.jpg","path":"images/znqtjj88/production/01513114028091195be2d0fec4321d1a5d3402ea-2238x597.jpg","sha1hash":"01513114028091195be2d0fec4321d1a5d3402ea","size":154367,"uploadId":"z6O9zyR1jlNqMuEVi0K2hRsHM5jYIYOy","url":"https://cdn.sanity.io/images/znqtjj88/production/01513114028091195be2d0fec4321d1a5d3402ea-2238x597.jpg","filename":"feat_gen_framework_01.jpg","width":2238,"height":597,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/01513114028091195be2d0fec4321d1a5d3402ea-2238x597.jpg?rect=920,0,398,597&w=%width%&h=%height%&q=80","id":"image-01513114028091195be2d0fec4321d1a5d3402ea-2238x597-jpg","children":[],"parent":null},"caption":"IoT Data Flow","mediaOpacity":1},{"_key":"52455d66462d","_type":"block","children":[{"_key":"fd7b1bbb4ed80","_type":"span","marks":[],"text":"Each individual event is depicted by a large number of attributes representing a variety of characteristics. It is not uncommon for there to be hundreds of attributes associated with an event. In addition, there are thousands of events which are in many cases organized in hierarchical orders, so called nested events. This kind of structure produces even more data when flattened."}],"markDefs":[],"style":"normal"},{"_key":"006192c15eeb","_type":"block","children":[{"_key":"ffb0edde4e370","_type":"span","marks":[],"text":"Sometimes, raw data may contain human or device generated error data too. The complexity and granularity of the events makes it difficult to extract information and isolate trends from raw data and subsequently create knowledge/value to support decision-making processes."}],"markDefs":[],"style":"normal"},{"_key":"bc942764054a","_type":"block","children":[{"_key":"45c5ead2b6e00","_type":"span","marks":[],"text":"Possible Approaches"}],"markDefs":[],"style":"h2"},{"_key":"f0cb623deedd","_type":"block","children":[{"_key":"2501d37e8c290","_type":"span","marks":[],"text":"Data on the high level includes signals and noise. Large data may result in a considerable amount of noise and can compromise the performance while executing daily analytical tasks. There are two broad perspectives to approach mining data and extracting value:"}],"markDefs":[],"style":"normal"},{"_key":"180351755c9a","_type":"block","children":[{"_key":"02a246f751540","_type":"span","marks":[],"text":"Use complex and often low explanatory algorithms"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"287ae453ec4e","_type":"block","children":[{"_key":"a477bd3c1a0d0","_type":"span","marks":[],"text":"Engineer a library of smart features that will carry the signals"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"c5b39e83787b","_type":"block","children":[{"_key":"57ecbc3dbad60","_type":"span","marks":[],"text":"The first approach is focused on the use of complex "},{"_key":"57ecbc3dbad61","_type":"span","marks":["strong"],"text":"algorithms,"},{"_key":"57ecbc3dbad62","_type":"span","marks":[],"text":" hence the processes and results are difficult to explain. Additionally, our events data is minimally or not at all preprocessed. Even when it is preprocessed, that is done in a highly customized manner which does not allow for reuse or repurpose of it."}],"markDefs":[],"style":"normal"},{"_key":"29df5b0baf72","_type":"block","children":[{"_key":"f79e9c2719670","_type":"span","marks":[],"text":"The second approach places focus on smart use of the "},{"_key":"f79e9c2719671","_type":"span","marks":["strong"],"text":"data"},{"_key":"f79e9c2719672","_type":"span","marks":[],"text":". This approach uses carefully crafted "},{"_key":"f79e9c2719673","_type":"span","marks":["strong"],"text":"features"},{"_key":"f79e9c2719674","_type":"span","marks":[],"text":" as inputs which yields use of simpler algorithms resulting in highly explainable processes and results. In some sense this is a more "},{"_key":"f79e9c2719675","_type":"span","marks":["strong"],"text":"data-centric"},{"_key":"f79e9c2719676","_type":"span","marks":[],"text":" approach."}],"markDefs":[],"style":"normal"},{"_key":"ee77eb3b2d91","_type":"block","children":[{"_key":"3e6efc2783680","_type":"span","marks":[],"text":"A quick reminder what a feature is:"}],"markDefs":[],"style":"normal"},{"_key":"3afc61ef8e8b","_type":"block","children":[{"_key":"bdd0fbc7ce370","_type":"span","marks":[],"text":"A characteristic, property or attribute extracted from raw data"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"a29581f7c286","_type":"block","children":[{"_key":"974430952c1c0","_type":"span","marks":[],"text":"Preferably in base units\nFor example: count, mean, median, standard deviation, variance"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"45b9455c8c57","_type":"block","children":[{"_key":"c1e21cbdfcee0","_type":"span","marks":[],"text":"Let us take a brief look at the "},{"_key":"c1e21cbdfcee1","_type":"span","marks":["strong"],"text":"Analytics Lifecycle:"}],"markDefs":[],"style":"normal"},{"_key":"7f93f9253f8d","_type":"sonosImage","altText":"Analytics Lifecycle","asset":{"_id":"image-44b1c1f714dd82d5016275930e471ac3d1d8e9a9-2964x411-jpg","_type":"sanity.imageAsset","_rev":"8F4yeKwlkKXWnqQAG6FAkp","_createdAt":"2022-06-07T17:41:42Z","_updatedAt":"2022-06-07T17:41:42Z","assetId":"44b1c1f714dd82d5016275930e471ac3d1d8e9a9","extension":"jpg","metadata":{"_type":"sanity.imageMetadata","blurHash":"44Nm.*M{?b~q?b","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":7.211678832116788,"height":411,"width":2964},"hasAlpha":false,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAADABQDASIAAhEBAxEB/8QAFwABAAMAAAAAAAAAAAAAAAAAAAIEB//EAB0QAAIBBAMAAAAAAAAAAAAAAAABAwQRE5ExMoH/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/8QAFBEBAAAAAAAAAAAAAAAAAAAAAP/aAAwDAQACEQMRAD8A2CpbUjs3snE3j5ewALUPT0AAf//Z","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#646464","foreground":"#fff","population":1.65,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#424242","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":3.14,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":3.14,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":1.08,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#7f7f7f","foreground":"#fff","population":0,"title":"#fff"}}},"mimeType":"image/jpeg","originalFilename":"feat_gen_framework_02.jpg","path":"images/znqtjj88/production/44b1c1f714dd82d5016275930e471ac3d1d8e9a9-2964x411.jpg","sha1hash":"44b1c1f714dd82d5016275930e471ac3d1d8e9a9","size":158192,"uploadId":"yoyB7Q2deGMRVezV5ZdpwSmau9R1qpIc","url":"https://cdn.sanity.io/images/znqtjj88/production/44b1c1f714dd82d5016275930e471ac3d1d8e9a9-2964x411.jpg","filename":"feat_gen_framework_02.jpg","width":2964,"height":411,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/44b1c1f714dd82d5016275930e471ac3d1d8e9a9-2964x411.jpg?rect=1345,0,274,411&w=%width%&h=%height%&q=80","id":"image-44b1c1f714dd82d5016275930e471ac3d1d8e9a9-2964x411-jpg","children":[],"parent":null},"caption":"Analytics Lifecycle","mediaOpacity":1},{"_key":"4bfb7fb160ae","_type":"block","children":[{"_key":"c92fc03177240","_type":"span","marks":[],"text":"In the image above, we can identify three phases which are directly related to data preparation and notice that these three phases take at least 37.5% of the analytics project, which is a pretty large portion of the whole process."}],"markDefs":[],"style":"normal"},{"_key":"df055dbedf35","_type":"block","children":[{"_key":"ca479ad449c70","_type":"span","marks":[],"text":"If we emphasize on extracting the signals from data, we provide more flexibility for downstream users."}],"markDefs":[],"style":"normal"},{"_key":"6d3ad7c47f1f","_type":"block","children":[{"_key":"4fb11904d15c0","_type":"span","marks":[],"text":"A few key benefits of well designed features:"}],"markDefs":[],"style":"normal"},{"_key":"ff3d4f901dc7","_type":"block","children":[{"_key":"c3d8d85bf1260","_type":"span","marks":[],"text":"Flexibility"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"b11c63e02da6","_type":"block","children":[{"_key":"620e81b185ff0","_type":"span","marks":[],"text":"Simpler models"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"7bd753061008","_type":"block","children":[{"_key":"48a526cab1460","_type":"span","marks":[],"text":"Better results"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"fd0d9d238b30","_type":"block","children":[{"_key":"79fed9e391ff0","_type":"span","marks":[],"text":"Less resources usage"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"5e0bbd7a99bf","_type":"block","children":[{"_key":"240c0e4075cb0","_type":"span","marks":[],"text":"A crucial benefit of this approach is that these generic smart features can play a "},{"_key":"240c0e4075cb1","_type":"span","marks":["strong"],"text":"multi-purpose role"},{"_key":"240c0e4075cb2","_type":"span","marks":[],"text":" and be used as inputs for predictive modeling and/or any other analytics task at hand."}],"markDefs":[],"style":"normal"},{"_key":"a0e771a56c26","_type":"block","children":[{"_key":"47d1df98c1e50","_type":"span","marks":[],"text":"While this requires more development and investigation work, it provides flexible and reusable data structures that can be combined and tailored in a seemingly inexhaustible manner."}],"markDefs":[],"style":"normal"},{"_key":"cded8dcb7323","_type":"block","children":[{"_key":"2bfc912275cb0","_type":"span","marks":[],"text":"Our Solution"}],"markDefs":[],"style":"h2"},{"_key":"fd323f576e62","_type":"block","children":[{"_key":"7e8eb3aa6a8d0","_type":"span","marks":[],"text":"We decided to build a library of customizable plug & play features using metadata driven highly automated processes called "},{"_key":"7e8eb3aa6a8d1","_type":"span","marks":["strong"],"text":"feature generating framework (FGF)"},{"_key":"7e8eb3aa6a8d2","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"025ec48b940d","_type":"block","children":[{"_key":"4ea3af7b0aac0","_type":"span","marks":[],"text":"The idea that we decided to follow is that the"},{"_key":"4ea3af7b0aac1","_type":"span","marks":["strong"],"text":" high quality data can be more powerful than the algorithm."}],"markDefs":[],"style":"normal"},{"_key":"214a63013bbb","_type":"block","children":[{"_key":"701429c7e185","_type":"span","marks":[],"text":""}],"markDefs":[],"style":"normal"},{"_key":"2d515cd232fa","_type":"block","children":[{"_key":"74a180b757150","_type":"span","marks":["strong"],"text":"What does the feature engineering process look like?"}],"markDefs":[],"style":"normal"},{"_key":"a66f308820e0","_type":"block","children":[{"_key":"ca8ada9898960","_type":"span","marks":[],"text":"Brainstorming"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"393f90e3f740","_type":"block","children":[{"_key":"7ef8c09f9ee10","_type":"span","marks":[],"text":"Finding patterns that lead to feature creation (Exploratory Data Analysis)"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"c740740d9baa","_type":"block","children":[{"_key":"dae75a5205fd0","_type":"span","marks":[],"text":"Deciding what features to create"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"bda9f90f3b9d","_type":"block","children":[{"_key":"985f9f00af200","_type":"span","marks":[],"text":"Creating features"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"9b7b52ce85a0","_type":"block","children":[{"_key":"c710098545440","_type":"span","marks":[],"text":"Testing the impact of the identified features on the problem (Analytics Lifecycle)"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"865e4aef0c3d","_type":"block","children":[{"_key":"19fa7d2f9d760","_type":"span","marks":[],"text":"Monitoring the quality of the features and improving features (results driven process)"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"3a3eb7aaa7ed","_type":"block","children":[{"_key":"988f2bb10afb0","_type":"span","marks":[],"text":"Repeat"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"40d0a3073078","_type":"block","children":[{"_key":"ba9ed8805b670","_type":"span","marks":[],"text":"What is the Feature Generating Framework?"}],"markDefs":[],"style":"h3"},{"_key":"9d92a2bf9af4","_type":"block","children":[{"_key":"6b16c55b7ff90","_type":"span","marks":[],"text":"The "},{"_key":"6b16c55b7ff91","_type":"span","marks":["strong"],"text":"FGF"},{"_key":"6b16c55b7ff92","_type":"span","marks":[],"text":" is a custom-built "},{"_key":"6b16c55b7ff93","_type":"span","marks":["strong"],"text":"infrastructure"},{"_key":"6b16c55b7ff94","_type":"span","marks":[],"text":" which generates metadata-driven processes of feature creation. This infrastructure allows us to performantly and systematically reduce a body of data into smaller parts or views that yield more information."}],"markDefs":[],"style":"normal"},{"_key":"7cf33ed74467","_type":"block","children":[{"_key":"5d77beb9285a0","_type":"span","marks":[],"text":"It offers a streamlined way to create bite-sized data points that are trustworthy, accessible and understandable."}],"markDefs":[],"style":"normal"},{"_key":"4327031f39a4","_type":"block","children":[{"_key":"a9b2c92060880","_type":"span","marks":[],"text":"Most importantly, it "},{"_key":"a9b2c92060881","_type":"span","marks":["strong"],"text":"automates"},{"_key":"a9b2c92060882","_type":"span","marks":[],"text":" the approach of how the features are extracted from the data so that the feature generation is performed in a systematic and organized manner. That’s why we call it a declarative feature engineering framework; it uses a parameterized approach to define what to generate and how, and allows highly configurable quantitative or qualitative output features. The input parameters are saved in the metadata model to provide dynamic generation of data wrangling processes. The features are organized in a dimensional model to offer the best performance when selecting data bites for analysis and/or predictive modeling."}],"markDefs":[],"style":"normal"},{"_key":"67b2fca75d8a","_type":"block","children":[{"_key":"1b0e56bc0ddd0","_type":"span","marks":["strong"],"text":"Input"}],"markDefs":[],"style":"h3"},{"_key":"4c8c112b385f","_type":"block","children":[{"_key":"0ed79c69e9010","_type":"span","marks":[],"text":"The framework uses a Jupyter notebook UI to accept a combination of parameters/variables, which are saved into the "},{"_key":"0ed79c69e9011","_type":"span","marks":["f088aa4f20aa"],"text":"relational metadata model"},{"_key":"0ed79c69e9012","_type":"span","marks":[],"text":". This front-end UI prompts the execution of self-driven SQL statements on the backend. These operations are stitched together using Python and JavaScript."}],"markDefs":[{"_key":"f088aa4f20aa","_type":"link","href":"https://en.wikipedia.org/wiki/Relational_model"}],"style":"normal"},{"_key":"d0ce8bf2126a","_type":"block","children":[{"_key":"5e4e1078dd440","_type":"span","marks":[],"text":"The input parameters (metadata) are used and reused to assemble DDLs, DQLs and DMLs to perform the following:"}],"markDefs":[],"style":"normal"},{"_key":"4117ba60ffae","_type":"block","children":[{"_key":"4658a5750c4c0","_type":"span","marks":[],"text":"Create data structures"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"43c0b252a7aa","_type":"block","children":[{"_key":"6a7c487e8d170","_type":"span","marks":[],"text":"Load the data"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"ebfdf6582c18","_type":"block","children":[{"_key":"4c42423f007f0","_type":"span","marks":[],"text":"Create and start the jobs which drive data loading processes on schedules"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"b782545ca01c","_type":"block","children":[{"_key":"a3edcaf820f30","_type":"span","marks":[],"text":"Monitor data quality"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"386a0f1c9b70","_type":"block","children":[{"_key":"2ab4e771ed6d0","_type":"span","marks":[],"text":"Export data quality metrics for continuous visual data quality monitoring (DataDog, Tableau)"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"12af6f5ade3a","_type":"block","children":[{"_key":"15729247f4450","_type":"span","marks":["strong"],"text":"Data Sources"}],"markDefs":[],"style":"h3"},{"_key":"b47a07e73208","_type":"block","children":[{"_key":"818c4efd56780","_type":"span","marks":[],"text":"Any single table or complex SQL query which references multiple tables can be used as a data source for deriving a feature."}],"markDefs":[],"style":"normal"},{"_key":"4aeadfd4bbde","_type":"block","children":[{"_key":"fc3945bb54fc0","_type":"span","marks":[],"text":"There is a one-to-one relationship between data source and feature which allows for implementation of deductive logic and a later bottom-up approach in combining the features into custom datasets for analysis and/or predictive modeling."}],"markDefs":[],"style":"normal"},{"_key":"f2182d18ae0f","_type":"block","children":[{"_key":"eea7c54d34530","_type":"span","marks":[],"text":"Each data source includes a set of default "},{"_key":"eea7c54d34531","_type":"span","marks":["strong"],"text":"key attributes"},{"_key":"eea7c54d34532","_type":"span","marks":[],"text":" which implement standardization and reusability with other in-house datasets."}],"markDefs":[],"style":"normal"},{"_key":"0a2b38936e87","_type":"block","children":[{"_key":"942753b16b2c0","_type":"span","marks":[],"text":"The "},{"_key":"942753b16b2c1","_type":"span","marks":["strong"],"text":"key"},{"_key":"942753b16b2c2","_type":"span","marks":[],"text":" attributes serve as "},{"_key":"942753b16b2c3","_type":"span","marks":["strong"],"text":"consistent identifiers"},{"_key":"942753b16b2c4","_type":"span","marks":[],"text":" across all the internal data and can be used to stitch together datasets and produce "},{"_key":"942753b16b2c5","_type":"span","marks":["strong"],"text":"rich high quality data"},{"_key":"942753b16b2c6","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"933d0b436a91","_type":"sonosImage","altText":"Feature Generation Process Flow","asset":{"_id":"image-a0a4b7d29d4f736e92dea8192b678f16753b603d-2880x2580-jpg","_type":"sanity.imageAsset","_rev":"tbmL6Oue3Qo9G06ZuT4zCz","_createdAt":"2022-06-07T17:42:40Z","_updatedAt":"2022-06-07T17:42:40Z","assetId":"a0a4b7d29d4f736e92dea8192b678f16753b603d","extension":"jpg","metadata":{"_type":"sanity.imageMetadata","blurHash":"e4ONB[-;?b~qRjIUWBofofxu%MIUt7j[of~qRjt7ofRj%MxuWBt7Rj","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":1.1162790697674418,"height":2580,"width":2880},"hasAlpha":false,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAASABQDASIAAhEBAxEB/8QAGAABAQADAAAAAAAAAAAAAAAAAAIDBAf/xAAhEAACAwABAwUAAAAAAAAAAAABAgADETEEMkESEyFRcf/EABQBAQAAAAAAAAAAAAAAAAAAAAD/xAAUEQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIRAxEAPwDtV5djwVC/XmTU7isOAzb42bT9jfkxdMoNC5xAqm33E9WZ85zEpKkRcUADmIFN2mTSAKxkRAuIiB//2Q==","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#676767","foreground":"#fff","population":0.71,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#424242","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":1.04,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":1.04,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":0.39,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#7f7f7f","foreground":"#fff","population":0,"title":"#fff"}}},"mimeType":"image/jpeg","originalFilename":"feat_gen_framework_03.jpg","path":"images/znqtjj88/production/a0a4b7d29d4f736e92dea8192b678f16753b603d-2880x2580.jpg","sha1hash":"a0a4b7d29d4f736e92dea8192b678f16753b603d","size":555558,"uploadId":"ivMGDEE7FYrq9THcIyvXmGNQ033nhv7q","url":"https://cdn.sanity.io/images/znqtjj88/production/a0a4b7d29d4f736e92dea8192b678f16753b603d-2880x2580.jpg","filename":"feat_gen_framework_03.jpg","width":2880,"height":2580,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/a0a4b7d29d4f736e92dea8192b678f16753b603d-2880x2580.jpg?rect=580,0,1720,2580&w=%width%&h=%height%&q=80","id":"image-a0a4b7d29d4f736e92dea8192b678f16753b603d-2880x2580-jpg","children":[],"parent":null},"caption":"Feature Generation Process Flow","mediaOpacity":1},{"_key":"eaf4d5f33d4d","_type":"block","children":[{"_key":"ddb7e1dbc8b50","_type":"span","marks":["strong"],"text":"The feature generation "},{"_key":"ddb7e1dbc8b51","_type":"span","marks":[],"text":"process starts with either new or existing parameters input into the front-end UI. This is then processed by generic code which assembles SQL statements to either create a new feature or refresh existing feature with newly arrived raw data."}],"markDefs":[],"style":"normal"},{"_key":"4c164e4fb48f","_type":"block","children":[{"_key":"84c0f72353020","_type":"span","marks":[],"text":"The new set of raw data is cleaned, transformed, and loaded into a "},{"_key":"84c0f72353021","_type":"span","marks":["strong"],"text":"dimensional model"},{"_key":"84c0f72353022","_type":"span","marks":[],"text":". Once a feature is created, the "},{"_key":"95df19e9640b1","_type":"span","marks":["strong"],"text":"data quality"},{"_key":"95df19e9640b2","_type":"span","marks":[],"text":" (DQ) phase starts. We monitor features quality, applying custom defined calculations and storing the resulting statistics. These statistics are imported into a cloud observability service where the historical trends are visualized over various time frames for anomaly detection and other DQ analytics."}],"markDefs":[],"style":"normal"},{"_key":"3851a2300630","_type":"block","children":[{"_key":"adfb99f4ac770","_type":"span","marks":[],"text":"Finally, we assemble merged datasets using available metadata, add new or remove existing features in automated fashion into custom datasets. At this point the raw "},{"_key":"84b9161ba91b1","_type":"span","marks":["strong"],"text":"features"},{"_key":"84b9161ba91b2","_type":"span","marks":[],"text":" and the "},{"_key":"84b9161ba91b3","_type":"span","marks":["strong"],"text":"statistical features"},{"_key":"84b9161ba91b4","_type":"span","marks":[],"text":" are available for data analysis and/or predictive modeling."}],"markDefs":[],"style":"normal"},{"_key":"1f082f08a951","_type":"block","children":[{"_key":"4181e8c436870","_type":"span","marks":[],"text":"Data Model"}],"markDefs":[],"style":"h3"},{"_key":"a5167c48dde0","_type":"block","children":[{"_key":"9fee4c0f99110","_type":"span","marks":[],"text":"Let’s take a closer look at the “Load Data in DM” phase from the above process flow diagram. The "},{"_key":"9fee4c0f99111","_type":"span","marks":["strong"],"text":"FGF"},{"_key":"9fee4c0f99112","_type":"span","marks":[],"text":" shapes and models the data and generates features into a "},{"_key":"9fee4c0f99113","_type":"span","marks":["4e0966a0156c"],"text":"dimensional model"},{"_key":"9fee4c0f99114","_type":"span","marks":[],"text":"."}],"markDefs":[{"_key":"4e0966a0156c","_type":"link","href":"https://en.wikipedia.org/wiki/Dimensional_modeling"}],"style":"normal"},{"_key":"ee83938753c8","_type":"block","children":[{"_key":"b3cef095b5960","_type":"span","marks":[],"text":"The dimensional model consists of "},{"_key":"b3cef095b5961","_type":"span","marks":["strong"],"text":"facts"},{"_key":"b3cef095b5962","_type":"span","marks":[],"text":" including measures and dimensions including descriptive data about measures. The "},{"_key":"69547daa29a51","_type":"span","marks":["strong"],"text":"FGF"},{"_key":"69547daa29a52","_type":"span","marks":[],"text":" dimensional model is used to organize and standardize features into data structures. These provide a variety of connectors used to merge features into complex, customized datasets. Quantitative features reside in fact tables and categorical features reside in dimensions."}],"markDefs":[],"style":"normal"},{"_key":"63024be4b098","_type":"block","children":[{"_key":"0691f96792620","_type":"span","marks":[],"text":"The "},{"_key":"0691f96792621","_type":"span","marks":["strong"],"text":"keys"},{"_key":"0691f96792622","_type":"span","marks":[],"text":" are ingrained into the tables during data transformation processes. Resulting tables are daily aggregates grouped by the key attributes. Each feature is time-stamped with the standard date attribute which can be customized based on the rules."}],"markDefs":[],"style":"normal"},{"_key":"065401e1502a","_type":"block","children":[{"_key":"c1131f8b56b20","_type":"span","marks":[],"text":"\n"}],"markDefs":[],"style":"normal"},{"_key":"c10740d6aa6c","_type":"sonosImage","altText":"FGF Data Model","asset":{"_id":"image-e1d7553d699d44cf0960ad61ab2c3424f9f5bc93-1920x1720-gif","_type":"sanity.imageAsset","_rev":"fz7lDlMAARX7CuemrYPOiN","_createdAt":"2022-06-07T17:44:00Z","_updatedAt":"2022-06-07T17:44:00Z","assetId":"e1d7553d699d44cf0960ad61ab2c3424f9f5bc93","extension":"gif","metadata":{"_type":"sanity.imageMetadata","blurHash":"e3OWvn-;~q_3t7%Moft7RjRj~qRj9FofWB_3RjWBt7xuWBj[WBRj%M","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":1.1162790697674418,"height":1720,"width":1920},"hasAlpha":true,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAASABQDASIAAhEBAxEB/8QAGAABAAMBAAAAAAAAAAAAAAAAAAECAwf/xAAeEAADAAICAwEAAAAAAAAAAAAAAQIDERIhBEFRcf/EABQBAQAAAAAAAAAAAAAAAAAAAAD/xAAUEQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIRAxEAPwDuFvXbf4ZYs11vmkvhbyI5zqXqmVWDdTTp9IDeXtAlLSAD2SAAAAH/2Q==","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#6a6a6a","foreground":"#fff","population":0.39,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#424242","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0.51,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0.51,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":0.31,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#7f7f7f","foreground":"#fff","population":0,"title":"#fff"}}},"mimeType":"image/gif","originalFilename":"feat_gen_framework_04.gif","path":"images/znqtjj88/production/e1d7553d699d44cf0960ad61ab2c3424f9f5bc93-1920x1720.gif","sha1hash":"e1d7553d699d44cf0960ad61ab2c3424f9f5bc93","size":148880,"uploadId":"uGzroTvjo7BQLa05WgoYmn5FMVHbSGik","url":"https://cdn.sanity.io/images/znqtjj88/production/e1d7553d699d44cf0960ad61ab2c3424f9f5bc93-1920x1720.gif","filename":"feat_gen_framework_04.gif","width":1920,"height":1720,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/e1d7553d699d44cf0960ad61ab2c3424f9f5bc93-1920x1720.gif?rect=387,0,1147,1720&w=%width%&h=%height%&q=80","id":"image-e1d7553d699d44cf0960ad61ab2c3424f9f5bc93-1920x1720-gif","children":[],"parent":null},"caption":"FGF Data Model","mediaOpacity":1},{"_key":"2e1b4dd96728","_type":"block","children":[{"_key":"84a0f652b84d0","_type":"span","marks":[],"text":"Metadata Store"}],"markDefs":[],"style":"h3"},{"_key":"e337ca5f7b6a","_type":"block","children":[{"_key":"a7d0586b8c750","_type":"span","marks":[],"text":"The metadata store organizes input parameters into a normalized relational data structure. Metadata is used to support "},{"_key":"1cd5b4f9954e1","_type":"span","marks":["strong"],"text":"data-centric automation"},{"_key":"1cd5b4f9954e2","_type":"span","marks":[],"text":" of the feature generation processes."}],"markDefs":[],"style":"normal"},{"_key":"d034e010c6c7","_type":"block","children":[{"_key":"09bfdacc3d340","_type":"span","marks":[],"text":"We use a "},{"_key":"09bfdacc3d341","_type":"span","marks":["strong"],"text":"dynamic programming "},{"_key":"09bfdacc3d342","_type":"span","marks":[],"text":"approach where the "},{"_key":"09bfdacc3d343","_type":"span","marks":["strong"],"text":"code"},{"_key":"09bfdacc3d344","_type":"span","marks":[],"text":" for each phase in the feature generation process (above diagram) is "},{"_key":"09bfdacc3d345","_type":"span","marks":["strong"],"text":"generalized"},{"_key":"09bfdacc3d346","_type":"span","marks":[],"text":" and "},{"_key":"09bfdacc3d347","_type":"span","marks":["strong"],"text":"reused"},{"_key":"09bfdacc3d348","_type":"span","marks":[],"text":" in "},{"_key":"09bfdacc3d349","_type":"span","marks":["strong"],"text":"conjunction"},{"_key":"09bfdacc3d3410","_type":"span","marks":[],"text":" with the "},{"_key":"09bfdacc3d3411","_type":"span","marks":["strong"],"text":"metadata"},{"_key":"09bfdacc3d3412","_type":"span","marks":[],"text":" to create highly customized yet "},{"_key":"09bfdacc3d3413","_type":"span","marks":["strong"],"text":"standardized"},{"_key":"09bfdacc3d3414","_type":"span","marks":[],"text":" data structures and transformations."}],"markDefs":[],"style":"normal"},{"_key":"18751a53f532","_type":"block","children":[{"_key":"be5ef1f72a6e0","_type":"span","marks":[],"text":"The metadata model is designed in the manner that allows:"}],"markDefs":[],"style":"normal"},{"_key":"4390e46a20b7","_type":"block","children":[{"_key":"6ba74a8033ee0","_type":"span","marks":[],"text":"Activation and deactivation of most FG processes.\nFor example: stop temporary data load jobs, stop generation of statistics, deactivate a feature and similar."}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"3eae3b489831","_type":"block","children":[{"_key":"78e1e446107b0","_type":"span","marks":[],"text":"Metadata modifications which can affect a feature's content/data."}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"2fb4cf678c25","_type":"block","children":[{"_key":"34ea40b27b200","_type":"span","marks":[],"text":"DQ statistics that can be extended or modified."}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"ec1ec60098a1","_type":"block","children":[{"_key":"d2252999e4cf0","_type":"span","marks":[],"text":"Most importantly, the "},{"_key":"d2252999e4cf1","_type":"span","marks":["strong"],"text":"metadata store"},{"_key":"d2252999e4cf2","_type":"span","marks":[],"text":" is directly connected with the dimensional model via "},{"_key":"d2252999e4cf3","_type":"span","marks":["strong"],"text":"unique keys"},{"_key":"d2252999e4cf4","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"80be90144192","_type":"block","children":[{"_key":"48245e5163e6","_type":"span","marks":[],"text":""}],"markDefs":[],"style":"normal"},{"_key":"64ce3a481501","_type":"sonosImage","altText":"FGF Metadata Model","asset":{"_id":"image-67122ef2d96737e9d30a29c5f86980b774e52c4b-2283x2043-jpg","_type":"sanity.imageAsset","_rev":"8F4yeKwlkKXWnqQAG6FUmj","_createdAt":"2022-06-07T17:44:50Z","_updatedAt":"2022-06-07T17:44:50Z","assetId":"67122ef2d96737e9d30a29c5f86980b774e52c4b","extension":"jpg","metadata":{"_type":"sanity.imageMetadata","blurHash":"e3OgKN?b_3?bIUxuRjj[t7xu-;IUM{xut7~qRjayt7M{%M%MRjM{j[","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":1.117474302496329,"height":2043,"width":2283},"hasAlpha":false,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAASABQDASIAAhEBAxEB/8QAGAABAQADAAAAAAAAAAAAAAAAAAMBAgf/xAAdEAACAgIDAQAAAAAAAAAAAAAAAQIREiEDEzEi/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhEDEQA/AO3z7cnjVGsOdK1N7LvwhwQX1kk9gWjJSVrwBJJUtADIAAAAD//Z","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#656565","foreground":"#fff","population":0.53,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#424242","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":0.53,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0.31,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#bcbcbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#7c7c7c","foreground":"#fff","population":0.53,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#7f7f7f","foreground":"#fff","population":0,"title":"#fff"}}},"mimeType":"image/jpeg","originalFilename":"feat_gen_framework_05.jpg","path":"images/znqtjj88/production/67122ef2d96737e9d30a29c5f86980b774e52c4b-2283x2043.jpg","sha1hash":"67122ef2d96737e9d30a29c5f86980b774e52c4b","size":241125,"uploadId":"VhUZEnLyU9AFpw7yovqqPI2geJiCXVAk","url":"https://cdn.sanity.io/images/znqtjj88/production/67122ef2d96737e9d30a29c5f86980b774e52c4b-2283x2043.jpg","filename":"feat_gen_framework_05.jpg","width":2283,"height":2043,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/67122ef2d96737e9d30a29c5f86980b774e52c4b-2283x2043.jpg?rect=461,0,1362,2043&w=%width%&h=%height%&q=80","id":"image-67122ef2d96737e9d30a29c5f86980b774e52c4b-2283x2043-jpg","children":[],"parent":null},"caption":"FGF Metadata Model","mediaOpacity":1},{"_key":"a4c02da5feec","_type":"block","children":[{"_key":"476c2998caaf0","_type":"span","marks":[],"text":"Data Quality Monitoring"}],"markDefs":[],"style":"h3"},{"_key":"abb315499295","_type":"block","children":[{"_key":"0fa8c0b1d8900","_type":"span","marks":[],"text":"Provides a glance at historical and current insights about central tendency and distribution of the data/features. It supplies two kinds of DQ monitoring:"}],"markDefs":[],"style":"normal"},{"_key":"77643ad01540","_type":"block","children":[{"_key":"1a465715a14f0","_type":"span","marks":[],"text":"Feature values auditing"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"8f8f8dabc298","_type":"block","children":[{"_key":"be850336e8660","_type":"span","marks":[],"text":"Data loading metrics"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"2177e8669e6b","_type":"block","children":[{"_key":"1bec3ef7dba50","_type":"span","marks":[],"text":"It calculates daily statistics for generated features, such as:"}],"markDefs":[],"style":"normal"},{"_key":"1a938245348d","_type":"block","children":[{"_key":"2cd8b99193410","_type":"span","marks":[],"text":"Mean, median, mode, standard deviation, variance, IQR, … etc."}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"444497ee4816","_type":"block","children":[{"_key":"b14e5b04b36a0","_type":"span","marks":[],"text":"Daily data loading metrics for each fact:"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"565f1ef8a0a8","_type":"block","children":[{"_key":"d6b4e380eddd0","_type":"span","marks":[],"text":"Null ratio"}],"level":2,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"1f34eab9136f","_type":"block","children":[{"_key":"a4159d273ad40","_type":"span","marks":[],"text":"Row counts"}],"level":2,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"5a414c2a7b81","_type":"block","children":[{"_key":"2adfba9bdd360","_type":"span","marks":[],"text":"Unique ratios"}],"level":2,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"d7dcd34bef0f","_type":"block","children":[{"_key":"f851e55906520","_type":"span","marks":[],"text":"Unique keys ratios"}],"level":2,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"06753de855a7","_type":"block","children":[{"_key":"5f0de4d500cf0","_type":"span","marks":[],"text":"The "},{"_key":"5f0de4d500cf1","_type":"span","marks":["strong"],"text":"feature statistics"},{"_key":"5f0de4d500cf2","_type":"span","marks":[],"text":" are saved in the data store and the reference keys are provided to the raw features as well as metadata."}],"markDefs":[],"style":"normal"},{"_key":"e7d81c069437","_type":"block","children":[{"_key":"d18d9242ce4a0","_type":"span","marks":[],"text":"DQ monitoring provides a wide range of feature statistics that are used to immediately "},{"_key":"d18d9242ce4a1","_type":"span","marks":["strong"],"text":"detect"},{"_key":"d18d9242ce4a2","_type":"span","marks":[],"text":" any "},{"_key":"d18d9242ce4a3","_type":"span","marks":["strong"],"text":"qualitative"},{"_key":"d18d9242ce4a4","_type":"span","marks":[],"text":" and "},{"_key":"d18d9242ce4a5","_type":"span","marks":["strong"],"text":"quantitative"},{"_key":"d18d9242ce4a6","_type":"span","marks":[],"text":" "},{"_key":"d18d9242ce4a7","_type":"span","marks":["strong"],"text":"shifts"},{"_key":"d18d9242ce4a8","_type":"span","marks":[],"text":" in data. We use it to remedy data issues promptly and before they dramatically affect any downstream processes."}],"markDefs":[],"style":"normal"},{"_key":"e18733943569","_type":"block","children":[{"_key":"6e8348172ada0","_type":"span","marks":[],"text":"Below image represents a sample of the median and mean time series feature statistics:"}],"markDefs":[],"style":"normal"},{"_key":"711cb4667a76","_type":"sonosImage","altText":"Graph of daily trending median versus mean fact measure values","asset":{"_id":"image-e6e3587fbb705dc3f23b222ad523d193718c90b0-1964x1460-jpg","_type":"sanity.imageAsset","_rev":"tbmL6Oue3Qo9G06ZuT5EVG","_createdAt":"2022-06-07T17:45:49Z","_updatedAt":"2022-06-07T17:45:49Z","assetId":"e6e3587fbb705dc3f23b222ad523d193718c90b0","extension":"jpg","metadata":{"_type":"sanity.imageMetadata","blurHash":"V6SY{p?v-;?Ix]?cj[t7j[oJ?dRjtQaeRi~XWBt7ayof","dimensions":{"_type":"sanity.imageDimensions","aspectRatio":1.3452054794520547,"height":1460,"width":1964},"hasAlpha":false,"isOpaque":true,"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAPABQDASIAAhEBAxEB/8QAGQAAAgMBAAAAAAAAAAAAAAAAAAIBAwQI/8QAHxAAAgICAQUAAAAAAAAAAAAAAREAAgMSIQQTMUFx/8QAFQEBAQAAAAAAAAAAAAAAAAAAAAH/xAAVEQEBAAAAAAAAAAAAAAAAAAAAIf/aAAwDAQACEQMRAD8A6ezZqPW5rx6IlXdxAlaP5NNhYnwDFTaqJKIx9S686wj0BARAhA//2Q==","palette":{"_type":"sanity.imagePalette","darkMuted":{"_type":"sanity.imagePaletteSwatch","background":"#6e6e6f","foreground":"#fff","population":0.04,"title":"#fff"},"darkVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#634421","foreground":"#fff","population":0,"title":"#fff"},"dominant":{"_type":"sanity.imagePaletteSwatch","background":"#c69156","foreground":"#000","population":0.26,"title":"#fff"},"lightMuted":{"_type":"sanity.imagePaletteSwatch","background":"#b3d5b9","foreground":"#000","population":0.16,"title":"#fff"},"lightVibrant":{"_type":"sanity.imagePaletteSwatch","background":"#fc9cbc","foreground":"#000","population":0,"title":"#fff"},"muted":{"_type":"sanity.imagePaletteSwatch","background":"#8c8c5c","foreground":"#fff","population":0,"title":"#fff"},"vibrant":{"_type":"sanity.imagePaletteSwatch","background":"#c69156","foreground":"#000","population":0.26,"title":"#fff"}}},"mimeType":"image/jpeg","originalFilename":"Data Quality Monitoring.jpg","path":"images/znqtjj88/production/e6e3587fbb705dc3f23b222ad523d193718c90b0-1964x1460.jpg","sha1hash":"e6e3587fbb705dc3f23b222ad523d193718c90b0","size":366137,"uploadId":"sgHWn01QUDtw4wOP27rVbS75Q42c6U0Q","url":"https://cdn.sanity.io/images/znqtjj88/production/e6e3587fbb705dc3f23b222ad523d193718c90b0-1964x1460.jpg","filename":"Data Quality Monitoring.jpg","width":1964,"height":1460,"placeholderUrl":"https://cdn.sanity.io/images/znqtjj88/production/e6e3587fbb705dc3f23b222ad523d193718c90b0-1964x1460.jpg?rect=495,0,973,1460&w=%width%&h=%height%&q=80","id":"image-e6e3587fbb705dc3f23b222ad523d193718c90b0-1964x1460-jpg","children":[],"parent":null},"mediaOpacity":1},{"_key":"5ad8e9cee680","_type":"block","children":[{"_key":"bb87fd6cb3110","_type":"span","marks":["strong"],"text":"Results"}],"markDefs":[],"style":"h3"},{"_key":"a85a5d7974e5","_type":"block","children":[{"_key":"e4ebba8eb65e0","_type":"span","marks":["strong"],"text":"The FGF"},{"_key":"e4ebba8eb65e1","_type":"span","marks":[],"text":" creates and maintains a "},{"_key":"e4ebba8eb65e2","_type":"span","marks":["strong"],"text":"library"},{"_key":"e4ebba8eb65e3","_type":"span","marks":[],"text":" of unique"},{"_key":"e4ebba8eb65e4","_type":"span","marks":["strong"],"text":" features"},{"_key":"e4ebba8eb65e5","_type":"span","marks":[],"text":" containing "},{"_key":"e4ebba8eb65e6","_type":"span","marks":["strong"],"text":"fundamental"},{"_key":"e4ebba8eb65e7","_type":"span","marks":[],"text":" "},{"_key":"e4ebba8eb65e8","_type":"span","marks":["strong"],"text":"information"},{"_key":"e4ebba8eb65e9","_type":"span","marks":[],"text":" about product data that can be used across the company due to the "},{"_key":"e4ebba8eb65e10","_type":"span","marks":["strong"],"text":"universal structure"},{"_key":"e4ebba8eb65e11","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"8501e3a57f10","_type":"block","children":[{"_key":"58cdb143930a0","_type":"span","marks":[],"text":"The "},{"_key":"58cdb143930a1","_type":"span","marks":["strong"],"text":"universal property"},{"_key":"58cdb143930a2","_type":"span","marks":[],"text":" comes from the relational keys that act as the "},{"_key":"58cdb143930a3","_type":"span","marks":["strong"],"text":"connectors"},{"_key":"58cdb143930a4","_type":"span","marks":[],"text":" to the rest of the wide range of internal data."}],"markDefs":[],"style":"normal"},{"_key":"dea70083ee80","_type":"block","children":[{"_key":"0c0eb4fdf3e50","_type":"span","marks":[],"text":"The "},{"_key":"0c0eb4fdf3e51","_type":"span","marks":["strong"],"text":"simplicity"},{"_key":"0c0eb4fdf3e52","_type":"span","marks":[],"text":" "},{"_key":"0c0eb4fdf3e53","_type":"span","marks":["strong"],"text":"of this"},{"_key":"0c0eb4fdf3e54","_type":"span","marks":[],"text":" "},{"_key":"0c0eb4fdf3e55","_type":"span","marks":["strong"],"text":"solution"},{"_key":"0c0eb4fdf3e56","_type":"span","marks":[],"text":" allows us to effectively "},{"_key":"0c0eb4fdf3e57","_type":"span","marks":["strong"],"text":"extract useful signals"},{"_key":"0c0eb4fdf3e58","_type":"span","marks":[],"text":" from a large volume of data. The "},{"_key":"0c0eb4fdf3e59","_type":"span","marks":["strong"],"text":"consistent structure"},{"_key":"0c0eb4fdf3e510","_type":"span","marks":[],"text":" of the outputs allows us to easily isolate and combine the most relevant information for a given company problem. With the metadata tracking and quality monitoring, users feel "},{"_key":"c6a781262e081","_type":"span","marks":["strong"],"text":"confident"},{"_key":"c6a781262e082","_type":"span","marks":[],"text":" using these features, trusting in their "},{"_key":"c6a781262e083","_type":"span","marks":["strong"],"text":"accuracy"},{"_key":"c6a781262e084","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"d9a03edbcd4d","_type":"block","children":[{"_key":"3c91592869ab0","_type":"span","marks":[],"text":"The "},{"_key":"3c91592869ab1","_type":"span","marks":["strong"],"text":"engineered features"},{"_key":"3c91592869ab2","_type":"span","marks":[],"text":" offer a more focused and simplified version of the raw data as opposed to working with the unwieldiness and complexity of multiple raw datasets joined together which may have unwanted information not currently monitored for DQ."}],"markDefs":[],"style":"normal"},{"_key":"0b5a25603e38","_type":"block","children":[{"_key":"57a53a2cc93a0","_type":"span","marks":[],"text":"The results are:"}],"markDefs":[],"style":"normal"},{"_key":"b5c9ad5192f1","_type":"block","children":[{"_key":"4ff4e1dead0e0","_type":"span","marks":["strong"],"text":"Accurate"},{"_key":"4ff4e1dead0e1","_type":"span","marks":[],"text":" and "},{"_key":"4ff4e1dead0e2","_type":"span","marks":["strong"],"text":"reliable"},{"_key":"4ff4e1dead0e3","_type":"span","marks":[],"text":" unit features"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"46a804845a61","_type":"block","children":[{"_key":"00cbdc3471fa0","_type":"span","marks":["strong"],"text":"Statistical features"},{"_key":"00cbdc3471fa1","_type":"span","marks":[],"text":" generated by DQ monitoring"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"94daefa0a6ab","_type":"block","children":[{"_key":"2ac575197d6a0","_type":"span","marks":["strong"],"text":"Plug & Play "},{"_key":"2ac575197d6a1","_type":"span","marks":[],"text":"property of the features using consistent identifiers"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"9aad138276b5","_type":"block","children":[{"_key":"b98177c055c20","_type":"span","marks":["strong"],"text":"Metadata"},{"_key":"b98177c055c21","_type":"span","marks":[],"text":" to track and reproduce processes"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"5b90da9aacf3","_type":"block","children":[{"_key":"85a3b95461d00","_type":"span","marks":["strong"],"text":"DQ monitoring"},{"_key":"85a3b95461d01","_type":"span","marks":[],"text":" and alerting"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"3e8db581acb4","_type":"block","children":[{"_key":"f679d15229100","_type":"span","marks":["strong"],"text":"Jobs"},{"_key":"f679d15229101","_type":"span","marks":[],"text":" which consistently maintain data & metadata"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"5d36a5c77afe","_type":"block","children":[{"_key":"9ddc789aeb560","_type":"span","marks":["strong"],"text":"Documentation"}],"level":1,"listItem":"bullet","markDefs":[],"style":"normal"},{"_key":"17c452377d75","_type":"block","children":[{"_key":"3746060419fe0","_type":"span","marks":[],"text":"Finally, due to all above benefits, the "},{"_key":"3746060419fe1","_type":"span","marks":["strong"],"text":"FGF"},{"_key":"3746060419fe2","_type":"span","marks":[],"text":" proves to be an irreplaceable "},{"_key":"3746060419fe3","_type":"span","marks":["strong"],"text":"savior"},{"_key":"3746060419fe4","_type":"span","marks":[],"text":" of "},{"_key":"3746060419fe5","_type":"span","marks":["strong"],"text":"Data Engineering"},{"_key":"3746060419fe6","_type":"span","marks":[],"text":" "},{"_key":"3746060419fe7","_type":"span","marks":["strong"],"text":"time and resources"},{"_key":"3746060419fe8","_type":"span","marks":[],"text":"."}],"markDefs":[],"style":"normal"},{"_key":"04cac87f0591","_type":"block","children":[{"_key":"94581fe331260","_type":"span","marks":[],"text":""}],"markDefs":[],"style":"normal"}]},"allCategoryMatchedPost":{"nodes":[{"id":"-b563eca1-33f5-5613-ab9b-edcb04190456","slug":{"current":"better-data-better-products-building-confidence-into-the-data-behind-every-sonos-experience"},"title":"Better Data, Better Products: Building Confidence into the Data Behind Every Sonos Experience","mainImage":{"altText":"Better Data, Better Products: Building Confidence into the Data Behind Every Sonos Experience","asset":{"path":"images/znqtjj88/production/e6c38316865378f860bfc383ee38b5db521da18b-5433x3318.jpg","metadata":{"dimensions":{"width":5433,"height":3318,"aspectRatio":1.6374321880650995},"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAMABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAMFBAf/xAAgEAACAQQBBQAAAAAAAAAAAAAAAQIDBBESMRMhUVKR/8QAFwEAAwEAAAAAAAAAAAAAAAAAAQIDBf/EABYRAQEBAAAAAAAAAAAAAAAAAAABEf/aAAwDAQACEQMRAD8A7btH2X0YnkiVLaDq1HtPu88lCx26LzJyx5NqXVbGsBcJNrkBgf/Z","palette":{"dominant":{"background":"#d4ccfc"}}}}},"categories":[{"_id":"baa46497-c2bf-4eeb-bf51-bd08238af629","title":"Data Engineering"}],"publishedAt":"2026-03-28T02:50:00.000Z"},{"id":"-eeea6166-14a4-53cb-9403-2a9176d634ea","slug":{"current":"reproducing-on-device-data-accurately-for-private-by-design-voice-control"},"title":"Reproducing On-Device Data Accurately for Private-by-Design Voice Control","mainImage":{"altText":null,"asset":{"path":"images/znqtjj88/production/87105e334cbb0c83b019c207e1c4d5481f6c8a40-3622x2212.jpg","metadata":{"dimensions":{"width":3622,"height":2212,"aspectRatio":1.6374321880650995},"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAMABQDASIAAhEBAxEB/8QAGAAAAgMAAAAAAAAAAAAAAAAAAAQCAwf/xAAfEAADAQACAQUAAAAAAAAAAAABAgMAETEFBBNBUWH/xAAVAQEBAAAAAAAAAAAAAAAAAAABBf/EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhEDEQA/AMhfx9lgtQA6t8L3l6RpMA0RlB+xnU8l6hI+2pULxx1k6WpQAUdmH6dcSleNNgOesZD/2Q==","palette":{"dominant":{"background":"#14345c"}}}}},"categories":[{"_id":"2a760cee-ab0b-432c-866e-eae71039e09d","title":"Machine Learning"},{"_id":"baa46497-c2bf-4eeb-bf51-bd08238af629","title":"Data Engineering"}],"publishedAt":"2023-04-26T13:40:20.236Z"},{"id":"-2d5a4a71-3f16-5bfa-82ba-f920a95180be","slug":{"current":"automating-data-engineering-and-data-discovery"},"title":"Automating Data Engineering and Data Discovery at Sonos","mainImage":{"altText":null,"asset":{"path":"images/znqtjj88/production/532a11b7c8d43f9da74045a11e738af2d7f15860-3622x2212.jpg","metadata":{"dimensions":{"width":3622,"height":2212,"aspectRatio":1.6374321880650995},"lqip":"data:image/jpeg;base64,/9j/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAAMABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAMFBAb/xAAjEAABBAIBAwUAAAAAAAAAAAABAgMEEQASBQYTISIxMkFR/8QAFwEAAwEAAAAAAAAAAAAAAAAAAQMEBv/EABkRAAMBAQEAAAAAAAAAAAAAAAECEQADof/aAAwDAQACEQMRAD8A6OIhUnWK01s6tQpQ9xlbqHhUcLGZGwdddHqN/A/lZIgTX4DxdjK1XVXWLcfdff7jy1LUVWSTebZk6HoIYo91MNyz4+jhmzkXC7I2KUg6geBWGOVqAcRv/9k=","palette":{"dominant":{"background":"#9c7cf4"}}}}},"categories":[{"_id":"baa46497-c2bf-4eeb-bf51-bd08238af629","title":"Data Engineering"}],"publishedAt":"2021-06-29T15:36:00.000Z"}]}},"pageContext":{"id":"-c884edc1-7150-5759-b861-66f02c77f631","categories":["Analytics","Data Engineering"]}},
    "staticQueryHashes": ["4145174575"]}