AWS Machine Learning Certification Specialty Exam Prep

The AWS Certified Machine Learning Specialty validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.

Use this App to learn about Machine Learning on AWS and prepare for the AWS Machine Learning Specialty Certification MLS-C01.

Download AWS machine Learning Specialty Exam Prep App on iOs

Download AWS Machine Learning Specialty Exam Prep App on Android/Web/Amazon

AWS MLS-C01 Machine Learning Specialty Exam Prep PRO
AWS machine learning certification prep
AWS machine learning certification prep

Download AWS machine Learning Specialty Exam Prep App on iOs

Download AWS Machine Learning Specialty Exam Prep App on Android/Web/Amazon

The App provides hundreds of quizzes and practice exam about:

– Machine Learning Operation on AWS

– Modelling

– Data Engineering

– Computer Vision,

– Exploratory Data Analysis,

– ML implementation & Operations

– Machine Learning Basics Questions and Answers

– Machine Learning Advanced Questions and Answers

– Scorecard

– Countdown timer

– Machine Learning Cheat Sheets

– Machine Learning Interview Questions and Answers

– Machine Learning Latest News

The App covers Machine Learning Basics and Advanced topics including: NLP, Computer Vision, Python, linear regression, logistic regression, Sampling, dataset, statistical interaction, selection bias, non-Gaussian distribution, bias-variance trade-off, Normal Distribution, correlation and covariance, Point Estimates and Confidence Interval, A/B Testing, p-value, statistical power of sensitivity, over-fitting and under-fitting, regularization, Law of Large Numbers, Confounding Variables, Survivorship Bias, univariate, bivariate and multivariate, Resampling, ROC curve, TF/IDF vectorization, Cluster Sampling, etc.

Domain 1: Data Engineering

Create data repositories for machine learning.

Identify data sources (e.g., content and location, primary sources such as user data)

Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)

Identify and implement a data ingestion solution.

Data job styles/types (batch load, streaming)

Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads), etc.

Domain 2: Exploratory Data Analysis

Sanitize and prepare data for modeling.

Perform feature engineering.

Analyze and visualize data for machine learning.

Domain 3: Modeling

Frame business problems as machine learning problems.

Select the appropriate model(s) for a given machine learning problem.

Train machine learning models.

Perform hyperparameter optimization.

Evaluate machine learning models.

Domain 4: Machine Learning Implementation and Operations

Build machine learning solutions for performance, availability, scalability, resiliency, and fault


Recommend and implement the appropriate machine learning services and features for a given


Apply basic AWS security practices to machine learning solutions.

Deploy and operationalize machine learning solutions.

Machine Learning Services covered:

Amazon Comprehend

AWS Deep Learning AMIs (DLAMI)

AWS DeepLens

Amazon Forecast

Amazon Fraud Detector

Amazon Lex

Amazon Polly

Amazon Rekognition

Amazon SageMaker

Amazon Textract

Amazon Transcribe

Amazon Translate

Other Services and topics covered are:



Data analysis/visualization

Model training

Model deployment/inference


AWS ML application services

Language relevant to ML (for example, Python, Java, Scala, R, SQL)

Notebooks and integrated development environments (IDEs),

S3, SageMaker, Kinesis, Lake Formation, Athena, Kibana, Redshift, Textract, EMR, Glue, SageMaker, CSV, JSON, IMG, parquet or databases, Amazon Athena

Amazon EC2, Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service, Amazon Elastic Kubernetes Service , Amazon Redshift

Important: To succeed with the real exam, do not memorize the answers in this app. It is very important that you understand why a question is right or wrong and the concepts behind it by carefully reading the reference documents in the answers.

Note and disclaimer: We are not affiliated with Microsoft or Azure or Google or Amazon. The questions are put together based on the certification study guide and materials available online. The questions in this app should help you pass the exam but it is not guaranteed. We are not responsible for any exam you did not pass.

Download AWS machine Learning Specialty Exam Prep App on iOs

Download AWS Machine Learning Specialty Exam Prep App on Android/Web/Amazon

  • [P] Confidence Intervals for binary classification
    by /u/chkthat (Machine Learning) on May 28, 2022 at 10:17 pm

    Hello r/machinelearning, the situation: I am currently trying to estimate the accuracy of a machine vision system. The goal is to automate surface inspection in an industrial environment. e.g. detecting scratches and dents on a flat surface. i ran quite a few trials in order to estimate the systems accruacy. But a point-estimator isnt worth much without a confidence interval. Literature does not recommend to aproximate the binomial distribution of the bernoulli-trials I conducted with a normal distribution, when the probability of success is near 1 (or 0), because the central limit theorem does not apply here. Instead the agresti-Coull-CI is recommended. the problem: at confidence level 1-alpha = 95% the upper boundary of my confidence interval exceeds 100%, which strikes me as illogical. ​ the question: Can you give me a piece of advice on how to estimate the probability of (in)correct classification? Is the agresti-coull-intervall a good method for constructing a confidence intervall with n > 100 trials and if so is it possible to get an upper boundary >100% or does this result hint at a misscalculation? ​ Appendix: CI_AgrestiCoull = p +- 2*sqrt((p-p²)/n) with p = k/n , k = correct classifications + 2, n = number of trials + 4 (Values for confidence level 95%) submitted by /u/chkthat [link] [comments]

  • [D] Face Recognition and Verification System Integrated With Database
    by /u/Ambitious_Phantom (Machine Learning) on May 28, 2022 at 9:52 pm

    I'm in the beginning of my programming and Machine Learning studies, but I want to build a AI that recognize a face and search if the face are in a database of users, I don't know the path to build this, can someone please tell a pathway of what do I need to study to be able to build the system? Any help will be welcomed! submitted by /u/Ambitious_Phantom [link] [comments]

  • [R][P] Gradio Web Demo for HairCLIP: Design Your Hair by Text and Reference Image
    by /u/Illustrious_Row_9971 (Machine Learning) on May 28, 2022 at 8:54 pm

    submitted by /u/Illustrious_Row_9971 [link] [comments]

  • [D] Fairness of comparison of superiority and efficiency between different neural network architectures
    by /u/SpiridonSunRotator (Machine Learning) on May 28, 2022 at 8:21 pm

    During the last 10 years Deep Learning has made impressive progress in various domains, but here I would like to be concrete and focus on computer vision, and in particular ImageNet as a popular benchmark. Computer vision models have evolved much from vanilla CNNs consisting of only convolutional layers + activations + pooling to more advanced with skip connections, depthwise separable convolutions, squeeze-excitation blocks, and, recently, vision transformers and derivative models. There are a lot of papers proposing some architecture changes and claiming that at a given amount of parameters and FLOPs they achieve the best result, i.e lie highest of all on the Pareto front. ​ From MobileNet V3 However, the final performance depends not only on how efficient and optimal is design of the given architecture is, but is a function of the training procedure as well. For example, ResNet-50 in the original paper ( using 90 epochs of training with relatively weak augmentation achieved ~ 75.2% top-1 accuracy on ImageNet 1-k, whereas a more elaborate training procedure with RandAugment, Mixup, Cutmix ( allows to get 80.4% top-1 accuracy with the same architecture. ​ From ConvNext If the models are trained in a different way, it is difficult to disentangle, whether the advantage one over the other is due to the better design of model or the improved training procedure. Some of the papers give all details about the training procedure - the number of epochs, augmentations used, optimizer details like in ConvNext -, whereas other MobileNet V1 -, MobileNet V3 - provide only part of the details about their training procedure, without mentioning the number of epochs/steps. At it is hard to guess, whether the training was stopped, when the performance is saturated. What would be the fair way to compare different architectures? Train with the same procedure - (optimizer params, augmentation). But the recipe good for one model may lead to divergent training of another model. Try to find optimal training procedure for each model and then compare. Seems like it is a more fair setting, but the problem is that it can be hard to tune and for one model the final set of params could be closer to optimal than for another. submitted by /u/SpiridonSunRotator [link] [comments]

  • [R] OnePose can estimate 6D poses of arbitrary household objects without instance/category-specific training or CAD models
    by /u/SpatialComputing (Machine Learning) on May 28, 2022 at 6:20 pm

    submitted by /u/SpatialComputing [link] [comments]

  • [P] I reviewed 50+ open-source MLOps tools. Here’s the result
    by /u/Academic_Arrak (Machine Learning) on May 28, 2022 at 4:13 pm

    I spent the past weeks researching the most popular open-source MLOps tools and I would like to share the results with you. I created a website ( listing the tools, explaining when to use each of them and pitfalls to watch out for. You can fill in your stack based on our template. Why did I do it? I feel the current MLOps landscape has amazing tools. But so many of them! Right now picking a solution feels like a puzzle. It’s confusing and incredibly hard to put the pieces together to fit your needs. This is just a small contribution. If you want me to be more helpful, I have a small ask. What are the things you currently struggle with but haven’t guidance anywhere? Here are some of my thoughts: Examples of MLOps stacks - tools that work well together and are popular combinations Compare tools with code snippets - see tools in action Stack “cookiecutter” - code templates of tools working together I would be grateful if you could point me in the right direction. No opinion is too small! 🙂 submitted by /u/Academic_Arrak [link] [comments]

  • [R] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
    by /u/koolaidman123 (Machine Learning) on May 28, 2022 at 3:22 pm

    submitted by /u/koolaidman123 [link] [comments]

  • [P] Homemade algorithm for building faces in < 1 minute with 80 images.
    by /u/Singular23 (Machine Learning) on May 28, 2022 at 1:26 pm

    Was inspired by the use of Wasserstein distances to come up with a new method for generating Eg. faces without using neural networks. Raw output + noise reduction Specifications Training data around ~80 pieces of 128x128 images (B/W) Build time: Fast RAM usage: Extremely high Parallelizable: Yes (Step 1 in algorithm) and No (step 2 and step 3 in algorithm) Scalability: Poor. Algorithm used: 1) Build dictionary with pool of vectors/fragments from training set- Start by sliding over each image with a 9x9 grid (may be different sizes). For each sub frame flatten the content to vector and save to dict. Save also meta data such as relative location where the vector originated (inside the image) and the source image name). This meta data will be used in step 3. 2) Initiate new canvas- Pick random fragment (vector) known to originate from the very first coordinate (Top left edge of image) and insert into an empty numpy array (canvas). Let's call the first fragment for fragment (A). 3) Assemble new image based on three steps/rules After initiating canvas, the following interactive steps are followed to assemble the final image: a. The source that gave rise the most recent fragment (A) is put on a temporary penalty list where they will stay for ~20 iterations). b. Next, the most recent fragment (A) is compared against all new candidate fragments (B) found to originate from the next neighboring coordinate (except for fragments whose sources exists on the penalty list). c. The fragment (B) with the smallest Wasserstein-distance to fragment (A) is then selected to be inserted in the next neighboring position in the canvas. Fragment (B) then becomes -> Fragment (A) and the loop starts over until the image has been assembled. Possible optimization: Running this on a system with more RAM to see if the resemblance of an actual face will change. The 80-100 images I experimented with easily ate up 6-7 GB RAM when building the pool of fragments / vectors. Limiting the RAM usage by randomly down sampling the pool vectors. Perhaps find a metric to downsamples such that we can ensure the degree of exploration is still high. The results shown below possible be optimized by blending / averaging the values when of the overlapping fragments. It also appears I might have some algorithm issues resulting in some odd shifting of pixels once in a while. submitted by /u/Singular23 [link] [comments]

  • [R] Happy to share our latest Research paper: Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges
    by /u/aadityaura (Machine Learning) on May 28, 2022 at 11:22 am

    Hello everyone, We are pleased to share our new research paper with the ML community Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges. Our paper was accepted at ACM Transactions on Computing for Healthcare 2022 and published in ACM Digital Library. Federated learning (FL) is a novel paradigm that allows deploying large-scale machine learning models trained in different data centres without transferring data. The sensitive and distributed nature of EHR (Electronic Health Records) in real-world scenarios simulates a need for an effective mechanism to learn from data residing in health-related institutions and hospitals while accounting for data privacy. This motivates us to examine the potential and value of federated learning in the healthcare domain. ​ Applications of federated learning in healthcare ​ The main contributions: We demonstrate the components of the federated learning setup and discuss the communication architecture and building blocks of a federated learning system. We examine the various challenges that a federated learning setup faces in terms of privacy, data, and communication in the healthcare system. We survey existing works on federated learning in the health sector and propose a comprehensive list of applications classified into prognosis, diagnosis, and clinical workflow. ​ The Paper is open source and available at: If you find this helpful, please cite the paper. @article{10.1145/3533708, author = {Joshi, Madhura and Pal, Ankit and Sankarasubbu, Malaikannan}, title = {Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges}, year = {2022}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {2691-1957}, url = {}, doi = {10.1145/3533708}, abstract = {Federated learning is the process of developing machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, and applications a practitioner should be aware of in the topic of federated learning. This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries.}, note = {Just Accepted}, journal = {ACM Trans. Comput. Healthcare}, month = {apr}, keywords = {federated learning, transfer learning, GDPR} } submitted by /u/aadityaura [link] [comments]

  • [R] Guidance: a cheat code for diffusion models (Blog post)
    by /u/hardmaru (Machine Learning) on May 28, 2022 at 5:28 am

    submitted by /u/hardmaru [link] [comments]

  • [D] which small data problems pique your interest
    by /u/SpookyTardigrade (Machine Learning) on May 28, 2022 at 5:00 am

    Motivated by a recent post (can't seem to find it now; maybe from another subreddit) about how DL architecture research is gatekeep-ed by intensive computational requirements, I'd like to ask: what are your favorite small data problems? Why do you find it interesting? submitted by /u/SpookyTardigrade [link] [comments]

  • [D][R] How to create/tag the dataset for the sentence similarity task?
    by /u/aadityaura (Machine Learning) on May 28, 2022 at 3:36 am

    Hello everyone, I have a large corpus of domain-specific documents. I have gone through the quora question pair dataset and BIOSSES datasets for reference. I am trying to tag the dataset for different Language understanding tasks, including sentence similarity. But I am having difficulty creating the SST dataset for such a task. If I have 2-3 experts in that field, what is the best way to create the dataset for such a task? My thoughts: Approach 1 I am converting all documents into paragraphs and encoding them using USE, Elmo or domain-specific Bert embeddings. Extract top-10 similar sentences for each sentence (using Cosine or Levenshtein distance) A UI will show the expert a sentence and its similar sentences. An expert will choose the best similar sentence for the given sentence, and in the backend, both sentences will be tagged with label 1 ( which indicates that both are similar) Later, expert two will review this tagged dataset ( tagged by expert 1 ) and give the scale between 0-5; how similar are they? ​ Approach 2 Replace random keywords in sentences with synonyms using the domain pre-trained bert model. Show both sentences ( replaced sentence and original sentence ) to the expert, and let the expert tag the sentence with a scale of 0-5? ​ I'd love to hear suggestions from members of this subreddit. Any kind of input would be appreciated. submitted by /u/aadityaura [link] [comments]

  • [D] Best Tech Stack for Machine Learning Web Applications
    by /u/XhoniShollaj (Machine Learning) on May 28, 2022 at 2:02 am

    According to your experience deploying ML Solutions, what would be the best tech stack for deploying a web application which integrates various ML Algorithms in the background? Currently, I'm looking at FA.R.M (FastAPI, React JS, MongoDB) - but not sure what your take would be on this. Also, since most of our algorithms run on Notebooks - what would be the best practice for moving their outputs to production? Any kind of input would be appreciated. submitted by /u/XhoniShollaj [link] [comments]

  • [R] Reconnaissance Blind Chess - Join the NeurIPS Competition!
    by /u/rwgardner (Machine Learning) on May 27, 2022 at 9:58 pm

    Create a bot for the NeurIPS 2022 competition in Reconnaissance Blind Chess! Reconnaissance Blind Chess is a chess variant designed for new research in artificial intelligence. RBC includes imperfect information, long-term strategy, explicit observations, and almost no common knowledge. These features appear in real-world scenarios, and challenge even state of the art algorithms including those used to create super-human bots in chess, Go, and poker, for example. Each player of RBC controls traditional chess pieces, but cannot directly see the locations of her opponent's pieces. Rather, she learns partial information each turn by privately sensing a 3x3 area of the board. RBC's foundation in traditional chess makes it familiar and entertaining to human players, too! There is no cost to enter this tournament. Winners will receive a small monetary prize and authors of the best AIs will be invited talk about their bots at NeurIPS, the world's largest AI conference. Learn more, play a game of RBC yourself, and join our research community at ! ​ ​ Organized by: Johns Hopkins University Applied Physics Laboratory with Ashley J. Llorens (Microsoft Research) Todd W. Neller (Gettysburg College) Raman Arora (Johns Hopkins University) Bo Li (University of Illinois) Mykel J. Kochenderfer (Stanford University) submitted by /u/rwgardner [link] [comments]

  • [D] How to train a model to identify ranked classes?
    by /u/rsandler (Machine Learning) on May 27, 2022 at 9:45 pm

    Hi, I am trying to train a model to estimate the severity of an image in classes like "normal", "mild", "moderate", "severe". One approach would be to do multiclass classification, but that seems suboptimal since it doesn't encode the knowledge that the classes are not random, but ranked (ie normal < mild < moderate < severe). Another approach is to encode these classes are a number (ie normal=0, mild=1, moderate=2, severe=3) and perform regression. This seems sensible but I have never seen it done. Is there any literature on this topic? Is there another approach I am missing? submitted by /u/rsandler [link] [comments]

  • [P] TensorFlow Similarity 0.16 is out
    by /u/ebursztein (Machine Learning) on May 27, 2022 at 9:26 pm

    Happy Friday, Just a quick note that TensorFlow Similarity 0.16 is out -- this release beside adding the XMB loss is mostly focus on refactoring and optimizing the core components to ensure everything works smoothly and accurately. Details are in the changelog as usual and a simple pip install -U tensorflow_similarity should just work. We spend a lot of time behind the scene making sure STOA papers results can be reproduced and fixed a lot of bugs (including in augmentations) that should give you some accuracy boost compared to 0.16. Next we're going to keep working toward providing a strong foundations and extensive benchmarking capabilities so you can rely on it for your research. The last missing piece before 1.0 is how we do storage so it scale past 10M points and work with many ANN backend. If you are interested in helping let us know. Have a great weekend! submitted by /u/ebursztein [link] [comments]

  • [R] Flexible Diffusion Modeling of Long Videos
    by /u/Wiskkey (Machine Learning) on May 27, 2022 at 9:11 pm

    Paper. Abstract: We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator. Blog post (includes generated videos). Twitter thread from some of the authors. submitted by /u/Wiskkey [link] [comments]

  • On the Paradox of Learning to Reason from Data - Language models only learn a facsimile of reasoning based off of inherent statistical features
    by /u/stressed-nb (Machine Learning) on May 27, 2022 at 7:55 pm

    submitted by /u/stressed-nb [link] [comments]

  • [P] BrainAgent: Open Source for SOTA Performance on DMLab-30 of Multi-Task RL !
    by /u/leedoyup (Machine Learning) on May 27, 2022 at 5:54 pm

    Hello. I'd like to introduce an awesome project "Brain Agent." github: Brain Agent is a codebase for large-scale RL. The key contribution is the SOTA result & open-sourced codes and checkpoints for the DMLab-30 environment. DMLab-30 for Multi-task RL DMLab-30 is an environment for multi-task RL, consisting of different 30 tasks, developed by DeepMind. The tasks are hard to solve and important for multi-task RL research, but there was no reproducible codebase for SOTA performance on it. As the result, the SOTA performance has been always reported in the papers by DeepMind, but other RL researchers except for DeepMind cannot conduct a cutting-edge research on the environment. In this project, Kakao Brain succeed in achieving state-of-the-art performance on DMLab-30. In addition, we released the codes for evaluation and pretrained checkpoints, and hope our project help many RL researches focus how to solve the difficult multi-task RL tasks on DMLab-30. Reported Performance of Released Checkpoints Enjoy and ⭐️ submitted by /u/leedoyup [link] [comments]

  • [D] Can AI Replace Our Graphic Designer?
    by /u/takuonline (Machine Learning) on May 27, 2022 at 5:37 pm

    A nice video by MKBHD that gives some really nice insights on the effect that Dalle 2 or ml will have in the future. Considering that the model can make multiple examples in a very short space of time, things are getting very interesting and scary l will say. submitted by /u/takuonline [link] [comments]

  • Feasibility of aggregating text messages to train a system for natural language/chat bot [D]
    by /u/WordJord (Machine Learning) on May 27, 2022 at 5:31 pm

    I wonder if this is possible, I know getting messages from random people could create a lot of noise and varied inputs, but does it seem feasible to clean/prepare texts in such a way to use them for this purpose? submitted by /u/WordJord [link] [comments]

  • [R] Training ReLU networks to high uniform accuracy is intractable
    by /u/julbern (Machine Learning) on May 27, 2022 at 2:09 pm

    PDF on ResearchGate / arXiv Abstract: Statistical learning theory provides bounds on the necessary number of training samples needed to reach a prescribed accuracy in a learning problem formulated over a given target class. This accuracy is typically measured in terms of a generalization error, that is, an expected value of a given loss function. However, for several applications -- for example in a security-critical context or for problems in the computational sciences -- accuracy in this sense is not sufficient. In such cases, one would like to have guarantees for high accuracy on every input value, that is, with respect to the uniform norm. In this paper we precisely quantify the number of training samples needed for any conceivable training algorithm to guarantee a given uniform accuracy on any learning problem formulated over target classes containing (or consisting of) ReLU neural networks of a prescribed architecture. We prove that, under very general assumptions, the minimal number of training samples for this task scales exponentially both in the depth and the input dimension of the network architecture. As a corollary we conclude that the training of ReLU neural networks to high uniform accuracy is intractable. In a security-critical context this points to the fact that deep learning based systems are prone to being fooled by a possible adversary. We corroborate our theoretical findings by numerical results. submitted by /u/julbern [link] [comments]

  • [D] I don't really trust papers out of "Top Labs" anymore
    by /u/MrAcurite (Machine Learning) on May 27, 2022 at 5:46 am

    I mean, I trust that the numbers they got are accurate and that they really did the work and got the results. I believe those. It's just that, take the recent "An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems" paper. It's 18 pages of talking through this pretty convoluted evolutionary and multitask learning algorithm, it's pretty interesting, solves a bunch of problems. But two notes. One, the big number they cite as the success metric is 99.43 on CIFAR-10, against a SotA of 99.40, so woop-de-fucking-doo in the grand scheme of things. Two, there's a chart towards the end of the paper that details how many TPU core-hours were used for just the training regimens that results in the final results. The sum total is 17,810 core-hours. Let's assume that for someone who doesn't work at Google, you'd have to use on-demand pricing of $3.22/hr. This means that these trained models cost $57,348. Strictly speaking, throwing enough compute at a general enough genetic algorithm will eventually produce arbitrarily good performance, so while you can absolutely read this paper and collect interesting ideas about how to use genetic algorithms to accomplish multitask learning by having each new task leverage learned weights from previous tasks by defining modifications to a subset of components of a pre-existing model, there's a meta-textual level on which this paper is just "Jeff Dean spent enough money to feed a family of four for half a decade to get a 0.03% improvement on CIFAR-10." OpenAI is far and away the worst offender here, but it seems like everyone's doing it. You throw a fuckton of compute and a light ganache of new ideas at an existing problem with existing data and existing benchmarks, and then if your numbers are infinitesimally higher than their numbers, you get to put a lil' sticker on your CV. Why should I trust that your ideas are even any good? I can't check them, I can't apply them to my own projects. Is this really what we're comfortable with as a community? A handful of corporations and the occasional university waving their dicks at everyone because they've got the compute to burn and we don't? There's a level at which I think there should be a new journal, exclusively for papers in which you can replicate their experimental results in under eight hours on a single consumer GPU. submitted by /u/MrAcurite [link] [comments]

  • [D] Are there any existing algorithms that apply kNN in a bootstrapping manner?
    by /u/TrepidEd0601 (Machine Learning) on May 27, 2022 at 2:24 am

    Just out of curiosity, I was wondering what would happen if we combined kNN and bootstrapping in a certain manner. Specifically, say we have N points that we want to classify and we have a labeled set of points that are used for kNN. After applying kNN to these N points individually, we iterate over a process defined as: Repeat until class change is arbitrarily minimal: For each point in N observed points: Look at L nearest neighbors within N observed points Alter the class of the current point by majority vote of the L points In other words, applying kNN iteratively on unobserved points until "convergence" (i.e. assignment change of points is minimal). Going ahead with the idea, I tried implementing a quick snippet of code to test this out on some toy datasets, and for datasets where vanilla kNN (without altering the representation of the data) obtains an accuracy of 65%, this iterative method improves it a bit, up to around 70%. I tried doing a quick search if there already exists a similar algorithm to this and I couldn't find anything, but this seems like a simple enough idea that others must have already considered in the past. EDIT: Not too relevant, but it might help to add that this was inspired a bit from PU Learning. submitted by /u/TrepidEd0601 [link] [comments]

  • [R] An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems - Google 2022 - Jeff Dean
    by /u/Singularian2501 (Machine Learning) on May 26, 2022 at 7:37 pm

    Paper: Abstract: "Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is often focused to the study of common pitfalls such as catastrophic forgetting instead of being studied at a large scale as a critical component to build the next generation artificial intelligence. We propose an evolutionary method that can generate a large scale multitask model, and can support the dynamic and continuous addition of new tasks. The generated multitask model is sparsely activated and integrates a task-based routing that guarantees bounded compute cost and fewer added parameters per task as the model expands. The proposed method relies on a knowledge compartmentalization technique to achieve immunity against catastrophic forgetting and other common pitfalls such as gradient interference and negative transfer. We empirically show that the proposed method can jointly solve and achieve competitive results on 69image classification tasks, for example achieving the best test accuracy reported for a model trained only on public data for competitive tasks such as cifar10: 99.43%." submitted by /u/Singularian2501 [link] [comments]

  • [D] Machine Learning - WAYR (What Are You Reading) - Week 138
    by /u/ML_WAYR_bot (Machine Learning) on May 22, 2022 at 9:49 pm

    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 134 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 135 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 136 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 137 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/joyful_reader: Article 1 /u/need___username: /u/CatalyzeX_code_bot: Paper link Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]

  • [D] Simple Questions Thread
    by /u/AutoModerator (Machine Learning) on May 22, 2022 at 3:00 pm

    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]

  • Monkey Patching Python Code
    by Adrian Tam (Blog) on May 21, 2022 at 2:00 pm

    Python is a dynamic scripting language. Not only does it have a dynamic type system where a variable can be assigned to one type first and changed later, but its object model is also dynamic. This allows us to modify its behavior at run time. A consequence of this is the possibility of monkey patching. The post Monkey Patching Python Code appeared first on Machine Learning Mastery.

  • Detect social media fake news using graph machine learning with Amazon Neptune ML
    by Hasan Shojaei (AWS Machine Learning Blog) on May 19, 2022 at 4:12 pm

    In recent years, social media has become a common means for sharing and consuming news. However, the spread of misinformation and fake news on these platforms has posed a major challenge to the well-being of individuals and societies. Therefore, it is imperative that we develop robust and automated solutions for early detection of fake news

  • Optimize F1 aerodynamic geometries via Design of Experiments and machine learning
    by Pablo Hermoso Moreno (AWS Machine Learning Blog) on May 19, 2022 at 4:02 pm

    FORMULA 1 (F1) cars are the fastest regulated road-course racing vehicles in the world. Although these open-wheel automobiles are only 20–30 kilometers (or 12–18 miles) per-hour faster than top-of-the-line sports cars, they can speed around corners up to five times as fast due to the powerful aerodynamic downforce they create. Downforce is the vertical force

  • Build a risk management machine learning workflow on Amazon SageMaker with no code
    by Peter Chung (AWS Machine Learning Blog) on May 19, 2022 at 3:47 pm

    Since the global financial crisis, risk management has taken a major role in shaping decision-making for banks, including predicting loan status for potential customers. This is often a data-intensive exercise that requires machine learning (ML). However, not all organizations have the data science resources and expertise to build a risk management ML workflow. Amazon SageMaker

  • Logging in Python
    by Daniel Chung (Blog) on May 18, 2022 at 8:00 pm

    Logging is a way to store information about your script and track events that occur. When writing any complex script in Python, logging is essential for debugging software as you develop it. Without logging, finding the source of a problem in your code may be extremely time consuming. After completing this tutorial, you will know: The post Logging in Python appeared first on Machine Learning Mastery.

  • Use Amazon Lex to capture street addresses
    by Brian Yost (AWS Machine Learning Blog) on May 18, 2022 at 6:18 pm

    Amazon Lex provides automatic speech recognition (ASR) and natural language understanding (NLU) technologies to transcribe user input, identify the nature of their request, and efficiently manage conversations. Lex lets you create sophisticated conversations, streamline your user experience to improve customer satisfaction (CSAT) scores, and increase containment in your contact centers. Natural, effective customer interactions require

  • Customize pronunciation using lexicons in Amazon Polly
    by Ratan Kumar (AWS Machine Learning Blog) on May 17, 2022 at 3:36 pm

    Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize natural-sounding human speech. It is used in a variety of use cases, such as contact center systems, delivering conversational user experiences with human-like voices for automated real-time status check, automated account and billing inquiries, and by news agencies like The Washington

  • Personalize your machine translation results by using fuzzy matching with Amazon Translate
    by Narcisse Zekpa (AWS Machine Learning Blog) on May 16, 2022 at 5:48 pm

    A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when

  • Profiling Python Code
    by Adrian Tam (Blog) on May 14, 2022 at 10:00 am

    Profiling is a technique to figure out how time is spent in a program. With these statistics, we can find the “hot spot” of a program and think about ways of improvement. Sometimes, a hot spot in an unexpected location may hint at a bug in the program as well. In this tutorial, we will The post Profiling Python Code appeared first on Machine Learning Mastery.

  • Enhance the caller experience with hints in Amazon Lex
    by Kai Loreck (AWS Machine Learning Blog) on May 13, 2022 at 10:36 pm

    We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up

  • Run automatic model tuning with Amazon SageMaker JumpStart
    by Doug Mbaya (AWS Machine Learning Blog) on May 13, 2022 at 12:09 am

    In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). In March 2022, we also announced the support for APIs in JumpStart. JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across

  • Image classification and object detection using Amazon Rekognition Custom Labels and Amazon SageMaker JumpStart
    by Pashmeen Mistry (AWS Machine Learning Blog) on May 12, 2022 at 10:07 pm

    In the last decade, computer vision use cases have been a growing trend, especially in industries like insurance, automotive, ecommerce, energy, retail, manufacturing, and others. Customers are building computer vision machine learning (ML) models to bring operational efficiencies and automation to their processes. Such models help automate the classification of images or detection of objects

  • Intelligently search your Jira projects with Amazon Kendra Jira cloud connector
    by Shreyas Subramanian (AWS Machine Learning Blog) on May 12, 2022 at 8:37 pm

    Organizations use agile project management platforms such as Atlassian Jira to enable teams to collaborate to plan, track, and ship deliverables. Jira captures organizational knowledge about the workings of the deliverables in the issues and comments logged during project implementation. However, making this knowledge easily and securely available to users is challenging due to it

  • The Intel®3D Athlete Tracking (3DAT) scalable architecture deploys pose estimation models using Amazon Kinesis Data Streams and Amazon EKS
    by Han Man (AWS Machine Learning Blog) on May 12, 2022 at 6:42 pm

    This blog post is co-written by Jonathan Lee, Nelson Leung, Paul Min, and Troy Squillaci from Intel.  In Part 1 of this post, we discussed how Intel®3DAT collaborated with AWS Machine Learning Professional Services (MLPS) to build a scalable AI SaaS application. 3DAT uses computer vision and AI to recognize, track, and analyze over 1,000

  • Moderate, classify, and process documents using Amazon Rekognition and Amazon Textract
    by Jay Rao (AWS Machine Learning Blog) on May 12, 2022 at 5:38 pm

    Many companies are overwhelmed by the abundant volume of documents they have to process, organize, and classify to serve their customers better. Examples of such can be loan applications, tax filing, and billing. Such documents are more commonly received in image formats and are mostly multi-paged and in low-quality format. To be more competitive and

  • Achieve in-vehicle comfort using personalized machine learning and Amazon SageMaker
    by Joshua Levy (AWS Machine Learning Blog) on May 11, 2022 at 4:24 pm

    This blog post is co-written by Rudra Hota and Esaias Pech from Continental AG. Many drivers have had the experience of trying to adjust temperature settings in their vehicle while attempting to keep their eyes on the road. Whether the previous driver preferred a warmer cabin temperature, or you’re now wearing warmer clothing, or the

  • Create video subtitles with Amazon Transcribe using this no-code workflow
    by Jason O'Malley (AWS Machine Learning Blog) on May 10, 2022 at 6:23 pm

    Subtitle creation on video content poses challenges no matter how big or small the organization. To address those challenges, Amazon Transcribe has a helpful feature that enables subtitle creation directly within the service. There is no machine learning (ML) or code writing required to get started. This post walks you through setting up a no-code

  • Utilize AWS AI services to automate content moderation and compliance
    by Lauren Mullennex (AWS Machine Learning Blog) on May 9, 2022 at 4:01 pm

    The daily volume of third-party and user-generated content (UGC) across industries is increasing exponentially. Startups, social media, gaming, and other industries must ensure their customers are protected, while keeping operational costs down. Businesses in the broadcasting and media industries often find it difficult to efficiently add ratings to content pieces and formats to comply with

  • Content moderation design patterns with AWS managed AI services
    by Nate Bachmeier (AWS Machine Learning Blog) on May 9, 2022 at 4:00 pm

    User-generated content (UGC) grows exponentially, as well as the requirements and the cost to keep content and online communities safe and compliant. Modern web and mobile platforms fuel businesses and drive user engagement through social features, from startups to large organizations. Online community members expect safe and inclusive experiences where they can freely consume and

  • Static Analyzers in Python
    by Adrian Tam (Blog) on May 9, 2022 at 5:09 am

    Static analyzers are tools that help you check your code without really running your code. The most basic form of static analyzers is the syntax highlighters in your favorite editors. If you need to compile your code (say, in C++), your compiler, such as LLVM, may also provide some static analyzer functions to warn you The post Static Analyzers in Python appeared first on Machine Learning Mastery.

  • Process larger and wider datasets with Amazon SageMaker Data Wrangler
    by Haider Naqvi (AWS Machine Learning Blog) on May 6, 2022 at 5:30 pm

    Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler can simplify your data preparation and feature engineering processes and help you with data selection, cleaning, exploration, and visualization. Data Wrangler has over 300 built-in transforms written in PySpark,

  • Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker
    by Arnav Khare (AWS Machine Learning Blog) on May 6, 2022 at 5:22 pm

    Approximately 7,000 languages are in use today. Despite attempts in the late 19th century to invent constructed languages such as Volapük or Esperanto, there is no sign of unification. People still choose to create new languages (think about your favorite movie character who speaks Klingon, Dothraki, or Elvish). Today, natural language processing (NLP) examples are

  • Build a custom Q&A dataset using Amazon SageMaker Ground Truth to train a Hugging Face Q&A NLU model
    by Jeremy Feltracco (AWS Machine Learning Blog) on May 6, 2022 at 4:29 pm

    In recent years, natural language understanding (NLU) has increasingly found business value, fueled by model improvements as well as the scalability and cost-efficiency of cloud-based infrastructure. Specifically, the Transformer deep learning architecture, often implemented in the form of BERT models, has been highly successful, but training, fine-tuning, and optimizing these models has proven to be

  • Use custom vocabulary in Amazon Lex to enhance speech recognition
    by Kai Loreck (AWS Machine Learning Blog) on May 5, 2022 at 10:34 pm

    In our daily conversations, we come across new words or terms that we may not know. Perhaps these are related to a new domain that we’re just getting familiar with, and we pick these up as we understand more about the domain. For example, home loan terminology (“curtailment”), shortened words, (“refi”, “comps”), and acronyms (“HELOC”)

  • Setting Breakpoints and Exception Hooks in Python
    by Stefania Cristina (Blog) on May 5, 2022 at 4:21 pm

    There are different ways of debugging code in Python, one of which is to introduce breakpoints into the code at points where one would like to invoke a Python debugger. The statements used to enter a debugging session at different call sites depend on the version of the Python interpreter that one is working with, The post Setting Breakpoints and Exception Hooks in Python appeared first on Machine Learning Mastery.

  • Using Kaggle in Machine Learning Projects
    by Zhe Ming Chng (Blog) on May 2, 2022 at 2:02 pm

    You’ve probably heard of Kaggle data science competitions, but did you know that Kaggle has many other features that can help you with your next machine learning project? For people looking for datasets for their next machine learning project, Kaggle allows you to access public datasets by others and share your own datasets. For those The post Using Kaggle in Machine Learning Projects appeared first on Machine Learning Mastery.

  • Techniques to Write Better Python Code
    by Adrian Tam (Blog) on April 29, 2022 at 2:47 pm

    We write a program to solve a problem or make a tool that we can repeatedly solve a similar problem. For the latter, it is inevitable that we come back to revisit the program we wrote, or someone else is reusing the program we write. There is also a chance that we will encounter data The post Techniques to Write Better Python Code appeared first on Machine Learning Mastery.

  • Take Your Machine Learning Skills Global
    by MLM Team (Blog) on April 28, 2022 at 2:48 am

    Sponsored Post In our interconnected world, a decision made thousands of miles away can have lasting consequences for entire organizations or economies. When small changes have big effects, it is unsurprising that companies and governments are turning to machine learning and AI to accurately predict risk. ​ How the Global Community is Applying Machine Learning The post Take Your Machine Learning Skills Global appeared first on Machine Learning Mastery.

  • Google Colab for Machine Learning Projects
    by Zhe Ming Chng (Blog) on April 27, 2022 at 7:39 pm

    Have you ever wanted an easy-to-configure interactive environment to run your machine learning code that came with access to GPUs for free? Google Colab is the answer you’ve been looking for. It is a convenient and easy-to-use way to run Jupyter notebooks on the cloud, and their free version comes with some limited access to The post Google Colab for Machine Learning Projects appeared first on Machine Learning Mastery.

  • Multiprocessing in Python
    by Daniel Chung (Blog) on April 25, 2022 at 2:02 pm

    When you work on a computer vision project, you probably need to preprocess a lot of image data. This is time-consuming, and it would be great if you could process multiple images in parallel. Multiprocessing is the ability of a system to run multiple processors at one time. If you had a computer with a The post Multiprocessing in Python appeared first on Machine Learning Mastery.

Download AWS machine Learning Specialty Exam Prep App on iOs

AWS machine learning certification prep
AWS machine learning certification prep

Download AWS Machine Learning Specialty Exam Prep App on Android/Web/Amazon

Download AWS machine Learning Specialty Exam Prep App on iOs

Download AWS Machine Learning Specialty Exam Prep App on Android/Web/Amazon